我有一个 keras 神经网络分类器。
该模型将形状为 (batch_size, 20, 10) 的张量作为输入并执行二元分类。
我想提高分类精度,我认为混合专家(MoE)机制可以帮助我实现更好的精度。
所以我尝试创建多个模型(专家),然后使用 MoE 机制选择最好的 K 个专家来处理批次的每个 (20, 10) 输入。
我的想法是实现一个行为类似于路由器的 keras 层。
类似这样的:
from keras.layers import Layer
class MixtureOfExperts(Layer):
def __init__(self, experts: list[Layer], **kwargs):
super().__init__(**kwargs)
self.experts = experts
...
...
该层的行为应该像路由器一样。当对一批输入调用该层时,每个输入都应由 K 个最佳专家处理。
如何完成此类的代码以实现所需的行为?假设每个专家都有相同的输出形状
理由是这样的:
from keras.layers import Layer, Dense
import keras.backend as K
class MixtureOfExperts(Layer):
def __init__(self, num_experts, output_dim, **kwargs):
super(MixtureOfExperts, self).__init__(**kwargs)
self.num_experts = num_experts
self.output_dim = output_dim
self.experts = [Dense(output_dim, activation='softmax') for _ in range(num_experts)]
self.gating = Dense(num_experts, activation='softmax')
def call(self, inputs):
gate_outputs = self.gating(inputs) # (batch_size, num_experts)
expert_outputs = [expert(inputs) for expert in self.experts] # List of (batch_size, output_dim) tensors
weighted_expert_outputs = [gate_output[:, i:i+1] * expert_output for i, gate_output in enumerate(gate_outputs) for expert_output in expert_outputs] # List of (batch_size, output_dim) tensors
combined_output = K.sum(weighted_expert_outputs, axis=0) # (batch_size, output_dim)
return combined_output
def compute_output_shape(self, input_shape):
return (input_shape[0], self.output_dim)