word2vec 中 CBOW 和 Skipgram 梯度的区别？

Question

为什么在 CBOW 更新过程中会考虑大于或小于

的

MAX_EXP

值，而在 Skipgram 中忽略？

我专门研究了 word2vec 的 Google 实现，但相同的功能已在许多其他项目中复制，其中一个是here，用于更大的上下文。

// CBOW negative sampling gradient calculations  
f = 0;
l2 = target * layer1_size;
for (c = 0; c < layer1_size; c++) f += neu1[c] * syn1neg[c + l2];
// ** here, we still update, but essentially round the value to 1 or 0
if (f > MAX_EXP) g = (label - 1) * alpha;
else if (f < -MAX_EXP) g = (label - 0) * alpha;
else g = (label - expTable[(int)((f + MAX_EXP) * (EXP_TABLE_SIZE / MAX_EXP / 2))]) * alpha;

// ---------------------------

// Skipgram hierarchical softmax gradient calculations
f = 0;
l2 = vocab[word].point[d] * layer1_size;
for (c = 0; c < layer1_size; c++) f += syn0[c + l1] * syn1[c + l2];
// ** here, we don't update if f is outside the range given by MAX_EXP **
if (f <= -MAX_EXP) continue;
else if (f >= MAX_EXP) continue;
else f = expTable[(int)((f + MAX_EXP) * (EXP_TABLE_SIZE / MAX_EXP / 2))];
g = (1 - vocab[word].code[d] - f) * alpha;

Answer 1

这是限制

expTable

大小的经验值，否则您将不得不考虑从负无穷大到无穷大的范围。

具体值 MAX_EXP 为 6，由 3 sigma 规则选择，该规则表示对于正态分布，99.7% 的值将符合此范围：

exp(6)/(exp(6)+1) = 0.9975...

word2vec 中 CBOW 和 Skipgram 梯度的区别？

问题描述投票：0回答：1

1个回答

最新问题

word2vec 中 CBOW 和 Skipgram 梯度的区别？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1