这个函数中 numpy 随机种子的推理？

Question

我正在学习有关在生物信息学中使用 Python 的教程。在本教程中，通过以下函数执行了 Mann-Whitney U 测试。

numpy.random.seed 在包之后的第一行中使用，但没有在其他地方使用。我想知道这个动作有什么用，因为它似乎不会影响结果？

def mannwhitney(descriptor, verbose=False):

  from numpy.random import seed 
  from numpy.random import randn
  from scipy.stats import mannwhitneyu 

  seed(1)

  selection  =[descriptor, "Bioactivity_Class"]
  df = df_2class[selection]
  active = df[df.Bioactivity_Class == "active"]
  active = active[descriptor]

  selection=[descriptor,"Bioactivity_Class"]
  df = df_2class[selection]
  inactive = df[df.Bioactivity_Class == "inactive"]
  inactive = inactive[descriptor]

  stat,p = mannwhitneyu(active,inactive)

  #creating a result dataframe for easier interpretation 
  
  alpha = 0.05

  if p> alpha:
    interpretation = "Same distribution (fail to reject H0)"

  else: 
    interpretation = "Different distribution (reject H0)"

  results = pd.DataFrame ({"Descriptor": descriptor,"Statistics": stat,"p":p,
                           "alpha":alpha, "Interpretation":interpretation},
                          index =[0])
  
  return results

Answer 1

这是一个很好的问题。 numpy 中的种子保证随机生成值的可重复性。

假设您正在学习一个教程，它会生成两个随机分布以与统计测试进行比较。根据定义，随机生成是

random

，每次运行单元或脚本时，统计测试都会给出略有不同的结果。为了避免这种情况，人们喜欢播种。

但是，在您的情况下，进入曼惠特尼测试的数据可能是确定性的，即通过

descriptor

和

df_2class

方法从外部提供。如果这些方法出于任何原因生成合成数据，那么您的种子将确保在独立运行之间您的 p 值和统计量与底层合成数据完全相同。

如果该数据实际上是静态/确定性的，那么种子实际上是无用的，因为没有任何东西是随机生成的。

最好的猜测是看看

descriptor

和

df_2class

对生成的 df 变量做了什么，以查看种子在此特定定义中是否有用。

这个函数中 numpy 随机种子的推理？

问题描述投票：0回答：1

1个回答

最新问题

这个函数中 numpy 随机种子的推理？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1