如何插入统计注释（星号或 p 值）

Question

这似乎是一个微不足道的问题，但我已经搜索了一段时间，似乎找不到答案。它似乎也应该成为这些软件包的标准部分。有谁知道是否有标准方法在seaborn中的分布图之间包含统计注释？

例如，在两个箱图或群图之间？

Answer 1

可以使用

matplotlib.pyplot.plot

或

matplotlib.axes.Axes.plot

直接绘制大括号/括号，并可以使用

matplotlib.pyplot.text

或

matplotlib.axes.Axes.text

添加注释。

seaborn

分类图的索引为 0，而默认情况下，带有

matplotlib

和

pandas

的箱线图从

range(1, N+1)

开始，可以使用

positions

参数进行调整。

seaborn

是

matplotlib

的高级 API，

pandas.DataFrame.plot

使用

matplotlib

作为默认后端。

导入和数据框

import seaborn as sns
import matplotlib.pyplot as plt

# dataframe in long form for seaborn
tips = sns.load_dataset("tips")

# dataframe in wide form for plotting with pandas.DataFrame.plot
df = tips.pivot(columns='day', values='total_bill')

# data as a list of lists for plotting directly with matplotlib (no nan values allowed)
data = [df[c].dropna().tolist() for c in df.columns]

seaborn

sns.boxplot(x="day", y="total_bill", data=tips, palette="PRGn")

# statistical annotation
x1, x2 = 2, 3   # columns 'Sat' and 'Sun' (first column: 0, see plt.xticks())
y, h, col = tips['total_bill'].max() + 2, 2, 'k'

plt.plot([x1, x1, x2, x2], [y, y+h, y+h, y], lw=1.5, c=col)
plt.text((x1+x2)*.5, y+h, "ns", ha='center', va='bottom', color=col)

plt.show()

pandas.DataFrame.plot

ax = df.plot(kind='box', positions=range(len(df.columns)))

x1, x2 = 2, 3
y, h, col = df.max().max() + 2, 2, 'k'

ax.plot([x1, x1, x2, x2], [y, y+h, y+h, y], lw=1.5, c=col)
ax.text((x1+x2)*.5, y+h, "ns", ha='center', va='bottom', color=col)

matplotlib

plt.boxplot(data, positions=range(len(data)))

x1, x2 = 2, 3

y, h, col = max(map(max, data)) + 2, 2, 'k'

plt.plot([x1, x1, x2, x2], [y, y+h, y+h, y], lw=1.5, c=col)
plt.text((x1+x2)*.5, y+h, "ns", ha='center', va='bottom', color=col)

tips.head()

   total_bill   tip     sex smoker  day    time  size
0       16.99  1.01  Female     No  Sun  Dinner     2
1       10.34  1.66    Male     No  Sun  Dinner     3
2       21.01  3.50    Male     No  Sun  Dinner     3
3       23.68  3.31    Male     No  Sun  Dinner     2
4       24.59  3.61  Female     No  Sun  Dinner     4

df.head()

day  Thur  Fri  Sat    Sun
0     NaN  NaN  NaN  16.99
1     NaN  NaN  NaN  10.34
2     NaN  NaN  NaN  21.01
3     NaN  NaN  NaN  23.68
4     NaN  NaN  NaN  24.59

data

[[27.2, 22.76, 17.29, ..., 20.53, 16.47, 18.78],
 [28.97, 22.49, 5.75, ..., 13.42, 16.27, 10.09],
 [20.65, 17.92, 20.29, ..., 29.03, 27.18, 22.67, 17.82],
 [16.99, 10.34, 21.01, ..., 18.15, 23.1, 15.69]]

Answer 2

人们可能还对向不同的框对添加多个注释感兴趣。在这种情况下，自动处理 y 轴上不同线条和文本的放置可能会很有用。我和其他贡献者编写了一个小函数来处理这些情况（请参阅Github repo），它可以正确地将行堆叠在一起而不重叠。注释可以位于图内部或图外部，并且实施了多种统计检验：Mann-Whitney 和 t 检验（独立和配对）。这是一个最小的例子。

import matplotlib.pyplot as plt
import seaborn as sns
from statannot import add_stat_annotation

sns.set(style="whitegrid")
df = sns.load_dataset("tips")

x = "day"
y = "total_bill"
order = ['Sun', 'Thur', 'Fri', 'Sat']
ax = sns.boxplot(data=df, x=x, y=y, order=order)
add_stat_annotation(ax, data=df, x=x, y=y, order=order,
                    box_pairs=[("Thur", "Fri"), ("Thur", "Sat"), ("Fri", "Sun")],
                    test='Mann-Whitney', text_format='star', loc='outside', verbose=2)

x = "day"
y = "total_bill"
hue = "smoker"
ax = sns.boxplot(data=df, x=x, y=y, hue=hue)
add_stat_annotation(ax, data=df, x=x, y=y, hue=hue,
                    box_pairs=[(("Thur", "No"), ("Fri", "No")),
                                 (("Sat", "Yes"), ("Sat", "No")),
                                 (("Sun", "No"), ("Thur", "Yes"))
                                ],
                    test='t-test_ind', text_format='full', loc='inside', verbose=2)
plt.legend(loc='upper left', bbox_to_anchor=(1.03, 1))

Answer 3

您可以使用该套件

starbars

。您给出这些对及其 p 值，它会为您绘制出来：

import seaborn as sns
import matplotlib.pyplot as plt
import starbars

# taking from the previous example
tips = sns.load_dataset("tips")
df = tips.pivot(columns='day', values='total_bill')
data = [df[c].dropna().tolist() for c in df.columns]
sns.boxplot(x="day", y="total_bill", data=tips)

# adding statistical annotation
annotations = [("Sat", "Sun", 0.002), ("Fri", "Thur", 0.05)]
starbars.draw_annotation(annotations)

plt.show()

它还有一个选项，不显示不显着的 p 值条：

starbars.draw_annotation(annotations, ns_show=False)

您可以在此处找到

starbars

文档。

免责声明：我是该包的作者。

如何插入统计注释（星号或 p 值）

问题描述投票：0回答：3

3个回答

导入和数据框

`seaborn`

`pandas.DataFrame.plot`

`matplotlib`

最新问题

如何插入统计注释（星号或 p 值）

问题描述 投票：0回答：3

3个回答

导入和数据框

seaborn

pandas.DataFrame.plot

matplotlib

最新问题

问题描述投票：0回答：3

`seaborn`

`pandas.DataFrame.plot`

`matplotlib`