我有以下数据框
merged_dft
来散点图两列,例如。 snv vs snv-dra
samples snv het-hom ti-tv snv-drg het-hom-drg ti-tv-drg insertion-drg deletion-drg insertion deletion ins-del-ratio-drg ins-del-ratio Sample_name Sex Superpopulation_code
0 NA20126 4592368 2.14 1.97 4770140 2.26 1.96 523917 536443 472931 494200 0.98 0.96 NA20126 male AFR
1 NA20127 4699751 2.04 1.97 4918959 2.18 1.97 562430 572733 485645 505302 0.98 0.96 NA20127 female AFR
2 NA20128 4636463 2.09 1.97 4854107 2.22 1.97 552634 566283 478801 500632 0.98 0.96 NA20128 female AFR
3 NA20129 4638940 2.11 1.97 4863336 2.23 1.97 552984 565534 478078 499867 0.98 0.96 NA20129 female AFR
4 NA20274 4339811 2.10 1.96 4554995 2.23 1.96 524046 530728 456420 471116 0.99 0.97 NA20274 female AFR
....
....
--
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.lines as mlines
import scipy.stats as stats
# x = merged_dft['snv']
# y = merged_dft['snv-drg']
# x_min = merged_dft['snv'].min()
# x_max = merged_dft['snv'].max()
# y_min = merged_dft['snv-drg'].min()
# y_max = merged_dft['snv-drg'].max()
# lineStart = min(x_min,y_min)
# lineEnd = max(x_max,y_max)
# Create a scatter plot
# plt.scatter(x, y, c='tab:blue')
sns.scatterplot(data=merged_dft, x='snv', y='snv-drg', hue='Superpopulation_code' )
plt.xlabel('NPM')
plt.ylabel('Drgen')
plt.title('Count_SNVs')
plt.rcParams.update({'figure.figsize':(10,8), 'figure.dpi':100})
plt.plot([lineStart, lineEnd], [lineStart, lineEnd], color = 'r', linestyle = 'dashed')
plt.xlim(lineStart, lineEnd)
plt.ylim(lineStart, lineEnd)
r, p = stats.pearsonr(x, y)
plt.annotate('r = {:.2f}'.format(r), xy=(0.1, 0.95), xycoords='axes fraction')
# plt.legend(bbox_to_anchor=(1.025,1), loc='upper left', borderaxespad=0.)
我想按顺序对
npm_col
与 drg_col
中的一对列进行散点图/皮尔逊相关。我无法通过下面的代码完成它。
示例:
snv vs snv-drg
,
het-hom vs het-hom-drg
,
ti-tv vs ti-tv-drg
# set 1 coloumns
npm_col = merged_dft[['snv', 'het-hom', 'ti-tv']]
npm_col
# set 2 coloumns
drg_col = merged_dft[['snv-drg', 'het-hom-drg', 'ti-tv-drg']]
drg_col
--
for i in range(len(npm_col)):
for j in range(len(drg_col)):
plt.figure()
plt.scatter(merged_dft[npm_col], merged_dft[drg_col])
plt.xlabel(npm_col)
plt.ylabel(drg_col)
plt.title(f'Scatter plot between {npm_col} and {drg_col}')
plt.rcParams.update({'figure.figsize':(10,8), 'figure.dpi':100})
plt.plot([lineStart, lineEnd], [lineStart, lineEnd], color = 'r', linestyle = 'dashed')
plt.xlim(lineStart, lineEnd)
plt.ylim(lineStart, lineEnd)
# r, p = stats.pearsonr(x, y)
r, p = stats.pearsonr(merged_dft[npm_col], merged_dft[drg_col])
plt.annotate('r = {:.2f}'.format(r), xy=(0.1, 0.95), xycoords='axes fraction')
# plt.legend(bbox_to_anchor=(1.025,1), loc='upper left', borderaxespad=0.)
plt.show()
感谢您的帮助!
感谢您提供问题的详细信息。我了解您想要创建散点图并按顺序计算列对的皮尔逊相关性。这是一个 Python 脚本,应该可以完成您正在寻找的任务:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as stats
# Define the column pairs
column_pairs = [
('snv', 'snv-drg'),
('het-hom', 'het-hom-drg'),
('ti-tv', 'ti-tv-drg')
]
# Create a plot for each column pair
for npm_col, drg_col in column_pairs:
plt.figure(figsize=(10, 8))
# Create scatter plot
sns.scatterplot(data=merged_dft, x=npm_col, y=drg_col, hue='Superpopulation_code')
# Add title and labels
plt.title(f'{npm_col} vs {drg_col}')
plt.xlabel(npm_col)
plt.ylabel(drg_col)
# Add diagonal line
x_min, x_max = plt.xlim()
y_min, y_max = plt.ylim()
line_start = min(x_min, y_min)
line_end = max(x_max, y_max)
plt.plot([line_start, line_end], [line_start, line_end], color='r', linestyle='dashed')
# Calculate and add Pearson correlation
r, p = stats.pearsonr(merged_dft[npm_col], merged_dft[drg_col])
plt.annotate(f'r = {r:.2f}', xy=(0.1, 0.95), xycoords='axes fraction')
plt.tight_layout()
plt.show()
此代码将自动创建您请求的三个图形对(snv 与 snv-drg、het-hom 与 het-hom-drg、ti-tv 与 ti-tv-drg)并计算每个图形对的 Pearson 相关性。