我想根据序列用 NAN(零值)分配/替换计数列,或用字母表替换列
熊猫数据框
IDS sequence substitute count
header1 GCTCAGCTGGCtAGAG NAN O
header1 >>>>........<<<< < 4
预期产出
IDS sequence substitute count
header1 GCTCAGCTGGCtAGAG NAN Nan
header1 >>>>........<<<< < 4
我尝试了下面链接中给出的代码,但没有运气
我无法按预期改变,我明白了
ids sequence Count count
0 header1 GCTCAGCTGGCtAGAG 0 NaN
1 header1 >>>>.........<<< 0 3
提前谢谢您
假设您想要将行与 DNA 序列(a/c/g/t 字母)匹配,您可以使用:
m = df['sequence'].str.contains('[acgt]', case=False)
df.loc[m, 'count'] = np.nan
输出:
IDS sequence substitute count
0 header1 GCTCAGCTGGCtAGAG NAN NaN
1 header1 >>>>........<<<< < 4