如何在列表中给定位置之间搜索值

问题描述 投票:0回答:1

我有一个包含三个值(字符串)和一个子字符串的列表。

  1. 列表中的每个字符串都需要在位置 20 到 50 之间搜索给定的子字符串,如果出现超过 5 次(该子字符串在每个字符串中),则打印出来。

  2. 如果字符串缺少子字符串,则应打印一条消息,指出缺少子字符串(在每个列表项中)。

输出应该是(考虑下面我的代码)

1 Enriched with SP1 binding sites
3 Contains no SP1 binding sites
seq_list = ["GGGCGGAAAAGGGCGGAAAAGGGCGGGGGCGGAAAAGGGCGGAAAAGGGCGGGGGCGGAAAAGGGCGGAAAAGGGCGGGGGCGGAAAAGGGCGGAAAAGGGCGG", "GGGCGG", "BBBBBBB"]
binding_site = "GGGCGG"


for count, value in enumerate(seq_list, start=1):               
    if binding_site in value:
        sumSP = int(sum(s.count('GGCGG')for s in seq_list))
        if sumSP >20:
            print(count, "enriched with SP1 binding sites")

else:
    print(count,"No binding sites found.")

Output

所以我有两个问题。首先,我在互联网上搜索了一个简单的解决方案来搜索 pos 20-50 之间的每个字符串,但只设法找到如何搜索整个列表位置(使用切片)。 第二个问题是我的代码

sumSP
不起作用,因为它为我的第二个字符串提供了 true,而第二个字符串应该为 false,因为我的列表中只有值 1 包含超过 5 个绑定站点。

python list slice
1个回答
0
投票

下面的代码是我认为你想要的,但可以很容易地修改。它使用 REGEX 作为计算子字符串出现次数的简单方法。它展示了如何搜索字符串的一部分。

import re

seq_list = ["GGGCGGAAAAGGGCGGAAAAGGGCGGGGGCGGAAAAGGGCGGAAAAGGGCGGGGGCGGAAAAGGGCGGAAAAGGGCGGGGGCGGAAAAGGGCGGAAAAGGGCGG", "GGGCGG", "BBBBBBB"]
binding_site = "GGGCGG"
search_for = 'GGCGG'
START = 20
FINISH = 50

for i, seq  in enumerate(seq_list):
    if not binding_site in seq:
        print(f"seq {i} No binding sites found.")
    elif len(seq) < FINISH:
        print(f"seq {i} length {len(seq)} less than search size {FINISH}")
    else:
        num = len(re.findall(search_for, seq[START:FINISH]))
        print(f"seq {i} has {num} found - enriched with SP1 binding sites")

给出:

seq 0 has 3 found - enriched with SP1 binding sites
seq 1 length 6 less than search size 50
seq 2, No binding sites found.
© www.soinside.com 2019 - 2024. All rights reserved.