用于计算DNA序列中GC含量的Python初学者脚本

Question

我正在尝试计算 Rosalind 问题的 DNA 序列的 GC 含量（以％为单位）。我有以下代码，但它返回 0，或者仅返回 G 的数量或单独的 C 的数量（无百分比）。

x = raw_input("Sequence?:").upper()
total = len(x)
c = x.count("C")
g = x.count("G")

gc_total = g+c

gc_content = gc_total/total

print gc_content

我也尝试过这个，只是为了获取 G 和 C 的计数，而不是百分比，但它只是返回整个字符串的计数：

x = raw_input("Sequence?:").upper()
def gc(n):
    count = 0
    for i in n:
        if i == "C" or "G":
            count = count + 1
        else:
            count = count
    return count
gc(x)

编辑：我修复了第一个代码示例中 print 语句中的拼写错误。这不是问题，我只是粘贴了错误的代码片段（有很多尝试......）

Answer 1

你的问题是你正在执行整数除法，而不是浮点除法。

尝试

gc_content = gc_total / float(total)

Answer 2

不应该：

打印cg_内容

阅读

打印gc_content？

至于其他代码片段，你的循环说

如果我==“C”或“G”：

这每次都会将“G”评估为 true，从而将 if 语句运行为 true。

相反，它应该读作

如果我==“C”或i==“G”：

此外，您不需要 else 语句。

希望这有帮助。让我们知道进展如何。

阿卜杜勒·萨塔尔

Answer 3

您还需要将答案乘以 100 将其转换为百分比。

Answer 4

#This works for me.

import sys

filename=sys.argv[1]

fh=open(filename,'r')

file=fh.read()
x=file
c=0
a=0
g=0
t=0

for x in file:
    if "C" in x:
        c+=1    
    elif "G" in x:
        g+=1
    elif "A" in x:
        a+=1    
    elif "T" in x:
        t+=1

print "C=%d, G=%d, A=%d, T=%d" %(c,g,a,t)

gc_content=(g+c)*100/(a+t+g+c)

print "gc_content= %f" %(gc_content)

Answer 5

import sys
orignfile = sys.argv[1]
outfile = sys.argv[2]

sequence = ""
with open(orignfile, 'r') as f:
    for line in f:
        if line.startswith('>'):
            seq_id = line.rstrip()[0:]
        else:
            sequence += line.rstrip()
GC_content = float((sequence.count('G') + sequence.count('C'))) / len(sequence) * 100
with open(outfile, 'a') as file_out:
    file_out.write("The GC content of '%s' is\t %.2f%%" % (seq_id, GC_content))

Answer 6

也许为时已晚，但使用 Bio 会更好

#!/usr/bin/env python

import sys
from Bio import SeqIO

filename=sys.argv[1]

fh= open(filename,'r')

parser = SeqIO.parse(fh, "fasta")

for record in parser:
    c=0
    a=0
    g=0
    t=0
    for x in str(record.seq):
        if "C" in x:
            c+=1    
        elif "G" in x:
            g+=1
        elif "A" in x:
            a+=1    
        elif "T" in x:
            t+=1
gc_content=(g+c)*100/(a+t+g+c)

print "%s\t%.2f" % (filename, gc_content)

Answer 7

这可能会有帮助

import random
dna=''.join(random.choice('ATGCN') for i in range(2048))
print(dna)
print("A count",round((dna.count("A")/2048)*100),"%")
print("T count",round((dna.count("T")/2048)*100),"%")
print("G count",round((dna.count("G")/2048)*100),"%")
print("C count",round((dna.count("C")/2048)*100),"%")
print("AT count",round((dna.count("AT")/2048)*100),"%")
print("GC count",round((dna.count("GC")/2048)*100),"%")

Answer 8

使用此网页 (http://www.faculty.ucr.edu/~mmaduro/random.htm) 使用以下设置生成 15 个字符长的字符串：

DNA 大小（bp）= 15
GC含量（0到1之间）= 0.7

编写一个 Python 程序，使用两种方法计算生成的基因组序列中 C 和 G 字符的数量：

使用for循环
不使用任何类型的循环

您的 Python 程序应打印以下内容： 1- 生成的基因组序列 2- 'C' 字符数，使用 for 循环 3- 'G' 字符的数量，使用 for 循环 4- 'C' 字符数，不使用循环 5- 'G' 字符的数量，不使用循环

用于计算DNA序列中GC含量的Python初学者脚本

问题描述投票：0回答：8

8个回答

最新问题

用于计算DNA序列中GC含量的Python初学者脚本

问题描述 投票：0回答：8

8个回答

最新问题

问题描述投票：0回答：8