我是Perl编程的新手,并且坚持使用自己的脚本。
我正在尝试在FASTA文件中搜索一个基序,如果找到,则打印出包含该基序的蛋白质的ID。
我可以加载我的文件,但是在放置主题之后,什么也没有发生。我收到以下错误:在串联(。)或test.pl第36行第2行的字符串中使用未初始化的值$ data [0]。
这是我的代码:
#!/usr/bin/perl -w
# Searching for motifs
print "Please type the filename of the protein sequence data: ";
$proteinfilename = <STDIN>;
# Remove the newline from the protein filename
chomp $proteinfilename;
# open the file, or exit
unless ( open(FA, $proteinfilename) ) {
print "Cannot open file \"$proteinfilename\"\n\n";
exit;
}
@protein = <FA>; # Read the protein sequence data from the file, and store it into the array variable @protein
my (@description, @ID, @data);
while (my $protein = <FA>) {
chomp($protein);
@description = split (/\s+/, $protein);
push (@ID, $description[0]);
}
# Close the file
close FA;
my %params = map { $_ => 1 } @ID;
# Put the protein sequence data into a single string, as it's easier to search for a motif in a string than in an array of lines
$protein = join( '', @protein);
# Remove whitespace
$protein =~ s/\s//g;
# ask for a motif or exit if no motif is entered.
do {
print "Enter a motif to search for: ";
$motif = <STDIN>;
# Remove the newline at the end of $motif
chomp $motif;
# Look for the motif
@data = split (/\s+/, $protein);
if ( $protein =~ /$motif/ ) {
print $description[0]."\n" if(exists($params{$data[0]}));
}
# exit on an empty user input
} until ( $motif =~ /^\s*$/ );
# exit the program
exit;
输入的示例是:
sp | O60341 | KDM1A_HUMAN赖氨酸特异性组蛋白脱甲基酶1A OS =智人OX = 9606 GN = KDM1A PE = 1 SV = 2MLSGKKAAAAAAAAAAAAAAATGTEAGPGTAGGSENGSEVAAQPAGLSGPAEVGPGAVGERTPRKKEPPRASPPGGLAEPPGSAGPQAGPTVVPGSATPMETGIAETPEGRRTSRRKRAKVEY
假设我想在给定序列中找到基序'PMET'。如果存在,我想获取一个ID作为输出-> O60341
非常感谢!
非常感谢任何反馈!
我在这里为单行输入文件编写了示例代码。
my $motif = <STDIN>;
chomp($motif);
my $str = "sp|O60341|KDM1A_HUMAN Lysine-specific histone demethylase 1A OS=Homo sapiens OX=9606 GN=KDM1A PE=1 SV=2 MLSGKKAAAAAAAAAAAATGTEAGPGTAGGSENGSEVAAQPAGLSGPAEVGPGAVGERTP RKKEPPRASPPGGLAEPPGSAGPQAGPTVVPGSATPMETGIAETPEGRRTSRRKRAKVEY";
if($str=~m/$motif/)
{
if($str=~m/^([^|]+)\|([^|]+)\|/gm)
{
print "Expected Value: $2\n";
}
}
else { print "Not matched...\n"; }
Input>$: PMET
Expected Value: O60341