我有以下蛇文件:
import os
SAMPLES = [i.replace('sample_', '') for i in os.listdir('final_raw')]
rule all:
input:
expand('analysis/results/unmapped/hybrid/read1_{sample}_hy.fastq', sample = SAMPLES)
rule trimming:
input:
read1 = 'final_raw/sample_{sample}/read1_sample_{sample}.fastq.gz',
read2 = 'final_raw/sample_{sample}/read2_sample_{sample}.fastq.gz',
barcodes = 'final_bc.fasta'
output:
'analysis/trimmed_data/{sample}_read_1.fastq.gz',
'analysis/trimmed_data/{sample}_read_2.fastq.gz'
threads: 6
shell:
'flexbar -threads {threads} '
'-r {input.read1} '
'-p {input.read2} '
'-aa Nextera -br {input.barcodes} '
'-t analysis/trimmed_data/{wildcards.sample}_read '
'-z GZ'
rule create_hybrid_genome:
input:
human = 'genomes/human/GRCh38.fna',
covid = 'genomes/cov/cov_genome.fna'
output:
'genomes/hybrid/hybrid.fna'
shell:
'cat {input.human} {input.covid} > {output}'
rule hybrid_indexing:
input:
rules.create_hybrid_genome.output
output:
'genomes/hybrid/hybrid.fna.amb'
shell:
'bwa index {input}'
rule map_hybrid:
input:
read1 = 'analysis/trimmed_data/{sample}_read_1.fastq.gz',
read2 = 'analysis/trimmed_data/{sample}_read_2.fastq.gz',
genome = 'genomes/hybrid/hybrid.fna'
output:
'analysis/results/mapped/hybrid/{sample}.sorted.bam'
shell:
'bwa mem -t 8 {input.genome} {input.read1} {input.read2} | '
'samtools sort -@ 2 -o {wildcards.sample}.sorted.bam && '
'samtools index -@ 8 {wildcards.sample}.sorted.bam && '
'mv {wildcards.sample}.sorted.* analysis/results/mapped/hybrid'
rule unmapped_hybrid:
input:
rules.map_hybrid.output
output:
'analysis/results/unmapped/hybrid/um_{sample}.bam'
shell:
'samtools view -b -f 4 {input} > um_{wildcards.sample}.bam && '
'mv um_{wildcards.sample}.bam analysis/results/unmapped/hybrid'
rule fastq_hybrid:
input:
rules.unmapped_hybrid.output
output:
read1 = 'analysis/results/unmapped/hybrid/read1_{sample}_hy.fastq',
read2 = 'analysis/results/unmapped/hybrid/read2_{sample}_hy.fastq'
shell:
'samtools fastq '
'-1 {output.read1} '
'-2 {output.read2} '
'-0 /dev/null '
'-s /dev/null '
'-n {input}'
我正在尝试添加一个运行以下 for 循环的规则“kraken”:
for FILE in $(ls analysis/results/unmapped/hybrid/read1_*_hy.fastq | sed 's/read1_*_hy.fastq//'); do \
kraken2 --db {input.database} \
--memory-mapping \
--threads 8 \
--use-names \
--report analysis/results/kraken2/hybrid/{{}}${{FILE}}.kraken \
--paired read1_${{FILE}}_hy.fastq read2_${{FILE}}_hy.fastq \
--unclassified-out /analysis/results/unclassified/hybrid/{{}}${{FILE}}_uc#.fastq; done
for 循环负责一次运行所有.fastq 文件,以便将kraken 数据库存储在内存中,并对所有样本运行分类。 如果不参考采用通配符并为每个样本运行命令的常规 snakemake shell,如何实现这一点?
正常的方法是有以下内容:
rule kraken2:
input:
database = 'kraken2_db'
read1 = 'analysis/results/unmapped/hybrid/read1_{sample}_hy.fastq'
read2 = 'analysis/results/unmapped/hybrid/read2_{sample}_hy.fastq'
output:
'analysis/results/kraken2/hybrid/{sample}.kraken'
shell:
'kraken2 '
'--db {input.database} '
'--threads 8 '
'--paired '
'--use-names '
'--unclassified-out {wildcards.sample}_uc#.fq '
'--report {output} '
'{input.read1} {input.read2}'
和规则全部更新为以下内容:
rule all:
input:
expand('analysis/results/kraken2/hybrid/{sample}.kraken', sample = SAMPLES)
这将一次运行每个示例,我正在努力在 shell 中运行 for 循环,并且大多数时候都会出现错误:
MissingInputException in rule all in line 5 of /home/stud9/NAS/snakefile: Missing input files for rule all: affected files:
有什么建议吗?