是否可以再次在snakemake中定义所有规则？

Question

我使用snakemake并行处理8个文件（fastq）。然后，对每个文件进行解复用，然后再次使用snakemake对由每个文件生成的解复用文件进行并行处理。

我的第一次尝试（效果很好）是使用2个蛇文件。

用于并行处理8个文件的蛇文件
用于并行处理生成的多路分解的文件的蛇文件

我只想使用一个蛇文件。

这里有2条蛇文件的解决方案：

用于并行处理8个文件（通配符{run}）的快照文件＃1

configfile: "config.yaml"

rule all:
    input:
        expand("{folder}{run}_R1.fastq.gz", run=config["fastqFiles"],folder=config["fastqFolderPath"]),
        expand('assembled/{run}/{run}.fastq', run=config["fastqFiles"]),
        expand('assembled/{run}/{run}.ali.fastq', run=config["fastqFiles"]),
        expand('assembled/{run}/{run}.ali.assigned.fastq', run=config["fastqFiles"]),
        expand('assembled/{run}/{run}.unidentified.fastq', run=config["fastqFiles"]),
        expand('log/remove_unaligned/{run}.log',run=config["fastqFiles"]),
        expand('log/illuminapairedend/{run}.log',run=config["fastqFiles"]),
        expand('log/assign_sequences/{run}.log',run=config["fastqFiles"]),
        expand('log/split_sequences/{run}.log',run=config["fastqFiles"])

include: "00-rules/assembly.smk"
include: "00-rules/demultiplex.smk"

snakefile＃2，用于并行处理生成的多路分解的文件

SAMPLES, = glob_wildcards('samples/{sample}.fasta')

rule all:
    input:
        expand('samples/{sample}.uniq.fasta',sample=SAMPLES),
        expand('samples/{sample}.l.u.fasta',sample=SAMPLES),
        expand('samples/{sample}.r.l.u.fasta',sample=SAMPLES),
        expand('samples/{sample}.c.r.l.u.fasta',sample=SAMPLES),
        expand('log/dereplicate_samples/{sample}.log',sample=SAMPLES),
        expand('log/goodlength_samples/{sample}.log',sample=SAMPLES),
        expand('log/clean_pcrerr/{sample}.log',sample=SAMPLES),
        expand('log/rm_internal_samples/{sample}.log',sample=SAMPLES)

include: "00-rules/filtering.smk"

此解决方案运行良好。

是否可以通过这种方式将这两个蛇文件合并为一个？

configfile: "config.yaml"

rule all:
    input:
        expand("{folder}{run}_R1.fastq.gz", run=config["fastqFiles"],folder=config["fastqFolderPath"]),
        expand('assembled/{run}/{run}.fastq', run=config["fastqFiles"]),
        expand('assembled/{run}/{run}.ali.fastq', run=config["fastqFiles"]),
        expand('assembled/{run}/{run}.ali.assigned.fastq', run=config["fastqFiles"]),
        expand('assembled/{run}/{run}.unidentified.fastq', run=config["fastqFiles"]),
        expand('log/remove_unaligned/{run}.log',run=config["fastqFiles"]),
        expand('log/illuminapairedend/{run}.log',run=config["fastqFiles"]),
        expand('log/assign_sequences/{run}.log',run=config["fastqFiles"]),
        expand('log/split_sequences/{run}.log',run=config["fastqFiles"])

include: "00-rules/assembly.smk"
include: "00-rules/demultiplex.smk"

SAMPLES, = glob_wildcards('samples/{sample}.fasta')

rule all:
    input:
        expand('samples/{sample}.uniq.fasta',sample=SAMPLES),
        expand('samples/{sample}.l.u.fasta',sample=SAMPLES),
        expand('samples/{sample}.r.l.u.fasta',sample=SAMPLES),
        expand('samples/{sample}.c.r.l.u.fasta',sample=SAMPLES),
        expand('log/dereplicate_samples/{sample}.log',sample=SAMPLES),
        expand('log/goodlength_samples/{sample}.log',sample=SAMPLES),
        expand('log/clean_pcrerr/{sample}.log',sample=SAMPLES),
        expand('log/rm_internal_samples/{sample}.log',sample=SAMPLES)

include: "00-rules/filtering.smk"

所以我必须再次定义rule all。

并且我收到以下消息错误：

The name all is already used by another rule

它们是拥有许多rule all的一种方法还是“使用许多snakefile”的解决方案是唯一的一种可能？

我想以最合适的方式使用snakemake。

Answer 1

您不受顶级规则的命名限制。您可以将其命名为all，也可以重命名它：唯一重要的是它们的定义顺序。默认情况下，Snakemake将第一个规则作为目标规则，然后构造依赖关系图。

考虑到您有几种选择。首先，您可以将工作流程中的两个顶级规则合并为一个。归根结底，all规则除了目标文件的定义外什么也不做。接下来，您可以将规则重命名为all1和all2（如果在命令行中指定，则可以运行单个工作流），并为all规则提供合并的输入。最后，您可以使用子工作流程，但是只要您打算将两个脚本压缩为一个，那将是一个过大的杀伤力。

另一个可能有用的提示：如果您为每次运行定义了不同的输出，则无需为每个文件指定模式expand('filename{sample}',sample=config["fastqFiles"])。例如：

rule sample:
    input:
        'samples/{sample}.uniq.fasta',
        'samples/{sample}.l.u.fasta',
        'samples/{sample}.r.l.u.fasta',
        'samples/{sample}.c.r.l.u.fasta',
        'log/dereplicate_samples/{sample}.log',
        'log/goodlength_samples/{sample}.log',
        'log/clean_pcrerr/{sample}.log',
        'log/rm_internal_samples/{sample}.log'
    output:
        temp('flag_sample_{sample}_complete')

在这种情况下，all规则变得微不足道：

rule all:
    input: expand('flag_sample_{sample}_complete', sample=SAMPLES)

或者，正如我之前建议的那样：

rule all:
    input: expand('flag_run_{run}_complete', run=config["fastqFiles"]),
    input: expand('flag_sample_{sample}_complete', sample=SAMPLES)

rule all1:
    input: expand('flag_run_{run}_complete', run=config["fastqFiles"])

rule all2:
    input: expand('flag_sample_{sample}_complete', sample=SAMPLES)

是否可以再次在snakemake中定义所有规则？

问题描述投票：1回答：1

1个回答

最新问题

是否可以再次在snakemake中定义所有规则？

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1