snakemake：在规则输出中使用通配符索引配置文件

Question

我的目标是制定一条规则，根据样本信息 csv 文件中指示的生物体生成用于读取比对的基因组索引。

每个库都可以是人类或小鼠（或其他），我希望管道尽可能通用。

我的管道始于：

configfile: "config.yml"

samples = pd.read_table(config["sampleInfo_path"],sep=";").set_index("sampleName", drop=False)

收集样本信息并加载配置文件。

示例信息文件：

样本名称	有机体
样品A	鼠标
样品B	人类

参考基因组的路径在配置文件中指示：

yaml 配置文件：

genome:
  human:
    fasta: "/media/References/Human/Genecode/GRch38/Sequences/GRCh38.primary_assembly.genome.fa"
    index: "/media/References/Human/Genecode/GRch38/Indexes/Bowtie2/GRCh38.primary_assembly.genome" # path to index created during the run if not existing yet
    annotation: "/media/References/Human/Genecode/GRch38/Annotations/gencode.v46.annotation.gtf.gz"

因此，对于每个

sampleName

，我想在

organism

文件中选择其

sampleInfo

，然后使用该值提取与

config file

中的生物体对应的fasta文件的路径。 yaml 嵌套路径看起来像：

config['genome'][organism_value_extracted]['fasta']

snakemake 规则如下所示：

rule index:
    input: lambda wildcards: config["genome"][samples["organism"][wildcards.sample]]["fasta"]
    output: config['genome'][samples["organism"]["{sample}"]["index"]
    shell: """
        bowtie2 ... {input} {output
        """

不幸的是，我无法使输出起作用。

使用

config['genome']["human]["index"]

它就像一个魅力，但不可能用

samples["organism"][wildcards.sample]

的值代替“人类”

我尝试了不同的语法、lambda 或函数，但这在输出中不起作用。

我的snakemake版本是8.20.5

感谢您提供的任何帮助。

Answer 1

这并不完全是您所要求的，也许有人会给出“正确”的答案。但我认为解决问题的规范方法是不在配置中定义输出。如果没有这个限制，snakefile 就会变得非常简单，并且您将获得更具可重复性的结果，因为输出的文件树是一致的。

import pandas as pd    
configfile: "config.yml"    
samples = pd.read_table(config["sampleInfo_path"],sep=";").set_index("sampleName", drop=False)


rule all:
    input:
        expand("index/{organism}/{organism}_index.genome", organism=["mouse", "human"])

rule index:
    input: 
        lambda wc: config["genome"][wc.organism]["fasta"]
    output: 
        "index/{organism}/{organism}_index.genome"
    shell: 
        """
        touch {output}
        """

snakemake：在规则输出中使用通配符索引配置文件

问题描述投票：0回答：1

1个回答

最新问题

snakemake：在规则输出中使用通配符索引配置文件

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1