我是Snakemake的新手,并试图弄清楚嵌套配置值是如何工作的。我创建了以下配置文件...
# dummyconfig.json
{
"fam1": {
"numchr": 1,
"chrlen": 2500000,
"seeds": {
"genome": 8013785666,
"simtrio": 1776,
"simseq": {
"mother": 2053695854357871005,
"father": 4517457392071889495,
"proband": 2574020394472462046
}
},
"ninherited": 100,
"ndenovo": 5,
"numreads": 375000
}
}
...在我的Snakefile中遵循这条规则(以及其他规则)。
# Snakefile
rule simgenome:
input:
"human.order6.mm",
output:
"{family}-refr.fa.gz"
shell:
"nuclmm simulate --out - --order 6 --numseqs {config[wildcards.family][numchr]} --seqlen {config[wildcards.family][chrlen]} --seed {config[wildcards.family][seeds][genome]} {input} | gzip -c > {output}"
然后我想通过调用fam1-refr.fa.gz
来创建snakemake --configfile dummyconfig.json fam1-refr.fa.gz
。当我这样做时,我收到以下错误消息。
Building DAG of jobs...
rule simgenome:
input: human.order6.mm
output: fam1-refr.fa.gz
jobid: 0
wildcards: family=fam1
RuleException in line 1 of /Users/standage/Projects/noble/Snakefile:
NameError: The name 'wildcards.family' is unknown in this context. Please make sure that you defined that variable. Also note that braces not used for variable access have to be escaped by repeating them, i.e. {{print $1}}
所以fam1
被正确地识别为family
通配符的值,但它似乎不像{config[wildcards.family][numchr]}
这样的变量访问工作。
是否可以以这种方式遍历嵌套配置,或者Snakemake是否仅支持访问顶级变量?
解决它的一种方法是使用params
并解析shell
块之外的变量。
rule simgenome:
input:
"human.order6.mm",
output:
"{family}-refr.fa.gz"
params:
seed=lambda w: config[w.family]['seeds']['genome'],
numseqs=lambda w: config[w.family]['numchr'],
seqlen=lambda w: config[w.family]['chrlen']
shell:
"nuclmm simulate --out - --order 6 --numseqs {params.numseqs} --seqlen {params.seqlen} --seed {params.seed} {input} | gzip -c > {output}"