我的工作流程大致如下:
import pandas as pd
from snakemake.utils import Paramspace
TEMP_DIR = "temp"
chr = 22
# Set up paramspace for QC values:
paramspace = Paramspace(pd.read_csv("config/qc_values.tsv",
sep = "\t"))
### Target rule ###
rule all:
input:
expand(f"{TEMP_DIR}/snp-stats/post-qc/{{parameters}}/by_chr/data.chr{{chr}}.snp-stats",
parameters = paramspace.instance_patterns, chr = "22"),
### Modules ###
include: rules/rules.smk
规则.smk:
[...]
rule filter_snp_stats:
# Filter SNPs so to include only variants with high info score
# and in HWE in biallelic loci. The output is chr by chr
input:
f"{TEMP_DIR}/snp-stats/pre-qc/by_chr/data.chr{{chr}}.snp-stats"
output:
f"{TEMP_DIR}/snp-stats/post-qc/{{paramspace.wildcard_pattern}}/by_chr/data.chr{{chr}}.snp-stats",
params:
thresholds = paramspace.instance
script:
"../scripts/filter_snpstats.R"
config/qc_values.tsv:
hwe_p info
1e-06 0.8
1e-06 0.9
1e-06 0.95
1e-06 0.99
但是如果我运行它,我会得到以下错误
$ snakemake -np
Building DAG of jobs...
MissingInputException in rule all in file /mnt/storage/project/workflow/Snakefile, line 53:
Missing input files for rule all:
affected files:
/mnt/storage/project/project_folder/snp-stats/post-qc/hwe_p~1e-06/info~0.99/by_chr/data.chr22.snp-stats
/mnt/storage/project/project_folder/snp-stats/post-qc/hwe_p~1e-06/info~0.95/by_chr/data.chr22.snp-stats
/mnt/storage/project/project_folder/snp-stats/post-qc/hwe_p~1e-06/info~0.8/by_chr/data.chr22.snp-stats
/mnt/storage/project/project_folder/snp-stats/post-qc/hwe_p~1e-06/info~0.9/by_chr/data.chr22.snp-stats
...这是正确的,因为它们应该由 filter_snp_stats 生成,但不清楚为什么它不能通过通配符跟踪它们。我做错了什么?
Paramspace
需要通过 f 字符串进行扩展,而不是作为通配符,因此 {paramspace..wildcard_pattern}
不应使用 {{}}
从 f 字符串扩展中转义。在文档中,可以看到here。正确的规则是:
rule filter_snp_stats:
input:
f"{TEMP_DIR}/snp-stats/pre-qc/by_chr/data.chr{{chr}}.snp-stats"
output:
f"{TEMP_DIR}/snp-stats/post-qc/{paramspace.wildcard_pattern}/by_chr/data.chr{{chr}}.snp-stats",
params:
thresholds = paramspace.instance
script:
"../scripts/filter_snpstats.R"