我有如下所示的输入文件:
93965_0_0_16000_16000.tif
93965_0_16000_16000_12799.tif
93965_16000_0_14548_16000.tif
93965_16000_16000_14548_12799.tif
93966_0_0_16000_16000.tif
93966_0_16000_16000_3800.tif
93966_16000_0_2980_16000.tif
93966_16000_16000_2980_3800.tif
93967_0_0_16000_16000.tif
93967_0_16000_16000_12799.tif
93967_16000_0_16000_16000.tif
93967_16000_16000_16000_12799.tif
93967_32000_0_2365_16000.tif
93967_32000_16000_2365_12799.tif
94297_0_0_16000_16000.tif
94297_0_16000_16000_16000.tif
94297_0_32000_16000_16000.tif
94297_0_48000_16000_7799.tif
94297_16000_0_16000_16000.tif
94297_16000_16000_16000_16000.tif
94297_16000_32000_16000_16000.tif
输出将是:
6000_16000_14548_12799.png
93966_0_0_16000_16000.png
93966_0_16000_16000_3800.png
93966_16000_0_2980_16000.png
93966_16000_16000_2980_3800.png
93967_0_0_16000_16000.png
93967_0_16000_16000_12799.png
93967_16000_0_16000_16000.png
93967_16000_16000_16000_12799.png
93967_32000_0_2365_16000.png
93967_32000_16000_2365_12799.png
94297_0_0_16000_16000.png
94297_0_16000_16000_16000.png
94297_0_32000_16000_16000.png
94297_0_48000_16000_7799.png
94297_16000_0_16000_16000.png
94297_16000_16000_16000_16000.png
94297_16000_32000_16000_16000.png
所以,第一部分是一个唯一的ID,剩下的部分是变化的。我想通过第一个唯一 ID 对许多输入/输出进行分组。所以,一张外卡要正常处理,另一张需要特殊的方法来扩展它,我正在寻找。
到目前为止我有以下内容
rule process:
input:
{id}_{patch}.tif'
output:
'./output_{id}_{patch}.png'
wildcard_constraints:
patch = "\d+_\d+_\d+_\d+.*"
shell:
"""
bash command {wildcard.id}
"""
我试图在输入和输出中制定正则表达式规则来过滤列表,但似乎 snakemake 不会以这种方式评估输入/输出。
我希望每个作业都在相同 id 但不同补丁的列表上运行。
我很感激任何想法。
谢谢!
也许其他人可以想出更优雅的解决方案。
IDS, PAT1, PAT2, PAT3, PAT4, = glob_wildcards("{id}_{pat1}_{pat2}_{pat3}_{pat4}.tif")
PATCHES = list(zip(PAT1,PAT2,PAT3,PAT4))
patches = []
for i in PATCHES:
string = "_".join(i)
patches.append(string)
rule all:
input:
expand("{id}_{patch}.png", zip, id = IDS, patch = patches)
rule process:
input:
"{id}_{patch}.tif"
output:
"{id}_{patch}.png"
shell:
"""
echo {wildcards.id} > {output}
"""