Snakemake:如何使用通配符实现函数?

问题描述 投票:0回答:1

我正在尝试使用snakemake从特定作业输出一些文件。 基本上我有不同的过程通道,跨越不同的质量范围。然后根据{channel,mass}对,我必须针对不同的“规范”值运行作业。我想用这个:

import numpy as np
import pickle as pkl

particle_masses = {
    5: 4.18,     # Bottom quark mass ~4.18 GeV however there is an issue with the charon spectra below 10 GeV
    8: 80.3,   # W boson mass ~80.379 GeV
    11: 1.777,   # Tau lepton mass ~1.777 GeV
    12: 0,       # Electron neutrino mass ~0 (neutrino masses are very small)
    14: 0,       # Muon neutrino mass ~0
    16: 0        # Tau neutrino mass ~0
}

mass_values = np.logspace(0, np.log10(499), num=25).tolist()
# Masked mass dictionary with rounded values
masked_mass_dict = {}
for channel, min_mass in particle_masses.items():
    # Apply mask to exclude values below the particle mass threshold
    masked_mass = [round(m, 2) for m in mass_values if m >= max(min_mass, 3)]
    masked_mass_dict[channel] = masked_mass

channels = list(masked_mass_dict.keys())

rule all: 
    input: 
        expand(
            f"{DATA_LOC}/signal/channel_{{channel}}/trial_distributions/{{mass}}_trial_distrib_{{norm}}.h5",
            channel=channels,
            mass=lambda wildcards: masked_mass_dict[wildcards.channel],
            norm=lambda wildcards: get_norms({"mass": wildcards.mass, "channel": wildcards.channel}),
        )

rule compute_trial_distribution:
    input:
        signal_file=f"{DATA_LOC}/signal/channel_{{channel}}/mc_distrib/{{mass}}_mc_distrib.h5",
        
    output:
        norm_file=f"{DATA_LOC}/signal/channel_{{channel}}/trial_distributions/{{mass}}_trial_distrib_{{norm}}.h5"
    shell: 
        """
        ...
        """

def get_norms(dict):
    """
    Get the norm values (sensitivity) for the given channel and mass from the pre- loaded sensitivity dictionary.
    """
    
    channel = dict["channel"]
    mass = dict["mass"]
    channel = int(channel)
    mass = float(mass)
    #channel = int(wildcards.channel)
    #mass = float(wildcards.mass)

    # Load sensitivity dictionary
    dictionary_file = "path/"
    with open(dictionary_file, "rb") as f:
        sensitivities_dict = pkl.load(f)

    # Get the sensitivity data for the specified channel
    if channel not in sensitivities_dict:
        raise ValueError(f"Channel {channel} not found in sensitivity dictionary.")

    sensitivity_mass_array = sensitivities_dict[channel]
    mass_index = np.where(sensitivity_mass_array[0] == mass)[0]

    if len(mass_index) == 0:
        raise ValueError(f"Mass {mass} not found for channel {channel}")

    # Calculate norms
    azimov_norm = sensitivity_mass_array[1][mass_index[0]]
    norms = np.linspace(0.1 * azimov_norm, 10 * azimov_norm, 50)
    norms = np.insert(norms, 0, 0)  # Include a norm value of 0 for background-only distribution
    norms = np.array([0])
    # Convert to list for use in Snakemake params or shell
    return norms.tolist()

不过,这似乎行不通。我不知道如何正确实现这个...这是我得到的错误:

InputFunctionException in rule all in file /home/egenton/upgrade_solar_WIMP/scripts/Snakefile, line 45:
Error:
  AttributeError: 'Wildcards' object has no attribute 'channel'
Wildcards:

如果有人知道如何解决这个问题,那将非常有用!

我尝试输入通道质量对来扩展 get_norm 函数的范数,但这不起作用,因为 getnorm 函数确实识别通配符。

python workflow jobs snakemake
1个回答
0
投票

规则

all
不能有通配符。它没有输出部分和下游规则。但是,通配符是通过将输出中的模式与下游规则输入中所需的文件进行匹配来确定的。

对于

all
的输入,您应该定义所有您想要的最终文件的明确列表。这通常可以使用
expand
来完成,但提供参数值列表而不是 lambda 函数。 然而,对于你想要的,我认为使用简单的 Python 来计算你想要的更简单:

# Define get_norms before using it
def get_norms(...):
    # ...

distribs = []
for channel in channels:
    mass = masked_mass_dict[channel]
    for norm in get_norms({"mass": mass, "channel": channel}):
        distribs.append(
            f"{DATA_LOC}/signal/channel_{channel}/trial_distributions/{mass}_trial_distrib_{norm}.h5"

rule all: 
    input:
        distribs

在我看来,如果教程和示例中的

expand
被替换为简单的 Python 结构,snakemake 对于初学者来说会不会那么混乱。这将是学习一些 Python 基础知识的机会,这对于编写非完全直接的工作流程非常有用。

旁注:你不能将

get_norms
定义为具有两个参数(
channel
mass
)的函数,而不是采用字典的函数吗?

© www.soinside.com 2019 - 2024. All rights reserved.