在nextflow脚本块中导入python模块时出错

问题描述 投票:0回答:1

我遇到了与herehere描述的类似问题。代码如下:

process q2_predict_dysbiosis { publishDir 'results', mode: 'copy'

input:
path abundance_file
path species_abundance_file
path stratified_pathways_table
path unstratified_pathways_table

output:
path "${abundance_file.baseName}_q2pd.tsv"

script:
"""
#!/usr/bin/env python

from q2_predict_dysbiosis import calculate_index
import pandas as pd

pd.set_option('display.max_rows', None)

taxa = pd.read_csv("${species_abundance_file}", sep="\\t", index_col=0)
paths_strat = pd.read_csv("${stratified_pathways_table}", sep="\\t", index_col=0)
paths_unstrat = pd.read_csv("${unstratified_pathways_table}", sep="\\t", index_col=0)

score_df = calculate_index(taxa, paths_strat, paths_unstrat)
score_df.to_csv("${abundance_file.baseName}_q2pd.tsv", sep="\\t", float_format="%.2f")
"""
}

获得错误:

Caused by:
  Process `q2_predict_dysbiosis (1)` terminated with an error exit status (1)


Command executed:

  #!/usr/bin/env python

  from q2_predict_dysbiosis import calculate_index
  import pandas as pd

  pd.set_option('display.max_rows', None)

  taxa = pd.read_csv("abundance1-taxonomy_table.txt", sep="\t", index_col=0)
  paths_strat = pd.read_csv("pathways_stratified.txt", sep="\t", index_col=0)
  paths_unstrat = pd.read_csv("pathways_unstratified.txt", sep="\t", index_col=0)

  score_df = calculate_index(taxa, paths_strat, paths_unstrat)
  score_df.to_csv("abundance1_q2pd.tsv", sep="\t", float_format="%.2f")

Command exit status:
  1

Command output:
  (empty)

Command error:
  Traceback (most recent call last):
    File ".command.sh", line 3, in <module>
      from q2_predict_dysbiosis import calculate_index
  ModuleNotFoundError: No module named 'q2_predict_dysbiosis'

我已按照此链接中的说明进行操作,但仍然不起作用。我想保留这样的代码块,而不是运行 script.py 文件。我正在使用此存储库中的代码。

提前致谢!

更新

为了尝试解决导入错误,我已执行以下操作:

  1. 创建一个

    bin/
    目录,与
    script.nf
    位于同一目录中。没有结果。

  2. 更改 shebang 声明。没有结果。

q2_predict_dysbiosis
未安装(没有安装说明),但它在本地运行。我认为问题在于 Nextflow 找不到
q2_predict_dysbiosis.py
,即使它位于
./bin
目录中。

python import nextflow
1个回答
0
投票

Python 导入系统使用以下顺序来定位要导入的包和模块:

  1. 当前工作目录(即

    $PWD
    :这是启动Python解释器的目录。

  2. PYTHONPATH
    环境变量:如果设置,此环境变量可以为Python指定其他目录来搜索包和模块。

  3. 程序中的

    sys.path
    列表:此列表中的路径决定Python在何处查找模块,您可以在代码中修改
    sys.path
    以包含其他目录。

  4. 系统范围或虚拟环境安装的软件包:这些是已全局安装在系统或虚拟环境中的软件包。


一个快速解决方案是简单地使用 nextflow.config 中的

PYTHONPATH
 范围设置 
env
环境变量。例如,在项目存储库根目录(即 main.nf 脚本所在的目录)中名为
packages
的文件夹中使用 q2_predict_dysbiosis.py

env {

    PYTHONPATH = "${projectDir}/packages"
}
process q2_predict_dysbiosis {

    debug true

    script:
    """
    #!/usr/bin/env python
    import sys
    print(sys.path)

    from q2_predict_dysbiosis import calculate_index

    assert 'q2_predict_dysbiosis' in sys.modules
    """
}

workflow {

    q2_predict_dysbiosis()
}

结果:

$ nextflow run main.nf 

 N E X T F L O W   ~  version 24.10.0

Launching `main.nf` [grave_avogadro] DSL2 - revision: 2f0c31286e

executor >  local (1)
[8f/50976f] q2_predict_dysbiosis [100%] 1 of 1 ✔
[
    '/path/to/project/work/8f/50976fe453d54fd6e11b3501d4b05a',
    '/path/to/project/packages',
    '/usr/lib/python312.zip',
    '/usr/lib/python3.12',
    '/usr/lib/python3.12/lib-dynload',
    '/usr/lib/python3.12/site-packages'
]

不过,更好的解决方案是重构。将您的自定义代码移至单独的文件中(例如 your_script.py),将其放在

bin
目录中并使其可执行(
chmod a+x bin/your_script.py
)。同时将 q2_predict_dysbiosis.py 移动到此目录中,可能位于名为 utils 的子目录中。您的目录结构可能如下所示:

$ find .
.
./main.nf
./bin
./bin/utils
./bin/utils/q2_predict_dysbiosis.py
./bin/your_script.py

并且 your_script.py 可能如下所示,使用

argparse
来提供用户友好的命令行界面:

#!/usr/bin/env python

import argparse
import pandas as pd

from utils.q2_predict_dysbiosis import calculate_index

pd.set_option('display.max_rows', None)
def custom_help_formatter(prog):
    return argparse.HelpFormatter(prog, max_help_position=80)
def parse_args():
    parser = argparse.ArgumentParser(
        description="Calculate dysbiosis index using abundance and pathways tables.",
        formatter_class=custom_help_formatter,
    )

    parser.add_argument(
        "species_abundance_file",
        help="Path to the species abundance file",
    )
    parser.add_argument(
        "stratified_pathways_table",
        help="Path to the stratified pathways table file",
    )
    parser.add_argument(
        "unstratified_pathways_table",
        help="Path to the unstratified pathways table file",
    )
    parser.add_argument(
        "output_file",
        help="Path to the output file to save the results",
    )

    return parser.parse_args()
def main(
    species_abundance_file,
    stratified_pathways_table,
    unstratified_pathways_table,
    output_file
):
    taxa = pd.read_csv(species_abundance_file, sep="\t", index_col=0)
    paths_strat = pd.read_csv(stratified_pathways_table, sep="\t", index_col=0)
    paths_unstrat = pd.read_csv(unstratified_pathways_table, sep="\t", index_col=0)
    
    score_df = calculate_index(taxa, paths_strat, paths_unstrat)
    score_df.to_csv(output_file, sep="\t", float_format="%.2f")
if __name__ == "__main__":
    args = parse_args()

    main(
        args.species_abundance_file,
        args.stratified_pathways_table,
        args.unstratified_pathways_table,
        args.output_file
    )

使用

main.nf
测试:

$ cat main.nf 
process q2_predict_dysbiosis {

    debug true

    script:
    """
    your_script.py --help
    """
}

workflow {

    q2_predict_dysbiosis()
}

结果:

$ nextflow run main.nf 

 N E X T F L O W   ~  version 24.10.0

Launching `main.nf` [peaceful_stonebraker] DSL2 - revision: fea21868c7

executor >  local (1)
[88/538f31] q2_predict_dysbiosis [100%] 1 of 1 ✔
usage: your_script.py [-h] species_abundance_file stratified_pathways_table unstratified_pathways_table output_file

Calculate dysbiosis index using abundance and pathways tables.

positional arguments:
  species_abundance_file       Path to the species abundance file
  stratified_pathways_table    Path to the stratified pathways table file
  unstratified_pathways_table  Path to the unstratified pathways table file
  output_file                  Path to the output file to save the results

options:
  -h, --help                   show this help message and exit

© www.soinside.com 2019 - 2024. All rights reserved.