在nextflow脚本块中导入python模块时出错

Question

我遇到了与here和here描述的类似问题。代码如下：

process q2_predict_dysbiosis { publishDir 'results', mode: 'copy'

input:
path abundance_file
path species_abundance_file
path stratified_pathways_table
path unstratified_pathways_table

output:
path "${abundance_file.baseName}_q2pd.tsv"

script:
"""
#!/usr/bin/env python

from q2_predict_dysbiosis import calculate_index
import pandas as pd

pd.set_option('display.max_rows', None)

taxa = pd.read_csv("${species_abundance_file}", sep="\\t", index_col=0)
paths_strat = pd.read_csv("${stratified_pathways_table}", sep="\\t", index_col=0)
paths_unstrat = pd.read_csv("${unstratified_pathways_table}", sep="\\t", index_col=0)

score_df = calculate_index(taxa, paths_strat, paths_unstrat)
score_df.to_csv("${abundance_file.baseName}_q2pd.tsv", sep="\\t", float_format="%.2f")
"""
}

获得错误：

Caused by:
  Process `q2_predict_dysbiosis (1)` terminated with an error exit status (1)


Command executed:

  #!/usr/bin/env python

  from q2_predict_dysbiosis import calculate_index
  import pandas as pd

  pd.set_option('display.max_rows', None)

  taxa = pd.read_csv("abundance1-taxonomy_table.txt", sep="\t", index_col=0)
  paths_strat = pd.read_csv("pathways_stratified.txt", sep="\t", index_col=0)
  paths_unstrat = pd.read_csv("pathways_unstratified.txt", sep="\t", index_col=0)

  score_df = calculate_index(taxa, paths_strat, paths_unstrat)
  score_df.to_csv("abundance1_q2pd.tsv", sep="\t", float_format="%.2f")

Command exit status:
  1

Command output:
  (empty)

Command error:
  Traceback (most recent call last):
    File ".command.sh", line 3, in <module>
      from q2_predict_dysbiosis import calculate_index
  ModuleNotFoundError: No module named 'q2_predict_dysbiosis'

我已按照此链接中的说明进行操作，但仍然不起作用。我想保留这样的代码块，而不是运行 script.py 文件。我正在使用此存储库中的代码。

提前致谢！

更新

为了尝试解决导入错误，我已执行以下操作：

创建一个
```
bin/
```
目录，与
```
script.nf
```
位于同一目录中。没有结果。
更改 shebang 声明。没有结果。

q2_predict_dysbiosis

未安装（没有安装说明），但它在本地运行。我认为问题在于 Nextflow 找不到

q2_predict_dysbiosis.py

，即使它位于

./bin

目录中。

Answer 1

Python 导入系统使用以下顺序来定位要导入的包和模块：

当前工作目录（即
```
$PWD
```
）：这是启动Python解释器的目录。
```
PYTHONPATH
```
环境变量：如果设置，此环境变量可以为Python指定其他目录来搜索包和模块。
程序中的
```
sys.path
```
列表：此列表中的路径决定Python在何处查找模块，您可以在代码中修改
```
sys.path
```
以包含其他目录。
系统范围或虚拟环境安装的软件包：这些是已全局安装在系统或虚拟环境中的软件包。

一个快速解决方案是简单地使用 nextflow.config 中的

PYTHONPATH

 范围设置

env 环境变量。例如，在项目存储库根目录（即 main.nf 脚本所在的目录）中名为

packages

的文件夹中使用 q2_predict_dysbiosis.py：

env {

    PYTHONPATH = "${projectDir}/packages"
}

process q2_predict_dysbiosis {

    debug true

    script:
    """
    #!/usr/bin/env python
    import sys
    print(sys.path)

    from q2_predict_dysbiosis import calculate_index

    assert 'q2_predict_dysbiosis' in sys.modules
    """
}

workflow {

    q2_predict_dysbiosis()
}

结果：

$ nextflow run main.nf 

 N E X T F L O W   ~  version 24.10.0

Launching `main.nf` [grave_avogadro] DSL2 - revision: 2f0c31286e

executor >  local (1)
[8f/50976f] q2_predict_dysbiosis [100%] 1 of 1 ✔
[
    '/path/to/project/work/8f/50976fe453d54fd6e11b3501d4b05a',
    '/path/to/project/packages',
    '/usr/lib/python312.zip',
    '/usr/lib/python3.12',
    '/usr/lib/python3.12/lib-dynload',
    '/usr/lib/python3.12/site-packages'
]

不过，更好的解决方案是重构。将您的自定义代码移至单独的文件中（例如 your_script.py），将其放在

bin

目录中并使其可执行（

chmod a+x bin/your_script.py

）。同时将 q2_predict_dysbiosis.py 移动到此目录中，可能位于名为 utils 的子目录中。您的目录结构可能如下所示：

$ find .
.
./main.nf
./bin
./bin/utils
./bin/utils/q2_predict_dysbiosis.py
./bin/your_script.py

并且 your_script.py 可能如下所示，使用

argparse

来提供用户友好的命令行界面：

#!/usr/bin/env python

import argparse
import pandas as pd

from utils.q2_predict_dysbiosis import calculate_index

pd.set_option('display.max_rows', None)

def custom_help_formatter(prog):
    return argparse.HelpFormatter(prog, max_help_position=80)

def parse_args():
    parser = argparse.ArgumentParser(
        description="Calculate dysbiosis index using abundance and pathways tables.",
        formatter_class=custom_help_formatter,
    )

    parser.add_argument(
        "species_abundance_file",
        help="Path to the species abundance file",
    )
    parser.add_argument(
        "stratified_pathways_table",
        help="Path to the stratified pathways table file",
    )
    parser.add_argument(
        "unstratified_pathways_table",
        help="Path to the unstratified pathways table file",
    )
    parser.add_argument(
        "output_file",
        help="Path to the output file to save the results",
    )

    return parser.parse_args()

def main(
    species_abundance_file,
    stratified_pathways_table,
    unstratified_pathways_table,
    output_file
):
    taxa = pd.read_csv(species_abundance_file, sep="\t", index_col=0)
    paths_strat = pd.read_csv(stratified_pathways_table, sep="\t", index_col=0)
    paths_unstrat = pd.read_csv(unstratified_pathways_table, sep="\t", index_col=0)
    
    score_df = calculate_index(taxa, paths_strat, paths_unstrat)
    score_df.to_csv(output_file, sep="\t", float_format="%.2f")

if __name__ == "__main__":
    args = parse_args()

    main(
        args.species_abundance_file,
        args.stratified_pathways_table,
        args.unstratified_pathways_table,
        args.output_file
    )

使用

main.nf

测试：

$ cat main.nf 
process q2_predict_dysbiosis {

    debug true

    script:
    """
    your_script.py --help
    """
}

workflow {

    q2_predict_dysbiosis()
}

结果：

$ nextflow run main.nf 

 N E X T F L O W   ~  version 24.10.0

Launching `main.nf` [peaceful_stonebraker] DSL2 - revision: fea21868c7

executor >  local (1)
[88/538f31] q2_predict_dysbiosis [100%] 1 of 1 ✔
usage: your_script.py [-h] species_abundance_file stratified_pathways_table unstratified_pathways_table output_file

Calculate dysbiosis index using abundance and pathways tables.

positional arguments:
  species_abundance_file       Path to the species abundance file
  stratified_pathways_table    Path to the stratified pathways table file
  unstratified_pathways_table  Path to the unstratified pathways table file
  output_file                  Path to the output file to save the results

options:
  -h, --help                   show this help message and exit

在nextflow脚本块中导入python模块时出错

问题描述投票：0回答：1

1个回答

最新问题

在nextflow脚本块中导入python模块时出错

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1