process q2_predict_dysbiosis { publishDir 'results', mode: 'copy'
input:
path abundance_file
path species_abundance_file
path stratified_pathways_table
path unstratified_pathways_table
output:
path "${abundance_file.baseName}_q2pd.tsv"
script:
"""
#!/usr/bin/env python
from q2_predict_dysbiosis import calculate_index
import pandas as pd
pd.set_option('display.max_rows', None)
taxa = pd.read_csv("${species_abundance_file}", sep="\\t", index_col=0)
paths_strat = pd.read_csv("${stratified_pathways_table}", sep="\\t", index_col=0)
paths_unstrat = pd.read_csv("${unstratified_pathways_table}", sep="\\t", index_col=0)
score_df = calculate_index(taxa, paths_strat, paths_unstrat)
score_df.to_csv("${abundance_file.baseName}_q2pd.tsv", sep="\\t", float_format="%.2f")
"""
}
获得错误:
Caused by:
Process `q2_predict_dysbiosis (1)` terminated with an error exit status (1)
Command executed:
#!/usr/bin/env python
from q2_predict_dysbiosis import calculate_index
import pandas as pd
pd.set_option('display.max_rows', None)
taxa = pd.read_csv("abundance1-taxonomy_table.txt", sep="\t", index_col=0)
paths_strat = pd.read_csv("pathways_stratified.txt", sep="\t", index_col=0)
paths_unstrat = pd.read_csv("pathways_unstratified.txt", sep="\t", index_col=0)
score_df = calculate_index(taxa, paths_strat, paths_unstrat)
score_df.to_csv("abundance1_q2pd.tsv", sep="\t", float_format="%.2f")
Command exit status:
1
Command output:
(empty)
Command error:
Traceback (most recent call last):
File ".command.sh", line 3, in <module>
from q2_predict_dysbiosis import calculate_index
ModuleNotFoundError: No module named 'q2_predict_dysbiosis'
我已按照此链接中的说明进行操作,但仍然不起作用。我想保留这样的代码块,而不是运行 script.py 文件。我正在使用此存储库中的代码。
提前致谢!
更新
为了尝试解决导入错误,我已执行以下操作:
创建一个
bin/
目录,与script.nf
位于同一目录中。没有结果。
更改 shebang 声明。没有结果。
q2_predict_dysbiosis
未安装(没有安装说明),但它在本地运行。我认为问题在于 Nextflow 找不到 q2_predict_dysbiosis.py
,即使它位于 ./bin
目录中。
Python 导入系统使用以下顺序来定位要导入的包和模块:
当前工作目录(即
$PWD
):这是启动Python解释器的目录。
PYTHONPATH
环境变量:如果设置,此环境变量可以为Python指定其他目录来搜索包和模块。
程序中的
sys.path
列表:此列表中的路径决定Python在何处查找模块,您可以在代码中修改sys.path
以包含其他目录。
系统范围或虚拟环境安装的软件包:这些是已全局安装在系统或虚拟环境中的软件包。
一个快速解决方案是简单地使用 nextflow.config 中的
PYTHONPATH
范围设置
env
环境变量。例如,在项目存储库根目录(即 main.nf 脚本所在的目录)中名为 packages的文件夹中使用
q2_predict_dysbiosis.py
:
env {
PYTHONPATH = "${projectDir}/packages"
}
process q2_predict_dysbiosis {
debug true
script:
"""
#!/usr/bin/env python
import sys
print(sys.path)
from q2_predict_dysbiosis import calculate_index
assert 'q2_predict_dysbiosis' in sys.modules
"""
}
workflow {
q2_predict_dysbiosis()
}
结果:
$ nextflow run main.nf
N E X T F L O W ~ version 24.10.0
Launching `main.nf` [grave_avogadro] DSL2 - revision: 2f0c31286e
executor > local (1)
[8f/50976f] q2_predict_dysbiosis [100%] 1 of 1 ✔
[
'/path/to/project/work/8f/50976fe453d54fd6e11b3501d4b05a',
'/path/to/project/packages',
'/usr/lib/python312.zip',
'/usr/lib/python3.12',
'/usr/lib/python3.12/lib-dynload',
'/usr/lib/python3.12/site-packages'
]
不过,更好的解决方案是重构。将您的自定义代码移至单独的文件中(例如 your_script.py),将其放在
bin
目录中并使其可执行(chmod a+x bin/your_script.py
)。同时将 q2_predict_dysbiosis.py 移动到此目录中,可能位于名为 utils 的子目录中。您的目录结构可能如下所示:
$ find .
.
./main.nf
./bin
./bin/utils
./bin/utils/q2_predict_dysbiosis.py
./bin/your_script.py
argparse
来提供用户友好的命令行界面:
#!/usr/bin/env python
import argparse
import pandas as pd
from utils.q2_predict_dysbiosis import calculate_index
pd.set_option('display.max_rows', None)
def custom_help_formatter(prog):
return argparse.HelpFormatter(prog, max_help_position=80)
def parse_args():
parser = argparse.ArgumentParser(
description="Calculate dysbiosis index using abundance and pathways tables.",
formatter_class=custom_help_formatter,
)
parser.add_argument(
"species_abundance_file",
help="Path to the species abundance file",
)
parser.add_argument(
"stratified_pathways_table",
help="Path to the stratified pathways table file",
)
parser.add_argument(
"unstratified_pathways_table",
help="Path to the unstratified pathways table file",
)
parser.add_argument(
"output_file",
help="Path to the output file to save the results",
)
return parser.parse_args()
def main(
species_abundance_file,
stratified_pathways_table,
unstratified_pathways_table,
output_file
):
taxa = pd.read_csv(species_abundance_file, sep="\t", index_col=0)
paths_strat = pd.read_csv(stratified_pathways_table, sep="\t", index_col=0)
paths_unstrat = pd.read_csv(unstratified_pathways_table, sep="\t", index_col=0)
score_df = calculate_index(taxa, paths_strat, paths_unstrat)
score_df.to_csv(output_file, sep="\t", float_format="%.2f")
if __name__ == "__main__":
args = parse_args()
main(
args.species_abundance_file,
args.stratified_pathways_table,
args.unstratified_pathways_table,
args.output_file
)
使用
main.nf
测试:
$ cat main.nf
process q2_predict_dysbiosis {
debug true
script:
"""
your_script.py --help
"""
}
workflow {
q2_predict_dysbiosis()
}
结果:
$ nextflow run main.nf
N E X T F L O W ~ version 24.10.0
Launching `main.nf` [peaceful_stonebraker] DSL2 - revision: fea21868c7
executor > local (1)
[88/538f31] q2_predict_dysbiosis [100%] 1 of 1 ✔
usage: your_script.py [-h] species_abundance_file stratified_pathways_table unstratified_pathways_table output_file
Calculate dysbiosis index using abundance and pathways tables.
positional arguments:
species_abundance_file Path to the species abundance file
stratified_pathways_table Path to the stratified pathways table file
unstratified_pathways_table Path to the unstratified pathways table file
output_file Path to the output file to save the results
options:
-h, --help show this help message and exit