我是 Scrapy 新手,目前正在尝试将我的蜘蛛部署到 Scrapyd 服务器。但是,我遇到了一个问题,我似乎无法在 Scrapy 设置文件中使用
os.getenv
。
这就是我尝试设置我的settings.py的方式:
# settings.py
import os
from dotenv import load_dotenv
load_dotenv()
SENTRY_DSN = os.getenv("SENTRY_DSN")
MONGO_URI = os.getenv("MONGO_URI")
在我的蜘蛛代码中,我尝试像这样访问这些变量:
def get_collection(self) -> Collection:
client = pymongo.MongoClient(self.settings.get("MONGO_URI"))
database = client["jobs"]
collection = database[self.name]
return collection
我正在使用
scrapyd-client
将蜘蛛部署到我的服务器,但似乎我做错了什么,因为我无法访问设置文件中的这些环境变量。
这是服务器的完整响应:
{"node_name": "scrapyd-nd1-c68b9c799-cmwjd", "status": "error", "message": "Traceback (most recent call last):\n File \"<frozen runpy>\", line 198, in _run_module_as_main\n File \"<frozen runpy>\", line 88, in _run_code\n File \"/usr/local/lib/python3.11/dist-packages/scrapyd/runner.py\", line 49, in <module>\n main()\n File \"/usr/local/lib/python3.11/dist-packages/scrapyd/runner.py\", line 45, in main\n execute()\n File \"/usr/local/lib/python3.11/dist-packages/scrapy/cmdline.py\", line 128, in execute\n settings = get_project_settings()\n ^^^^^^^^^^^^^^^^^^^^^^\n File \"/usr/local/lib/python3.11/dist-packages/scrapy/utils/project.py\", line 71, in get_project_settings\n settings.setmodule(settings_module_path, priority=\"project\")\n File \"/usr/local/lib/python3.11/dist-packages/scrapy/settings/__init__.py\", line 383, in setmodule\n module = import_module(module)\n ^^^^^^^^^^^^^^^^^^^^^\n File \"/usr/lib/python3.11/importlib/__init__.py\", line 126, in import_module\n return _bootstrap._gcd_import(name[level:], package, level)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"<frozen importlib._bootstrap>\", line 1206, in _gcd_import\n File \"<frozen importlib._bootstrap>\", line 1178, in _find_and_load\n File \"<frozen importlib._bootstrap>\", line 1149, in _find_and_load_unlocked\n File \"<frozen importlib._bootstrap>\", line 690, in _load_unlocked\n File \"<frozen importlib._bootstrap_external>\", line 940, in exec_module\n File \"<frozen importlib._bootstrap>\", line 241, in _call_with_frames_removed\n File \"/tmp/jobFlow-1695549937-0y8mcj3h.egg/jobFlow/settings.py\", line 4, in <module>\n File \"/tmp/jobFlow-1695549937-0y8mcj3h.egg/dotenv/main.py\", line 336, in load_dotenv\n File \"/tmp/jobFlow-1695549937-0y8mcj3h.egg/dotenv/main.py\", line 300, in find_dotenv\n File \"/tmp/jobFlow-1695549937-0y8mcj3h.egg/dotenv/main.py\", line 257, in _walk_to_root\nOSError: Starting path not found\n"}
这是我正在运行的完整命令:
scrapyd-deploy --include-dependencies
有什么想法可以解决吗?
我想我明白发生了什么。由于某种原因 scrapyd 不喜欢 dotenv。如果您可以避免在 scrapy 中使用该库,您应该能够通过curl 执行蜘蛛。
在你的settings.py中删除dotenv并保留导入os
import os
#from dotenv import load_dotenv()
#load_dotenv()
将您的秘密从 .env 导入到 settings.py 中,如下所示:
SENTRY_DSN = os.environ.get('SENTRY_DSN')
MONGO_URI = os.environ.get('MONGO_URI')
一旦使用所需的环境变量设置了 settings.py,您将使用 scrapy.utils.project 从 settings.py 文件中获取值。
from scrapy.utils.project import get_project_settings
SENTRY_DSN = get_project_settings().get('SENTRY_DSN')
MONGO_URI = get_project_settings().get('MONGO_URI')
然后在终端中运行 scrapyd,默认值应该是 scrapy.cfg 文件中的项目设置变量。
scrapyd
Scrapyd-deploy default
这样做可以让我使用 Scrapyd-deploy 默认值!
curl http://localhost:6800/schedule.json -d project=<project> -d spider=<spider_name>
output: {"node_name": "Name-MBP", "status": "ok", "jobid": "bd605850...."}
这是我的错误消息,即使不完全相同,也与您的类似。希望您发现这很有用,或者其他人可能也遇到类似的事情!
{"node_name": "NAME-MBP", "status": "error", "message": "Traceback (most recent call last):\n File \"<frozen runpy>\", line 198, in _run_module_as_main\n File \"<frozen runpy>\", line 88, in _run_code\n File \"/Users/name/Coding/proj/lib/python3.11/site-packages/scrapyd/runner.py\", line 38, in <module>\n main()\n File \"/Users/name/Coding/proj/lib/python3.11/site-packages/scrapyd/runner.py\", line 34, in main\n execute()\n File \"/Users/name/Coding/proj/lib/python3.11/site-packages/scrapy/cmdline.py\", line 160, in execute\n cmd.crawler_process = CrawlerProcess(settings)\n ^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/name/Coding/proj/lib/python3.11/site-packages/scrapy/crawler.py\", line 357, in __init__\n super().__init__(settings)\n File \"/Users/name/Coding/proj/lib/python3.11/site-packages/scrapy/crawler.py\", line 227, in __init__\n self.spider_loader = self._get_spider_loader(settings)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/name/Coding/proj/lib/python3.11/site-packages/scrapy/crawler.py\", line 221, in _get_spider_loader\n return loader_cls.from_settings(settings.frozencopy())\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/name/Coding/proj/lib/python3.11/site-packages/scrapy/spiderloader.py\", line 79, in from_settings\n return cls(settings)\n ^^^^^^^^^^^^^\n File \"/Users/name/Coding/proj/lib/python3.11/site-packages/scrapy/spiderloader.py\", line 34, in __init__\n self._load_all_spiders()\n File \"/Users/name/Coding/proj/lib/python3.11/site-packages/scrapy/spiderloader.py\", line 63, in _load_all_spiders\n for module in walk_modules(name):\n ^^^^^^^^^^^^^^^^^^\n File \"/Users/name/Coding/proj/lib/python3.11/site-packages/scrapy/utils/misc.py\", line 106, in walk_modules\n submod = import_module(fullpath)\n ^^^^^^^^^^^^^^^^^^^^^^^\n File \"/usr/local/Cellar/[email protected]/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/importlib/__init__.py\", line 126, in import_module\n return _bootstrap._gcd_import(name[level:], package, level)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"<frozen importlib._bootstrap>\", line 1204, in _gcd_import\n File \"<frozen importlib._bootstrap>\", line 1176, in _find_and_load\n File \"<frozen importlib._bootstrap>\", line 1147, in _find_and_load_unlocked\n File \"<frozen importlib._bootstrap>\", line 690, in _load_unlocked\n File \"<frozen importlib._bootstrap_external>\", line 940, in exec_module\n File \"<frozen importlib._bootstrap>\", line 241, in _call_with_frames_removed\n File \"/Users/name/Coding/proj/legislative/test/test/eggs/proj/num.egg/test/spiders/spider.py\", line 4, in <module>\n File \"/Users/name/Coding/proj/lib/python3.11/site-packages/dotenv/main.py\", line 336, in load_dotenv\n dotenv_path = find_dotenv()\n ^^^^^^^^^^^^^\n File \"/Users/name/Coding/proj/lib/python3.11/site-packages/dotenv/main.py\", line 300, in find_dotenv\n for dirname in _walk_to_root(path):\n File \"/Users/name/Coding/proj/lib/python3.11/site-packages/dotenv/main.py\", line 257, in _walk_to_root\n raise IOError('Starting path not found')\nOSError: Starting path not found\n"}