我根据项目教程组装的网络爬虫遇到 ModuleNotFound 错误。我的爬虫运行良好,但是一旦我添加了代码来提取数据并将其存储在 MongoDB 数据库中,我在启动“ModuleNotFoundError”爬虫时遇到了问题,看起来像是无法找到我的蜘蛛的模块(名称-“stackspider”) )。由于蜘蛛之前运行良好,不确定间隙在哪里。
不想为第一个问题添加过多的代码。如何解决此 ModuleNotFound 错误?
如果需要更多详细信息或乐意提供代码。感谢您的帮助。
[这是我用来创建 scraper 项目的教程的链接] (https://realpython.com/web-scraping-with-scrapy-and-mongodb/#scrapy-project)
2020-04-14 08:23:28 [twisted] CRITICAL:
Traceback (most recent call last):
File "/Users/blouie/.conda/envs/GoScrape/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
File "/Users/blouie/.conda/envs/GoScrape/lib/python3.7/site-packages/scrapy/crawler.py", line 89, in crawl
self.engine = self._create_engine()
File "/Users/blouie/.conda/envs/GoScrape/lib/python3.7/site-packages/scrapy/crawler.py", line 103, in _create_engine
return ExecutionEngine(self, lambda _: self.stop())
File "/Users/blouie/.conda/envs/GoScrape/lib/python3.7/site-packages/scrapy/core/engine.py", line 70, in __init__
self.scraper = Scraper(crawler)
File "/Users/blouie/.conda/envs/GoScrape/lib/python3.7/site-packages/scrapy/core/scraper.py", line 71, in __init__
self.itemproc = itemproc_cls.from_crawler(crawler)
File "/Users/blouie/.conda/envs/GoScrape/lib/python3.7/site-packages/scrapy/middleware.py", line 53, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "/Users/blouie/.conda/envs/GoScrape/lib/python3.7/site-packages/scrapy/middleware.py", line 34, in from_settings
mwcls = load_object(clspath)
File "/Users/blouie/.conda/envs/GoScrape/lib/python3.7/site-packages/scrapy/utils/misc.py", line 50, in load_object
mod = import_module(module)
File "/Users/blouie/.conda/envs/GoScrape/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 965, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'stackspider'
我的蜘蛛代码(stackspider.py)
import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
from scrapy import Spider
from scrapy.selector import Selector
from bigscrape.items import BigscrapeItem
class StackspiderSpider(Spider):
name = 'stackspider'
allowed_domains = ['stackoverflow.com']
start_urls = ["http://stackoverflow.com/questions?pagesize=50&sort=newest"]
rules = (
Rule(LinkExtractor(allow=r'Items/'), callback='parse_item', follow=True),
)
def parse(self, response):
#item = BigscrapeItem()
questions = Selector(response).xpath('//div[@class="summary"]/h3')
for question in questions:
item = BigscrapeItem()
item['title'] = question.xpath(
'a[@class="question-hyperlink"]/text()').get()
item['url'] = question.xpath(
'a[@class="question-hyperlink"]/@href').get()
yield item
如果您正在使用我认为您必须添加一个保存此配置的 scrapy.cfg 文件:
[设置]
默认 =