我创建了一个使用selenium plus scrapy的蜘蛛,这表明它确实刮掉了现在直到昨天使用相同的脚本我能够将输出写入csv文件但现在在下午它显示scrapy未被识别命令用python和pip
所以我从头开始安装了包括python在内的一切,当我试图运行蜘蛛之后,蜘蛛运行平稳,但不会像以前那样以首选的方式编写。
从4个小时以来一直坚持不懈但是如果任何人都可以帮助我的话就无法找到方法我会非常感激以下是你需要的东西
我试过多次更换管道
settings.朋友
BOT_NAME = 'mcmastersds'
SPIDER_MODULES = ['grainger.spiders']
NEWSPIDER_MODULE = 'grainger.spiders'
LOG_LEVEL = 'INFO'
ROBOTSTXT_OBEY = False
ITEM_PIPELINES = {'grainger.pipelines.GraingerPipeline': 300,}
DOWNLOAD_DELAY = 1
USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36 OPR/43.0.2442.806'
PROXY_MODE = 0
RETRY_TIMES = 0
SPLASH_URL = 'http://localhost:8050'
SPIDER_MIDDLEWARES = {
'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}
pipelines.朋友
import csv
import os.path
from scrapy.loader import ItemLoader
from scrapy.loader.processors import TakeFirst, MapCompose, Join
class GraingerPipeline(object):
def __init__(self):
if not os.path.isfile('CONTENT_psysci.csv'):
self.csvwriter = csv.writer(open('safale.csv', 'a',newline="",encoding='utf8'))
self.csvwriter.writerow(['url','Title','sellername','travlink','travlink1','rating','Crreview','feature','Description','proddescription','Additonalinfo','details','detailsextended','producttable','stockstatus','newseller','condition','deliverystatus','price','bestsellersrank','mainimage','subimage'])
def process_item(self, item, spider):
self.csvwriter.writerow([item['url'],item['title'],item['sellername'],item['travlink'],item['travlink1'],item['rating'],item['Crreview'],item['feature'],item['Description'],item['proddescription'],item['Additonalinfo'],item['details'],item['detailsextended'],item['producttable'],item['stockstatus'],item['newseller'],item['condition'],item['deliverystatus'],item['price'],item['bestsellersrank'],item['mainimage'],item['subimage']])
return item
你能帮助我吗?
如果您只是想在不对数据做任何特定事情的情况下编写项目,我建议使用feed exports功能。 Scrapy提供内置的CSV feed exporter。
您的代码无法正常工作的原因是您从未关闭在self.csvwriter
初始化语句中打开的csv文件。
您应该使用open_spider
和close_spider
方法打开文件并在处理完项目后关闭它,看一下类似的scrapy文档中的json pipeline example。
因此,您的上述管道应适用于以下代码:
class GraingerPipeline(object):
csv_file = None
def open_spider(self):
if not os.path.isfile('CONTENT_psysci.csv'):
self.csvfile = open('safale.csv', 'a',newline="",encoding='utf8')
self.csvwriter = csv.writer(self.csvfile)
self.csvwriter.writerow(['url','Title','sellername','travlink','travlink1','rating','Crreview','feature','Description','proddescription','Additonalinfo','details','detailsextended','producttable','stockstatus','newseller','condition','deliverystatus','price','bestsellersrank','mainimage','subimage'])
def process_item(self, item, spider):
self.csvwriter.writerow([item['url'],item['title'],item['sellername'],item['travlink'],item['travlink1'],item['rating'],item['Crreview'],item['feature'],item['Description'],item['proddescription'],item['Additonalinfo'],item['details'],item['detailsextended'],item['producttable'],item['stockstatus'],item['newseller'],item['condition'],item['deliverystatus'],item['price'],item['bestsellersrank'],item['mainimage'],item['subimage']])
return item
def close_spider(self):
if self.csv_file:
self.csv_file.close()