我正在尝试运行以下代码,但收到此错误“NameError:名称'scrapedate'未定义”
import scrapy
from datetime import datetime, timedelta
from dogscraper.items import DogItem
racedate = '2024-01-25'
days = 2
realdate = datetime.strptime(racedate, '%Y-%m-%d').date()
scrape_list = [(realdate - timedelta(days=x)).strftime('%Y-%m-%d') for x in range(days)]
class DogspiderSpider(scrapy.Spider):
name = "dogspider"
allowed_domains = ["www.thedogs.com.au"]
start_urls = ["https://www.thedogs.com.au/racing/"+racedate]
def parse(self, response):
for scrapedate in scrape_list:
next_dateurl = 'https://www.thedogs.com.au/racing/' + scrapedate
yield response.follow(next_dateurl, callback=self.parse_date)
def parse_date(self, response):
nswmeetings = response.css('table.meeting-grid')[0]
nswmeetings = nswmeetings.css('td.meetings-venues__name')
for meeting in nswmeetings:
meeting_url = meeting.css('a::attr(href)').get()
nextmeeting = 'https://www.thedogs.com.au' + meeting_url
yield response.follow(nextmeeting, callback=self.parse_meeting)
def parse_meeting(self, response):
races = response.css('a.race-box.race-box--result')
for race in races:
race_url = race.css('a.race-box.race-box--result::attr(href)').get()
nextrace = 'https://www.thedogs.com.au' + race_url
yield response.follow(nextrace, callback=self.parse_race)
def parse_race(self, response):
dogs = response.css('tr.accordion__anchor.race-runner')
dog_item = DogItem()
for dog in dogs:
dog_item['date'] = scrapedate
名称错误:名称“scrapedate”未定义
本质上,我想在def parse下获取scrape_list中的scrapedate,并在稍后运行def parse_race时使用它,dog_item['date'] = scrapedate
观察您的代码,我可以看到您正在尝试使用 scrapedate,它是在 parse_race 函数(生成器)中的 parse 函数(生成器)中声明的。这将导致 NameError,因为 scrapedate 是特定于解析生成器的局部变量。因此,如果您想在 parse_race 中使用 scrapedate,您必须将其设为 Class 属性:
class DogspiderSpider(scrapy.Spider):
# ... (your existing code)
scrapedate = None # Initialize to None
def parse(self, response):
for scrapedate in scrape_list:
# ... (your existing code)
self.scrapedate = scrapedate # assign the attribue
yield response.follow(next_dateurl, callback=self.parse_date)
# ..... (Your existing code)
def parse_race(self, response):
# ... (your existing code)
dog_item['date'] = self.scrapedate # Access the attribute
感谢@SIM。
我能够使用元传递抓取日期
#...
yield response.follow(next_dateurl, callback=self.parse_date, meta={'scrapedate' : scrapedate})
然后
#...
yield response.follow(nextmeeting, callback=self.parse_meeting, meta={'scrapedate' : response.meta['scrapedate']})
#...
yield response.follow(nextrace, callback=self.parse_race, meta={'scrapedate' : response.meta['scrapedate']})
我可以用
来调用它dog_item['date'] = response.meta['scrapedate']