我想要抓取的页面是 https://www.biggerpockets.com/forums/88/topics/895460-cap-rate-vs-interest-rate
开发者控制台中的 xpath 返回与帖子标题对应的文本元素
但是,运行scrapy时,同样的xpath不起作用,标题返回'None'
yield SplashRequest("https://www.biggerpockets.com/forums/88/topics/895460-cap-rate-vs-interest-rate", self.parse_post, args={'wait': 2})
def parse_post(self, response):
title = response.xpath('//div[contains(@class, "simplified-forums__discussion")]//div[contains(@class, "simplified-forums__discussion__first-post")]//div[contains(@class, "simplified-forums__card__content")]//h1/text()').get()
print(title)
2023-11-01 00:16:11 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.biggerpockets.com/forums/49/topics/276013-interest-rate>
None
当我访问时
http://localhost:8050/render.html?url=https://www.biggerpockets.com/forums/88/topics/895460-cap-rate-vs-interest-rate
页面也渲染得很好,不确定到底出了什么问题,因为我确信 xpath 是正确的。
如果我遗漏了什么,请帮助我
正如我在评论中提到的,你的 xpath 似乎是错误的。
import scrapy
class biggerpockets(scrapy.Spider):
name ='biggerpockets'
start_urls = ['https://www.biggerpockets.com/forums/88/topics/895460-cap-rate-vs-interest-rate']
def parse(self,response):
title = response.xpath("//h1[@class='simplified-forums__topic-content__title']/text()").get()
print("-------Extracted text-----------------")
print(title)
print("------------------------")