我编写了一段代码,用于通过加载更多按钮来抓取网站。我仅在按钮之前获取内容。
import scrapy
from load_more.items import LoadMoreItem
from scrapy_selenium import SeleniumRequest
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
class FoolSpider(scrapy.Spider):
name = "fool"
def start_requests(self):
url = 'https://www.fool.com/earnings-call-transcripts/'
yield SeleniumRequest(url=url,
callback=self.parse,
script="document.querySelector('.load-more-button').click()",
)
def parse(self, response):
load_item=LoadMoreItem()
load_item['Headline']=response.css('a.text-gray-1100 > h5.font-medium::text').getall()
yield load_item
请解决这个问题
我也想要加载更多按钮后的内容
您实际上甚至不需要
selenium
来获取您正在寻找的信息。
load more
按钮调度 url 请求,例如 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=?
,该请求返回一个 json 对象,其中包含注入到页面中的其他链接的 html。
您只需直接将这些请求发送到这些网址即可获得您正在寻找的结果。您可以在下面的日志中查看多个页面的结果。
例如:
import scrapy
from scrapy.selector import Selector
class FoolSpider(scrapy.Spider):
name = "fool"
def start_requests(self):
url = 'https://www.fool.com/earnings-call-transcripts/'
yield scrapy.Request(url, cb_kwargs={"page": 1})
def parse(self, response, page=None):
if page > 1:
# after first page take extract html from json
text = response.json()["html"]
# wrap the in a parent tag and create a scrapy selector
response = Selector(text=f"<html>{text}</html>")
for headline in response.css('a.text-gray-1100 > h5.font-medium::text'):
# iterate through headlines
yield {"headline": headline.get()} # yield headlines
# send request for next page to json api url with appropriate headers
yield scrapy.Request(f"https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page={page+1}", cb_kwargs={"page": page+1}, headers={"X-Requested-With": "fetch"})
日志输出
2024-05-20 22:56:54 [scrapy.core.engine] INFO: Spider opened
2024-05-20 22:56:54 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2024-05-20 22:56:54 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2024-05-20 22:56:54 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.fool.com/earnings-call-transcripts/> (referer: None)
2024-05-20 22:56:54 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/>
{'headline': 'Zoom Video Communications (ZM) Q1 2025 Earnings Call Transcript'}
2024-05-20 22:56:54 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/>
{'headline': 'Palo Alto Networks (PANW) Q3 2024 Earnings Call Transcript'}
2024-05-20 22:56:54 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/>
{'headline': 'Tremor International (TRMR) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:54 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/>
{'headline': 'Wix.com (WIX) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:54 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/>
{'headline': 'Li Auto (LI) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:54 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/>
{'headline': 'Global-e Online (GLBE) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:54 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/>
{'headline': 'Take-Two Interactive Software (TTWO) Q4 2024 Earnings Call Transcript'}
2024-05-20 22:56:54 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/>
{'headline': 'Applied Materials (AMAT) Q2 2024 Earnings Call Transcript'}
2024-05-20 22:56:54 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/>
{'headline': 'DXC Technology (DXC) Q4 2024 Earnings Call Transcript'}
2024-05-20 22:56:54 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/>
{'headline': 'Lightspeed Commerce (LSPD) Q4 2024 Earnings Call Transcript'}
2024-05-20 22:56:54 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/>
{'headline': 'JD.com (JD) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:54 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/>
{'headline': 'Baidu (BIDU) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:54 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/>
{'headline': 'Walmart (WMT) Q1 2025 Earnings Call Transcript'}
2024-05-20 22:56:54 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/>
{'headline': 'Cisco Systems (CSCO) Q3 2024 Earnings Call Transcript'}
2024-05-20 22:56:54 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/>
{'headline': 'Infinera (INFN) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:54 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/>
{'headline': 'VOXX International Corporation (VOXX) Q4 2024 Earnings Call Transcript'}
2024-05-20 22:56:54 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/>
{'headline': 'Dynatrace (DT) Q4 2024 Earnings Call Transcript'}
2024-05-20 22:56:54 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/>
{'headline': 'Monday.com (MNDY) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:54 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/>
{'headline': 'Sportradar Group Ag (SRAD) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:54 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/>
{'headline': 'Home Depot (HD) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:55 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=2> (referer: htt
ps://www.fool.com/earnings-call-transcripts/)
2024-05-20 22:56:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=2>
{'headline': 'International Game Technology Plc (IGT) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=2>
{'headline': 'Organigram (OGI) Q2 2024 Earnings Call Transcript'}
2024-05-20 22:56:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=2>
{'headline': 'Tencent (TCEHY) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=2>
{'headline': 'Sea Limited (SE) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=2>
{'headline': 'Alibaba Group (BABA) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=2>
{'headline': 'StoneCo (STNE) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=2>
{'headline': 'Inovio Pharmaceuticals (INO) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=2>
{'headline': 'HUYA (HUYA) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=2>
{'headline': 'Main Street Capital (MAIN) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=2>
{'headline': 'Algonquin Power & Utilities (AQN) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=2>
{'headline': 'Honda Motor (HMC) Q4 2024 Earnings Call Transcript'}
2024-05-20 22:56:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=2>
{'headline': 'Cs Disco (LAW) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=2>
{'headline': 'MacroGenics (MGNX) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=2>
{'headline': 'Equinix (EQIX) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=2>
{'headline': 'Figs (FIGS) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=2>
{'headline': 'Amplitude (AMPL) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=2>
{'headline': 'Blink Charging (BLNK) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=2>
{'headline': 'Unity Software (U) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=2>
{'headline': 'Warby Parker (WRBY) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=2>
{'headline': 'Evolent Health (EVH) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:55 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=3> (referer: htt
ps://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=2)
2024-05-20 22:56:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=3>
{'headline': 'Innovative Industrial Properties (IIPR) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=3>
{'headline': 'Westport Fuel Systems (WPRT) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=3>
{'headline': 'Organogenesis (ORGO) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=3>
{'headline': 'Outbrain (OB) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=3>
{'headline': 'Warner Music Group (WMG) Q2 2024 Earnings Call Transcript'}
2024-05-20 22:56:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=3>
{'headline': 'Medical Properties Trust (MPW) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=3>
{'headline': 'GoodRx (GDRX) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=3>
{'headline': 'Yeti (YETI) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=3>
{'headline': 'Glatfelter (GLT) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=3>
{'headline': 'SelectQuote (SLQT) Q3 2024 Earnings Call Transcript'}
2024-05-20 22:56:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=3>
{'headline': 'NCR (VYX) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=3>
{'headline': 'Krispy Kreme (DNUT) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=3>
{'headline': 'Cedar Fair (FUN) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=3>
{'headline': 'FiscalNote (NOTE) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=3>
{'headline': 'SNDL (SNDL) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=3>
{'headline': 'Camtek (CAMT) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=3>
{'headline': 'Pan American Silver (PAAS) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=3>
{'headline': 'Hecla Mining (HL) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=3>
{'headline': 'Cronos Group (CRON) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=3>
{'headline': 'Plug Power (PLUG) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:56 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=4> (referer: htt
ps://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=3)
2024-05-20 22:56:56 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=4>
{'headline': 'Clearway Energy (CWEN) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:56 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=4>
{'headline': 'Roblox (RBLX) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:56 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=4>
{'headline': 'Six Flags Entertainment (SIX) Q1 2024 Earnings Call Transcript'}
2024-05-20 22:56:56 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fool.com/earnings-call-transcripts/filtered_articles_by_page/?page=4>