无法在Web刮擦时获得链接

问题描述 投票:0回答:2
我想在选择“ T20I”时使用Python进行Web刮擦。为此,我需要在请求和美丽的小组中提出一个特定的链接。

当我打开

https://www.espncricinfo.com/cricketers/team/india-6,我获得了一个“ intl” selection。 选择“ INTL”的图像:

但是当我选择“ T20I”时,我会得到一个不同的页面,但具有类似的链接

Https://www.espncricinfo.com/cricketers/team/india-6

选择“ T20i”的图像:

在这种情况下我该怎么办才能检索数据?选择“ T20i”时我将如何获取数据?

我建议使用硒 这是一个可以很好地工作的示例

from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.common.exceptions import TimeoutException chrome_options = Options() chrome_options.add_argument("--start-maximized"); browser =webdriver.Chrome(r"YOUR chromedriver.exe", options=chrome_options) browser.get("https://www.espncricinfo.com/cricketers/team/india-6") #find element you want to click by XPATH button = browser.find_element(By.XPATH, '//*[@id="main-container"]/div[5]/div[1]/div[2]/div[2]/div/div/div[2]/div[2]/div[1]/div/div[5]/a/span/span') #click the button "T20I" button.click()

IT!

python web-scraping beautifulsoup python-requests
2个回答
1
投票
data通过JavaScript渲染。有一个API将其拉动。有API时不要使用硒。

import requests import pandas as pd url = 'https://hs-consumer-api.espncricinfo.com/v1/pages/player/search' payload = { 'mode': 'BOTH', 'page': '1', 'records': '40', 'filterActive': 'true', 'filterTeamId': '6', 'filterClassId': '3', 'filterFormatLevel': 'ALL', 'sort': 'ALPHA_ASC'} jsonData = requests.get(url, params=payload).json() df = pd.DataFrame(jsonData['results'])

输出:1st 5行31行

print(df.head().to_string()) id objectId name longName mobileName indexName battingName fieldingName slug imageUrl dateOfBirth dateOfDeath gender battingStyles bowlingStyles longBattingStyles longBowlingStyles image countryTeamId playerRoleTypeIds playingRoles headshotImage 0 101430 1125976 Arshdeep Singh Arshdeep Singh Arshdeep Singh Arshdeep Singh Arshdeep Singh arshdeep-singh /db/PICTURES/CMS/356700/356795.1.png {'year': 1999, 'month': 2, 'date': 5} None M [lhb] [lmf] [left-hand bat] [left-arm medium-fast] {'id': 356795, 'objectId': 1365005, 'slug': 'arshdeep-singh-player-portrait', 'url': '/db/PICTURES/CMS/356700/356795.1.png', 'width': 160, 'height': 213, 'caption': 'Arshdeep Singh player portrait', 'longCaption': 'Arshdeep Singh player portrait', 'credit': 'Getty Images', 'photographer': None, 'peerUrls': None} 6 [4] [bowler] {'id': 322178, 'objectId': 1264653, 'slug': 'arshdeep-singh-player-page-headshot-cutout-2021', 'url': '/db/PICTURES/CMS/322100/322178.png', 'width': 600, 'height': 436, 'caption': 'Arshdeep Singh player page headshot cutout, 2021', 'longCaption': 'Arshdeep Singh player page headshot cutout, 2021', 'credit': None, 'photographer': None, 'peerUrls': {'FILM': None, 'WIDE': None, 'SQUARE': '/db/PICTURES/CMS/322100/322178.square.png'}} 1 12894 26421 R Ashwin Ravichandran Ashwin Ashwin Ashwin, R R Ashwin Ashwin ravichandran-ashwin /db/PICTURES/CMS/302300/302395.jpg {'year': 1986, 'month': 9, 'date': 17} None M [rhb] [ob] [right-hand bat] [right-arm offbreak] {'id': 302395, 'objectId': 1220592, 'slug': 'r-ashwin-portrait', 'url': '/db/PICTURES/CMS/302300/302395.jpg', 'width': 160, 'height': 200, 'caption': 'R Ashwin portrait', 'longCaption': 'R Ashwin portrait, April 2020', 'credit': 'Getty Images', 'photographer': None, 'peerUrls': None} 6 [11] [bowling allrounder] {'id': 316521, 'objectId': 1251150, 'slug': 'r-ashwin-headshot', 'url': '/db/PICTURES/CMS/316500/316521.png', 'width': 600, 'height': 436, 'caption': 'R Ashwin headshot', 'longCaption': 'R Ashwin headshot', 'credit': None, 'photographer': None, 'peerUrls': {'FILM': None, 'WIDE': None, 'SQUARE': None}} 2 73507 694211 Avesh Khan Avesh Khan Avesh Khan Avesh Khan Avesh Khan Avesh Khan avesh-khan /db/PICTURES/CMS/200000/200065.1.jpg {'year': 1996, 'month': 12, 'date': 13} None M [rhb] [rfm] [right-hand bat] [right-arm fast-medium] {'id': 200065, 'objectId': 807641, 'slug': 'avesh-khan-portrait', 'url': '/db/PICTURES/CMS/200000/200065.1.jpg', 'width': 160, 'height': 200, 'caption': 'Avesh Khan portrait', 'longCaption': 'Avesh Khan portrait, November 2014', 'credit': 'MPCA', 'photographer': None, 'peerUrls': None} 6 [4] [bowler] {'id': 322244, 'objectId': 1264747, 'slug': 'avesh-khan-player-page-headshot-cutout-2021', 'url': '/db/PICTURES/CMS/322200/322244.png', 'width': 600, 'height': 436, 'caption': 'Avesh Khan player page headshot cutout, 2021', 'longCaption': 'Avesh Khan player page headshot cutout, 2021', 'credit': None, 'photographer': None, 'peerUrls': {'FILM': None, 'WIDE': None, 'SQUARE': '/db/PICTURES/CMS/322200/322244.square.png'}} 3 70640 625383 JJ Bumrah Jasprit Bumrah Bumrah Bumrah, JJ JJ Bumrah Bumrah jasprit-bumrah /db/PICTURES/CMS/356800/356849.1.png {'year': 1993, 'month': 12, 'date': 6} None M [rhb] [rf] [right-hand bat] [right-arm fast] {'id': 356849, 'objectId': 1365132, 'slug': 'bumrah-player-portrait', 'url': '/db/PICTURES/CMS/356800/356849.1.png', 'width': 160, 'height': 206, 'caption': 'Bumrah player portrait', 'longCaption': 'Bumrah player portrait', 'credit': 'Getty Images', 'photographer': None, 'peerUrls': None} 6 [4] [bowler] {'id': 319940, 'objectId': 1260219, 'slug': 'jasprit-bumrah-player-page-headshot-cutout-2021', 'url': '/db/PICTURES/CMS/319900/319940.png', 'width': 600, 'height': 436, 'caption': 'Jasprit Bumrah player page headshot cutout, 2021', 'longCaption': 'Jasprit Bumrah player page headshot cutout, 2021', 'credit': None, 'photographer': None, 'peerUrls': {'FILM': None, 'WIDE': None, 'SQUARE': '/db/PICTURES/CMS/319900/319940.square.png'}} 4 61325 430246 YS Chahal Yuzvendra Chahal Chahal Chahal, YS YS Chahal Chahal yuzvendra-chahal /db/PICTURES/CMS/312100/312155.png {'year': 1990, 'month': 7, 'date': 23} None M [rhb] [lbg] [right-hand bat] [legbreak googly] {'id': 312155, 'objectId': 1239214, 'slug': 'yuzvendra-chahal-portrait', 'url': '/db/PICTURES/CMS/312100/312155.png', 'width': 160, 'height': 200, 'caption': 'Yuzvendra Chahal portrait', 'longCaption': 'Yuzvendra Chahal portrait, November 2020', 'credit': 'Getty Images', 'photographer': None, 'peerUrls': None} 6 [4] [bowler] {'id': 319955, 'objectId': 1260243, 'slug': 'yuzvendra-chahal-player-page-headshot-cutout-2021', 'url': '/db/PICTURES/CMS/319900/319955.png', 'width': 600, 'height': 436, 'caption': 'Yuzvendra Chahal player page headshot cutout, 2021', 'longCaption': 'Yuzvendra Chahal player page headshot cutout, 2021', 'credit': None, 'photographer': None, 'peerUrls': {'FILM': None, 'WIDE': None, 'SQUARE': '/db/PICTURES/CMS/319900/319955.square.png'}}

0
投票

但是现在API需要X-HSCI-Auth-token ..我如何打电话?
	

最新问题
© www.soinside.com 2019 - 2025. All rights reserved.