如何使用带有cricinfo的BeautifulSoup的Python提取玩家名称

问题描述 投票:0回答:1

我正在学习漂亮的汤。我想从cricinfo.com中提取球员姓名,即两支球队的11场比赛。确切的链接是“ https://www.espncricinfo.com/series/13266/scorecard/439146/west-indies-vs-south-africa-1st-t20i-south-africa-tour-of-west-indies-2010”问题在于,该网站仅显示“击球手”类别下的球员(如果有击球的话)。否则,它们将置于“ wrap dnb”类下。我想提取所有球员,而不论他们是否击球。我如何维护两个阵列(每个团队一个),以动态搜索“包装板球手”和“包装dnb”中的球员(如果需要)?

这是我的尝试:

from urllib.request import urlopen
from bs4 import BeautifulSoup
import pandas as pd
years = []
# Years we will be analyzing
for i in range(2010, 2018):
    years.append(i)

names = []


# URL page we will scraping (see image above)
url = "https://www.espncricinfo.com/series/13266/scorecard/439146/west-indies-vs-south-africa-1st-t20i-south-africa-tour-of-west-indies-2010"
# this is the HTML from the given URL
html = urlopen(url)
soup = BeautifulSoup(html, features="html.parser")


for a in range(0, 1):
    names.append([a.getText() for a in soup.find_all("div", class_="cell batsmen")[1:][a].findAll('a', limit=1)])

soup = soup.find_all("div", class_="wrap dnb")
print(soup[0])
python python-3.x web-scraping beautifulsoup pycharm
1个回答
0
投票

虽然BeautifulSoup可以实现,但这并不是完成这项工作的最佳工具。所有这些数据(以及更多)都可以通过API获得。只需将其拉出,然后您就可以解析json以获取所需的内容(以及更多内容)。这是一个快速脚本,可让每个团队获得11名球员:

您可以通过使用开发工具(Ctrl-Shft-I)并查看浏览器发出的请求来获取api网址(请查看侧面板中的Network-> XHR。您可能需要单击四周以查看它发出的请求/ call)

import requests

url = 'https://site.web.api.espn.com/apis/site/v2/sports/cricket/13266/summary'

payload = {
'contentorigin': 'espn',
'event': '439146',
'lang': 'en',
'region': 'gb',
'section': 'cricinfo'}

jsonData = requests.get(url, params=payload).json()

roster = jsonData['rosters']

players = {}
for team in roster:
    players[team['team']['displayName']] = []
    for player in team['roster']:
        playerName = player['athlete']['displayName']
        players[team['team']['displayName']].append(playerName)

输出:

print (players)
{'West Indies': ['Chris Gayle', 'Andre Fletcher', 'Dwayne Bravo', 'Ramnaresh Sarwan', 'Narsingh Deonarine', 'Kieron Pollard', 'Darren Sammy', 'Nikita Miller', 'Jerome Taylor', 'Sulieman Benn', 'Kemar Roach'], 'South Africa': ['Graeme Smith', 'Loots Bosman', 'Jacques Kallis', 'AB de Villiers', 'Jean-Paul Duminy', 'Johan Botha', 'Alviro Petersen', 'Ryan McLaren', 'Roelof van der Merwe', 'Dale Steyn', 'Charl Langeveldt']}

见下文:

enter image description here

© www.soinside.com 2019 - 2024. All rights reserved.