我有 HTML 代码,需要使用 Python 来抓取
<div class="odds ng-star-inserted"> 1.30 </div>
、<div class="odds ng-star-inserted"> 2.30 </div>
、<div class="odds ng-star-inserted"> 1.31 </div>
和 <div class="odds ng-star-inserted"> 2.31 </div>
值 1.30、2.30、1.31 和 2.31,但它们每行只返回 1.30 和 2.30。
结果需要是:
荷兰 韩国1.30 2.30 德国 日本1.31 2.31
但我明白:
荷兰 韩国1.30 2.30 德国 日本1.30 2.30
这是Python代码:
teams = []
btts = []
odds_events = []
box = driver.find_element(By.XPATH, '//*[@id="page"]/div[2]')
#Looking for 'sports titles'
sport_title = box.find_element(By.CLASS_NAME, 'sport-name')
parent = sport_title.find_element(By.XPATH, './..')
grandparent = parent.find_element(By.XPATH, './..').find_element(By.XPATH, './..').find_element(By.XPATH, './..')
single_row_events = grandparent.find_elements(By.CLASS_NAME, 'event')
for match in single_row_events:
odds_event = match.find_elements(By.CLASS_NAME, 'games')
odds_events.append(odds_event)
# Scrape teams
for team in match.find_elements(By.CLASS_NAME, 'rivals'):
teams.append(team.text)
for odds_event in odds_events:
for n, box in enumerate(odds_event):
rows = box.find_elements(By.XPATH, '//div[@class="game g2 ng-star-inserted"]')
if n == 0:
btts.append(rows[0].text)
如果我设置
rows = box.find_elements(By.XPATH, './/*')
和 if n == 2:
显示错误
ValueError:所有数组的长度必须相同
但是如果我设置
if n == 0:
会给我很好的结果,但是对于 <div class="game g3 ng-star-inserted">
所以在这种情况下结果是,但我不需要它。
荷兰 韩国1.10 2.10 3.10 德国 日本1.11 2.11 3.11
这是 HTML 代码:
<div id="events">
<game-filter class="ng-star-inserted">
<div id="sport-legend" class="single">
<div class="sport-name"> Football </div>
<div class="games g3">
<div class="game ng-star-inserted">
<div class="game-name"> KI </div>
<div class="selections s3 ng-star-inserted">
<div class="selection ng-star-inserted"> Home </div>
<div class="selection ng-star-inserted"> Away </div>
</div>
</div>
<div class="game ng-star-inserted">
<div class="game-name"> UG </div>
<div class="selections s3 ng-star-inserted">
<div class="selection ng-star-inserted"> Over </div>
<div class="selection ng-star-inserted"> O/U </div>
<div class="selection ng-star-inserted"> Under </div>
</div>
</div>
<div class="game ng-star-inserted">
<div class="game-name"> BTTS </div>
<div class="selections s2 ng-star-inserted">
<div class="selection ng-star-inserted"> GG </div>
<div class="selection ng-star-inserted"> NG </div>
</div>
</div>
</div>
</div>
</game-filter>
<standard-item-info class="event ng-star-inserted">
<div class="details">
<div class="info">
<div class="time">01:01</div>
<div class="date">01.01.</div>
</div>
<div class="rivals">
<div class="league">
<!---->
<span class="time-special ng-star-inserted">VIRT 10'
</span> EL
</div>
<div class="home"> Netherlands </div>
<div class="away"> South Korea </div>
</div>
</div>
<standard-item-games class="games g3 ng-star-inserted">
<div class="game g3 ng-star-inserted">
<div class="ng-star-inserted">
<standard-item-game class="ng-star-inserted">
<div class="odds ng-star-inserted"> 1.10 </div>
</standard-item-game>
<standard-item-game class="ng-star-inserted">
<div class="odds ng-star-inserted"> 2.10 </div>
</standard-item-game>
<standard-item-game class="ng-star-inserted">
<div class="odds ng-star-inserted"> 3.10 </div>
</standard-item-game>
</div>
</div>
<div class="game g2 g3 ng-star-inserted">
<div class="ng-star-inserted">
<standard-item-game class="ng-star-inserted">
<div class="odds ng-star-inserted"> 1.20 </div>
</standard-item-game>
<div class="odds limit ng-star-inserted"> 2.20 </div>
<standard-item-game class="ng-star-inserted">
<div class="odds ng-star-inserted"> 3.20 </div>
</standard-item-game>
</div>
</div>
<div class="game g2 ng-star-inserted">
<div class="ng-star-inserted">
<standard-item-game class="ng-star-inserted">
<div class="odds ng-star-inserted"> 1.30 </div>
</standard-item-game>
<standard-item-game class="ng-star-inserted">
<div class="odds ng-star-inserted"> 2.30 </div>
</standard-item-game>
</div>
</div>
</standard-item-games>
<div class="show-all-expand ng-star-inserted">
<div class="event-expand">
<div class="icon"></div>
</div>
</div>
</standard-item-info>
<standard-item-info class="event ng-star-inserted">
<div class="details">
<div class="info">
<div class="time">01:01</div>
<div class="date">01.01.</div>
</div>
<div class="rivals">
<div class="league">
<!---->
<span class="time-special ng-star-inserted">VIRT 10'
</span> EL
</div>
<div class="home"> Germany </div>
<div class="away"> Japan </div>
</div>
</div>
<standard-item-games class="games g3 ng-star-inserted">
<div class="game g3 ng-star-inserted">
<div class="ng-star-inserted">
<standard-item-game class="ng-star-inserted">
<div class="odds ng-star-inserted"> 1.11 </div>
</standard-item-game>
<standard-item-game class="ng-star-inserted">
<div class="odds ng-star-inserted"> 2.11 </div>
</standard-item-game>
<standard-item-game class="ng-star-inserted">
<div class="odds ng-star-inserted"> 3.11 </div>
</standard-item-game>
</div>
</div>
<div class="game g2 g3 ng-star-inserted">
<div class="ng-star-inserted">
<standard-item-game class="ng-star-inserted">
<div class="odds ng-star-inserted"> 1.21 </div>
</standard-item-game>
<div class="odds limit ng-star-inserted"> 2.21 </div>
<standard-item-game class="ng-star-inserted">
<div class="odds ng-star-inserted"> 3.21 </div>
</standard-item-game>
</div>
</div>
<div class="game g2 ng-star-inserted">
<div class="ng-star-inserted">
<standard-item-game class="ng-star-inserted">
<div class="odds ng-star-inserted"> 1.31 </div>
</standard-item-game>
<standard-item-game class="ng-star-inserted">
<div class="odds ng-star-inserted"> 2.31 </div>
</standard-item-game>
</div>
</div>
</standard-item-games>
<div class="show-all-expand ng-star-inserted">
<div class="event-expand">
<div class="icon"></div>
</div>
</div>
</standard-item-info>
</div>
</div>```
一个解决方案:
teamsdiv = driver.find_elements_by_xpath ("//div[@id='events']//div[@class='home' or @class='away']")
notesdiv = driver.find_elements_by_xpath ("//div[@id='events']//standard-item-games")
teams = []
for i in range(0, len(teamsdiv), 2):
teams.append([teamsdiv[i].text, teamsdiv[i+1].text])
notes = []
for i in range(len(notesdiv)):
notes.append(notesdiv[i].text.split('\n')[-2:])
for i in range(len(notes)):
print(teams[i], notes[i])
结果:
['Netherlands', 'South Korea'] ['1.30', '2.30']
['Germany', 'Japan'] ['1.31', '2.31']