使用 Selenium 和 Python 抓取特定的 div

问题描述 投票:0回答:1

我有 HTML 代码,需要使用 Python 来抓取

<div class="odds ng-star-inserted"> 1.30 </div>
<div class="odds ng-star-inserted"> 2.30 </div>
<div class="odds ng-star-inserted"> 1.31 </div>
<div class="odds ng-star-inserted"> 2.31 </div>
值 1.30、2.30、1.31 和 2.31,但它们每行只返回 1.30 和 2.30。

结果需要是:

荷兰 韩国1.30 2.30 德国 日本1.31 2.31

但我明白:

荷兰 韩国1.30 2.30 德国 日本1.30 2.30

这是Python代码:

teams = []
btts = []
odds_events = []

box = driver.find_element(By.XPATH, '//*[@id="page"]/div[2]')
#Looking for 'sports titles'
sport_title = box.find_element(By.CLASS_NAME, 'sport-name')

parent = sport_title.find_element(By.XPATH, './..')
grandparent = parent.find_element(By.XPATH, './..').find_element(By.XPATH, './..').find_element(By.XPATH, './..')

single_row_events = grandparent.find_elements(By.CLASS_NAME, 'event')

for match in single_row_events:
    odds_event = match.find_elements(By.CLASS_NAME, 'games')
    odds_events.append(odds_event)
    # Scrape teams
    for team in match.find_elements(By.CLASS_NAME, 'rivals'):
        teams.append(team.text)
        
for odds_event in odds_events:
    for n, box in enumerate(odds_event):
    rows = box.find_elements(By.XPATH, '//div[@class="game g2 ng-star-inserted"]')
       if n == 0:
          btts.append(rows[0].text)

如果我设置

rows = box.find_elements(By.XPATH, './/*')
if n == 2:
显示错误

ValueError:所有数组的长度必须相同

但是如果我设置

 if n == 0:
会给我很好的结果,但是对于
<div class="game g3 ng-star-inserted">
所以在这种情况下结果是,但我不需要它。

荷兰 韩国1.10 2.10 3.10 德国 日本1.11 2.11 3.11

这是 HTML 代码:

  <div id="events">
    <game-filter class="ng-star-inserted">
      <div id="sport-legend" class="single">
        <div class="sport-name"> Football </div>
        <div class="games g3">
          <div class="game ng-star-inserted">
            <div class="game-name"> KI </div>
            <div class="selections s3 ng-star-inserted">
              <div class="selection ng-star-inserted"> Home </div>
              <div class="selection ng-star-inserted"> Away </div>
            </div>
          </div>
          <div class="game ng-star-inserted">
            <div class="game-name"> UG </div>
            <div class="selections s3 ng-star-inserted">
              <div class="selection ng-star-inserted"> Over </div>
              <div class="selection ng-star-inserted"> O/U </div>
              <div class="selection ng-star-inserted"> Under </div>
            </div>
          </div>
          <div class="game ng-star-inserted">
            <div class="game-name"> BTTS </div>
            <div class="selections s2 ng-star-inserted">
              <div class="selection ng-star-inserted"> GG </div>
              <div class="selection ng-star-inserted"> NG </div>
            </div>
          </div>
        </div>
      </div>
    </game-filter>
    <standard-item-info class="event ng-star-inserted">
      <div class="details">
        <div class="info">
          <div class="time">01:01</div>
          <div class="date">01.01.</div>
        </div>
        <div class="rivals">
          <div class="league">
            <!---->
            <span class="time-special ng-star-inserted">VIRT 10'
            </span> EL
          </div>
          <div class="home"> Netherlands </div>
          <div class="away"> South Korea </div>
        </div>
      </div>
      <standard-item-games class="games g3 ng-star-inserted">
        <div class="game g3 ng-star-inserted">
          <div class="ng-star-inserted">
            <standard-item-game class="ng-star-inserted">
              <div class="odds ng-star-inserted"> 1.10 </div>
            </standard-item-game>
            <standard-item-game class="ng-star-inserted">
              <div class="odds ng-star-inserted"> 2.10 </div>
            </standard-item-game>
            <standard-item-game class="ng-star-inserted">
              <div class="odds ng-star-inserted"> 3.10 </div>
            </standard-item-game>
          </div>
        </div>
        <div class="game g2 g3 ng-star-inserted">
          <div class="ng-star-inserted">
            <standard-item-game class="ng-star-inserted">
              <div class="odds ng-star-inserted"> 1.20 </div>
            </standard-item-game>
            <div class="odds limit ng-star-inserted"> 2.20 </div>
            <standard-item-game class="ng-star-inserted">
              <div class="odds ng-star-inserted"> 3.20 </div>
            </standard-item-game>
          </div>
        </div>
        <div class="game g2 ng-star-inserted">
          <div class="ng-star-inserted">
            <standard-item-game class="ng-star-inserted">
              <div class="odds ng-star-inserted"> 1.30 </div>
            </standard-item-game>
            <standard-item-game class="ng-star-inserted">
              <div class="odds ng-star-inserted"> 2.30 </div>
            </standard-item-game>
          </div>
        </div>
      </standard-item-games>
      <div class="show-all-expand ng-star-inserted">
        <div class="event-expand">
          <div class="icon"></div>
        </div>
      </div>
    </standard-item-info>
    <standard-item-info class="event ng-star-inserted">
      <div class="details">
        <div class="info">
          <div class="time">01:01</div>
          <div class="date">01.01.</div>
        </div>
        <div class="rivals">
          <div class="league">
            <!---->
            <span class="time-special ng-star-inserted">VIRT 10'
            </span> EL
          </div>
          <div class="home"> Germany </div>
          <div class="away"> Japan </div>
        </div>
      </div>
      <standard-item-games class="games g3 ng-star-inserted">
        <div class="game g3 ng-star-inserted">
          <div class="ng-star-inserted">
            <standard-item-game class="ng-star-inserted">
              <div class="odds ng-star-inserted"> 1.11 </div>
            </standard-item-game>
            <standard-item-game class="ng-star-inserted">
              <div class="odds ng-star-inserted"> 2.11 </div>
            </standard-item-game>
            <standard-item-game class="ng-star-inserted">
              <div class="odds ng-star-inserted"> 3.11 </div>
            </standard-item-game>
          </div>
        </div>
        <div class="game g2 g3 ng-star-inserted">
          <div class="ng-star-inserted">
            <standard-item-game class="ng-star-inserted">
              <div class="odds ng-star-inserted"> 1.21 </div>
            </standard-item-game>
            <div class="odds limit ng-star-inserted"> 2.21 </div>
            <standard-item-game class="ng-star-inserted">
              <div class="odds ng-star-inserted"> 3.21 </div>
            </standard-item-game>
          </div>
        </div>
        <div class="game g2 ng-star-inserted">
          <div class="ng-star-inserted">
            <standard-item-game class="ng-star-inserted">
              <div class="odds ng-star-inserted"> 1.31 </div>
            </standard-item-game>
            <standard-item-game class="ng-star-inserted">
              <div class="odds ng-star-inserted"> 2.31 </div>
            </standard-item-game>
          </div>
        </div>
      </standard-item-games>
      <div class="show-all-expand ng-star-inserted">
        <div class="event-expand">
          <div class="icon"></div>
        </div>
      </div>
    </standard-item-info>
  </div>
</div>```
python python-3.x selenium-webdriver web-scraping xpath
1个回答
0
投票

一个解决方案:

teamsdiv = driver.find_elements_by_xpath ("//div[@id='events']//div[@class='home' or @class='away']")
notesdiv = driver.find_elements_by_xpath ("//div[@id='events']//standard-item-games")

teams = []
for i in range(0, len(teamsdiv), 2):
    teams.append([teamsdiv[i].text, teamsdiv[i+1].text])

notes = []
for i in range(len(notesdiv)):
    notes.append(notesdiv[i].text.split('\n')[-2:])

for i in range(len(notes)):
    print(teams[i], notes[i])

结果:

['Netherlands', 'South Korea'] ['1.30', '2.30']
['Germany', 'Japan'] ['1.31', '2.31']
© www.soinside.com 2019 - 2024. All rights reserved.