处理不均匀的数据

问题描述 投票:0回答:1

如何处理被窃听的页面,以便与this类似地正确删除数据

虽然我尝试在下面执行类似的操作而没有运气,因为页面的结构并不简单。任何想法我如何能够迎合不平等的数据,因为数据因网页而随机变得不均匀。

期望

 Azam FC v Mwenge    1.8    https://www.bet365.com.au/#/AC/B1/C1/D13/E104/F16/S1/
 Western Sydney Wanderers v Melbourne City    2.87    https://www.bet365.com.au/#/AC/B1/C1/D13/E101/F16/S1/
 Sydney FC v Newcastle Jets    1.53    https://www.bet365.com.au/#/AC/B1/C1/D13/E101/F16/S1/

输出看起来像

 Azam FC v Mwenge    1.8    https://www.bet365.com.au/#/AC/B1/C1/D13/E104/F16/S1/
 Western Sydney Wanderers v Melbourne City    1.53    https://www.bet365.com.au/#/AC/B1/C1/D13/E101/F16/S1/

1.53不应该是西悉尼,而是悉尼FC

script.朋友

 import collections
 import csv
 import time

 from selenium import webdriver
 from selenium.common.exceptions import TimeoutException, NoSuchElementException
 from selenium.webdriver.common.by import By
 from selenium.webdriver.support import expected_conditions as EC
 from selenium.webdriver.support.ui import WebDriverWait
 from selenium.webdriver.support.ui import WebDriverWait as wait

 driver = webdriver.Chrome()
 driver.set_window_size(1024, 600)
 driver.maximize_window()


 driver.get('https://www.bet365.com.au/#/AS/B1/')
 driver.get('https://www.bet365.com.au/#/AS/B1/')


 def page_counter():
     for x in range(1000):
         yield x

 count = page_counter()

 clickMe = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, ('//div[div/div/text()="Main Lists"]//div[starts-with(@class, "sm-CouponLink_Label") and normalize-space()]'))))
 coupon_lables = [x.text for x in driver.find_elements_by_xpath('//div[div/div/text()="Main Lists"]//div[starts-with(@class, "sm-CouponLink_Label") and normalize-space()]')]

 links = dict((next(count) + 1, e) for e in coupon_lables)
 desc_links = collections.OrderedDict(sorted(links.items(), reverse=True))
 for key, label in desc_links.items():
     driver.get('https://www.bet365.com.au/#/AS/B1/')
     clickMe = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, ('//div[div/div/text()="Main Lists"]//div[starts-with(@class, "sm-CouponLink_Label") and normalize-space()]'))))
     driver.find_element_by_xpath(f'//div[contains(text(), "' + label + '")]').click()

     groups = '/html/body/div[1]/div/div[2]/div[1]/div/div[2]/div[2]/div/div/div[2]/div'
     xp_match_link = "//div//div[contains(@class, 'sl-CouponParticipantWithBookCloses_Name ')]"
     xp_bp1 = "//div[contains(@class, 'gl-Market_HasLabels')]/following-sibling::div[contains(@class, 'gl-Market_PWidth-12-3333')][1]//div[contains(@class, 'gl-ParticipantOddsOnly')]"

     try:
         # wait for the data to populate the tables
         wait(driver, 5).until(EC.element_to_be_clickable((By.XPATH, (xp_bp1))))
         time.sleep(2)

         data = []
         for elem in driver.find_elements_by_xpath(groups):
             try:
                 match_link = elem.find_element_by_xpath(xp_match_link) \
                     .get_attribute('href')
             except:
                 match_link = None

             try:
                 bp1 = elem.find_element_by_xpath(xp_bp1).text
             except:
                 bp1 = None

             data.append([bp1, match_link])
             # data.append([match_link, bp1, ba1, bp3, ba3])
         print(data)
         url1 = driver.current_url

         with open('C:\\daw.csv', 'a', newline='',
                   encoding="utf-8") as outfile:
             writer = csv.writer(outfile)
             for row in data:
                 writer.writerow(row)

     except TimeoutException as ex:
         pass
     except NoSuchElementException as ex:
         print(ex)
         break

 driver.close()
python selenium web-scraping
1个回答
0
投票

如果您更改以下xpath,它应该工作:

xp_match_link = "//div//div[contains(@class, 'sl-CouponParticipantWithBookCloses_NameContainer ')]"
© www.soinside.com 2019 - 2024. All rights reserved.