网络抓取在学校的项目

问题描述 投票:1回答:2

我正在尝试使用Selenium从页面中抓取数据。我上周做了,但本周发生了一些变化,现在它不再起作用了。问题是“显示更多”按钮,或“Prikažibloj”,你可以在网站上看到。我有多页要抓,但让我们专注于一个。

代码是:

options = Options()
options.headless = True
driver = webdriver.Chrome('/Users/Nenad/chromedriver', options=options)
driver.get('https://www.nekretnine.rs/stambeni-objekti/stanovi/zvezdara-konjarnik-milica-rakica-57m2-milica-rakica/NkJXDiY2ugE/')
try:
    element = driver.find_element_by_css_selector('div.row:nth-child(2) > div:nth-child(2) > div:nth-child(1) > div:nth-child(1) > form:nth-child(2) > button:nth-child(2)').click()
    sleep(randint(3, 5))
    home_phone = driver.find_element_by_css_selector('div.row:nth-child(2) > div:nth-child(2) > div:nth-child(1) > div:nth-child(1) > form:nth-child(2) > span:nth-child(1)')
    condo_agency_cell_phones.append(home_phone.text)
except:
    condo_agency_cell_phones.append('NaN')
try:
    element = driver.find_element_by_css_selector('div.row:nth-child(2) > div:nth-child(2) > div:nth-child(1) > div:nth-child(1) > form:nth-child(4) > button:nth-child(2)').click()
    sleep(randint(3, 5))
    cell_phone = driver.find_element_by_css_selector('div.row:nth-child(2) > div:nth-child(2) > div:nth-child(1) > div:nth-child(1) > form:nth-child(4) > span:nth-child(1)')
    condo_agency_cell_phones.append(cell_phone.text)
except:
    condo_agency_cell_phones.append('NaN')
driver.close()

上周它与xpath一起工作,但现在却没有。我甚至找到了一个按钮,但它没有点击:

options = Options()
options.headless = False
driver = webdriver.Chrome('/Users/Nenad/chromedriver', options=options)
driver.get('https://www.nekretnine.rs/stambeni-objekti/stanovi/zvezdara-konjarnik-milica-rakica-57m2-milica-rakica/NkJXDiY2ugE/')
sleep(20)
try:
    element = driver.find_element_by_xpath("//button\[@type='button'\]").click()
    print(element.text)
except:
    print('NaN')
python selenium web-scraping
2个回答
1
投票

而不是xpath,尝试通过css选择器find_element_by_css_selector(button[type="button"])找到


0
投票

如果第一个答案没有解决您的问题,请试试这个。导入了一些不同的库。在上面的代码中,“try:”为未定义的变量返回错误,因为未导入库。

from selenium import webdriver
from selenium.webdriver import Firefox
from selenium.webdriver.firefox.options import Options
from time import sleep
options = Options()
options.headless = True
driver = webdriver.Chrome('/Users/Nenad/chromedriver', options=options)
# driver = webdriver.Firefox(executable_path=r'C:\\Py\\geckodriver.exe');

driver.get('https://www.nekretnine.rs/stambeni-objekti/stanovi/zvezdara-konjarnik-milica-rakica-57m2-milica-rakica/NkJXDiY2ugE/')
condo_agency_cell_phones = []
try:
    element = driver.find_element_by_css_selector('div.row:nth-child(2) > div:nth-child(2) > div:nth-child(1) > div:nth-child(1) > form:nth-child(2) > button:nth-child(2)').click()
    # sleep(randint(3, 5))
    sleep(4)
    # home_phone1 = driver.find_element_by_xpath("html/body/div[11]/div[1]/div[2]/div[1]/div/div[2]/div[2]/div/div/form[1]/span")
    # condo_agency_cell_phones.append(home_phone1.text)
    home_phone = driver.find_element_by_css_selector('div.row:nth-child(2) > div:nth-child(2) > div:nth-child(1) > div:nth-child(1) > form:nth-child(2) > span:nth-child(1)')
    print(home_phone.text)
    condo_agency_cell_phones.append(home_phone.text)
except:
    condo_agency_cell_phones.append('NaN')
try:
    element = driver.find_element_by_css_selector('div.row:nth-child(2) > div:nth-child(2) > div:nth-child(1) > div:nth-child(1) > form:nth-child(4) > button:nth-child(2)').click()
    # sleep(randint(3, 5))
    sleep ( 4 )
    cell_phone = driver.find_element_by_css_selector('div.row:nth-child(2) > div:nth-child(2) > div:nth-child(1) > div:nth-child(1) > form:nth-child(4) > span:nth-child(1)')
    condo_agency_cell_phones.append(cell_phone.text)
except:
    condo_agency_cell_phones.append('NaN')

print(condo_agency_cell_phones)
driver.close()
© www.soinside.com 2019 - 2024. All rights reserved.