如何在R html_nodes中传递带有ID或类名的CSS选择器?

问题描述 投票:0回答:1

但是,无论我尝试使用哪个CSS选择器或xpath,我都不会从德国议会的主页中提取议会成员的姓名。

https://www.bundestag.de/ausschuesse/a11#

Screenshot

#names <- landing_page_AS %>%
#html_nodes("main > div") %>% 
#extract2(7) %>%
#html_nodes("h3") %>%
#html_text()

names <- landing_page_AS %>% 
html_nodes(".bt-teaser-person-text h3") %>%
#html_nodes(xpath = "//*[(@id = "bt-collapse-538348")]//h3") %>%
#html_nodes(xpath = "//*[contains(concat( " ", @class, " " ), 
concat( " ", "bt-teaser-person-text", " " ))]//h3") %>% 
html_text()
html r xpath web-scraping css-selectors
1个回答
0
投票

我能够使用硒从德国议会网站中提取姓名列表。问题可能是服务器在不使用无头浏览器的情况下拒绝了对您的机器人的访问。

如果使用硒,这是您可以使用的代码和xpath,对我有用:

from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
driver = webdriver.Chrome(r"your_webdriver_address", chrome_options = chrome_options)

#OPEN NEW BROWSER
driver.set_page_load_timeout(10)

driver.get('https://www.bundestag.de/en/members')

button = driver.find_elements_by_xpath("//*[contains(@class, 'icon-list-bullet')]")
button = button[0]
button.click()
time.sleep(3)
GE_MEMBERS_NAMES = driver.find_elements_by_xpath("//*[contains(@class, 'bt-teaser-person-text')]/h3")

for item in GE_MEMBERS_NAMES:
    name = item.text
    print (name)
© www.soinside.com 2019 - 2024. All rights reserved.