我想将该网站https://www.askramar.com/Ponuda抓取到网上。首先,我应该抓取指向每个汽车页面的所有链接。扩展链接在html结构中如下所示:
我尝试了以下代码,但在R中得到了一个空对象:
url <- "https://www.askramar.com/Ponuda"
html_document <- read_html(url)
links <- html_document %>%
html_nodes(xpath = '//*[contains(concat(" ", @class, " "), concat(" ", "vozilo", " "))]') %>%
html_attr(name = "href")
是网页上的javascript吗?请帮忙!谢谢!
[是,该页面使用javascript来加载您感兴趣的内容。但是,它只是通过向https://www.askramar.com/Ajax/GetResults.cshtml
调用xhr GET请求来完成此操作。您可以这样做:
url <- "https://www.askramar.com/Ajax/GetResults.cshtml"
html_document <- read_html(url)
links <- html_document %>%
html_nodes(xpath = '//a[contains(@href, "Vozilo")]') %>%
html_attr(name = "href")
print(links)