我正在尝试抓取一个有几个按钮的页面。 我想选择/单击最后一个按钮。使用 Chrome 的 selector gadget 扩展,我可以通过在选择器末尾添加
:last
成功选择最后一个按钮。但是,当我在 rvest
中运行以下函数时,它们会返回:Error in onRejected(reason) : code: -32000 message: DOM Error while querying
代码如下:
page <-
read_html_live("https://researchers.cedars-sinai.edu/search?by=text&type=user")
page %>%
html_elements("span button:last")
# or
page$click(css = "span button:last")
我已经尝试过这些更改,但它们不起作用:
:nth-child(1)
、:first-child
和:nth-last-child(1)
。
另外,我知道XPATH可以解决这个问题。但是,问题是
rvest
的 click()
还不接受 XPATH。所以,我必须坚持使用 CSS。
您可以调用 API 来获取所有内容。
library(tidyverse)
library(httr2)
req <- request("https://researchers.cedars-sinai.edu/api/users") %>%
req_body_json(list(params = list(by = "text", type = "user"))) %>%
req_perform() %>%
resp_body_json(simplifyVector = TRUE)
n <- req %>%
pluck("pagination", "total")
df <- map(seq(0, n, 100),
~ request("https://researchers.cedars-sinai.edu/api/users") %>%
req_body_json(list(params = list(by = "text", type = "user"),
pagination = list(startFrom = .x, perPage = 100))) %>%
req_perform() %>%
resp_body_json(simplifyVector = TRUE) %>%
pluck("resource") %>%
as_tibble()) %>%
list_rbind()
# A tibble: 986 × 16
lastName overview hasThumbnail discoveryUrlId positions tags$explicit discoveryId linkedObjectsCounts$…¹
<chr> <chr> <lgl> <chr> <list> <list> <chr> <int>
1 Abdel-Hafiz "Hany Ab… TRUE Hany.Abdel-Ha… <df> <df [1 × 3]> 1513 1
2 Abdul-Haqq NA TRUE Ryan.Abdul <df> <NULL> 3636 0
3 Aboujaoude NA TRUE Elias.Aboujao… <df> <NULL> 10472 0
4 Abuav "Dr. Abu… TRUE Rachel.Abuav <df> <NULL> 4847 0
5 Accortt "Eynav A… TRUE Eynav.Accortt <df> <df [8 × 3]> 1865 8
6 Ader "The ove… TRUE Marilyn.Ader <df> <df [8 × 3]> 1237 13
7 Ahdoot NA TRUE Michael.Ahdoot <df> <df [1 × 3]> 877 10
8 Ahluwalia NA TRUE Sonu.Ahluwalia <df> <df [1 × 3]> 2958 0
9 Ahmed NA FALSE Waseem.Ahmed <df> <NULL> 18202 0
10 Ainsworth NA TRUE Richard.Ainsw… <df> <NULL> 7154 3
# ℹ 976 more rows
# ℹ abbreviated name: ¹linkedObjectsCounts$grants$all
# ℹ 13 more variables: linkedObjectsCounts$grants$favourites <int>,
# linkedObjectsCounts$teachingActivities <df[,2]>, $equipment <df[,2]>, $professionalActivities <df[,2]>,
# $publications <df[,2]>, firstName <chr>, firstNameLastName <chr>, equipmentLinkTypes <list>,
# objectId <int>, updatedWhen <chr>, hasCollaborationData <lgl>, embeddableMediaList <list>,
# customFilterOne <list>
# ℹ Use `print(n = ...)` to see more rows