为什么对 CSS 选择器使用第一个/最后一个顺序会在 rvest 中返回错误?

问题描述 投票:0回答:1

我正在尝试抓取一个有几个按钮的页面。 我想选择/单击最后一个按钮。使用 Chrome 的 selector gadget 扩展,我可以通过在选择器末尾添加

:last
成功选择最后一个按钮。但是,当我在
rvest
中运行以下函数时,它们会返回:
Error in onRejected(reason) : code: -32000 message: DOM Error while querying

代码如下:

page <- 
  read_html_live("https://researchers.cedars-sinai.edu/search?by=text&type=user")

page %>% 
  html_elements("span button:last") 

# or

page$click(css = "span button:last")

我已经尝试过这些更改,但它们不起作用:

:nth-child(1)
:first-child
:nth-last-child(1)

另外,我知道XPATH可以解决这个问题。但是,问题是

rvest
click()
还不接受 XPATH。所以,我必须坚持使用 CSS

r web-scraping css-selectors rvest
1个回答
0
投票

您可以调用 API 来获取所有内容。

library(tidyverse)
library(httr2)

req <- request("https://researchers.cedars-sinai.edu/api/users") %>% 
  req_body_json(list(params = list(by = "text", type = "user"))) %>% 
  req_perform() %>% 
  resp_body_json(simplifyVector = TRUE) 

n <- req %>% 
  pluck("pagination", "total")

df <- map(seq(0, n, 100), 
    ~ request("https://researchers.cedars-sinai.edu/api/users") %>% 
      req_body_json(list(params = list(by = "text", type = "user"), 
                         pagination = list(startFrom = .x, perPage = 100))) %>% 
      req_perform() %>% 
      resp_body_json(simplifyVector = TRUE) %>% 
      pluck("resource") %>% 
      as_tibble()) %>% 
  list_rbind()

# A tibble: 986 × 16
   lastName    overview  hasThumbnail discoveryUrlId positions tags$explicit discoveryId linkedObjectsCounts$…¹
   <chr>       <chr>     <lgl>        <chr>          <list>    <list>        <chr>                        <int>
 1 Abdel-Hafiz "Hany Ab… TRUE         Hany.Abdel-Ha… <df>      <df [1 × 3]>  1513                             1
 2 Abdul-Haqq   NA       TRUE         Ryan.Abdul     <df>      <NULL>        3636                             0
 3 Aboujaoude   NA       TRUE         Elias.Aboujao… <df>      <NULL>        10472                            0
 4 Abuav       "Dr. Abu… TRUE         Rachel.Abuav   <df>      <NULL>        4847                             0
 5 Accortt     "Eynav A… TRUE         Eynav.Accortt  <df>      <df [8 × 3]>  1865                             8
 6 Ader        "The ove… TRUE         Marilyn.Ader   <df>      <df [8 × 3]>  1237                            13
 7 Ahdoot       NA       TRUE         Michael.Ahdoot <df>      <df [1 × 3]>  877                             10
 8 Ahluwalia    NA       TRUE         Sonu.Ahluwalia <df>      <df [1 × 3]>  2958                             0
 9 Ahmed        NA       FALSE        Waseem.Ahmed   <df>      <NULL>        18202                            0
10 Ainsworth    NA       TRUE         Richard.Ainsw… <df>      <NULL>        7154                             3
# ℹ 976 more rows
# ℹ abbreviated name: ¹​linkedObjectsCounts$grants$all
# ℹ 13 more variables: linkedObjectsCounts$grants$favourites <int>,
#   linkedObjectsCounts$teachingActivities <df[,2]>, $equipment <df[,2]>, $professionalActivities <df[,2]>,
#   $publications <df[,2]>, firstName <chr>, firstNameLastName <chr>, equipmentLinkTypes <list>,
#   objectId <int>, updatedWhen <chr>, hasCollaborationData <lgl>, embeddableMediaList <list>,
#   customFilterOne <list>
# ℹ Use `print(n = ...)` to see more rows
© www.soinside.com 2019 - 2024. All rights reserved.