rvest 网络抓取返回空 xml_nodeset

Question

我正在尝试使用

rvest

抓取数据，但已经有一段时间了，我无法完成看似简单的任务。我尝试了各种 html 和 css 元素来提取数据，但不断获取输出

{xml_nodeset (0)}

。

例如，我一直在使用的代码是：

library(rvest)
library(dplyr)

url <- "https://www.helmet.beam.vt.edu/bicycle-helmet-ratings.html"
webpage <- read_html(url)
webpage %>% html_nodes("span.helmet-name")

我希望有一个以自行车名称开头的列表，这些名称在将此类与选择器小工具一起使用时会突出显示：

我也在selectorgadget中自定义了这个类，因为默认选项只有

，这可能没问题，但看起来不那么具体。我只是查看了代码并找到了类名。

Answer 1

当该页面将数据集加载到

bicycleDataRaw

js 变量时：

var bicycleDataRaw = [
    {
        brand: "Specialized",
        model: "Tactic 4",
        score: 8.554,
        rating: 5,
        cost: "$110",
        style: "Mountain",
        date: "2021",
        photo: "specialized-tactic-4.jpg",
        certifications: "CPSC",
        low: 2.019,
        high: 6.535
    }, ...
]

，我们可以通过评估 javascript 将其从

chromote

会话中提取为 R 列表。

通过

chromote

返回的

rvest::LiveHTML

对象评估

rvest::read_html_live()

会话中的 javascript :

library(rvest)
s <- read_html_live("https://www.helmet.beam.vt.edu/bicycle-helmet-ratings.html")
s$session$Runtime$evaluate("bicycleDataRaw", returnByValue = TRUE)$result$value |> 
  dplyr::bind_rows() 

#> # A tibble: 241 × 11
#>    brand   model score rating cost  style date  photo certifications   low  high
#>    <chr>   <chr> <dbl>  <int> <chr> <chr> <chr> <chr> <chr>          <dbl> <dbl>
#>  1 Specia… Tact…  8.55      5 $110  Moun… 2021  spec… CPSC            2.02  6.54
#>  2 Sweet … Trai…  8.69      5 $180  Moun… 2021  swee… CPSC            2.03  6.66
#>  3 Specia… Mode   8.80      5 $120  Urban 2021  spec… CPSC            2.42  6.38
#>  4 Fox     Drop…  8.85      5 $200  Moun… 2020  fox-… CPSC            2.30  6.55
#>  5 Giant   Rev …  9.12      5 $65   Road  2021  gian… CPSC            2.20  6.92
#>  6 Lazer   G1 M…  9.23      5 $240  Road  2020  laze… CPSC            2.40  6.83
#>  7 Bontra… Rall…  9.35      5 $150  Moun… 2019  bont… CPSC            2.34  7.01
#>  8 Specia… Alig…  9.55      5 $50   Mult… 2020  spec… CPSC            2.95  6.6 
#>  9 Lazer   Toni…  9.85      5 $80   Road  2021  laze… CPSC            2.56  7.29
#> 10 Troy L… A2 M…  9.99      5 $179  Moun… 2019  troy… CPSC            2.52  7.46
#> # ℹ 231 more rows

rvest 网络抓取返回空 xml_nodeset

问题描述投票：0回答：1

1个回答

最新问题

rvest 网络抓取返回空 xml_nodeset

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1