使用 rvest 从网页中提取表格

Question

我已经加载了

rvest

库并尝试了几个函数来尝试获取表格。即使用我右键单击页面并检查元素并复制 XPath 后复制的表的 xpath。

webpage <- webpage <- read_html("https://www.wunderground.com/history/daily/KBNA/date/2024-9-30")

复制 XPath：

webpage %>% html_element(xpath = "/html/body/app-root/app-history/one-column-layout/wu-header/sidenav/mat-sidenav-container/mat-sidenav-content/div[2]/section/div[2]/div[1]/div[5]/div[1]/div/lib-city-history-observation/div/div[2]/table")

复制完整的 XPath：

webpage %>% html_element(xpath = "/html/body/app-root/app-history/one-column-layout/wu-header/sidenav/mat-sidenav-container/mat-sidenav-content/div[2]/section/div[2]/div[1]/div[5]/div[1]/div/lib-city-history-observation/div/div[2]/table")

仅复制 XPath 时，我收到以下错误：

Error: unexpected symbol in "webpage %>% html_element(xpath = "//*[@id="inner"

复制完整的 XPath 时，我得到：

{xml_missing} <NA>

如何使用 XPath 提取表？或者rvest还有其他方法吗？

Answer 1

所以您遇到了一些事情。首先，该网站使用 javascript 构建 HTML，因此您需要使用

read_html_live

来触发此过程。其次，如果您查看

xpath

，您会发现其中有

""

。只要您在 xpath 参数中考虑到这一点就可以了。在您的情况下，您使用

"xpath = "//*[@id="inner"

，它将通过意外的符号，因为 R 认为 xpath 中的

在开始处关闭

。最直接的解决方案是使用

代替

library(rvest)
  
## this is just for reprex's purpose
  
session = read_html_live('https://www.wunderground.com/history/daily/KBNA/date/2024-9-30')

dat = session |>
  html_element(xpath = '//*[@id="inner-content"]/div[2]/div[1]/div[5]/div[1]/div/lib-city-history-observation/div/div[2]/table') |>
  html_table()

head(dat)
#> # A tibble: 6 × 10
#>   Time  Temperature `Dew Point` Humidity Wind  `Wind Speed` `Wind Gust` Pressure
#>   <chr> <chr>       <chr>       <chr>    <chr> <chr>        <chr>       <chr>   
#> 1 12:3… 67 °F       64 °F       90 °%    WNW   5 °mph       0 °mph      29.25 °…
#> 2 12:5… 67 °F       64 °F       90 °%    W     3 °mph       0 °mph      29.25 °…
#> 3 1:53… 67 °F       64 °F       90 °%    WNW   5 °mph       0 °mph      29.25 °…
#> 4 2:53… 67 °F       64 °F       90 °%    WNW   3 °mph       0 °mph      29.25 °…
#> 5 3:53… 67 °F       64 °F       90 °%    CALM  0 °mph       0 °mph      29.25 °…
#> 6 4:53… 67 °F       64 °F       90 °%    WSW   3 °mph       0 °mph      29.25 °…
#> # ℹ 2 more variables: Precip. <chr>, Condition <chr>

^{创建于 2024 年 10 月 31 日，使用 reprex v2.1.1}

使用 rvest 从网页中提取表格

问题描述投票：0回答：1

1个回答

最新问题

使用 rvest 从网页中提取表格

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1