使用R以手风琴格式抓取网络表格

Question

我希望使用 rvest 来抓取 https://www.panynj.gov/port/en/our-port/facts-and-figures.html 上的“每月货物量”数据。我相信我有正确的 xpath，但我没有得到任何结果。谢谢你。

library(rvest)
library(dplyr)

url <- 'https://www.panynj.gov/port/en/our-port/facts-and-figures.html'

schedule <- url %>% 
    read_html() %>%  
    html_nodes(xpath = '/html/body/div[1]/div/div/div[2]/div[62]/div[1]/div/div[1]/div/div[1]/div[5]/div[1]/div[2]/div[1]/div/div/div/div[1]/div/table/tbody/tr[3]/td[2]') %>% 
    html_table() %>% 
    data.frame

结果：具有 0 列和 0 行的数据框

Answer 1

第一个问题是该网站不是静态的，而是使用 javascript 创建您想要抓取的表格。要解决此问题，请使用

read_html_live

而不是

read_html

（这需要您在计算机上安装 Chrome）。其次，即使进行了这种更改，当您使用 xpath 表达式而不是表格定位单个表格单元格时，您也会得到一个空数据框。因此，由于单元格不包含表格，因此

html_table

将不会返回任何内容。相反，要获取单个单元格的内容，您可以使用

html_text

:

library(rvest)

url <- "https://www.panynj.gov/port/en/our-port/facts-and-figures.html"

schedule <- read_html_live(url) |>
  html_elements(
    xpath = "/html/body/div[1]/div/div/div[2]/div[62]/div[1]/div/div[1]/div/div[1]/div[5]/div[1]/div[2]/div[1]/div/div/div/div[1]/div/table/tbody/tr[3]/td[2]"
  ) |> 
  html_text()

schedule
#> [1] "    3,737,112"

使用R以手风琴格式抓取网络表格

问题描述投票：0回答：1

1个回答

最新问题

使用R以手风琴格式抓取网络表格

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1