使用 Rvest 进行 Webscarping - 使用 xpath 将表提取为数据框

Question

我正在尝试使用通过 Chrome 浏览器复制的 XPath 来提取此页面“https://clinicaltrials.gov/study/NCT05817110?tab=history”上的表格

我尝试过使用此代码，但它不起作用。我偶尔会从事网页抓取工作，对 HTML 有基本的了解。预先感谢您对此提供的任何帮助。

# URL of the webpage
url <- "https://clinicaltrials.gov/study/NCT05817110?tab=history"

# Fetch the webpage
webpage <- read_html(url)

# Extract the table using the XPath
table_data <- webpage %>%
  html_nodes(xpath = '/*[@id="study-record-versions-table"]/ctg-card/div/div[2]/ctg-history-changes-table/table/tbody') %>%
  html_table(fill = TRUE)

Answer 1

看起来页面使用了 javascript 来加载页面。有几种可能的解决方案。使用

read_html_live()

或直接通过 api 链接访问数据：“https://clinicaltrials.gov/api/int/studies/NCT05817110?history=true”（使用浏览器开发人员工具的网络选项卡找到）

study <- jsonlite::fromJSON("https://clinicaltrials.gov/api/int/studies/NCT05817110?history=true")
study$history$changes

使用 Rvest 进行 Webscarping - 使用 xpath 将表提取为数据框

问题描述投票：0回答：1

1个回答

最新问题

使用 Rvest 进行 Webscarping - 使用 xpath 将表提取为数据框

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1