我想知道如何在
R
中访问网站并查看网络活动和响应,就像在 Google Chrome
中一样,并将其相关属性复制为 url
?
甚至不知道从哪里开始,但我尝试了
httr2
url <- "https://en.wikipedia.org/wiki/Association_football"
library(httr2)
response <- url %>%
httr2::request() %>%
httr2::req_perform()
我应该能够在
response
对象的某处看到下面的所有信息吗?
httr2
仅发出单个请求(除非被重定向),并且不会加载任何链接的资源。虽然你可以使用chromote
。如果您只需要加载资源的列表,您也许可以在 chromote 会话中使用 Performance API:
library(chromote)
b <- ChromoteSession$new()
# request pattern from https://rstudio.github.io/chromote/#loading-a-page-reliably
p <- b$Page$loadEventFired(wait_ = FALSE)
b$Page$navigate("https://en.wikipedia.org/wiki/Association_football", wait_ = FALSE)
#> <Promise [pending]>
b$wait_for(p)
#> $timestamp
#> [1] 157552.4
# evaluate js in chromote session, get result value
perf_entries <- b$Runtime$evaluate(
'window.performance.getEntries()
.filter(entry => entry.entryType == "resource" || entry.entryType == "navigation")
.map(entry => ({e: entry.entryType, i: entry.initiatorType, u: entry.name}))',
returnByValue = TRUE)$result$value
# list of named lists, bind_rows() will handle this
dplyr::bind_rows(perf_entries)
#> # A tibble: 74 × 3
#> e i u
#> <chr> <chr> <chr>
#> 1 navigation navigation https://en.wikipedia.org/wiki/Association_football
#> 2 resource link https://en.wikipedia.org/w/load.php?lang=en&modules=ex…
#> 3 resource script https://en.wikipedia.org/w/load.php?lang=en&modules=st…
#> 4 resource img https://en.wikipedia.org/static/images/icons/wikipedia…
#> 5 resource link https://en.wikipedia.org/w/load.php?lang=en&modules=si…
#> 6 resource img https://en.wikipedia.org/static/images/mobile/copyrigh…
#> 7 resource img https://en.wikipedia.org/static/images/mobile/copyrigh…
#> 8 resource img https://upload.wikimedia.org/wikipedia/en/thumb/1/1b/S…
#> 9 resource img https://upload.wikimedia.org/wikipedia/commons/thumb/4…
#> 10 resource img https://upload.wikimedia.org/wikipedia/commons/thumb/4…
#> # ℹ 64 more rows
创建于 2024-06-05,使用 reprex v2.1.0