httr2 中的 req_cache 和 req_perform_parallel

Question

我想缓存大型 API 拉取的结果，并将其分解为适应 API 限制的较小拉取列表。使用

httr2

，我可以为缓存设置请求和外部目录，并且拉取运行良好，但外部目录中不会保存任何内容以供以后使用。

具有较小请求子集的示例代码（根据 https://icite.od.nih.gov/api 在 NIH 公开可用）：

library(httr2) # api with cache capability

# list of several URL requests
citeLst <- list()
citeLst$one <- paste(17038628,17320734,16677657, sep=",") # string of PMIDs
citeLst$two <- paste(17380955,17299329,17297311, sep=",")
citeLst$three <- paste(17280498,17141525,17262759, sep=",")
citeLst <- lapply(citeLst, function(x) paste0("https://icite.od.nih.gov/api/pubs?pmids=", x))

reqLst <- lapply(citeLst, request) # set up list of requests
# add cache path for each request in the list - produces an external directory for each citeLst element
cacheLst <- local({
  lengthVec <- as.character(1:length(citeLst))
  lst <- mapply(function(x,y) req_cache(x, path=file.path("cache","citePull",y)),
                reqLst, lengthVec, SIMPLIFY=FALSE)
  return(lst)
})
respLst <- req_perform_parallel(cacheLst) # pulls data fine, but nothing in the cache

req_perform_parallel

设计用于处理列表，但

req_cache

仅适用于单个字符串。这就是为什么我使用

mapply

来设置它，为

citeLst

中的每个元素创建一个单独的目录。

此处的示例 (https://github.com/r-lib/httr2/issues/447) 显示

req_cache

确实可以与

req_perform_parallel

一起使用，请求中只有一个字符串。

虽然

iCiteR

包可用于提取这些东西，但我没有看到任何关于使用它进行缓存的能力。

Answer 1

req_cache()

依赖于服务器的配合。来自

?req_cache

：

req_cache() 缓存对具有状态代码 200 和至少一个标准缓存标头（例如 Expires、Etag、Last-Modified、Cache-Control）的 GET 请求的响应

位于 https://icite.od.nih.gov/api 的 API 服务器不会发送任何这些缓存标头，因此不会执行任何缓存。在这种情况下，您需要推出自己的缓存解决方案。

httr2 中的 req_cache 和 req_perform_parallel

问题描述投票：0回答：1

1个回答

最新问题

httr2 中的 req_cache 和 req_perform_parallel

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1