我想从互联网上下载一个pdf文件并将其保存在本地高清中。下载后,pdf输出文件有很多空页。我该怎么办才能修复它?
例:
require(XML)
url <- ('http://cran.r-project.org/doc/manuals/R-intro.pdf')
download.file(url, 'introductionToR.pdf')
提前致谢。
尝试wb-mode这样:
download.file(url, 'introductionToR.pdf', mode="wb")
。
对我而言,它就是这样的。
您可以使用tabulizer包下载pdfs并将表导出为data.frame
https://ropensci.org/tutorials/tabulizer_tutorial.html
install.packages("devtools")
# on 64-bit Windows
ghit::install_github(c("ropenscilabs/tabulizerjars", "ropenscilabs/tabulizer"), INSTALL_opts = "--no-multiarch")
# elsewhere
ghit::install_github(c("ropenscilabs/tabulizerjars", "ropenscilabs/tabulizer"))
library(tabulizer)
f2 <- "https://github.com/leeper/tabulizer/raw/master/inst/examples/data.pdf"
extract_tables(f2, pages = 1, method = "data.frame")