我知道有很多类似的问题,但我似乎没有找到一个问这个问题(请原谅我,如果我错了)。我正在试图抓住一个网站获取天气数据,我成功地为其中一个网页做了这样的事情。但是,我想循环这个过程。我看过enter link description here enter link description here
但我不相信他们解决了我的问题..
该目录最后从http://climate.rutgers.edu/stateclim_v1/nclimdiv/index.php?stn=NJ00&elem=avgt
to稍微改变
http://climate.rutgers.edu/stateclim_v1/nclimdiv/index.php?stn=NJ00&elem=pcpn
等等。即使它们没有按数字增加,我怎么能循环它们?
码:
nj_weather_data<-read_html("http://climate.rutgers.edu/stateclim_v1/nclimdiv/")
### Get info you want from web page###
hurr<-html_nodes(nj_weather_data,"#climdiv_table")
### Extract info and turn into dataframe###
precip_table<-as.data.frame(html_table(hurr))%>%
select(-Rank)
假设您想要平均T,最小T,降水...当您在温度表上方的表格中单击时,查看URL的更改方式。这是通过javascript完成的,为了获得这一点,你必须通过某种(无头)浏览器(如phantomJS)加载页面。
另一种方法是获取单个页面的名称并将其附加到URL并加载数据。
library(rvest)
# notice the %s at the end - this is replaced by elements of cs in sprintf
# statement below
x <- "http://climate.rutgers.edu/stateclim_v1/nclimdiv/index.php?stn=NJ00&elem=%s"
cs <- c("mint", "avgt", "pcpn", "hdd", "cdd")
# you could paste together new url using paste, too
customstat <- sprintf(x, cs) # %s is replaced with mint, avgt...
# prepare empty object for results
out <- vector("list", length(customstat))
names(out) <- cs
# get individual table and insert it into the output
for (i in customstat) {
out[[which(i == customstat)]] <- read_html(i) %>%
html_nodes("#climdiv_table") %>%
html_table() %>%
.[[1]]
}
> str(out)
List of 5
$ mint:'data.frame': 131 obs. of 15 variables:
..$ Rank : logi [1:131] NA NA NA NA NA NA ...
..$ Year : chr [1:131] "1895" "1896" "1897" "1898" ...
..$ Jan : chr [1:131] "18.1" "18.6" "18.7" "23.2" ...
..$ Feb : chr [1:131] "11.7" "20.7" "22.5" "22.1" ...
您现在可以将表格粘贴在一起(例如使用do.call(rbind, out)
)或分析所需的任何内容。