使用 R 中的 get 从网站下载 zip 格式的 csv 时出错

问题描述 投票:0回答:3

我想使用 download.file() 将数据从 nse-india.com 读取到 R,如下所示。

url = 'http://www.nseindia.com/content/historical/EQUITIES/2014/SEP/cm24SEP2014bhav.csv.zip'
temp = tempfile()
download.file(url, destfile = temp,method = 'wget')

它抛出以下错误:

SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc
syswgetrc = C:\PROGRA~2\GnuWin32/etc/wgetrc
--2014-09-28 21:19:26--  http://www.nseindia.com/content/historical/EQUITIES/2014/SEP/cm24SEP2014bhav.csv.zip
Resolving www.nseindia.com... 202.83.22.200, 202.83.22.203
Connecting to www.nseindia.com|202.83.22.200|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2014-09-28 21:19:26 ERROR 403: Forbidden.

Warning messages:
1: running command 'wget  "http://www.nseindia.com/content/historical/EQUITIES/2014/SEP/cm24SEP2014bhav.csv.zip" -O "C:\Users\ITITHI~1\AppData\Local\Temp\Rtmp2fjADx\file1fb02375882"' had status 1 
2: In download.file(url, destfile = temp, method = "wget") :
  download had nonzero exit status

无论如何请让我知道解决这个问题。

编辑:或者从 R 中下载文件的任何其他方法也很棒。

r download wget http-status-code-403
3个回答
1
投票

您需要设置一个类似浏览器的用户代理字符串,以便网站认为您是浏览器而不是自动抓取/下载机器人:

library(httr) # >=v0.5

GET("http://www.nseindia.com/content/historical/EQUITIES/2014/SEP/cm24SEP2014bhav.csv.zip",
    user_agent("Mozilla/5.0"), write_disk("cm24SEP2014bhav.csv.zip"))

## Response [http://www.nseindia.com/content/historical/EQUITIES/2014/SEP/cm24SEP2014bhav.csv.zip]
##   Date: 2014-09-28 23:53
##   Status: 200
##   Content-type: application/zip
##   Size: 58.2 kB
## <ON DISK>  cm24SEP2014bhav.csv.zip

0
投票

您需要获得访问该网站的权限。 这是来自

httr
包的消息(在文档中):

url = 'http://www.nseindia.com/content/historical/EQUITIES/2014/SEP/cm24SEP2014bhav.csv.zip'
doc <- content(GET(url))


<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head><title>Access Denied</title></head>
<body>
<h1>Access Denied</h1>

You don't have permission to access "http://www.nseindia.com/content/historical/EQUITIES/2014/SEP/cm24SEP2014bhav.csv.zip" on this server.<p>
Reference #18.df24317.1411924047.3b4f02a1
</p>
</body>
</html>

0
投票

curl::curl_fetch_memory(url,handle=handle) 中的错误: 接收失败:连接已重置

现在就是回复了

© www.soinside.com 2019 - 2024. All rights reserved.