我做了一个简单的卷曲调用,对于一个网站它可以工作,而对于另一个网站则不能。当在 Chrome 中打开时,两者都有相似的 PDF 输出渲染红色。
curl -v https://www.bseindia.com/bseplus/AnnualReport/543258/74183543258.pdf -o bse_file.pdf
curl -v https://nsearchives.nseindia.com/content/equities/IPO_RHP_UNICOMM.pdf -o nse_file.pdf
错误:
curl: (92) HTTP/2 stream 1 was not closed cleanly: INTERNAL_ERROR (err 2)
curl --version
curl 8.4.0 (x86_64-apple-darwin23.0) libcurl/8.4.0 (SecureTransport) LibreSSL/3.3.6 zlib/1.2.12 nghttp2/1.58.0
$ curl -v https://nsearchives.nseindia.com/content/equities/IPO_RHP_UNICOMM.pdf -o nse_file.pdf
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 23.46.207.147:443...
* Connected to nsearchives.nseindia.com (23.46.207.147) port 443
* ALPN: curl offers h2,http/1.1
* (304) (OUT), TLS handshake, Client hello (1):
} [329 bytes data]
* CAfile: /etc/ssl/cert.pem
* CApath: none
* (304) (IN), TLS handshake, Server hello (2):
{ [122 bytes data]
* (304) (IN), TLS handshake, Unknown (8):
{ [29 bytes data]
* (304) (IN), TLS handshake, Certificate (11):
{ [4274 bytes data]
* (304) (IN), TLS handshake, CERT verify (15):
{ [80 bytes data]
* (304) (IN), TLS handshake, Finished (20):
{ [36 bytes data]
* (304) (OUT), TLS handshake, Finished (20):
} [36 bytes data]
* SSL connection using TLSv1.3 / AEAD-CHACHA20-POLY1305-SHA256
* ALPN: server accepted h2
* Server certificate:
* subject: C=IN; ST=Maharashtra; L=Mumbai; O=National Stock Exchange of India Ltd; CN=www.nseindia.com
* start date: Aug 21 00:00:00 2024 GMT
* expire date: Jan 28 23:59:59 2025 GMT
* subjectAltName: host "nsearchives.nseindia.com" matched cert's "nsearchives.nseindia.com"
* issuer: C=US; O=DigiCert Inc; CN=DigiCert TLS RSA SHA256 2020 CA1
* SSL certificate verify ok.
* using HTTP/2
* [HTTP/2] [1] OPENED stream for https://nsearchives.nseindia.com/content/equities/IPO_RHP_UNICOMM.pdf
* [HTTP/2] [1] [:method: GET]
* [HTTP/2] [1] [:scheme: https]
* [HTTP/2] [1] [:authority: nsearchives.nseindia.com]
* [HTTP/2] [1] [:path: /content/equities/IPO_RHP_UNICOMM.pdf]
* [HTTP/2] [1] [user-agent: curl/8.4.0]
* [HTTP/2] [1] [accept: */*]
> GET /content/equities/IPO_RHP_UNICOMM.pdf HTTP/2
> Host: nsearchives.nseindia.com
> User-Agent: curl/8.4.0
> Accept: */*
>
* HTTP/2 stream 1 was not closed cleanly: INTERNAL_ERROR (err 2)
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
* Connection #0 to host nsearchives.nseindia.com left intact
curl: (92) HTTP/2 stream 1 was not closed cleanly: INTERNAL_ERROR (err 2)
尝试过的替代方案:
从此answer我尝试使用--http1.1和curl来降级协议。这样,下载就开始了,但会一直持续下去,永远不会结束。
curl -v --http1.1 https://nsearchives.nseindia.com/content/equities/IPO_RHP_UNICOMM.pdf -o curl_file.pdf
curl 无法成功完成对第二个网站的请求的可能原因是,它会检测到请求何时不是来自浏览器并阻止它。
可从 //https://github.com/lwthiker/curl-impersonate 获得解决方案。 它提供了模仿四种主要浏览器的curl可执行文件:Chrome、Firefox、Safari和Microsoft Edge。 例如,这将获取该 PDF 文件:
curl_chrome110 https://nsearchives.nseindia.com/content/equities/IPO_RHP_UNICOMM.pdf -o nse_file.pdf