curl http2 在特定 url 上出错,但适用于其他类似的 pdf url

问题描述 投票:0回答:1

我做了一个简单的卷曲调用,对于一个网站它可以工作,而对于另一个网站则不能。当在 Chrome 中打开时,两者都有相似的 PDF 输出渲染红色。

有效的命令

curl -v https://www.bseindia.com/bseplus/AnnualReport/543258/74183543258.pdf -o bse_file.pdf

无效的命令

curl  -v https://nsearchives.nseindia.com/content/equities/IPO_RHP_UNICOMM.pdf -o nse_file.pdf

错误:

curl: (92) HTTP/2 stream 1 was not closed cleanly: INTERNAL_ERROR (err 2)

卷曲版本

curl --version
curl 8.4.0 (x86_64-apple-darwin23.0) libcurl/8.4.0 (SecureTransport) LibreSSL/3.3.6 zlib/1.2.12 nghttp2/1.58.0

错误的详细卷曲跟踪

$ curl  -v https://nsearchives.nseindia.com/content/equities/IPO_RHP_UNICOMM.pdf -o nse_file.pdf
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 23.46.207.147:443...
* Connected to nsearchives.nseindia.com (23.46.207.147) port 443
* ALPN: curl offers h2,http/1.1
* (304) (OUT), TLS handshake, Client hello (1):
} [329 bytes data]
*  CAfile: /etc/ssl/cert.pem
*  CApath: none
* (304) (IN), TLS handshake, Server hello (2):
{ [122 bytes data]
* (304) (IN), TLS handshake, Unknown (8):
{ [29 bytes data]
* (304) (IN), TLS handshake, Certificate (11):
{ [4274 bytes data]
* (304) (IN), TLS handshake, CERT verify (15):
{ [80 bytes data]
* (304) (IN), TLS handshake, Finished (20):
{ [36 bytes data]
* (304) (OUT), TLS handshake, Finished (20):
} [36 bytes data]
* SSL connection using TLSv1.3 / AEAD-CHACHA20-POLY1305-SHA256
* ALPN: server accepted h2
* Server certificate:
*  subject: C=IN; ST=Maharashtra; L=Mumbai; O=National Stock Exchange of India Ltd; CN=www.nseindia.com
*  start date: Aug 21 00:00:00 2024 GMT
*  expire date: Jan 28 23:59:59 2025 GMT
*  subjectAltName: host "nsearchives.nseindia.com" matched cert's "nsearchives.nseindia.com"
*  issuer: C=US; O=DigiCert Inc; CN=DigiCert TLS RSA SHA256 2020 CA1
*  SSL certificate verify ok.
* using HTTP/2
* [HTTP/2] [1] OPENED stream for https://nsearchives.nseindia.com/content/equities/IPO_RHP_UNICOMM.pdf
* [HTTP/2] [1] [:method: GET]
* [HTTP/2] [1] [:scheme: https]
* [HTTP/2] [1] [:authority: nsearchives.nseindia.com]
* [HTTP/2] [1] [:path: /content/equities/IPO_RHP_UNICOMM.pdf]
* [HTTP/2] [1] [user-agent: curl/8.4.0]
* [HTTP/2] [1] [accept: */*]
> GET /content/equities/IPO_RHP_UNICOMM.pdf HTTP/2
> Host: nsearchives.nseindia.com
> User-Agent: curl/8.4.0
> Accept: */*
>
* HTTP/2 stream 1 was not closed cleanly: INTERNAL_ERROR (err 2)
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
* Connection #0 to host nsearchives.nseindia.com left intact
curl: (92) HTTP/2 stream 1 was not closed cleanly: INTERNAL_ERROR (err 2)

尝试过的替代方案:

从此answer我尝试使用--http1.1和curl来降级协议。这样,下载就开始了,但会一直持续下去,永远不会结束。

curl -v --http1.1  https://nsearchives.nseindia.com/content/equities/IPO_RHP_UNICOMM.pdf -o curl_file.pdf
pdf ssl curl download http2
1个回答
0
投票

curl 无法成功完成对第二个网站的请求的可能原因是,它会检测到请求何时不是来自浏览器并阻止它。

可从 //https://github.com/lwthiker/curl-impersonate 获得解决方案。 它提供了模仿四种主要浏览器的curl可执行文件:Chrome、Firefox、Safari和Microsoft Edge。 例如,这将获取该 PDF 文件:

curl_chrome110 https://nsearchives.nseindia.com/content/equities/IPO_RHP_UNICOMM.pdf -o nse_file.pdf
© www.soinside.com 2019 - 2024. All rights reserved.