如何以编程方式下载 csv 文件

问题描述 投票:0回答:1

我有一个 CSV 文件的 URL。文件大小为 300kb,2700 行 15 列。

我在 Python 和 C# 中尝试了多种方法,但以异常结束

远端关闭连接无响应**

我尝试过的事情

Python:

import pandas as pd
import numpy as np
import os

# Download CSV with read_csv
df = pd.read_csv('https://nsearchives.nseindia.com/products/content/sec_bhavdata_full_17072024.csv', low_memory=False)

再次使用Python

import urllib.request

url = 'https://nsearchives.nseindia.com/products/content/sec_bhavdata_full_17072024.csv'
filename = 'large_file.csv'

def download_large_file(url, filename):
    with urllib.request.urlopen(url) as response, open(filename, 'wb') as out_file:
        while True:
            chunk = response.read(8192)  # Download in 8KB chunks
            if not chunk:
                break
            out_file.write(chunk)

download_large_file(url, filename)
print("File downloaded successfully!")

C#

using System.Net;

WebClient webClient = new WebClient();
webClient.DownloadFile("URL");
c# python-3.x download
1个回答
0
投票

问题是服务器需要一些先前的 cookie 应该在请求中为您提供文件,这里是一个完整的 C# 程序将为您提供该功能

using System.Diagnostics;
using System.Net;
using System.Net.Http.Headers;

var cookieContainer = new CookieContainer();
HttpClientHandler handler = new HttpClientHandler()
{
    AllowAutoRedirect = true,
    UseCookies = true,
    CookieContainer = cookieContainer,
    UseDefaultCredentials = true,
};

HttpClient client = new(handler);
client.DefaultRequestHeaders.Accept.Clear();
client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("text/html"));
client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("text/csv"));
client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("*/*"));
client.DefaultRequestHeaders.AcceptEncoding.Add(new("gzip"));


// first vist the Main Page to obtain required cookies
Console.Write("vist Main Page to obtain cookies...");
var MainPage = new Uri(@"https://www.nseindia.com");
var mainPageRes = await client.GetAsync(MainPage);

if (!mainPageRes.IsSuccessStatusCode)
{
    Console.WriteLine("Failed!");
    Console.WriteLine("can't obtains cookies form the main page");
    Console.WriteLine("status code: " + mainPageRes.StatusCode);
    return;
}
Console.WriteLine("done.");

Console.Write("start to download csv file ....");
var csvUri = new Uri(@"https://nsearchives.nseindia.com/products/content/sec_bhavdata_full_17072024.csv");
var response = await client.GetAsync(csvUri);

if (!response.IsSuccessStatusCode)
{
    Console.WriteLine($"Faile.");
    Console.WriteLine("Can't download the file");
    Console.WriteLine("status code: " + response.StatusCode);
    Console.WriteLine(response.Headers);
    return;
}

Console.WriteLine("done.");

var filename = "sec_bhavdata_full_17072024.csv";
using var contentStreem = await response.Content.ReadAsStreamAsync();
using var stream = new FileStream(filename, FileMode.Create, FileAccess.Write);

Console.Write("start to save content to file....");
await contentStreem.CopyToAsync(stream);
Console.WriteLine("done");
try
{
    if (OperatingSystem.IsWindows())
    {
        Process.Start("explorer.exe", ".");
    }
}
finally
{
    Console.WriteLine($"file name is {filename}");
}
© www.soinside.com 2019 - 2024. All rights reserved.