使用 C# 将 CSV 上传/下载到 Azure Blob 存储时出现字符编码问题

问题描述 投票:0回答:1

我有一项服务,允许用户上传和下载 CSV 文件,以使用 C# 更新系统中的数据。该数据包含外来字符。该文件似乎上传正常。如果我直接从 Azure 存储资源管理器下载并查看它,所有编码都是正确的。但是,当我尝试使用 C# 服务下载时,法语和德语字符被替换为 �.

您可以在存储资源管理器中查看文件。似乎具有正确的内容类型。

Azure Storage Explorer

我的上传和下载功能如下:

public async Task CreateCSV<T>(string reference, IEnumerable<T> data)
{
    using (var writer = new StringWriter())
    using (var csv = new CsvWriter(writer, CultureInfo.InvariantCulture))
    {
        csv.WriteRecords(data);

        await SaveToBlobStorage(reference, Encoding.UTF8.GetBytes(writer.ToString()));
    }
}


public async Task<bool> SaveToBlobStorage(string filePath, byte[] data)
{
    BlobContainerClient container = new BlobContainerClient(Helpers.StorageConnection(), Helpers.ContainerName());

    var blob = container.GetBlobClient(filePath);

    using (var stream = new MemoryStream(data, writable: false))
    {
        blob.Upload(stream);
        var type = MimeTypesMap.GetMimeType(filePath);
        await blob.SetHttpHeadersAsync(new BlobHttpHeaders { ContentType = $"{type}; charset=utf-8" });
    }
}
public async Task<IEnumerable<T>> GetCSV<T>(string reference)
{
    var stream = await GetStreamFromStorage(reference);

    stream.Seek(0, SeekOrigin.Begin);

    using var reader = new StreamReader(stream, Encoding.UTF8);

    using var csv = new CsvReader(reader, CultureInfo.InvariantCulture);

    csv.Configuration.Delimiter = ",";
    csv.Configuration.HeaderValidated = null;
    csv.Configuration.MissingFieldFound = null;

    IEnumerable<T> result = csv.GetRecords<T>();

    return result.ToList();
}

public async Task<MemoryStream> GetStreamFromStorage(string file)
{
    var blobClient = new BlockBlobClient(Helpers.StorageConnection(), Helpers.ContainerName(), file);

    var memoryStream = new MemoryStream();

    await blobClient.DownloadToAsync(memoryStream);

    return memoryStream;
}

如何返回具有正确编码的 csv 文件?

c# azure character-encoding azure-blob-storage csvhelper
1个回答
0
投票

使用 C# 将 CSV 上传/下载到 Azure Blob 存储时出现字符编码问题

我同意dbc的评论,上面的代码完全由

csvhelper
处理,看来当你将CSV文件保存到Azure Blob存储时你已经在使用
UTF-8 encoding
了。问题可能与您如何从 Azure Blob 存储读取 CSV 文件有关。

您可以查看包含所有方法的示例代码,以确保在整个上传和下载过程中保持

UTF-8 encoding

代码:

using Azure.Storage.Blobs;
using Azure.Storage.Blobs.Models;
using CsvHelper;
using CsvHelper.Configuration;
using System.Globalization;
using System.Text;

namespace CsvBlobExample
{
    public class SampleData
    {
        public string Name { get; set; }
        public string Country { get; set; }
    }

    public static class MimeTypeHelper
    {
        public static string GetMimeType(string fileName)
        {
            string extension = Path.GetExtension(fileName).ToLowerInvariant();
            return extension switch
            {
                ".csv" => "text/csv",
                // Add other extensions as needed
                _ => "application/octet-stream", // Default for unknown types
            };
        }
    }

    public class CsvBlobService
    {
        private readonly string _connectionString;
        private readonly string _containerName;

        public CsvBlobService(string connectionString, string containerName)
        {
            _connectionString = connectionString;
            _containerName = containerName;
        }

        // Method to create a CSV from a collection of data and upload it to Blob Storage
        public async Task CreateCSV<T>(string reference, IEnumerable<T> data)
        {
            using (var writer = new StringWriter())
            using (var csv = new CsvWriter(writer, new CsvConfiguration(CultureInfo.InvariantCulture) { Encoding = Encoding.UTF8 }))
            {
                csv.WriteRecords(data);

                // Convert StringWriter content to UTF-8 bytes for upload
                byte[] encodedData = Encoding.UTF8.GetBytes(writer.ToString());
                await SaveToBlobStorage(reference, encodedData);
            }
        }
        public async Task<bool> SaveToBlobStorage(string filePath, byte[] data)
        {
            BlobContainerClient container = new BlobContainerClient(_connectionString, _containerName);
            await container.CreateIfNotExistsAsync();

            var blob = container.GetBlobClient(filePath);

            using (var stream = new MemoryStream(data, writable: false))
            {
                await blob.UploadAsync(stream, overwrite: true);

                var type = MimeTypeHelper.GetMimeType(filePath);
                await blob.SetHttpHeadersAsync(new BlobHttpHeaders
                {
                    ContentType = type,
                    ContentEncoding = "utf-8"
                });
            }
            return true;
        }

        // Method to download a CSV from Blob Storage and parse it back into a collection of data
        public async Task<IEnumerable<T>> GetCSV<T>(string reference)
        {
            var stream = await GetStreamFromStorage(reference);
            stream.Seek(0, SeekOrigin.Begin);

            using var reader = new StreamReader(stream, Encoding.UTF8);
            using var csv = new CsvReader(reader, new CsvConfiguration(CultureInfo.InvariantCulture)
            {
                Delimiter = ",",
                HeaderValidated = null,
                MissingFieldFound = null
            });

            return csv.GetRecords<T>().ToList();
        }

        // Method to get a MemoryStream of data from Blob Storage
        private async Task<MemoryStream> GetStreamFromStorage(string file)
        {
            var blobClient = new BlobContainerClient(_connectionString, _containerName).GetBlobClient(file);
            var memoryStream = new MemoryStream();

            // Download blob content to memory stream
            await blobClient.DownloadToAsync(memoryStream);

            return memoryStream;
        }
    }

    class Program
    {
        static async Task Main(string[] args)
        {
            string connectionString = "DefaultEndpointsProtocol=https;AccountName=venkat326123;AccountKey=redacted;EndpointSuffix=core.windows.net";
            string containerName = "venkat";

            var csvBlobService = new CsvBlobService(connectionString, containerName);
            
            var dataToUpload = new List<SampleData>
            {
                new SampleData { Name = "François", Country = "France" },
                new SampleData { Name = "Müller", Country = "Germany" },
                new SampleData { Name = "Jürgen", Country = "Germany" }
            };

            // Reference name for the CSV file in Blob Storage
            string csvFileName = "sample_data.csv";

            // Upload the CSV file to Blob Storage
            await csvBlobService.CreateCSV(csvFileName, dataToUpload);
            Console.WriteLine("CSV file uploaded successfully!");

            var retrievedData = await csvBlobService.GetCSV<SampleData>(csvFileName);
            Console.WriteLine("CSV file retrieved successfully!");

            foreach (var item in retrievedData)
            {
                Console.WriteLine($"Name: {item.Name}, Country: {item.Country}");
            }
        }
    }
}

上述代码有助于在写入、保存到 Azure Blob 存储以及从 Azure Blob 存储中读回时保留特殊字符,例如法语和德语口音。

输出:

CSV file uploaded successfully!
CSV file retrieved successfully!
Name: François, Country: France
Name: Müller, Country: Germany
Name: Jürgen, Country: Germany

enter image description here

传送门:

enter image description here

© www.soinside.com 2019 - 2024. All rights reserved.