为什么 WideCharToMultiByte 不能正确转换 Unicode 字符？

Question

我正在尝试将 Windows 本机字符串转换为 UTF-8

std::string

。将 VS2022/C++ 17 与 Windows 10 SDK 结合使用。

输入字符串为

L"½ half ½"

输出字符串为

"Â½ half Â½"

#include <Windows.h>
#include <string>
#include <iostream>

int main() {
    // Wide character string
    const wchar_t* wideStr = L"½ half ½";

    // Calculate the required buffer size for the UTF-8 string
    int utf8Length = WideCharToMultiByte(CP_UTF8, 0, wideStr, -1, nullptr, 0, nullptr, nullptr);

    // Allocate a buffer for the UTF-8 string
    char* utf8Buffer = new char[utf8Length];

    // Convert the wide string to UTF-8
    WideCharToMultiByte(CP_UTF8, 0, wideStr, -1, utf8Buffer, utf8Length, nullptr, nullptr);

    // Create a std::string from the UTF-8 buffer
    std::string utf8String(utf8Buffer);

    // Clean up
    delete[] utf8Buffer;

    return 0;
}

Answer 1

您的代码非常好（尽管我建议直接转换为

std::string

并摆脱

char[]

缓冲区）。

"Â½ half Â½"

是当

"½ half ½"

的 UTF-8 编码形式显示为 Latin-1 而不是 UTF-8 时得到的结果。例如，Unicode

½

在 UTF-8 中编码为字节

C2 BD

。在 Latin-1 中，字节

C2

是

Â

，字节

BD

是

½

。

你的

std::string

中的实际字节实际上是正确的UTF-8。您只需修复 std::string 的

display

以将其视为 UTF-8。

为什么 WideCharToMultiByte 不能正确转换 Unicode 字符？

问题描述投票：0回答：1

1个回答

最新问题

为什么 WideCharToMultiByte 不能正确转换 Unicode 字符？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1