VBA - 将字符串转换为 UNICODE

Question

我需要将字符串 HTML 从西里尔字母和拉丁符号的混合转换为 UNICODE。

我尝试了以下方法：

Public HTML As String
    Sub HTMLsearch()

    GetHTML ("http://nfs.mobile.bg/pcgi/mobile.cgi?act=3&slink=6jkjov&f1=1")
    MsgBox HTML
    HTML = StrConv(HTML, vbUnicode)
    MsgBox HTML
End Sub

Function GetHTML(URL As String) As String
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", URL, False
        .Send
        HTML = .ResponseText
    End With
End Function

您可以看到 StrConv 之前和之后的内容。如果你想获取文件中的html，可以使用以下代码：

Public HTML As String
    Sub HTMLsearch()

    GetHTML ("http://nfs.mobile.bg/pcgi/mobile.cgi?act=3&slink=6jkjov&f1=1")

    Dim path As String

    path = ThisWorkbook.path & "\html.txt"
    Open path For Output As #1
    Print #1, HTML
    Close #1

    HTML = StrConv(HTML, vbUnicode)

    path = ThisWorkbook.path & "\htmlUNICODE.txt"
    Open path For Output As #1
    Print #1, HTML
    Close #1
End Sub

Function GetHTML(URL As String) As String
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", URL, False
        .Send
        HTML = .ResponseText
    End With
End Function

想法？

Answer 1

VBA 对 Unicode 的支持并不是那么好。

可以处理 Unicode 字符串，但您将无法看到带有

Debug.Print

或

MsgBox

的实际字符 - 它们将在那里显示为

。

您可以将 控制面板 > 区域和语言 > 管理选项卡 >“非 Unicode 程序的当前语言” 设置为“俄语”切换到不同的代码页，这将允许您在 VBA 消息框中看到西里尔字母，而不是问号。但这只是表面上的改变。

你真正的问题是别的。

服务器 (nfs.mobile.bg) 将文档发送为

Content-Type: text/html

。没有关于字符编码的信息。这意味着接收者必须自己找出字符编码。

浏览器通过查看响应字节流并进行猜测来做到这一点。对于您的情况，HTML 源代码中存在有用的

<meta http-equiv="Content-Type" content="text/html; charset=windows-1251">

标记。因此，字节流应该被解释为

Windows-1251

，这恰好是 Windows 中的西里尔文 ANSI 代码页。

所以，我们这里甚至没有 Unicode！

在没有任何附加信息的情况下，

responseText

对象的

XMLHTTP

属性默认为

us-ascii

。西里尔字母表中的扩展字符不存在于 ASCII 中，因此它们将被转换为“实际”问号并丢失。这就是为什么你不能使用 responseText 来做任何事情。

但是，响应的

original

字节仍然可用，在 responseBody 属性中，它是

Byte

的数组。

在 VBA 中，您必须执行与浏览器相同的操作。您必须将字节流解释为特定的字符集。

ADODB.Stream

对象可以为您做到这一点，而且也非常简单：

' reference: "Microsoft XML, v6.0" (or any other version)
' reference: "Microsoft ActiveX Data Objects 6.1 library" (or any other version)
Option Explicit

Sub HTMLsearch()
    Dim url As String, html As String
    
    url = "http://nfs.mobile.bg/pcgi/mobile.cgi?act=3&slink=6jkjov&f1=1"
    html = GetHTML(url, "Windows-1251")
    
    ' Cyrillic characters are supported in Office, so they will appear correctly
    ActiveDocument.Range.InsertAfter html
End Sub

Function GetHTML(Url As String, Optional Charset As String = "UTF-8") As String
    Dim request As New MSXML2.XMLHTTP
    Dim converter As New ADODB.stream
    
    ' fetch page
    request.Open "GET", Url, False
    request.send
    
    ' write raw bytes to the stream
    converter.Open
    converter.Type = adTypeBinary
    converter.Write request.responseBody
    
    ' switch the stream to text mode and set charset
    converter.Position = 0
    converter.Type = adTypeText
    converter.Charset = Charset
    
    ' read text characters from the stream, close the stream
    GetHTML = converter.ReadText
    converter.Close
End Function

我一直在这里使用 MS Word 并调用

HTMLsearch()

正确地将西里尔字符写入页面。不过，对我来说，它们仍然显示为

中的

MsgBox

，但这纯粹是一个显示问题，是由 VBA 创建的 UI 无法处理 Unicode 造成的。

Answer 2

Private Const CP_UTF8 = 65001 Private Declare Function MultiByteToWideChar Lib "kernel32" ( _ ByVal CodePage As Long, ByVal dwFlags As Long, _ ByVal lpMultiByteStr As Long, ByVal cchMultiByte As Long, _ ByVal lpWideCharStr As Long, ByVal cchWideChar As Long) As Long Public Function sUTF8ToUni(bySrc() As Byte) As String ' Converts a UTF-8 byte array to a Unicode string Dim lBytes As Long, lNC As Long, lRet As Long lBytes = UBound(bySrc) - LBound(bySrc) + 1 lNC = lBytes sUTF8ToUni = String$(lNC, Chr(0)) lRet = MultiByteToWideChar(CP_UTF8, 0, VarPtr(bySrc(LBound(bySrc))), lBytes, StrPtr(sUTF8ToUni), lNC) sUTF8ToUni = Left$(sUTF8ToUni, lRet) End Function

用法示例：

Dim sHTML As String Dim bHTML() As Byte bHTML = GetHTML("http://yoururlhere/myorderdata.php") sHTML = sUTF8ToUni(bHTML) sHTML = Mid(sHTML, 2) 'strip off Byte Order Mark: EF BB BF

VBA - 将字符串转换为 UNICODE

问题描述投票：0回答：2

2个回答

最新问题

VBA - 将字符串转换为 UNICODE

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2