使用 VBA 将雅虎财经中的数据提取到 Excel 电子表格的单元格中

问题描述 投票:0回答:3

我正在尝试将雅虎财经的数据提取到 Excel 的单元格中。

我找到了这个代码。

它返回一个非常长的文本,基本上是所需链接的所有“查看代码”html 文本。
所需的数据已经存在,但尚未被解析和返回。

responseText
中给出我需要的数据的行以此字符串开头:

<script type="application/json" data-sveltekit-fetched data-url="https://query1.finance.yahoo.com/v10/finance/quoteSummary/AAPL?formatted=true&amp;modules=upgradeDowngradeHistory%2CrecommendationTrend%2Cfinanci

“numberOfAnalystOptions”的字符串部分如下所示:

“购买”,“numberOfAnalystOpinions”:{“raw”:37,“fmt”:“37”,“longFmt”:“37”},“totalCash”:{“raw”:67150000128,“fmt”

我如何获取其中包含的数据?

Sub SharePrices()
    
    Const Url As String = "https://finance.yahoo.com/quote/AAPL/analysis?p=AAPL"
    Dim sResp$, sHigh$, currentPrice$
    Dim analystNum$, sLow$, tMeanprice$

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", Url, False
        .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36"
        .send
        sResp = .responseText
    End With

    With CreateObject("VBScript.RegExp")
        .Pattern = "numberOfAnalystOpinions[\s\S]+?raw"":(.*?),"
        If .Execute(sResp).Count > 0 Then
            analystNum = .Execute(sResp)(0).SubMatches(0)
        End If

        .Pattern = "targetMeanPrice[\s\S]+?raw"":(.*?),"
        If .Execute(sResp).Count > 0 Then
            tMeanprice = .Execute(sResp)(0).SubMatches(0)
        End If

        .Pattern = "targetHighPrice[\s\S]+?raw"":(.*?),"
        If .Execute(sResp).Count > 0 Then
            sHigh = .Execute(sResp)(0).SubMatches(0)
        End If

        .Pattern = "targetLowPrice[\s\S]+?raw"":(.*?),"
        If .Execute(sResp).Count > 0 Then
            sLow = .Execute(sResp)(0).SubMatches(0)
        End If

        .Pattern = "currentPrice[\s\S]+?raw"":(.*?),"
        If .Execute(sResp).Count > 0 Then
            currentPrice = .Execute(sResp)(0).SubMatches(0)
        End If
    End With

    ActiveCell.Value = "Test"
    ActiveCell.Offset(0, 1).Select
    ActiveCell.Value = analystNum
    ActiveCell.Offset(0, 1).Select
    ActiveCell.Value = tMeanprice
    ActiveCell.Offset(0, 1).Select
    ActiveCell.Value = sHigh
    ActiveCell.Offset(0, 1).Select
    ActiveCell.Value = sLow
    ActiveCell.Offset(0, 1).Select
    ActiveCell.Value = currentPrice
    
End Sub
excel vba yahoo-finance
3个回答
2
投票

响应包含转义的 JSON 文本,因此引号前面有反斜杠 (

\
),这是 JSON 中的转义字符。

您需要将其作为文字字符包含在正则表达式中,这意味着您必须在正则表达式中对其进行转义。正则表达式转义字符也是反斜杠,因此这意味着

\\
表示正则表达式中的文字反斜杠。

因此,您需要在每个正则表达式中的

\\
之前插入
""
,如下所示:

.Pattern = "numberOfAnalystOpinions[\s\S]+?raw\\"":(.*?),"

.Pattern = "targetMeanPrice[\s\S]+?raw\\"":(.*?),"

等等。

插入这些文字反斜杠后,正则表达式全部匹配成功。 (当然,我不能保证结果符合您的预期,尤其是返回的 HTML 可能会有所不同。)


1
投票

由于您要提取的数据作为脚本块内的 JSON 字符串存在于页面中,因此您可以将其解析为 JSON 对象并挑选出您想要的部分,而不是使用正则表达式字符串搜索。 例如,如果您需要进行任何循环,或者有多次出现的文本,这会更容易。

您需要将 JsonConverter.bas(来自 https://github.com/VBA-tools/VBA-JSON)导入到您的 VBA 项目中。

Sub SharePrices()
    
    Dim json As Object, sResp As String, html As Object, scripts, script
    Dim scriptSource, fd As Object
    Const Url As String = "https://finance.yahoo.com/quote/AAPL/analysis?p=AAPL"
    
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", Url, False
        .send
        sResp = .responseText
    End With
    
    'create an HTML document and load the response into it
    Set html = CreateObject("htmlfile")
    html.Open "text/html"
    html.Write sResp
    html.Close
    
    'get all the <script> blocks
    Set scripts = html.getElementsByTagName("script")
 
    'find the script block which has the information we want...
    For Each script In scripts 'find the script we want...
        scriptSource = script.getAttribute("data-url") & "" 'append empty string to handle Null
        If scriptSource Like "*upgradeDowngradeHistory*" Then
            'parse the script content as json
            Set json = JsonConverter.ParseJson(script.Text)
            'actual data is embedded here - parse that
            Set json = JsonConverter.ParseJson(json("body")) 
        End If
    Next script
    
    'didn't locate the required script ?
    If json Is Nothing Then Exit Sub 
    
    'This part of the JSON object has the pieces of data we want
    Set fd = json("quoteSummary")("result")(1)("financialData")

    'verify the values. Here is where you'd write them to the sheet
    Debug.Print fd("numberOfAnalystOpinions")("raw")
    Debug.Print fd("targetMeanPrice")("raw")
    Debug.Print fd("targetHighPrice")("raw")
    Debug.Print fd("targetLowPrice")("raw")
    Debug.Print fd("currentPrice")("raw")
End Sub

1
投票

对正则表达式模式的评论

  • 在正则表达式模式中,反斜杠应转义为
    \\
  • [\s\S]
    匹配任何字符,但
    \\"":{\\""
    更精确地匹配响应文本。
  • 使用
    [\d.]+
    来匹配数字比使用
    .*?
    更好。

  • 使用
    For
    循环来简化代码
Sub SharePrices1()
    Const Url As String = "https://finance.yahoo.com/quote/AAPL/analysis?p=AAPL"
    Dim sResp$, sHigh$, currentPrice$
    Dim analystNum$, sLow$, tMeanprice$
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", Url, False
        .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36"
        .send
        sResp = .responseText
    End With
    Dim aKey, i As Long
    Const SPAT = "\\"":{\\""raw\\"":([\d.]+),"
    aKey = Split("numberOfAnalystOpinions targetMeanPrice targetHighPrice targetLowPrice currentPrice")
    ActiveCell.Resize(1, UBound(aKey) + 1).Value = aKey
    With CreateObject("VBScript.RegExp")
        For i = 0 To UBound(aKey)
            .Pattern = aKey(i) & SPAT 
            If .Execute(sResp).Count > 0 Then
                ActiveCell.Offset(1, i).Value = .Execute(sResp)(0).SubMatches(0)
            End If
        Next
    End With
End Sub
© www.soinside.com 2019 - 2024. All rights reserved.