我正在尝试将雅虎财经的数据提取到 Excel 的单元格中。
我找到了这个代码。
它返回一个非常长的文本,基本上是所需链接的所有“查看代码”html 文本。
所需的数据已经存在,但尚未被解析和返回。
responseText
中给出我需要的数据的行以此字符串开头:
<script type="application/json" data-sveltekit-fetched data-url="https://query1.finance.yahoo.com/v10/finance/quoteSummary/AAPL?formatted=true&modules=upgradeDowngradeHistory%2CrecommendationTrend%2Cfinanci
“numberOfAnalystOptions”的字符串部分如下所示:
“购买”,“numberOfAnalystOpinions”:{“raw”:37,“fmt”:“37”,“longFmt”:“37”},“totalCash”:{“raw”:67150000128,“fmt”
我如何获取其中包含的数据?
Sub SharePrices()
Const Url As String = "https://finance.yahoo.com/quote/AAPL/analysis?p=AAPL"
Dim sResp$, sHigh$, currentPrice$
Dim analystNum$, sLow$, tMeanprice$
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", Url, False
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36"
.send
sResp = .responseText
End With
With CreateObject("VBScript.RegExp")
.Pattern = "numberOfAnalystOpinions[\s\S]+?raw"":(.*?),"
If .Execute(sResp).Count > 0 Then
analystNum = .Execute(sResp)(0).SubMatches(0)
End If
.Pattern = "targetMeanPrice[\s\S]+?raw"":(.*?),"
If .Execute(sResp).Count > 0 Then
tMeanprice = .Execute(sResp)(0).SubMatches(0)
End If
.Pattern = "targetHighPrice[\s\S]+?raw"":(.*?),"
If .Execute(sResp).Count > 0 Then
sHigh = .Execute(sResp)(0).SubMatches(0)
End If
.Pattern = "targetLowPrice[\s\S]+?raw"":(.*?),"
If .Execute(sResp).Count > 0 Then
sLow = .Execute(sResp)(0).SubMatches(0)
End If
.Pattern = "currentPrice[\s\S]+?raw"":(.*?),"
If .Execute(sResp).Count > 0 Then
currentPrice = .Execute(sResp)(0).SubMatches(0)
End If
End With
ActiveCell.Value = "Test"
ActiveCell.Offset(0, 1).Select
ActiveCell.Value = analystNum
ActiveCell.Offset(0, 1).Select
ActiveCell.Value = tMeanprice
ActiveCell.Offset(0, 1).Select
ActiveCell.Value = sHigh
ActiveCell.Offset(0, 1).Select
ActiveCell.Value = sLow
ActiveCell.Offset(0, 1).Select
ActiveCell.Value = currentPrice
End Sub
响应包含转义的 JSON 文本,因此引号前面有反斜杠 (
\
),这是 JSON 中的转义字符。
您需要将其作为文字字符包含在正则表达式中,这意味着您必须在正则表达式中对其进行转义。正则表达式转义字符也是反斜杠,因此这意味着
\\
表示正则表达式中的文字反斜杠。
因此,您需要在每个正则表达式中的
\\
之前插入 ""
,如下所示:
.Pattern = "numberOfAnalystOpinions[\s\S]+?raw\\"":(.*?),"
.Pattern = "targetMeanPrice[\s\S]+?raw\\"":(.*?),"
等等。
插入这些文字反斜杠后,正则表达式全部匹配成功。 (当然,我不能保证结果符合您的预期,尤其是返回的 HTML 可能会有所不同。)
由于您要提取的数据作为脚本块内的 JSON 字符串存在于页面中,因此您可以将其解析为 JSON 对象并挑选出您想要的部分,而不是使用正则表达式字符串搜索。 例如,如果您需要进行任何循环,或者有多次出现的文本,这会更容易。
您需要将 JsonConverter.bas(来自 https://github.com/VBA-tools/VBA-JSON)导入到您的 VBA 项目中。
Sub SharePrices()
Dim json As Object, sResp As String, html As Object, scripts, script
Dim scriptSource, fd As Object
Const Url As String = "https://finance.yahoo.com/quote/AAPL/analysis?p=AAPL"
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", Url, False
.send
sResp = .responseText
End With
'create an HTML document and load the response into it
Set html = CreateObject("htmlfile")
html.Open "text/html"
html.Write sResp
html.Close
'get all the <script> blocks
Set scripts = html.getElementsByTagName("script")
'find the script block which has the information we want...
For Each script In scripts 'find the script we want...
scriptSource = script.getAttribute("data-url") & "" 'append empty string to handle Null
If scriptSource Like "*upgradeDowngradeHistory*" Then
'parse the script content as json
Set json = JsonConverter.ParseJson(script.Text)
'actual data is embedded here - parse that
Set json = JsonConverter.ParseJson(json("body"))
End If
Next script
'didn't locate the required script ?
If json Is Nothing Then Exit Sub
'This part of the JSON object has the pieces of data we want
Set fd = json("quoteSummary")("result")(1)("financialData")
'verify the values. Here is where you'd write them to the sheet
Debug.Print fd("numberOfAnalystOpinions")("raw")
Debug.Print fd("targetMeanPrice")("raw")
Debug.Print fd("targetHighPrice")("raw")
Debug.Print fd("targetLowPrice")("raw")
Debug.Print fd("currentPrice")("raw")
End Sub
对正则表达式模式的评论
\\
。[\s\S]
匹配任何字符,但 \\"":{\\""
更精确地匹配响应文本。[\d.]+
来匹配数字比使用 .*?
更好。For
循环来简化代码Sub SharePrices1()
Const Url As String = "https://finance.yahoo.com/quote/AAPL/analysis?p=AAPL"
Dim sResp$, sHigh$, currentPrice$
Dim analystNum$, sLow$, tMeanprice$
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", Url, False
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36"
.send
sResp = .responseText
End With
Dim aKey, i As Long
Const SPAT = "\\"":{\\""raw\\"":([\d.]+),"
aKey = Split("numberOfAnalystOpinions targetMeanPrice targetHighPrice targetLowPrice currentPrice")
ActiveCell.Resize(1, UBound(aKey) + 1).Value = aKey
With CreateObject("VBScript.RegExp")
For i = 0 To UBound(aKey)
.Pattern = aKey(i) & SPAT
If .Execute(sResp).Count > 0 Then
ActiveCell.Offset(1, i).Value = .Execute(sResp)(0).SubMatches(0)
End If
Next
End With
End Sub