我录制了一个宏,并试图使用for循环来调整它,我想要从中抓取不同的链接。
问题是,VBA无法将我的变量识别为链接。当我直接在代码中输入链接时,它可以工作。我不仅需要来自一个链接的数据,还需要来自500的数据。
这是我的代码片段:
Dim Link As String
Link = "https://coinmarketcap.com/currencies/bitcoin/historical-data/"
For i = 1 To 5
Link = Cells(i, 1)
ActiveWorkbook.Queries.Add Name:="Table 0 (3)", Formula:= _
"let" & Chr(13) & "" & Chr(10) & " Quelle = Web.Page(Web.Contents(""https://coinmarketcap.com/currencies/ontology/historical-data/""))," & Chr(13) & "" & Chr(10) & " Data0 = Quelle{0}[Data]," & Chr(13) & "" & Chr(10) & " #""Geänderter Typ"" = Table.TransformColumnTypes(Data0,{{""Date"", type date}, {""Open*"", type number}, {""High"", type number}, {""Low"", type number}, {""Close**"", type number}, {""Volume"", type number}, {""Market Cap" & _
""", type number}})" & Chr(13) & "" & Chr(10) & "in" & Chr(13) & "" & Chr(10) & " #""Geänderter Typ"""
With ActiveSheet.ListObjects.Add(SourceType:=0, Source:= _
"OLEDB;Provider=Microsoft.Mashup.OleDb.1;Data Source=$Workbook$;Location=""Table 0 (3)"";Extended Properties=""""" _
, Destination:=Range("$D$1")).QueryTable
.CommandType = xlCmdSql
.CommandText = Array("SELECT * FROM [Table 0 (3)]")
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.BackgroundQuery = True
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.PreserveColumnInfo = True
.ListObject.DisplayName = "Table_0__3"
.Refresh BackgroundQuery:=False
End With
Next
一旦我更改了变量“link”的链接(“”https://coinmarketcap.comblabla“”),我就会得到一个应用程序或对象定义的错误。当我深入挖掘并点击数组时,Excel告诉我导入“链接”未连接到导出。
您可以使用下面的代码获取主要历史数据表和上面的信息。它有点棘手,有点脆弱,因为很多都依赖于当前的页面样式,这可能会改变。历史数据位是一个实际的表,它更加健壮。
例如,您可以使用从单元格中拾取的新URL进行循环,并在每个循环的开头简单地使用Sheets.Add
行,这样您就可以使用新的Activesheet来写入数据。
下面,应该足以让您开始,具体取决于您的要求。
我得到了最高点:
使用.Cells(1, 1) = IE.document.querySelector(".col-xs-6.col-sm-8.col-md-4.text-left").innerText
。这不是很强大。文档的样式可以更改。但是,访问和获取页面并不是一个容易的部分,可能是您当前选择的任何方法都很容易受到攻击。我使用元素的类名("."
)来检索使用.querySelector
文档方法的信息来应用CSS selector .col-xs-6.col-sm-8.col-md-4.text-left
。这与.getElementsByClassName(0)
相同。
我得到中间位:
同
Set aNodeList = IE.document.querySelectorAll("[class*='coin-summary'] div")
这使用CSS选择器[class*='coin-summary'] div
,它是元素'中的div
标签,其中className包含字符串'coin-summary'
。
CSS选择器返回一个列表,因此.querySelectorAll
方法用于返回一个随后遍历的nodeLIst。
我使用表标记获取结束历史数据(这是一个实际的表):
Set hTable = .document.getElementsByTagName("table")(0)
然后,我遍历表格中的行和行内的单元格。
VBA:
Option Explicit
Public Sub GetInfo()
Dim IE As Object
Set IE = CreateObject("InternetExplorer.Application")
Application.ScreenUpdating = False
With IE
.Visible = True
.navigate "https://coinmarketcap.com/currencies/bitcoin/historical-data/"
While .Busy Or .readyState < 4: DoEvents: Wend '<== Loop until loaded
Dim hTable As HTMLTable
Set hTable = .document.getElementsByTagName("table")(0)
Dim tSection As Object, tRow As Object, tCell As Object, tr As Object, td As Object, r As Long, c As Long, hBody As Object
Dim headers(), headers2()
headers = Array("Date", "Open*", "High", "Low", "Close**", "volume", "Market Cap")
headers2 = Array("Market Cap", "Volume (24h)", "Circulating Supply", "Max Supply")
With ActiveSheet
.Cells.ClearContents
.Cells(1, 1) = IE.document.querySelector(".col-xs-6.col-sm-8.col-md-4.text-left").innerText
Dim aNodeList As Object, i As Long, resumeRow As Long
Set aNodeList = IE.document.querySelectorAll("[class*='coin-summary'] div")
resumeRow = .Cells(.Rows.Count, "A").End(xlUp).Row + 2
.Range("A" & resumeRow).Resize(1, UBound(headers2) + 1) = headers2
For i = 0 To aNodeList.Length - 1
.Cells(resumeRow + 1, i + 1) = aNodeList.item(i).innerText
Next i
r = .Cells(.Rows.Count, "A").End(xlUp).Row + 2
.Cells(r, 1).Resize(1, UBound(headers) + 1) = headers
Set hBody = hTable.getElementsByTagName("tbody")
For Each tSection In hBody 'HTMLTableSection
Set tRow = tSection.getElementsByTagName("tr") 'HTMLTableRow
For Each tr In tRow
r = r + 1
Set tCell = tr.getElementsByTagName("td")
c = 1
For Each td In tCell 'DispHTMLElementCollection
.Cells(r, c).Value = td.innerText 'HTMLTableCell
c = c + 1
Next td
Next tr
Next tSection
End With
'Quit '<== Remember to quit application
Application.ScreenUpdating = True
End With
End Sub
表格中的输出(样本):
来自页面的一些示例数据:
这将从该表中获取数据。
Option Explicit
Sub Web_Table_Option_One()
Dim xml As Object
Dim html As Object
Dim objTable As Object
Dim result As String
Dim lRow As Long
Dim lngTable As Long
Dim lngRow As Long
Dim lngCol As Long
Dim ActRw As Long
Set xml = CreateObject("MSXML2.XMLHTTP.6.0")
With xml
.Open "GET", "https://coinmarketcap.com/currencies/bitcoin/historical-data/", False
.send
End With
result = xml.responseText
Set html = CreateObject("htmlfile")
html.body.innerHTML = result
Set objTable = html.getElementsByTagName("Table")
For lngTable = 0 To objTable.Length - 1
For lngRow = 0 To objTable(lngTable).Rows.Length - 1
For lngCol = 0 To objTable(lngTable).Rows(lngRow).Cells.Length - 1
ThisWorkbook.Sheets("Sheet1").Cells(ActRw + lngRow + 1, lngCol + 1) = objTable(lngTable).Rows(lngRow).Cells(lngCol).innerText
Next lngCol
Next lngRow
ActRw = ActRw + objTable(lngTable).Rows.Length + 1
Next lngTable
End Sub
您当然可以遍历一系列URL,并遍历每个URL。这500个网址在哪里?如果它们与您提供的不同,您可能会为您裁掉工作。通常,所有网站都是非常不同的,屏幕抓取是一个高度定制的过程。