如何使用变量来表示链接?

问题描述 投票:0回答:2

我录制了一个宏,并试图使用for循环来调整它,我想要从中抓取不同的链接。

问题是,VBA无法将我的变量识别为链接。当我直接在代码中输入链接时,它可以工作。我不仅需要来自一个链接的数据,还需要来自500的数据。

这是我的代码片段:

Dim Link As String
Link = "https://coinmarketcap.com/currencies/bitcoin/historical-data/"
For i = 1 To 5
Link = Cells(i, 1)

     ActiveWorkbook.Queries.Add Name:="Table 0 (3)", Formula:= _
        "let" & Chr(13) & "" & Chr(10) & "    Quelle = Web.Page(Web.Contents(""https://coinmarketcap.com/currencies/ontology/historical-data/""))," & Chr(13) & "" & Chr(10) & "    Data0 = Quelle{0}[Data]," & Chr(13) & "" & Chr(10) & "    #""Geänderter Typ"" = Table.TransformColumnTypes(Data0,{{""Date"", type date}, {""Open*"", type number}, {""High"", type number}, {""Low"", type number}, {""Close**"", type number}, {""Volume"", type number}, {""Market Cap" & _
        """, type number}})" & Chr(13) & "" & Chr(10) & "in" & Chr(13) & "" & Chr(10) & "    #""Geänderter Typ"""
    With ActiveSheet.ListObjects.Add(SourceType:=0, Source:= _
        "OLEDB;Provider=Microsoft.Mashup.OleDb.1;Data Source=$Workbook$;Location=""Table 0 (3)"";Extended Properties=""""" _
        , Destination:=Range("$D$1")).QueryTable
        .CommandType = xlCmdSql
        .CommandText = Array("SELECT * FROM [Table 0 (3)]")
        .RowNumbers = False
        .FillAdjacentFormulas = False
        .PreserveFormatting = True
        .RefreshOnFileOpen = False
        .BackgroundQuery = True
        .RefreshStyle = xlInsertDeleteCells
        .SavePassword = False
        .SaveData = True
        .AdjustColumnWidth = True
        .RefreshPeriod = 0
        .PreserveColumnInfo = True
        .ListObject.DisplayName = "Table_0__3"
        .Refresh BackgroundQuery:=False
    End With
Next

一旦我更改了变量“link”的链接(“”https://coinmarketcap.comblabla“”),我就会得到一个应用程序或对象定义的错误。当我深入挖掘并点击数组时,Excel告诉我导入“链接”未连接到导出。

excel vba web-scraping
2个回答
1
投票

您可以使用下面的代码获取主要历史数据表和上面的信息。它有点棘手,有点脆弱,因为很多都依赖于当前的页面样式,这可能会改变。历史数据位是一个实际的表,它更加健壮。

例如,您可以使用从单元格中拾取的新URL进行循环,并在每个循环的开头简单地使用Sheets.Add行,这样您就可以使用新的Activesheet来写入数据。

下面,应该足以让您开始,具体取决于您的要求。


我得到了最高点:

top bit

使用.Cells(1, 1) = IE.document.querySelector(".col-xs-6.col-sm-8.col-md-4.text-left").innerText。这不是很强大。文档的样式可以更改。但是,访问和获取页面并不是一个容易的部分,可能是您当前选择的任何方法都很容易受到攻击。我使用元素的类名(".")来检索使用.querySelector文档方法的信息来应用CSS selector .col-xs-6.col-sm-8.col-md-4.text-left。这与.getElementsByClassName(0)相同。


我得到中间位:

middle

Set aNodeList = IE.document.querySelectorAll("[class*='coin-summary'] div")

这使用CSS选择器[class*='coin-summary'] div,它是元素'中的div标签,其中className包含字符串'coin-summary'

CSS选择器返回一个列表,因此.querySelectorAll方法用于返回一个随后遍历的nodeLIst。

List returned by CSS selector


我使用表标记获取结束历史数据(这是一个实际的表):

Set hTable = .document.getElementsByTagName("table")(0)

然后,我遍历表格中的行和行内的单元格。


VBA:

Option Explicit
Public Sub GetInfo()
    Dim IE As Object
    Set IE = CreateObject("InternetExplorer.Application")
    Application.ScreenUpdating = False
    With IE
        .Visible = True
        .navigate "https://coinmarketcap.com/currencies/bitcoin/historical-data/"

        While .Busy Or .readyState < 4: DoEvents: Wend '<== Loop until loaded

        Dim hTable As HTMLTable
        Set hTable = .document.getElementsByTagName("table")(0)

        Dim tSection As Object, tRow As Object, tCell As Object, tr As Object, td As Object, r As Long, c As Long, hBody As Object
        Dim headers(), headers2()
        headers = Array("Date", "Open*", "High", "Low", "Close**", "volume", "Market Cap")
        headers2 = Array("Market Cap", "Volume (24h)", "Circulating Supply", "Max Supply")

        With ActiveSheet
            .Cells.ClearContents
            .Cells(1, 1) = IE.document.querySelector(".col-xs-6.col-sm-8.col-md-4.text-left").innerText
            Dim aNodeList As Object, i As Long, resumeRow As Long
            Set aNodeList = IE.document.querySelectorAll("[class*='coin-summary'] div")
            resumeRow = .Cells(.Rows.Count, "A").End(xlUp).Row + 2
            .Range("A" & resumeRow).Resize(1, UBound(headers2) + 1) = headers2

            For i = 0 To aNodeList.Length - 1
                .Cells(resumeRow + 1, i + 1) = aNodeList.item(i).innerText
            Next i

            r = .Cells(.Rows.Count, "A").End(xlUp).Row + 2

            .Cells(r, 1).Resize(1, UBound(headers) + 1) = headers
            Set hBody = hTable.getElementsByTagName("tbody")
            For Each tSection In hBody           'HTMLTableSection
                Set tRow = tSection.getElementsByTagName("tr") 'HTMLTableRow
                For Each tr In tRow
                    r = r + 1
                    Set tCell = tr.getElementsByTagName("td")
                    c = 1
                    For Each td In tCell         'DispHTMLElementCollection
                        .Cells(r, c).Value = td.innerText 'HTMLTableCell
                        c = c + 1
                    Next td

                Next tr
            Next tSection


        End With

        'Quit '<== Remember to quit application
        Application.ScreenUpdating = True
    End With
End Sub

表格中的输出(样本):

Example output


来自页面的一些示例数据:

Example data


1
投票

这将从该表中获取数据。

Option Explicit
Sub Web_Table_Option_One()
    Dim xml    As Object
    Dim html   As Object
    Dim objTable As Object
    Dim result As String
    Dim lRow As Long
    Dim lngTable As Long
    Dim lngRow As Long
    Dim lngCol As Long
    Dim ActRw As Long
    Set xml = CreateObject("MSXML2.XMLHTTP.6.0")
    With xml
        .Open "GET", "https://coinmarketcap.com/currencies/bitcoin/historical-data/", False
        .send
    End With
    result = xml.responseText
    Set html = CreateObject("htmlfile")
    html.body.innerHTML = result
    Set objTable = html.getElementsByTagName("Table")
    For lngTable = 0 To objTable.Length - 1
        For lngRow = 0 To objTable(lngTable).Rows.Length - 1
            For lngCol = 0 To objTable(lngTable).Rows(lngRow).Cells.Length - 1
                ThisWorkbook.Sheets("Sheet1").Cells(ActRw + lngRow + 1, lngCol + 1) = objTable(lngTable).Rows(lngRow).Cells(lngCol).innerText
            Next lngCol
        Next lngRow
        ActRw = ActRw + objTable(lngTable).Rows.Length + 1
    Next lngTable
End Sub

您当然可以遍历一系列URL,并遍历每个URL。这500个网址在哪里?如果它们与您提供的不同,您可能会为您裁掉工作。通常,所有网站都是非常不同的,屏幕抓取是一个高度定制的过程。

© www.soinside.com 2019 - 2024. All rights reserved.