首先,我是VBA的新手,想在Excel中建立一个天气统计数据。非常感谢帮助!
为此,我需要来自多个网站的数据,包括测量温度,气象站和时间的信息。
通过我在互联网上找到的一些VBA信息,到目前为止我已经编写了一个代码,它只在即时窗口中为我提供了一个网站所需的信息。
看起来像这样:
7.4°C | Wien-Mariabrunn (225m) | 14:00
7.6°C | Wien-Hohe Warte (198m) | 14:00
7.6°C | Wien-Unterlaa (200m) | 14:00
7.7°C | Wien-Schwechat (183m) | 14:00
7.8°C | Wien-Donaufeld (160m) | 14:00
8.1°C | Grossenzersdorf (154m) | 14:00
8.2°C | Wien-City (177m) | 14:00
Dim xmlReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim Temps1 As MSHTML.IHTMLElementCollection
Dim temps2 As MSHTML.IHTMLElementCollection
Dim Temp As MSHTML.IHTMLElement
xmlReq.Open "GET", "https://kachelmannwetter.com/at/messwerte/wien/temperatur/20190101-1300z.html", False
xmlReq.send
If xmlReq.Status <> 200 Then
MsgBox "Problem" & vbNewLine & xmlReq.Status & " - " & xmlReq.statusText
Exit Sub
End If
HTMLDoc.body.innerHTML = xmlReq.responseText
Set Temps1 = HTMLDoc.getElementsByClassName("ap o o-1 o-tmp-5")
Set temps2 = HTMLDoc.getElementsByClassName("ap o o-1 o-tmp-1")
For Each Temp In Temps1
Debug.Print Temp.Title
Next Temp
For Each Temp In temps2
Debug.Print Temp.Title
Next Temp
我遇到的第一个问题是,我不知道如何将这些信息放入一个或多个单元格中。
第二个问题是,我需要来自多个网站的相同数据,从开始日期到结束日期。我在这个示例代码中使用的网站是https://kachelmannwetter.com/at/messwerte/wien/temperatur/20190101-1300z.html
。最后,您可以找到日期,在此示例中为“20190101”,时间为“1300”。所以为此,我需要某种循环。
所以最后我需要一个工作表,其中A列带有日期,B列带有时间,C列带有数据(每个日期和时间)。
我希望这在某种程度上是可以理解的,我非常感谢任何帮助。
以下适用于短日期范围。较长的日期范围会导致响应变慢。很可能网站阻塞/限制。为此:
pauseIndex
,这意味着在添加下一个请求之前,每个x(= pauseIndex)URL的数量,延迟为y秒(由waitSeconds
指定)。你可以玩这个。MSXML2.ServerXMLHTTP
。修补上述内容以获得最佳设置和批量请求大小(如果进行批处理)。
指定开始日期时间和结束日期时间:
日期范围在名为Date ranges
的工作表中指定。它有以下设置:
构建数据集:
我建议您构建一个平面数据集,在其中指定要返回信息的确切站点。并非每个时间段都显示每个电台。
stations = Array("Wien-Schwechat", "Wien-Unterlaa", "Wien-Mariabrunn", "Wien-Hohe Warte", "Grossenzersdorf", _
"Wien-Donaufeld", "Wien-City")
你可以扩展它。我包含一个字典变量newStations
,它存储了所有未在列表中进行监视的站点。您可以轻松编写这些内容,以帮助确定要在数据集中监视/包含的其他工作站。
缺失站点读数的占位符值用于确保完整的数据集。
您可能希望将“异常值”标准化 - 例如,实际小时值可以在范围内而不是在小时内。在下面的演示中,为一个站准确检索16:20。您可以将其标准化为16:00。
助手功能/子:
代码中使用了许多辅助函数和1个子函数。
GetAllLinks
。生成开始日期时间和结束日期时间之间的所有请求URL。请参阅代码中的注释。这些可以循环以发出每个数据请求EmptyDict
- 确保在请求之间清除站数据UpdateDictForNoReading
。处理未在指定日期时间内报告受监控工作站的情况。它使用"No reading"
更新温度和长站描述WriteOutResults
。生成"flat"
,即不嵌套的2D数组结构,并将结果写入指定的输出表检索站点和站点数据:
我使用css attribute = value选择器和contains
运算符来定位站数据。
以示例站的HTML为例
<a class="ap o o-1 o-tmp--1" data-target="#obs-detail-3h" data-toggle="modal" data-left="635" data-top="545" onclick="obs_detail_3h('-1.0°C', 'Wien-Schwechat (183m)', '16:20','110360', '201901031500');" title="-1.0°C | Wien-Schwechat (183m) | 16:20" style="left: 408.533px; top: 337.757px;">-1</a>
如果我们查看class属性,我们会看到如下:
class="ap o o-1 o-tmp--1"
class属性的值是"ap o o-1 o-tmp--1"
,它实际上是一系列由空格分隔的类。每个站类值具有相同的子字符串,即o-tmp
。你可以稍微改变一下。我使用querySelectorAll返回在class属性值中具有此子字符串的所有元素的nodeList。
Set mapStations = html.querySelectorAll("[class*='o-tmp']")
这匹配页面上的所有站点(地图)。
nodeList(title
)中每个节点的mapStations
属性包含感兴趣的数据:
title="-1.0°C | Wien-Schwechat (183m) | 16:20"
该字符串包含管道(|
)分隔符。我可以使用split()
生成一个包含每一位信息的数组:
arr = Split(mapStations.item(i).Title, " | ")
这会生成一个数组,其中包含不同指数的-1.0°C
,Wien-Schwechat (183m)
和16:20
。使用这个例子,我将-1.0°C
存储在变量temp
中,Wien-Schwechat (183m)
存储在变量stationFull
中,只是将站点名称Wien-Schwechat
in station
,16:20
存储在time
中。
ALL:
要求:
Date ranges
Output
Date ranges
中的数据应如上图所示。
VBA:
Option Explicit
Public Sub GetInfo()
'VBE > Tools > References > Microsoft HTML Object Library
'Collect hourly temperature readings from list of stations in array stations.
'Missing readings are populated with "Missing reading". Times specified in request are not necessarily identical _
'to that correctly returned from page as reading reported within an hour interval may not be on the hour
Dim html As HTMLDocument, i As Long, arr() As String, mapStations As Object, dict As Object, newStations As Object
Dim time As String, station As String, temp As String, stations(), results(), j As Long
Dim urls As Object, url As Variant, startOfDateString As Long, currDate As String, stationFull As String
Dim outputSht As Worksheet, x As Long
Const pauseIndex As Long = 20
Const waitSeconds As Long = 1
Const PREFIX As String = "https://kachelmannwetter.com/at/messwerte/wien/temperatur/"
Const SUFFIX As String = "z.html"
startOfDateString = InStrRev(PREFIX, "/") + 1
Set outputSht = ThisWorkbook.Worksheets("Output")
Set urls = GetAllLinks(PREFIX, SUFFIX)
Set html = New HTMLDocument
Set dict = CreateObject("Scripting.Dictionary")
Set newStations = CreateObject("Scripting.Dictionary")
stations = Array("Wien-Schwechat", "Wien-Unterlaa", "Wien-Mariabrunn", "Wien-Hohe Warte", "Grossenzersdorf", _
"Wien-Donaufeld", "Wien-City") 'order of stations here should match that in sheet
j = 1
For i = LBound(stations) To UBound(stations)
dict(stations(i)) = vbNullString
Next
ReDim results(1 To 1 * urls.Count)
With CreateObject("MSXML2.XMLHTTP")
For Each url In urls
x = x + 1
If x Mod pauseIndex = 0 Then Application.Wait Now + TimeSerial(0, 0, waitSeconds)
DoEvents
.Open "GET", url, False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
html.body.innerHTML = .responseText
Set mapStations = html.querySelectorAll("[class*='o-tmp']")
For i = 0 To mapStations.Length - 1
arr = Split(mapStations.item(i).Title, " | ")
currDate = Join(Array(Mid$(url, startOfDateString + 4, 2), Mid$(url, startOfDateString + 6, 2), Mid$(url, startOfDateString, 4)), "-")
temp = arr(0)
station = Split(arr(1), " (")(0)
stationFull = arr(1)
time = arr(2)
If Not dict.Exists(station) Then
newStations(station) = vbNullString 'in case you are interested in which stations exist you are not monitoring
Else
dict(station) = Array(currDate, time, station, stationFull, temp)
End If
Next
Set dict = UpdateDictForNoReading(dict, currDate, time)
results(j) = dict.items
j = j + 1
Set dict = EmptyDict(dict)
Next
End With
WriteOutResults outputSht, results, UBound(stations) + 1
End Sub
Public Function UpdateDictForNoReading(ByVal dict As Object, ByVal currDate As String, ByVal time As String) As Object
'Loop dictionary containing station readings. If current value for key is not an array then no readings where found. _
'then dict is updated with "No reading" for station long text (which includes m e.g. Wien-Schwechat (183m)) and temperature
Dim key As Variant
For Each key In dict
If Not IsArray(dict(key)) Then dict(key) = Array(currDate, time, key, "No reading", "No reading")
Next
Set UpdateDictForNoReading = dict
End Function
Public Sub WriteOutResults(ByVal ws As Worksheet, ByRef results As Variant, ByVal stationCount As Long)
'Loop results array which at each index should have a child array which is comprised of all stations specified _
'The code unravels the nested structure into "flat" array for writing out to sheet. Aim is to be more efficient _
'with writing out to sheet
'The sheet to write results to is passed as argument ws. Headers are stated below.
Dim headers(), outputArr(), i As Long, arr(), j As Long, r As Long, c As Long
headers = Array("Date", "Time", "Station", "StationFull", "Temp")
ReDim outputArr(1 To UBound(results) * stationCount, 1 To UBound(headers) + 1)
For i = LBound(results) To UBound(results)
arr = results(i) '0-6
For j = LBound(arr) To UBound(arr)
r = r + 1
If IsArray(arr(j)) Then
For c = LBound(arr(j)) To UBound(arr(j))
outputArr(r, c + 1) = arr(j)(c)
Next
End If
Next
Next
With ws
.Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
.Cells(2, 1).Resize(UBound(outputArr, 1), UBound(outputArr, 2)) = outputArr
End With
End Sub
Public Function EmptyDict(ByVal dict As Object) As Object
'Ensures no data remains from prior request
Dim key As Variant
For Each key In dict
dict(key) = vbNullString
Next
Set EmptyDict = dict
End Function
Public Function GetAllLinks(ByVal PREFIX As String, ByVal SUFFIX As String) As Object
'Generate all urls between start date time and end date time. Accounts for fact that there is no real midnight. 00:00 uhr retrieves the 23:00 of prior day.
'Times selected on page e.g. 1:00 uhr are one hour ahead of what is used in url string e.g. 1 > 0.
Dim ws As Worksheet, hours(), urls As Collection
Set urls = New Collection
Set ws = ThisWorkbook.Worksheets("Date ranges")
'url "0000" = 1am. The selection of 00:00 in the sheet gives 23:00 of prior date
hours = Array("0000", "0100", "0200", "0300", "0400", "0500", "0600", "0700", "0800", "0900", "1000", "1100", "1200", _
"1300", "1400", "1500", "1600", "1700", "1800", "1900", "2000", "2100", "2200", "2300")
Dim startDate As Date, endDate As Date, startTime As String, endTime As String, currentDate As Date
Dim endIndex As Long, startIndex As Long
With ws
startDate = .Cells(1, 2).Value2 'Required for input yyyy-mm-dd; Required for output yyyymmdd
endDate = .Cells(1, 5).Value2
startTime = .Cells(2, 2)
endTime = .Cells(2, 5)
End With
startIndex = Application.Match(startTime, hours) - 2
endIndex = Application.Match(endTime, hours) - 2
currentDate = startDate
Dim i As Long, s As Long, e As Long
Do While currentDate <= endDate
If startDate = endDate Then
s = startIndex
e = endIndex
Else
Select Case currentDate
Case startDate
s = startIndex
e = UBound(hours)
Case endDate
s = LBound(hours)
e = endIndex
Case Else
s = LBound(hours)
e = UBound(hours)
End Select
End If
For i = s To e
urls.Add PREFIX & Format$(currentDate, "yyyymmdd") & "-" & hours(i) & SUFFIX
Next
currentDate = DateAdd("d", 1, currentDate)
Loop
Set GetAllLinks = urls
End Function
示例输出: