XMLHTTPRequest 总是返回“找不到页面”

问题描述 投票:0回答:1

前提

我正在尝试抓取这个网站。

https://pb.nalog.ru/

如果您将组织 ID 放入搜索栏中,然后按“Искать”,它会将您重定向到一个单独的页面,其基本网址为 https://pb.nalog.ru/search.html 和哈希值“#t=* &mode=search-all&queryAll=ID" 其中 t 是当前毫秒(来自

Date.gettime()

问题

如果我使用宏生成的 url 并手动将其放入浏览器中,它会返回正确的页面,但每当我尝试以编程方式执行此操作时,它都会返回 404 找不到页面网站虚拟;而且它返回的url和我的不一样:

  • 我收到的那个

https://pb.nalog.ru/search.html#t=1730356470622&mode=search-all&queryAll=9714055795

  • 收到的一个网站

/search.html%23t=1730356470622&mode=search-all&queryAll=9714055795

I assume %23 是 # 的转换,但我对此很陌生,不能肯定地说。我会尽力回答所有后续问题。

这是有问题的代码:

Option Explicit

Private Type SYSTEMTIME
    wYear As Integer
    wMonth As Integer
    wDayOfWeek As Integer
    wDay As Integer
    wHour As Integer
    wMinute As Integer
    wSecond As Integer
    wMilliseconds As Integer
End Type

Private Declare PtrSafe Sub GetSystemTime Lib "kernel32" (lpSystemTime As SYSTEMTIME)

Function CurrentTimeMillis() As Double
    ' Returns the milliseconds from 1970/01/01 00:00:00.0 to system UTC
    Dim st As SYSTEMTIME
    GetSystemTime st
    Dim t_Start, t_Now
    t_Start = DateSerial(1970, 1, 1) ' Starting time for Linux
    t_Now = DateSerial(st.wYear, st.wMonth, st.wDay) + _
        TimeSerial(st.wHour, st.wMinute, st.wSecond)
    CurrentTimeMillis = DateDiff("s", t_Start, t_Now) * 1000 + st.wMilliseconds
End Function

Public Sub oopsie_doopsie()

    Dim http As New XMLHTTP60
    Dim html As New HTMLDocument
    Dim curr As Double
    
    curr = CurrentTimeMillis(): Debug.Print curr
    With http
        .Open "GET", "https://pb.nalog.ru/search.html#t=" & curr & "&mode=search-all&queryAll=" & "9714055795" & "", False
        Debug.Print "https://pb.nalog.ru/search.html#t=" & curr & "&mode=search-all&queryAll=" & "9714055795" & ""
        DoEvents
        .send
        DoEvents
        html.body.innerHTML = .responseText
    End With
    
    html.getElementsByClassName ("pb-subject-status pb-subject-status--active")
    
End Sub
html excel vba web-scraping xmlhttprequest
1个回答
0
投票

我想你需要使用 XMLHTTP60.setRequestHeader 来创建服务器接受的标头:

    With http
        .setRequestHeader = "header zdes"
        .Open "GET", "https://pb.nalog.ru/search.html#t=" & curr & "&mode=search-all&queryAll=" & "9714055795" & "", False
    ... 

使用 Postman 之类的工具来研究任何浏览器发送的标头。

© www.soinside.com 2019 - 2024. All rights reserved.