无法从使用网站上的帐号启动搜索时填充的结果中抓取特定字段

问题描述 投票:0回答:1

当我使用此帐号0523620090003在此

网站
上发起搜索时,我可以在结果中看到有关该帐户的相关详细信息。我使用 requests 模块创建了一个脚本来抓取结果的两部分:
account details
fiduciary
。我已经可以刮掉左上角的
account details
了。但是,我无法解析与位于右上角中间的Fiduciary相关的信息。

import requests
from pprint import pprint

link = 'https://arcweb.hcad.org/server/rest/services/public/public_query/MapServer/0/query'

params = {
    'f': 'json',
    'distance': 2,
    'outFields': '*',
    'outSR': '102100',
    'spatialRel': 'esriSpatialRelIntersects',
    'units': 'esriSRUnit_StatuteMile',
    'where': "HCAD_NUM = '0523620090003'",
}

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
    'Accept': '*/*',
    'Accept-Encoding': 'gzip, deflate, br',
    'Accept-Language': 'en-US,en;q=0.9',
}
with requests.Session() as s:
    s.headers.update(headers)
    res = s.get(link,params=params)
    pprint(res.json()['features'][0]['attributes'])

如何使用请求模块从网站上抓取信托相关信息?

python python-3.x web-scraping python-requests
1个回答
0
投票

正如评论中所建议的,您可能会发现使用 Selenium 自动与网站交互很有用。

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys

import time

ACCOUNT_NUMBER = "0523620090003"

URL = "https://hcad.org/property-search/property-search"

options = Options()
options.add_argument("--disable-gpu")
options.add_argument("--no-sandbox")

driver = webdriver.Remote("http://127.0.0.1:4444/wd/hub", options=options)

driver.get(URL)

# Change focus to <iframe>.
iframe = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, "iframe")))
driver.switch_to.frame(iframe)

# Locate the <input> field.
input = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, 'input[type="search"]')))

# Insert search term.
input.send_keys(ACCOUNT_NUMBER)
time.sleep(2)

# Trigger search.
button = driver.find_element(By.CSS_SELECTOR, ".input-group-append button")
button.click()

time.sleep(5)

# Find first search result and click.
row = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, "tr.resulttr.odd")))

row.click()

time.sleep(5)

# Get fiduciary details.
fiduciary = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, "i.fa-person-walking-luggage")))
details = fiduciary.find_element(By.XPATH, './../following-sibling::*[1]')
print(details.text)

driver.close()

我正在使用 Selenium 的远程实例。您可以将对

webdriver.Remote()
的调用替换为:

driver = webdriver.Chrome(options=options)

页面受托部分的输出:

BETTENCOURT TAX ADVISORS LLC - 05082
© www.soinside.com 2019 - 2024. All rights reserved.