Web Scrap python,在网站上提交搜索表单而不更改网址

问题描述 投票:0回答:1

我想通过搜索zipcodes来提取影院位置,并提取结果。网站检查是这样的:

    <form id="set-location-form" class="ip-geoloc-address" action="/theatres" method="post" accept-charset="UTF-8"><div><button class="btn btn-default form-submit" id="edit-find" name="op" value=" " type="submit"> </button>
    <input type="hidden" name="form_build_id" value="form-C5B0Dm8QYZgOzeTv2uf9FlNjWVK-EbcLpDKjRz_HQt4" />
    <input type="hidden" name="form_id" value="ip_geoloc_set_location_form" />
    <div class="form-type-textfield form-item-street-address form-item form-group">
     <input placeholder="Enter your location" class="form-control form-text" type="text" id="edit-street-address" name="street_address" value="" size="60" maxlength="128" />
    </div>
    <button class="btn btn-default form-submit" id="edit-submit-address" name="op" value="Go" type="submit">Go</button>
    <button class="change-view btn-map-expand btn btn-default form-submit" id="edit-map-expand" name="op" value="Map" type="button">Map</button>
    <button class="change-view btn btn-default form-submit" id="edit-change-view" name="op" value="" type="button"></button>

检查结果如下:

[enter image description here][1]

但是当我查看页面源代码时,它不存在:

<div class="region region-content">
<section id="block-system-main" class="block block-system clearfix">
<div class="view view-theatres view-id-theatres view-display-id-page view-dom-id-8a00da3218aaa60e6d4d49fd07033c0b wrapper-container-box">
<div class="attachment attachment-before fix-wrapper">
<div class="view view-theatres view-id-theatres view-display-id-attachment_1">
<div class="view-content">
<div class="ip-geoloc-map view-based-map">

我尝试了这两个代码,但没有工作。导入请求

url = 'https://www.imax.com/theatres/'
data = {'street_address':'78759'}
r = requests.get(url, params=data)
with open("requests_results.html", "wb") as f:
    f.write(r.content)


data = { 'street_address':'94704'}
# Get the page
# use .post
# send the data
url = "https://www.imax.com/theatres/"
response = requests.post(url,data=data)
doc = BeautifulSoup(response.text, 'html.parser')

任何人的帮助,谢谢!

python web-scraping request
1个回答
0
投票

该页面使用lat和lon来请求数据。你可以模仿xhr(首先得到通过的位置的纬度和经度 - 我使用free API。你怎么做取决于你。)


您可以在此处查看请求:

enter image description here


响应json的行包含对着键的html。这里的输出示例


由于与行中的键相关联的内部值是html,我将其传递回BeautifulSoup进行处理。 html内容示例:

enter image description here


import requests
import json
import pandas as pd
from bs4 import BeautifulSoup as bs
apiKey = "yourFreeAPIkey"
address = "78759" 

url = "https://api.opencagedata.com/geocode/v1/json?q=" + address + "&key=" + apiKey + "&pretty=1"
res = requests.get(url).json()
data = res['results'][1]['geometry']
lat = data['lat']
lng = data['lng']
date = '2019-03-09'
res = requests.get('https://www.imax.com/showtimes/ajax/theatres?date=' + date + '&lat=' + str(lat) + '&lon=' + str(lng))
soup = bs(res.content, 'lxml')
newData = json.loads(soup.select_one('p').text)
columns = ['movieTitle', 'movieLink', 'theatreLink', 'address','movieFormat', 'times']
baseURL = 'https://www.imax.com'
results = []

for row in newData['rows']:
    soup = bs(row['row'], 'lxml')
    link = baseURL + soup.select_one('a')['href']
    address = soup.select_one('.theatre-address').text.strip()
    movieTitle = soup.select_one('.movie-title').text.strip()
    movieLink = baseURL + soup.select_one('.movie-title a')['href']
    movieFormat = soup.select_one('.movie-format').text.strip()
    times = [item.text.strip() for item in soup.select('.line-items a')]
    results.append([movieTitle, movieLink, link, address, movieFormat, times])
df = pd.DataFrame(results, columns = columns)
print(df)

示例结果:

© www.soinside.com 2019 - 2024. All rights reserved.