Beautifulsoup Webscraping:如何使用javascript获取信息?

问题描述 投票:0回答:2

我正在尝试从Choice Hotel的网站(特别是https://www.choicehotels.com/tennessee/nashville/hotels)网页抓取特定页面,以创建田纳西州纳什维尔所有精选酒店的列表。当我打开页面并打开开发人员的工具时,我可以在<div class="list">下看到我正在寻找的信息,但是,当我试图刮擦网站时,我找不到这个标签。我似乎找不到比<div class="animate-fade z-index-90">更深的东西,任何更深层次的标签都只返回“无”。但是,我确实看到了很多Javascript的底部。我相信这是因为我在浏览器中打开页面时没有看到我看到的内容。如何让我的程序能够看到我看到的标签?

这是我尝试刮擦的方式:

from bs4 import BeautifulSoup
import csv

source = request.get("https://www.choicehotels.com/tennessee/nashville/hotels").text
soup = BeautifulSoup(source, 'lxml')
list = soup.find('div', class_='list')
print(list)

有什么我不做或做错了吗?

html python-3.x beautifulsoup
2个回答
2
投票

您可以使用POST请求直接访问页面JavaScript访问。它返回一个JSON对象,您可以解析解析任何JSON的方式。

import requests

data = {'adults':   '1',
'checkInDate':  '2018-09-08',
'checkOutDate': '2018-09-09',
'hotelSortOrder':   'RELEVANCE',
'include':  'amenity_groups, amenity_totals, rating, relative_media',
'lat':  '36.167839',
'lon':  '-86.77816',
'minors':   '0',
'optimizeResponse': 'image_url',
'placeId':  '414666',
'placeName':    'Nashville, TN, US',
'placeType':    'City',
'platformType': 'DESKTOP',
'preferredLocaleCode':  'en-us',
'ratePlanCode': 'RACK',
'ratePlans':    'RACK,PREPD,PROMO,FENCD',
'rateType': 'LOW_ALL',
'searchRadius': '25',
'siteOpRelevanceSortMethod':    'ALGORITHM_B',}

r = requests.post('https://www.choicehotels.com/webapi/location/hotels', data = data)

for h in r.json()['hotels']:
    print(h['name'])
    print (h['description'])

输出:

Comfort Inn Downtown Nashville-Vanderbilt
Get rested and ready for anything when you stay at the Comfort Inn Downtown Nashville-Vanderbilt hotel in Nashville, TN. We are merely minutes from the Nashville International Airport and conveniently located near Vanderbilt University and the Nashville Convention Center. Each comfortable room is furnished with a flat-screen TV, hair dryer, coffee maker, microwave and more. We also offer free WiFi, a fitness center and outdoor pool. Get going with a free hot breakfast including eggs, waffles and meat plus healthy options like yogurt and fresh fruit. Also, earn rewards including free nights and gift cards with our Choice Privileges Rewards program. 
Comfort Suites Airport
Get more of the space you need to spread out, relax or work at the smoke-free Comfort Suites Airport hotel in Nashville, TN, located near the Grand Ole Opry. Nearby attractions include Opry Mills, Ryman Auditorium, Music City Bowl and Music City Center. Nashville Convention Center, Sommet Center, BridgestoneFirestone and Antique Archaeology are also close. Enjoy free hot breakfast, free WiFi, free airport transportation, fitness center and a seasonal outdoor pool. Your spacious room includes a flat-screen TV, hair dryer, sofa sleeper, microwave and refrigerator. Also, earn rewards including free nights and gift cards with our Choice Privileges Rewards program. 
Clarion Hotel Nashville Downtown - Stadium
Get more value at the 100 percent smoke-free Clarion Hotel Nashville Downtown-Stadium in Nashville, TN. We are near Nissan Stadium, Country Music Hall of Fame, Ryman Auditorium, Vanderbilt University and Bridgestone Arena. Life is better when you get together--enjoy such amenities as free WiFi, ample free parking, free breakfast, free downtown shuttle, business and fitness centers and restaurant. Your guest room features a refrigerator, microwave, coffee maker, hair dryer, iron and ironing board. Also, earn rewards including free nights and gift cards with our Choice Privileges Rewards program.  CC required at check-in. Shuttle runs from 8 am-9 pm on the hour. 
The Capitol Hotel Downtown, an Ascend Hotel Collection Member
Let the destination reach you at The Capitol Hotel Downtown, an Ascend Hotel Collection Member in Nashville, TN. Our smoke-free, upscale property is conveniently located near many key performing arts and sports facilities for which this iconic city is known. All guestrooms include coffee makers, hair dryers, irons and ironing boards, desks, safes, refrigerators and more. Enjoy free breakfast, free WiFi, a fitness center and business center. Then, relax in our bar and bistro at the end of your day. Also, earn rewards including free nights and gift cards with our Choice Privileges Rewards program. 
Sleep Inn
The Sleep Inn hotel in Nashville, TN will give you a simply stylish experience. Were close to attractions like the the Grand Ole Opry, Nashville Convention Center, Opry Mills and the Sommet Center. Enjoy free breakfast, free WiFi, free weekday newspaper, a seasonal outdoor pool and guest laundry facilities. Your guest room offers warm, modern designs, and includes a flat-screen TV in addition to standard room amenities. Some rooms have microwaves, refrigerators, coffee makers, irons and ironing boards. Also, earn rewards including free nights and gift cards with our Choice Privileges Rewards program. 

1
投票

你必须处理JavaScript,你可以使用selenium来处理JS。请参阅下面的代码。

from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait as wait
from bs4 import BeautifulSoup
driver = webdriver.Chrome()
driver.get("https://www.choicehotels.com/tennessee/nashville/hotels")
wait(driver, 10).until(EC.visibility_of_element_located(
        (By.XPATH, '//*[@class="address"]')))
source = driver.page_source
soup = BeautifulSoup(source, 'lxml')
list = soup.find('div', class_='list')
print(list)
driver.close()
© www.soinside.com 2019 - 2024. All rights reserved.