<code>https://untappd.com/v/beer-culture/893427</code>

问题描述 投票:0回答:2

['https://untappd.com/v/beer-culture/893427?menu_id=1489', 'https://untappd.com/v/beer-culture/893427?menu_id=116472']

仅刮擦原件
https://untappd.com/v/beer-culture/893427
两次
这里是我的脚本:

import requests from bs4 import BeautifulSoup venue_url = 'https://untappd.com/v/beer-culture/893427' count = 0 response = requests.get(venue_url, headers = {'User-agent': 'Mozilla/5.0'}) soup = BeautifulSoup(response.text, 'html.parser') def get_menu_beers(soup): global count menu = soup.find('div', {'class': 'menu-area'}) beers_all = menu.find_all('ul', {'class': 'menu-section-list'}) for beer_group in beers_all: beers = beer_group.find_all('li') for beer in beers: details = beer.find('div', {'class': 'beer-details'}) name_ = details.find("a",{"class":"track-click"}).text count = count + 1 print(count, ' ', name_) select_options = soup.find_all('select', {'class':'menu-selector'}) options_list = select_options[0].find_all('option') menu_ids =[] for option in options_list: menu_ids.append(int(option['value'])) menu_urls = [] for menu_id in menu_ids: menu_url = str(venue_url)+ '?menu_id=' + str(menu_id) menu_urls.append(menu_url) print(menu_urls) for url in menu_urls: res = requests.get(venue_url, headers = {'User-agent': 'Mozilla/5.0'}) s = BeautifulSoup(res.text, 'html.parser') get_menu_beers(s)

	

在您的最后几行代码中,您应该从菜单中传递

url

而不是
venue_url

for url in menu_urls:
    #### pass in url not venue_url ####
    res = requests.get(url, headers = {'User-agent': 'Mozilla/5.0'})
    s = BeautifulSoup(res.text, 'html.parser')
    get_menu_beers(s)
python web-scraping beautifulsoup python-requests
2个回答
1
投票

似乎问题是,您一直在向原始venue_url提出请求,而不是使用菜单列表中的正确URL。在您的最后一个循环中,您仍将Venue_url传递给请求。

将这条线重新放置在您的循环中:

res = requests.get(venue_url, headers={'User-agent': 'Mozilla/5.0'})

0
投票

res = requests.get(url, headers={'User-agent': 'Mozilla/5.0'})

最新问题
© www.soinside.com 2019 - 2025. All rights reserved.