['https://untappd.com/v/beer-culture/893427?menu_id=1489', 'https://untappd.com/v/beer-culture/893427?menu_id=116472']
仅刮擦原件
https://untappd.com/v/beer-culture/893427
两次
这里是我的脚本:
import requests
from bs4 import BeautifulSoup
venue_url = 'https://untappd.com/v/beer-culture/893427'
count = 0
response = requests.get(venue_url, headers = {'User-agent': 'Mozilla/5.0'})
soup = BeautifulSoup(response.text, 'html.parser')
def get_menu_beers(soup):
global count
menu = soup.find('div', {'class': 'menu-area'})
beers_all = menu.find_all('ul', {'class': 'menu-section-list'})
for beer_group in beers_all:
beers = beer_group.find_all('li')
for beer in beers:
details = beer.find('div', {'class': 'beer-details'})
name_ = details.find("a",{"class":"track-click"}).text
count = count + 1
print(count, ' ', name_)
select_options = soup.find_all('select', {'class':'menu-selector'})
options_list = select_options[0].find_all('option')
menu_ids =[]
for option in options_list:
menu_ids.append(int(option['value']))
menu_urls = []
for menu_id in menu_ids:
menu_url = str(venue_url)+ '?menu_id=' + str(menu_id)
menu_urls.append(menu_url)
print(menu_urls)
for url in menu_urls:
res = requests.get(venue_url, headers = {'User-agent': 'Mozilla/5.0'})
s = BeautifulSoup(res.text, 'html.parser')
get_menu_beers(s)
在您的最后几行代码中,您应该从菜单中传递
url
而不是
venue_url
::
for url in menu_urls:
#### pass in url not venue_url ####
res = requests.get(url, headers = {'User-agent': 'Mozilla/5.0'})
s = BeautifulSoup(res.text, 'html.parser')
get_menu_beers(s)
似乎问题是,您一直在向原始venue_url提出请求,而不是使用菜单列表中的正确URL。在您的最后一个循环中,您仍将Venue_url传递给请求。
将这条线重新放置在您的循环中:
res = requests.get(venue_url, headers={'User-agent': 'Mozilla/5.0'})
:
res = requests.get(url, headers={'User-agent': 'Mozilla/5.0'})