你好,我是 Python 新手,我试图弄清楚为什么每次在 while 循环期间加载和抓取新页面时我的列表都会覆盖以前的元素。先感谢您。
def scrapeurls():
domain = "https://domain234dd.com"
count = 0
while count < 10:
page = requests.get("{}{}".format(domain, count))
soup = BeautifulSoup(page.content, 'html.parser')
data = soup.findAll('div', attrs={'class': 'video'})
urls = []
for div in data:
links = div.findAll('a')
for a in links:
urls.append(a['href'])
print(a['href'])
print(count)
count += 1
因为您在循环的每次迭代中将
urls
重置为空列表。您应该将其移至循环之前。
(注意,整个事情最好用 for 循环来表示。)
您需要在循环之前初始化 URL 列表。如果您在循环内进行初始化,它每次都会将其设置为空。
domain = "https://domain234dd.com"
count = 0
urls = []
while count < 10:
page = requests.get("{}{}".format(domain, count))
soup = BeautifulSoup(page.content, 'html.parser')
data = soup.findAll('div', attrs={'class': 'video'})
for div in data:
links = div.findAll('a')
for a in links:
urls.append(a['href'])
print(a['href'])
print(count)
count += 1