我正在尝试遍历文本文件中的链接列表,并将信息写入文本文件。我得到'索引超出范围'错误,我不明白为什么。
import requests
from bs4 import BeautifulSoup
def item(a):
return a[::2]
def details(a):
return a[1::2]
sFile = open('scraped_data.txt', 'w+')
with open('C:/Users/Main/Desktop/Python Web Scraper/link_list.txt') as f:
lines = f.read().splitlines()
def scrape(l):
r = requests.get(l)
soup = BeautifulSoup(r.text, "lxml")
itemlist = []
for items in soup.find_all('td'):
itemlist.append(items.text.strip())
for i in range(0, 6):
print(item(itemlist)[i] + ' ' + details(itemlist)[i])
for i in range(0, 52):
scrape(lines[i])
sFile.close()
这是控制台结果。
Traceback (most recent call last):
File "C:/Users/Cobus Uys/PycharmProjects/Scraper/Scraper.py", line 33, in <module>
scrape(lines[i])
File "C:/Users/Cobus Uys/PycharmProjects/Scraper/Scraper.py", line 29, in scrape
print(item(itemlist)[i] + ' ' + details(itemlist)[i])
IndexError: list index out of range
Process finished with exit code 1
将其包装在try/except
子句中,该子句将捕获任何错误并在迭代完成时停止。
您还可以利用添加额外的except
条款,如果合适,可以使用else
或finally
。
在功能中:
def scrape(l):
r = requests.get(l)
soup = BeautifulSoup(r.text, "lxml")
itemlist = []
for items in soup.find_all('td'):
itemlist.append(items.text.strip())
try:
for i in range(0, 6):
print(item(itemlist)[i] + ' ' + details(itemlist)[i])
except IndexError:
print('Scraping finished')
在for循环中:
try:
for i in range(0, 52):
scrape(lines[i])
except IndexError:
print('Scaraping Finished')