我正在尝试循环并刮取与BS4的链接的文本文件。我在循环的第四次迭代时遇到错误

问题描述 投票:0回答:1

我正在尝试遍历文本文件中的链接列表,并将信息写入文本文件。我得到'索引超出范围'错误,我不明白为什么。

import requests
from bs4 import BeautifulSoup


def item(a):
    return a[::2]


def details(a):
    return a[1::2]


sFile = open('scraped_data.txt', 'w+')

with open('C:/Users/Main/Desktop/Python Web Scraper/link_list.txt') as f:
    lines = f.read().splitlines()


def scrape(l):
    r = requests.get(l)
    soup = BeautifulSoup(r.text, "lxml")

    itemlist = []

    for items in soup.find_all('td'):
        itemlist.append(items.text.strip())

    for i in range(0, 6):
        print(item(itemlist)[i] + ' ' + details(itemlist)[i])


for i in range(0, 52):
    scrape(lines[i])

sFile.close()

这是控制台结果。

Traceback (most recent call last):
  File "C:/Users/Cobus Uys/PycharmProjects/Scraper/Scraper.py", line 33, in <module>
    scrape(lines[i])
  File "C:/Users/Cobus Uys/PycharmProjects/Scraper/Scraper.py", line 29, in scrape
    print(item(itemlist)[i] + ' ' + details(itemlist)[i])
IndexError: list index out of range
Process finished with exit code 1
python loops web-scraping beautifulsoup
1个回答
0
投票

将其包装在try/except子句中,该子句将捕获任何错误并在迭代完成时停止。

您还可以利用添加额外的except条款,如果合适,可以使用elsefinally

在功能中:

def scrape(l):
    r = requests.get(l)
    soup = BeautifulSoup(r.text, "lxml")

    itemlist = []

    for items in soup.find_all('td'):
        itemlist.append(items.text.strip())
    try:

        for i in range(0, 6):
            print(item(itemlist)[i] + ' ' + details(itemlist)[i])
    except IndexError:
        print('Scraping finished')

在for循环中:

try:
    for i in range(0, 52):
        scrape(lines[i])
except IndexError:
    print('Scaraping Finished')
© www.soinside.com 2019 - 2024. All rights reserved.