无法使我的python web抓取脚本与多处理一起使用

问题描述 投票:1回答:1

我从csv中读取了我的网址,我希望最后将结果导出到新的csv中。我使用以下大约60个网址

import csv
from bs4 import BeautifulSoup 
import requests 
from time import sleep
from multiprocessing import Pool

contents = []

with open('websupplies2.csv') as csvf:
 reader = csv.reader(csvf, delimiter=";")
 for row in reader:
    contents.append(row) # Add each url to list contents

 price_text='-'
 availability_text='-'

def parse(contents):
  info = []
  with open('output_websupplies.csv', mode='w') as f:
  f_writer = csv.writer(f, delimiter=';', quotechar='"', quoting=csv.QUOTE_MINIMAL)
  f_writer.writerow(['SKU','Price','Availability'])

  for row in contents:  # Parse through each url in the list.
  sleep(3)
  page = requests.get(row[1]).content
  soup = BeautifulSoup(page, "html.parser")

  price = soup.find('div', attrs={'class':'product-price'})
  if price is not None:
   price_text = price.text.strip()
   print(price_text)
  else:
   price_text = "0,00"
   print(price_text)

  availability = soup.find('div', attrs={'class':'available-text'})
  if availability is not None:
   availability_text = availability.text.strip()
   print(availability_text)
  else:
   availability_text = "Μη Διαθέσιμο"
   print(availability_text)

  info.append(row[0])
  info.append(price_text)
  info.append(availability_text)

return ';'.join(info)     

if __name__ == "__main__":
 with Pool(10) as p:
 records = p.map(parse, contents)

if len(records) > 0:
 with open('output_websupplies.csv', 'a+') as f:
    f.write('\n'.join(records))

但我收到错误消息,如名称错误记录未定义。为了让脚本工作,我应该更改什么?

python python-3.x web-scraping
1个回答
1
投票

首先仔细检查缩进。你在这里粘贴的东西看起来不一致,如果你的if len(records) > 0:线没有缩进,那你肯定会得到一个NameError。

为了使语句在块内,它必须具有与块中的其他语句相等的缩进,并且大于打开块的行。换句话说,if语句中的所有内容都应排成一行。例如:

if __name__ == "__main__":
    with Pool(10) as p:
        records = p.map(parse, contents)

        if len(records) > 0:
            with open('output_websupplies.csv', 'a+') as f:
                f.write('\n'.join(records))
© www.soinside.com 2019 - 2024. All rights reserved.