如何使用selenium和Python迭代和下载多个pdf

Question

我对使用 selenium 和 Python 有点陌生。下面是我尝试运行来下载多个文件的代码。

from selenium import webdriver
driver = webdriver.Chrome(executable_path=r'C:\chromedriver_win32\chromedriver.exe')
cusip=['abc123','def456','ghi789']
for a in cusip:

    page=driver.get("http://mylink=" + str(a) + ".pdf")
    with open(a + '.pdf', 'wb') as f:
        for chunk in page.iter_content(chunk_size=1024):
            if chunk:
                f.write(chunk)

我收到的错误如下：

Traceback (most recent call last):
  File "C:/Users/shashi.singh/PycharmProjects/HiSSS/Selenium.py", line 13, in <module>
    for chunk in page.iter_content(chunk_size=1024):
AttributeError: 'NoneType' object has no attribute 'iter_content'

Answer 1

我不建议使用硒来完成这项任务。如果您有网址列表，只需使用

urllib.request.urlretrive

:

In [5]: from urllib import request

In [6]: request.urlretrieve('https://arxiv.org/pdf/1409.8470.pdf', r'C:\users\chris\test.pdf')
Out[6]: ('C:\\users\\chris\\test.pdf', <http.client.HTTPMessage at 0x59628d0>)

只需将每个 url 作为第一个参数传递，并将目的地作为最后一个参数传递。

Answer 2

感谢大家的帮助..下面是我正在使用的代码，它工作正常。

from selenium import webdriver
driver = webdriver.Chrome(executable_path=r'C:\chromedriver_win32\chromedriver.exe')
cusip=['abc123','def456','ghi789']
options = webdriver.ChromeOptions()

tgt = "C:\\directory"  #target directory to download item
profile = {"plugins.plugins_list": [{"enabled":False, "name":"Chrome PDF Viewer"}],
    "download.default_directory" : tgt}
options.add_experimental_option("prefs",profile)
print(options)
driver = webdriver.Chrome(executable_path=r'C:\chromedriver_win32\chromedriver.exe', chrome_options=options)

for a in cusip:
    page=driver.get("http://mylink=" + str(a) + ".pdf") #iterate the item in cusip list

Print('Process completed Successfully')

cusip 是一个列表，我必须迭代并将其添加到我需要下载的网页中，因此您可以根据需要对其进行修改。

Answer 3

我尝试了很多方法，发现下面的代码最有用，

import requests    
target = 'D:/Pdf_test/' # location where you want to save the files


count = 0
for link in links:
    count += 1
    print("Downloading file: ", count)

    # Get response object for the link
    response = requests.get(link)
    # Write content into a pdf file
    pdf = open(f"{target}PDF_FILE_{count}.pdf", 'wb') # It will name the files like PDF_FILE_1.pdf
    pdf.write(response. Content)
    pdf.close()
    print("File ", i, " downloaded")

print("All PDF files downloaded")

关键：

links # A list of url that points to the pdf file (download links)

如何使用selenium和Python迭代和下载多个pdf

问题描述投票：0回答：3

3个回答

最新问题

如何使用selenium和Python迭代和下载多个pdf

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3