我的用于网页抓取和下载五个图像的Python代码不起作用..使用Blender(3D)作为IDE

问题描述 投票:0回答:1

我也在运行此代码,网络抓取并从 Google 下载 5 张图像:应该发生的情况是,在我运行代码后,Chrome Web 浏览器应该出现,代码应该导致鼠标单击图像,然后下载它,然后向下滚动到另一个图像,鼠标单击该图像并下载它等等,最多五次。这段代码中发生的情况是 Chrome 浏览器出现一秒钟后关闭,没有发生任何其他事情......尽管代码没有抛出任何 Python 错误。

我使用 Blender 3D 建模软件作为我的 IDE,因为我希望制作一个 Blender 插件(Blender 插件有点像 Google Chrome 中的扩展,它是一个小软件,您可以安装到 Blender 中以增加其功能)将来使用Python代码。这就是我的代码顶部有额外导入行的原因......

一个相关的项目是我收到此警告:

E:\GLOBAL ASSETS\SCRIPTING\Web Scraping Images\web-scraper.blend\web-scraper.py:21: DeprecationWarning:executable_path 已被弃用,请传入 Service 对象

这是我运行代码后控制台中唯一的其他信息:

DevTools 监听 ws://127.0.0.1:52643/devtools/browser/ea448f70-0066-4d50-bfb8-8671528789b8

如有任何帮助,我们将不胜感激

import bpy
import subprocess
import sys
import os
import cv2
import random
from random import randrange
from PIL import Image #make sure both pil from c:\users\mjoe6\appdata\local\packages\pythonsoftwarefoundation.python.3.10_qbz5n2kfra8p0\localcache\local-packages\python310\site-packages is in blender pip3.exe folder
from selenium import webdriver
from selenium.webdriver.common.by import By
import requests
import io
import time

# path to python.exe
python_exe = os.path.join(sys.prefix, 'bin', 'python.exe')
py_lib = os.path.join(sys.prefix, 'lib', 'site-packages','pip')


PATH = "E:\\GLOBAL ASSETS\\SCRIPTING\\Web Scraping Images\\chromedriver.exe"
wd = webdriver.Chrome(PATH)

def get_images_from_google(wd, delay, max_images):
    def scroll_down(wd):
        wd.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        time.sleep(delay)
        
        url = "https://www.google.com/search?q=cats+2019+IMDb&sxsrf=ALiCzsZmBIp-JZmZv23v6ORoc0VL2NRuxg:1654543304286&source=lnms&tbm=isch&sa=X&ved=2ahUKEwiRjquPxpn4AhUymo4IHbC6Cx0Q_AUoAnoECAEQBA&biw=1536&bih=714&dpr=1.25#imgrc=ixH-uoDQFN_gpM"
        wd.get(url)
        
        image_urls = set()
        while len (image_urls) < max_images:
            scroll_down(wd)
            thumbnails = wd.find_elements(By.CLASS_NAME, "Q4LuWd")
            for img in thumbnails[len(image_urls):max_images]:
                try:
                    img.click()
                    time.sleep.delay
                except:
                    continue    
                images = wd.find_elements(By.CLASS_NAME, "n3VNCb")
                for image in images:
                    if image.get_attribute('src') and 'http' in image.get_attribute('src'):
                        image_urls.add(image.get_attribute('src'))
                        print(f"Found {len (image_urls)}")
        return image_urls
        
def download_image(download_path, url, file_name):
    try:
        image_content = requests.get(url).content
        image_file = io.BytesIO(image_content)
        image = Image.open(image_file)
        file_path = download_path + file_name
        with open(file_path, "wb") as f:
            image.save(f, "PNG")
        
        print("Success")
    except Exception as e:
        print('FAILED -', e)    


urls = get_images_from_google(wd, 1, 5)
print(urls)
wd.quit()
python google-chrome web-scraping
1个回答
0
投票

这些行可能有错误,

                images = wd.find_elements(By.CLASS_NAME, "n3VNCb")

应该像这样改变

                images = wd.find_elements(By.CLASS_NAME, "iPVvYb")

也在这一行

                if image.get_attribute('src') and 'http' in image.get_attribute('src'):

应该将 http 更改为 https,

                 if image.get_attribute('src') and 'http' in image.get_attribute('src'):
© www.soinside.com 2019 - 2024. All rights reserved.