我也在运行此代码,网络抓取并从 Google 下载 5 张图像:应该发生的情况是,在我运行代码后,Chrome Web 浏览器应该出现,代码应该导致鼠标单击图像,然后下载它,然后向下滚动到另一个图像,鼠标单击该图像并下载它等等,最多五次。这段代码中发生的情况是 Chrome 浏览器出现一秒钟后关闭,没有发生任何其他事情......尽管代码没有抛出任何 Python 错误。
我使用 Blender 3D 建模软件作为我的 IDE,因为我希望制作一个 Blender 插件(Blender 插件有点像 Google Chrome 中的扩展,它是一个小软件,您可以安装到 Blender 中以增加其功能)将来使用Python代码。这就是我的代码顶部有额外导入行的原因......
一个相关的项目是我收到此警告:
E:\GLOBAL ASSETS\SCRIPTING\Web Scraping Images\web-scraper.blend\web-scraper.py:21: DeprecationWarning:executable_path 已被弃用,请传入 Service 对象
这是我运行代码后控制台中唯一的其他信息:
DevTools 监听 ws://127.0.0.1:52643/devtools/browser/ea448f70-0066-4d50-bfb8-8671528789b8
如有任何帮助,我们将不胜感激
import bpy
import subprocess
import sys
import os
import cv2
import random
from random import randrange
from PIL import Image #make sure both pil from c:\users\mjoe6\appdata\local\packages\pythonsoftwarefoundation.python.3.10_qbz5n2kfra8p0\localcache\local-packages\python310\site-packages is in blender pip3.exe folder
from selenium import webdriver
from selenium.webdriver.common.by import By
import requests
import io
import time
# path to python.exe
python_exe = os.path.join(sys.prefix, 'bin', 'python.exe')
py_lib = os.path.join(sys.prefix, 'lib', 'site-packages','pip')
PATH = "E:\\GLOBAL ASSETS\\SCRIPTING\\Web Scraping Images\\chromedriver.exe"
wd = webdriver.Chrome(PATH)
def get_images_from_google(wd, delay, max_images):
def scroll_down(wd):
wd.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(delay)
url = "https://www.google.com/search?q=cats+2019+IMDb&sxsrf=ALiCzsZmBIp-JZmZv23v6ORoc0VL2NRuxg:1654543304286&source=lnms&tbm=isch&sa=X&ved=2ahUKEwiRjquPxpn4AhUymo4IHbC6Cx0Q_AUoAnoECAEQBA&biw=1536&bih=714&dpr=1.25#imgrc=ixH-uoDQFN_gpM"
wd.get(url)
image_urls = set()
while len (image_urls) < max_images:
scroll_down(wd)
thumbnails = wd.find_elements(By.CLASS_NAME, "Q4LuWd")
for img in thumbnails[len(image_urls):max_images]:
try:
img.click()
time.sleep.delay
except:
continue
images = wd.find_elements(By.CLASS_NAME, "n3VNCb")
for image in images:
if image.get_attribute('src') and 'http' in image.get_attribute('src'):
image_urls.add(image.get_attribute('src'))
print(f"Found {len (image_urls)}")
return image_urls
def download_image(download_path, url, file_name):
try:
image_content = requests.get(url).content
image_file = io.BytesIO(image_content)
image = Image.open(image_file)
file_path = download_path + file_name
with open(file_path, "wb") as f:
image.save(f, "PNG")
print("Success")
except Exception as e:
print('FAILED -', e)
urls = get_images_from_google(wd, 1, 5)
print(urls)
wd.quit()
这些行可能有错误,
images = wd.find_elements(By.CLASS_NAME, "n3VNCb")
应该像这样改变
images = wd.find_elements(By.CLASS_NAME, "iPVvYb")
也在这一行
if image.get_attribute('src') and 'http' in image.get_attribute('src'):
应该将 http 更改为 https,
if image.get_attribute('src') and 'http' in image.get_attribute('src'):