我正在尝试自动化一些功能,它涉及利用 azure 图像分析 API 和 pyautogui 根据 OCR 中的坐标单击特定文本。
以下是我的代码:
import base64
import logging
import pyautogui
import asyncio
import io
from azure.ai.vision.imageanalysis import ImageAnalysisClient
from azure.ai.vision.imageanalysis.models import VisualFeatures
from azure.core.credentials import AzureKeyCredential
from PIL import Image
async def move_mouse_and_click(text: str, session_id: str):
logging.info("Starting Image Analysis")
mongo_service = MongoService() # Ensure MongoService is initialized with the correct parameters
image_data_base64 = mongo_service.get_last_screenshot(session_id, offset=1)
if not image_data_base64:
logging.error("No image data found")
return
try:
image_data = base64.b64decode(image_data_base64)
except Exception as e:
logging.error(f"Failed to decode base64 image data: {e}")
return
logging.info("Image received and decoded successfully")
# Validate the image data directly
try:
image = Image.open(io.BytesIO(image_data))
image.verify() # Verify the image integrity
logging.info("Image data verified successfully.")
except Exception as e:
logging.error(f"Image data is invalid: {e}")
return
coordinates = await get_bounding_box_coordinates(text, image_data)
if coordinates:
x, y = coordinates[0] # Assuming the first point is the top-left corner
loop = asyncio.get_event_loop()
await loop.run_in_executor(None, pyautogui.moveTo, x, y, 2)
await loop.run_in_executor(None, pyautogui.click)
else:
logging.error("Text not found in the image")
async def get_bounding_box_coordinates(text: str, image_data: bytes):
endpoint = settings.vision_endpoint
key = settings.vision_key
client = ImageAnalysisClient(endpoint=endpoint, credential=AzureKeyCredential(key))
try:
image_stream = io.BytesIO(image_data)
result = client.analyze(image_data=image_stream.read(), visual_features=[VisualFeatures.READ])
except Exception as e:
logging.error(f"Image analysis failed: {e}")
return None
if result.read and result.read.blocks:
for block in result.read.blocks:
for line in block.lines:
if text in line.text:
return line.bounding_polygon
return None
图像始终成功验证,但不断收到如下错误:
2024-06-10 10:54:54,655 - root - INFO - Starting Image Analysis
2024-06-10 10:54:55,928 - root - INFO - Image received and decoded successfully
2024-06-10 10:54:55,929 - root - INFO - Image data verified successfully.
2024-06-10 10:54:55,931 - azure.core.pipeline.policies.http_logging_policy -
INFO - Request URL: 'https://ocr-matrix.cognitiveservices.azure.com//computervision/imageanalysis:analyze?api-version=REDACTED&features=REDACTED' Request method: 'POST' Request headers:
'Content-Length': '547998'
'content-type': 'application/octet-stream'
'Accept': 'application/json'
'x-ms-client-request-id': 'c5c95c7c-26e9-11ef-a236-0242560eba69'
'User-Agent': 'azsdk-python-ai-vision-imageanalysis/1.0.0b2 Python/3.11.7 (Linux-6.5.0-35-generic-x86_64-with-glibc2.35)'
'Ocp-Apim-Subscription-Key': 'REDACTED' A body is sent with the request 2024-06-10 10:54:59,099 - azure.core.pipeline.policies.http_logging_policy - INFO - Response status: 200 Response headers:
'Content-Length': '55770'
'Content-Type': 'application/json; charset=utf-8'
'request-id': '65f22858-bbe1-4001-844f-d8810a2ce719'
'ms-vision-response-time-input-received': 'REDACTED'
'ms-vision-response-time-input-processed': 'REDACTED'
'api-supported-versions': 'REDACTED'
'x-envoy-upstream-service-time': 'REDACTED'
'CSP-Billing-Usage': 'REDACTED'
'apim-request-id': 'REDACTED'
'Strict-Transport-Security': 'REDACTED'
'x-content-type-options': 'REDACTED'
'x-ms-region': 'REDACTED'
'Date': 'Mon, 10 Jun 2024 05:24:58 GMT' [ WARN:[email protected]] global /croot/opencv-suite_1691620365762/work/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('x'): can't open/read file: check file path/integrity 2024-06-10 10:55:00,136 - root - ERROR - Error occurred: Failed to read x because file is missing, has improper permissions, or is an unsupported or invalid format
我尝试将图像从数据库直接保存到我的系统中,以查看文件是否已损坏,但事实并非如此,因为我能够查看它。
在将图像数据传递到 Azure 图像分析 API 之前如何处理图像数据存在问题。
我已经参考过这个msdoc。 要重新处理图像文件
,请使用支持的格式之一:JPEG、PNG、GIF、BMP、WEBP、ICO、TIFF 或 MPO。本例中的图像文件sample.jpg
存在于指定路径中。sample.jpg
将图像文件加载到
bytes
对象image_data
时,请确保以二进制模式正确读取文件rb
。使用 try-except
块捕获任何与文件相关的异常,例如读取文件时的 FileNotFoundError
或 PermissionError
。
以下代码是使用
Azure Cognitive Services
计算机视觉 API 进行光学字符识别 (OCR) 来分析图像,然后根据 OCR 结果使用 pyautogui
与桌面进行交互。
import os
import sys
import logging
import pyautogui
from azure.ai.vision.imageanalysis import ImageAnalysisClient
from azure.ai.vision.imageanalysis.models import VisualFeatures
from azure.core.credentials import AzureKeyCredential
logger = logging.getLogger("azure")
logger.setLevel(logging.INFO)
handler = logging.StreamHandler(stream=sys.stdout)
formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
handler.setFormatter(formatter)
logger.addHandler(handler)
try:
endpoint = "https://Computervisionname.cognitiveservices.azure.com"
key = "ComputervisionKey"
except KeyError:
print("Missing environment variable 'VISION_ENDPOINT' or 'VISION_KEY'")
exit()
client = ImageAnalysisClient(
endpoint=endpoint,
credential=AzureKeyCredential(key),
logging_enable=True
)
def analyze_image(image_path):
try:
with open(image_path, "rb") as f:
image_data = f.read()
result = client.analyze(
image_data=image_data,
visual_features=[VisualFeatures.READ]
)
if result.read is not None:
for line in result.read.blocks[0].lines:
word = line.words[0].text
pyautogui.click(100, 100)
print(f"Clicked on '{word}' at coordinates (100, 100)")
except Exception as e:
logger.error(f"Error occurred: {e}")
if __name__ == "__main__":
image_path = "C://Users//hello.png"
logger.info("Starting Image Analysis")
logger.info("Analyzing image...")
analyze_image(image_path)