azure sdk 中不断出现读取失败的错误

问题描述 投票:0回答:1

我正在尝试自动化一些功能,它涉及利用 azure 图像分析 API 和 pyautogui 根据 OCR 中的坐标单击特定文本。

以下是我的代码:

import base64
import logging
import pyautogui
import asyncio
import io
from azure.ai.vision.imageanalysis import ImageAnalysisClient
from azure.ai.vision.imageanalysis.models import VisualFeatures
from azure.core.credentials import AzureKeyCredential
from PIL import Image

async def move_mouse_and_click(text: str, session_id: str):
    logging.info("Starting Image Analysis")
    mongo_service = MongoService()  # Ensure MongoService is initialized with the correct parameters
    image_data_base64 = mongo_service.get_last_screenshot(session_id, offset=1)
    if not image_data_base64:
        logging.error("No image data found")
        return

    try:
        image_data = base64.b64decode(image_data_base64)
    except Exception as e:
        logging.error(f"Failed to decode base64 image data: {e}")
        return

    logging.info("Image received and decoded successfully")

    # Validate the image data directly
    try:
        image = Image.open(io.BytesIO(image_data))
        image.verify()  # Verify the image integrity
        logging.info("Image data verified successfully.")
    except Exception as e:
        logging.error(f"Image data is invalid: {e}")
        return

    coordinates = await get_bounding_box_coordinates(text, image_data)
    if coordinates:
        x, y = coordinates[0]  # Assuming the first point is the top-left corner
        loop = asyncio.get_event_loop()
        await loop.run_in_executor(None, pyautogui.moveTo, x, y, 2)
        await loop.run_in_executor(None, pyautogui.click)
    else:
        logging.error("Text not found in the image")

async def get_bounding_box_coordinates(text: str, image_data: bytes):
    endpoint = settings.vision_endpoint
    key = settings.vision_key
    client = ImageAnalysisClient(endpoint=endpoint, credential=AzureKeyCredential(key))
    
    try:
        image_stream = io.BytesIO(image_data)
        result = client.analyze(image_data=image_stream.read(), visual_features=[VisualFeatures.READ])
    except Exception as e:
        logging.error(f"Image analysis failed: {e}")
        return None

    if result.read and result.read.blocks:
        for block in result.read.blocks:
            for line in block.lines:
                if text in line.text:
                    return line.bounding_polygon
    return None

图像始终成功验证,但不断收到如下错误:

2024-06-10 10:54:54,655 - root - INFO - Starting Image Analysis 
2024-06-10 10:54:55,928 - root - INFO - Image received and decoded successfully 
2024-06-10 10:54:55,929 - root - INFO - Image data verified successfully. 
2024-06-10 10:54:55,931 - azure.core.pipeline.policies.http_logging_policy - 
INFO - Request URL: 'https://ocr-matrix.cognitiveservices.azure.com//computervision/imageanalysis:analyze?api-version=REDACTED&features=REDACTED' Request method: 'POST' Request headers:
        'Content-Length': '547998'
        'content-type': 'application/octet-stream'
        'Accept': 'application/json'
        'x-ms-client-request-id': 'c5c95c7c-26e9-11ef-a236-0242560eba69'
        'User-Agent': 'azsdk-python-ai-vision-imageanalysis/1.0.0b2 Python/3.11.7 (Linux-6.5.0-35-generic-x86_64-with-glibc2.35)'
        'Ocp-Apim-Subscription-Key': 'REDACTED' A body is sent with the request 2024-06-10 10:54:59,099 - azure.core.pipeline.policies.http_logging_policy - INFO - Response status: 200 Response headers:
        'Content-Length': '55770'
        'Content-Type': 'application/json; charset=utf-8'
        'request-id': '65f22858-bbe1-4001-844f-d8810a2ce719'
        'ms-vision-response-time-input-received': 'REDACTED'
        'ms-vision-response-time-input-processed': 'REDACTED'
        'api-supported-versions': 'REDACTED'
        'x-envoy-upstream-service-time': 'REDACTED'
        'CSP-Billing-Usage': 'REDACTED'
        'apim-request-id': 'REDACTED'
        'Strict-Transport-Security': 'REDACTED'
        'x-content-type-options': 'REDACTED'
        'x-ms-region': 'REDACTED'
        'Date': 'Mon, 10 Jun 2024 05:24:58 GMT' [ WARN:[email protected]] global /croot/opencv-suite_1691620365762/work/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('x'): can't open/read file: check file path/integrity 2024-06-10 10:55:00,136 - root - ERROR - Error occurred: Failed to read x because file is missing, has improper permissions, or is an unsupported or invalid format

我尝试将图像从数据库直接保存到我的系统中,以查看文件是否已损坏,但事实并非如此,因为我能够查看它。

azure python-imaging-library ocr pyautogui
1个回答
0
投票

在将图像数据传递到 Azure 图像分析 API 之前如何处理图像数据存在问题。

我已经参考过这个msdoc。 要重新处理图像文件

sample.jpg
,请使用支持的格式之一:JPEG、PNG、GIF、BMP、WEBP、ICO、TIFF 或 MPO。本例中的图像文件
sample.jpg
存在于指定路径中。

将图像文件加载到

bytes
对象
image_data
时,请确保以二进制模式正确读取文件
rb
。使用
try-except
块捕获任何与文件相关的异常,例如读取文件时的
FileNotFoundError
PermissionError

以下代码是使用

Azure Cognitive Services
计算机视觉 API 进行光学字符识别 (OCR) 来分析图像,然后根据 OCR 结果使用
pyautogui
与桌面进行交互。

import os
import sys
import logging
import pyautogui
from azure.ai.vision.imageanalysis import ImageAnalysisClient
from azure.ai.vision.imageanalysis.models import VisualFeatures
from azure.core.credentials import AzureKeyCredential


logger = logging.getLogger("azure")
logger.setLevel(logging.INFO)
handler = logging.StreamHandler(stream=sys.stdout)
formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
handler.setFormatter(formatter)
logger.addHandler(handler)


try:
    endpoint = "https://Computervisionname.cognitiveservices.azure.com"
    key = "ComputervisionKey"
except KeyError:
    print("Missing environment variable 'VISION_ENDPOINT' or 'VISION_KEY'")
    exit()


client = ImageAnalysisClient(
    endpoint=endpoint,
    credential=AzureKeyCredential(key),
    logging_enable=True 
)


def analyze_image(image_path):
    try:
        with open(image_path, "rb") as f:
            image_data = f.read()

       
        result = client.analyze(
            image_data=image_data,
            visual_features=[VisualFeatures.READ]
        )

       
        if result.read is not None:
            for line in result.read.blocks[0].lines:
              
                word = line.words[0].text
               
                pyautogui.click(100, 100)
                print(f"Clicked on '{word}' at coordinates (100, 100)")

    except Exception as e:
        logger.error(f"Error occurred: {e}")


if __name__ == "__main__":
   
    image_path = "C://Users//hello.png"
    logger.info("Starting Image Analysis")
    logger.info("Analyzing image...")
    analyze_image(image_path)

enter image description here

enter image description here

enter image description here

© www.soinside.com 2019 - 2024. All rights reserved.