我一直在探索使用 AWS Rekognition 和 Google 的 Vision 来获取图像/视频中对象的数量,但一直找不到出路。尽管在 Google 的 Vision 网站上,他们确实有一个“从图像中洞察”的部分,显然,数量似乎已被捕获。
有人可以建议是否可以使用 Google 的 Vision 或任何其他 API 来帮助获取图像中的对象数量。谢谢
例如 - 对于下图所示的图像,返回的数量应为 10 辆汽车。 正如 Torry Yang 在他的回答中所建议的,标签注释计数可以给出所需的数量,但似乎并非如此,因为标签注释的计数是 18。返回的对象有点像这样。
"labelAnnotations": [
"mid": "/m/0k4j",
"description": "car",
"score": 0.98658943,
"topicality": 0.98658943
"mid": "/m/012f08",
"description": "motor vehicle",
"score": 0.9631113,
"topicality": 0.9631113
"mid": "/m/07yv9",
"description": "vehicle",
"score": 0.9223521,
"topicality": 0.9223521
"mid": "/m/01w71f",
"description": "personal luxury car",
"score": 0.8976857,
"topicality": 0.8976857
"mid": "/m/068mqj",
"description": "automotive design",
"score": 0.8736646,
"topicality": 0.8736646
"mid": "/m/012mq4",
"description": "sports car",
"score": 0.8418799,
"topicality": 0.8418799
"mid": "/m/01lcwm",
"description": "luxury vehicle",
"score": 0.7761523,
"topicality": 0.7761523
"mid": "/m/06j11d",
"description": "performance car",
"score": 0.76816446,
"topicality": 0.76816446
"mid": "/m/03vnt4",
"description": "mid size car",
"score": 0.75732976,
"topicality": 0.75732976
"mid": "/m/03vntj",
"description": "full size car",
"score": 0.6855145,
"topicality": 0.6855145
"mid": "/m/0h8ls87",
"description": "automotive exterior",
"score": 0.66056395,
"topicality": 0.66056395
"mid": "/m/014f__",
"description": "supercar",
"score": 0.592226,
"topicality": 0.592226
"mid": "/m/02swz_",
"description": "compact car",
"score": 0.5807265,
"topicality": 0.5807265
"mid": "/m/0h6dlrc",
"description": "bmw",
"score": 0.5801241,
"topicality": 0.5801241
"mid": "/m/01h80k",
"description": "muscle car",
"score": 0.55745816,
"topicality": 0.55745816
"mid": "/m/021mp2",
"description": "sedan",
"score": 0.5522745,
"topicality": 0.5522745
"mid": "/m/0369ss",
"description": "city car",
"score": 0.52938646,
"topicality": 0.52938646
"mid": "/m/01d1dj",
"description": "coupé",
"score": 0.50642073,
"topicality": 0.50642073
在 Google Cloud Vision 上,您应该能够获得计数。例如,如果你想用 Python 计算人脸的数量,你可以这样做:
def detect_faces(path):
"""Detects faces in an image."""
client = vision.ImageAnnotatorClient()
with io.open(path, 'rb') as image_file:
content = image_file.read()
image = vision.types.Image(content=content)
response = client.face_detection(image=image)
faces = response.face_annotations
def detect_labels(path):
"""Detects labels in the file."""
client = vision.ImageAnnotatorClient()
with io.open(path, 'rb') as image_file:
content = image_file.read()
image = vision.types.Image(content=content)
response = client.label_detection(image=image)
labels = response.label_annotations
count = {}
for label in labels:
if label in count:
count[label] += 1
count[label] = 1
Google Vision 和AWS Rekognition 都不支持照片中的对象计数。
但是,您可以在 Vision 和 Rekognition 中计算图像中的人脸数量。
在 AWS Rekognition 中,您会收到 DetectFaces API 的 json 响应:
HTTP/1.1 200 OK
Content-Type: application/x-amz-json-1.1
Date: Wed, 04 Jan 2017 23:37:03 GMT
x-amzn-RequestId: b1827570-d2d6-11e6-a51e-73b99a9bb0b9
Content-Length: 1355
Connection: keep-alive
此外,如果您想对照片中的对象进行计数,您可以在 AWS SageMaker 上设置自定义机器学习模型来执行此操作。示例:https://github.com/cosmincatalin/object-counting-with-mxnet-and-sagemaker