了解不寻常的 YOLO 标签格式及其对训练的影响

Question

我正在研究固定对象的数据集，其中数据分为训练、测试和验证文件夹以及相应的图像和标签。标签位于具有以下格式的文本文件中：

2 0.3832013609375 0 0 0.19411217812499998 0 0.614612228125 0.1995640296875 1 0.619265075 1 1 0.8055533171875 1 0.386728209375 0.798922646875 0 0.3832013609375 0

我很困惑，因为我预计每个边界框只有 5 个数字：

class_id, x_center, y_center, width, height.

但在这里，我看到的数字要多得多。难道这个格式代表着别的东西？ YOLO 标签格式是否还有我不知道的其他可能性？

其他背景

数据来源于此网站，但我找不到有关此标签格式的明确文档。

这是我不明白的部分：当我使用以下代码将此数据集传递给 YOLO 进行训练时，训练过程没有任何问题：

def train_yolo(weight_name):
    weight_path = os.path.join(weights_folder, weight_name)

    model = YOLO(weight_path)

    # Train model and save new weights
    results = model.train(data=data_yaml, epochs=100, imgsz=640, batch=16, name=f"yolo_{weight_name.split('.')[0]}", save=True)

    return results

我的 data.yaml 文件包含：

train: ../train/images
val: ../valid/images
test: ../test/images

nc: 4
names: ['pencil', 'rubber', 'ruler', 'sharpner']

roboflow:
  workspace: waqas-hussain
  project: stationary-object-detector
  version: 8
  license: CC BY 4.0
  url: https://universe.roboflow.com/waqas-hussain/stationary-object-detector/dataset/8

此 YAML 文件中没有直接引用边界框格式，但 YOLO 在训练期间正确处理数据。

问题：

YOLO 如何处理这些不寻常的标签格式？
难道是这样吗由于这种奇怪的边界框格式，我的训练不正确？
有没有办法确认这个格式代表什么以及它是怎样的由 YOLO 解析？

任何见解或指示将不胜感激！

Answer 1

从网站上的图片中，我看到一些注释不是边界框。它们是多边形。对多边形进行编码的常见方法是作为 x/y 对的列表。

所以我猜格式是

class_id x1 y1 x2 y2 x3 y3

等

为了检查这一点，我下载了其中一张图片及其相关标签。（具体来说，CamScanner-10-15-2023-14-29_86_jpg.rf.1042acb34a88542b82bbefa27b86569e.jpg我编写了一个程序来解析这个标签并绘制它。

代码：

import numpy as np
import matplotlib.pyplot as plt


label_text = """1 0.3855721390625 0.17391304375 0.26533996718749997 0.1273291921875 0.10779436093749999 0.273291925 0.25290215625 0.3354037265625 0.3855721390625 0.17391304375
0 0.9618573796875 0.381987578125 0.8872305140625001 0.3540372671875 0.327529021875 0.9782608703125 0.45190713125000004 1 0.9618573796875 0.381987578125
2 0.970149253125 0.034161490625 0.8084577109375 0 0.0165837484375 0.9254658390625 0.0414593703125 0.9937888203125 0.178275290625 1 0.970149253125 0.034161490625"""


lines = label_text.split('\n')
for line in lines:
    line = line.split(' ')
    class_id = line[0]
    label_without_id = np.array([float(s) for s in line[1:]])
    label_x = label_without_id[::2]
    label_y = label_without_id[1::2]
    plt.plot(label_x, label_y, label=class_id)
    # The convention when working with image coordinates is that Y-axis gets bigger as you move down the image
    plt.gca().invert_yaxis()
plt.legend()
plt.show()

输出：

考虑到输入，这看起来相当合理。纵横比错误，但他们可能希望您按图像宽度/高度重新调整 x/y 坐标。您还可以将其与 roboflow 上的图像标签进行比较。

了解不寻常的 YOLO 标签格式及其对训练的影响

问题描述投票：0回答：1

1个回答

最新问题

了解不寻常的 YOLO 标签格式及其对训练的影响

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1