我用 python 训练了一个 pytorch resnet50 Faster RCNN (fpn V2) 模型,并将其导出为 ONNX 格式。我需要使用 C# 加载模型并执行预测。我已经编写了一些测试代码来执行此操作。
模型的输入图像大小似乎是 576x720,即使它是使用 720x576 图像进行训练的。这不是一个大问题,因为我可以轻松调整图像大小,执行预测,然后重新调整它们的大小。我不知道为什么在训练时会出现这种情况,可能和我的问题有关
我从 C# 得到的结果不是很好。在我的图像中根本没有检测到任何对象,而在我的 python 代码中它工作正常。我注意到,在 C# 中,大多数时候我都会收到来自 ONNX 的错误,说它试图除以零,但是当我没有收到该错误时,它检测到的对象只是垃圾。
-resultsArray {Microsoft.ML.OnnxRuntime.DisposableNamedOnnxValue[3]} Microsoft.ML.OnnxRuntime.DisposableNamedOnnxValue[]
[0] {Microsoft.ML.OnnxRuntime.DisposableNamedOnnxValue} Microsoft.ML.OnnxRuntime.DisposableNamedOnnxValue
[1] {Microsoft.ML.OnnxRuntime.DisposableNamedOnnxValue} Microsoft.ML.OnnxRuntime.DisposableNamedOnnxValue
ElementType Int64 Microsoft.ML.OnnxRuntime.Tensors.TensorElementType
Name "2546" string
Value {"Attempted to divide by zero."} object {Microsoft.ML.OnnxRuntime.Tensors.DenseTensor<long>}
ValueType ONNX_TYPE_TENSOR Microsoft.ML.OnnxRuntime.OnnxValueType
_disposed false bool
_mapHelper null Microsoft.ML.OnnxRuntime.MapHelper
_name "2546" string
_ortValueHolder {Microsoft.ML.OnnxRuntime.OrtValueTensor<long>} Microsoft.ML.OnnxRuntime.IOrtValueOwner {Microsoft.ML.OnnxRuntime.OrtValueTensor<long>}
_value {"Attempted to divide by zero."} object {Microsoft.ML.OnnxRuntime.Tensors.DenseTensor<long>}
[2] {Microsoft.ML.OnnxRuntime.DisposableNamedOnnxValue} Microsoft.ML.OnnxRuntime.DisposableNamedOnnxValue
这是我到目前为止的 C# 代码。我相信问题出在图像的处理和张量的设置等方面,但是我对它的了解还不够确定。
模型的训练没有任何显式归一化等,只是原始 RGB 图像。即便如此,我的平均验证 IoU 仍接近 95%
private void cmdAnalyse_Click(object sender, EventArgs e)
{
// begin analysis
if (this.txtONNXFile.Text == "")
{
MessageBox.Show("Please select an ONNX file");
return;
}
if (this.originalImage == null)
{
MessageBox.Show("Please select an image");
return;
}
// flip the width and height dimensions. Images are 720x576, but the model expects 576x720
this.rescaledImage = new Bitmap(originalImage.Height, originalImage.Width);
Graphics graphics = Graphics.FromImage(rescaledImage);
graphics.InterpolationMode = System.Drawing.Drawing2D.InterpolationMode.HighQualityBicubic;
graphics.DrawImage(originalImage, 0, 0, rescaledImage.Width, rescaledImage.Height);
Microsoft.ML.OnnxRuntime.Tensors.Tensor<float> input = new Microsoft.ML.OnnxRuntime.Tensors.DenseTensor<float>(new[] { 1, 3, 720, 576 });
BitmapData bitmapData = rescaledImage.LockBits(new System.Drawing.Rectangle(0, 0, rescaledImage.Width, rescaledImage.Height), ImageLockMode.ReadOnly, PixelFormat.Format24bppRgb);
int stride = bitmapData.Stride;
IntPtr scan0 = bitmapData.Scan0;
unsafe
{
byte* ptr = (byte*)scan0;
for (int y = 0; y < rescaledImage.Height; y++)
{
for (int x = 0; x < rescaledImage.Width; x++)
{
int offset = y * stride + x * 3;
input[0, 0, y, x] = ptr[offset + 2]; // Red channel
input[0, 1, y, x] = ptr[offset + 1]; // Green channel
input[0, 2, y, x] = ptr[offset]; // Blue channel
}
}
}
rescaledImage.UnlockBits(bitmapData);
var inputs = new List<Microsoft.ML.OnnxRuntime.NamedOnnxValue>
{
Microsoft.ML.OnnxRuntime.NamedOnnxValue.CreateFromTensor("images", input)
};
// run inference
var session = new Microsoft.ML.OnnxRuntime.InferenceSession(this.txtONNXFile.Text);
Microsoft.ML.OnnxRuntime.IDisposableReadOnlyCollection<Microsoft.ML.OnnxRuntime.DisposableNamedOnnxValue> results = session.Run(inputs);
// process results
var resultsArray = results.ToArray();
float[] boxes = resultsArray[0].AsEnumerable<float>().ToArray();
long[] labels = resultsArray[1].AsEnumerable<long>().ToArray();
float[] confidences = resultsArray[2].AsEnumerable<float>().ToArray();
var predictions = new List<Prediction>();
var minConfidence = 0.0f;
for (int i = 0; i < boxes.Length; i += 4)
{
var index = i / 4;
if (confidences[index] >= minConfidence)
{
predictions.Add(new Prediction
{
Box = new Box(boxes[i], boxes[i + 1], boxes[i + 2], boxes[i + 3]),
Label = LabelMap.Labels[labels[index]],
Confidence = confidences[index]
});
}
}
System.Drawing.Graphics graph = System.Drawing.Graphics.FromImage(this.rescaledImage);
// Put boxes, labels and confidence on image and save for viewing
foreach (var p in predictions)
{
System.Drawing.Pen pen = new System.Drawing.Pen(System.Drawing.Color.Red, 2);
graph.DrawRectangle(pen, p.Box.Xmin, p.Box.Ymin, p.Box.Xmax - p.Box.Xmin, p.Box.Ymax - p.Box.Ymin);
}
graph.Flush();
graph.Dispose();
// rescale image back
System.Drawing.Bitmap bmpResult = new Bitmap(this.originalImage.Width, this.originalImage.Height);
graphics = Graphics.FromImage(bmpResult);
graphics.InterpolationMode = System.Drawing.Drawing2D.InterpolationMode.HighQualityBicubic;
graphics.DrawImage(rescaledImage, 0, 0, originalImage.Width, originalImage.Height);
graphics.Flush();
graphics.Dispose();
//graph.ScaleTransform(720, 576);
this.pbRibeye.Width = bmpResult.Width;
this.pbRibeye.Height = bmpResult.Height;
this.pbRibeye.Image = bmpResult;
//bmpResult.Dispose();
rescaledImage.Dispose();
}
有效的Python代码是:
ort_session = onnxruntime.InferenceSession(ONNXFile)
# Preprocess the input image
image = Image.open(image_path) # Load the image using PIL
resized_image = image.resize((576, 720)) # If this is omitted then I receive an error regarding the expected input dimensions
transform = torchvision.transforms.Compose([
torchvision.transforms.ToTensor(), # Convert PIL image to tensor
])
input_tensor = transform(resized_image)
input_tensor = input_tensor.unsqueeze(0) # Add a batch dimension
# Run the model
outputs = ort_session.run(None, {'images': input_tensor.numpy()})
Python 代码能够生成有效的输出,该输出与我在验证过程中获得的结果一致。
事实证明,代码一切都很好
我偶然发现了解决方案:即使模型没有经过任何图像归一化或预处理的训练,但它显然需要对图像进行归一化才能现在生成预测!
更新后的代码部分是:
unsafe
{
byte* ptr = (byte*)scan0;
for (int y = 0; y < originalImage.Height; y++)
{
for (int x = 0; x < originalImage.Width; x++)
{
int offset = y * stride + x * 3;
input[0, 0, y, x] = ptr[offset + 2] / 255.0f; // Red channel
input[0, 1, y, x] = ptr[offset + 1] / 255.0f; // Green channel
input[0, 2, y, x] = ptr[offset] / 255.0f; // Blue channel
}
}
}
当按 BGR 顺序将每个像素复制到张量时,它会对每个像素执行归一化。
我现在可以使用originalImage的原因是因为我能够从代码中删除所有额外的位图:我意识到DenseTensor似乎只是一个位图,所以图像的大小/方向实际上不会影响结果,只要张量的长度正确 - 至少 C# 似乎是这样。 Python 似乎关心尺寸是否正确。