ONNX 模型未返回 C# 中的预测

我用 python 训练了一个 pytorch resnet50 Faster RCNN (fpn V2) 模型,并将其导出为 ONNX 格式。我需要使用 C# 加载模型并执行预测。我已经编写了一些测试代码来执行此操作。

模型的输入图像大小似乎是 576x720,即使它是使用 720x576 图像进行训练的。这不是一个大问题,因为我可以轻松调整图像大小,执行预测,然后重新调整它们的大小。我不知道为什么在训练时会出现这种情况,可能和我的问题有关

我从 C# 得到的结果不是很好。在我的图像中根本没有检测到任何对象,而在我的 python 代码中它工作正常。我注意到,在 C# 中,大多数时候我都会收到来自 ONNX 的错误,说它试图除以零,但是当我没有收到该错误时,它检测到的对象只是垃圾。

-resultsArray {Microsoft.ML.OnnxRuntime.DisposableNamedOnnxValue[3]} Microsoft.ML.OnnxRuntime.DisposableNamedOnnxValue[]

  •   [0] {Microsoft.ML.OnnxRuntime.DisposableNamedOnnxValue} Microsoft.ML.OnnxRuntime.DisposableNamedOnnxValue
  •   [1] {Microsoft.ML.OnnxRuntime.DisposableNamedOnnxValue} Microsoft.ML.OnnxRuntime.DisposableNamedOnnxValue
      ElementType Int64   Microsoft.ML.OnnxRuntime.Tensors.TensorElementType
      Name    "2546"  string
  •   Value   {"Attempted to divide by zero."}    object {Microsoft.ML.OnnxRuntime.Tensors.DenseTensor<long>}
      ValueType   ONNX_TYPE_TENSOR    Microsoft.ML.OnnxRuntime.OnnxValueType
      _disposed   false   bool
      _mapHelper  null    Microsoft.ML.OnnxRuntime.MapHelper
      _name   "2546"  string
  •   _ortValueHolder {Microsoft.ML.OnnxRuntime.OrtValueTensor<long>} Microsoft.ML.OnnxRuntime.IOrtValueOwner {Microsoft.ML.OnnxRuntime.OrtValueTensor<long>}
  •   _value  {"Attempted to divide by zero."}    object {Microsoft.ML.OnnxRuntime.Tensors.DenseTensor<long>}
  •   [2] {Microsoft.ML.OnnxRuntime.DisposableNamedOnnxValue} Microsoft.ML.OnnxRuntime.DisposableNamedOnnxValue

这是我到目前为止的 C# 代码。我相信问题出在图像的处理和张量的设置等方面,但是我对它的了解还不够确定。

模型的训练没有任何显式归一化等,只是原始 RGB 图像。即便如此,我的平均验证 IoU 仍接近 95%

        private void cmdAnalyse_Click(object sender, EventArgs e)
            // begin analysis
            if (this.txtONNXFile.Text == "")
                MessageBox.Show("Please select an ONNX file");

            if (this.originalImage == null)
                MessageBox.Show("Please select an image");

            // flip the width and height dimensions. Images are 720x576, but the model expects 576x720
            this.rescaledImage = new Bitmap(originalImage.Height, originalImage.Width);

            Graphics graphics = Graphics.FromImage(rescaledImage);
            graphics.InterpolationMode = System.Drawing.Drawing2D.InterpolationMode.HighQualityBicubic;
            graphics.DrawImage(originalImage, 0, 0, rescaledImage.Width, rescaledImage.Height);

            Microsoft.ML.OnnxRuntime.Tensors.Tensor<float> input = new Microsoft.ML.OnnxRuntime.Tensors.DenseTensor<float>(new[] { 1, 3, 720, 576 });

            BitmapData bitmapData = rescaledImage.LockBits(new System.Drawing.Rectangle(0, 0, rescaledImage.Width, rescaledImage.Height), ImageLockMode.ReadOnly, PixelFormat.Format24bppRgb);

            int stride = bitmapData.Stride;
            IntPtr scan0 = bitmapData.Scan0;

                byte* ptr = (byte*)scan0;
                for (int y = 0; y < rescaledImage.Height; y++)
                    for (int x = 0; x < rescaledImage.Width; x++)
                        int offset = y * stride + x * 3;
                        input[0, 0, y, x] = ptr[offset + 2]; // Red channel
                        input[0, 1, y, x] = ptr[offset + 1]; // Green channel
                        input[0, 2, y, x] = ptr[offset];     // Blue channel


            var inputs = new List<Microsoft.ML.OnnxRuntime.NamedOnnxValue>
                Microsoft.ML.OnnxRuntime.NamedOnnxValue.CreateFromTensor("images", input)

            // run inference

            var session = new Microsoft.ML.OnnxRuntime.InferenceSession(this.txtONNXFile.Text);
            Microsoft.ML.OnnxRuntime.IDisposableReadOnlyCollection<Microsoft.ML.OnnxRuntime.DisposableNamedOnnxValue> results = session.Run(inputs);

            // process results
            var resultsArray = results.ToArray();

            float[] boxes = resultsArray[0].AsEnumerable<float>().ToArray();
            long[] labels = resultsArray[1].AsEnumerable<long>().ToArray();
            float[] confidences = resultsArray[2].AsEnumerable<float>().ToArray();
            var predictions = new List<Prediction>();
            var minConfidence = 0.0f;
            for (int i = 0; i < boxes.Length; i += 4)
                var index = i / 4;
                if (confidences[index] >= minConfidence)
                    predictions.Add(new Prediction
                        Box = new Box(boxes[i], boxes[i + 1], boxes[i + 2], boxes[i + 3]),
                        Label = LabelMap.Labels[labels[index]],
                        Confidence = confidences[index]

            System.Drawing.Graphics graph = System.Drawing.Graphics.FromImage(this.rescaledImage);

            // Put boxes, labels and confidence on image and save for viewing

            foreach (var p in predictions)

                System.Drawing.Pen pen = new System.Drawing.Pen(System.Drawing.Color.Red, 2);

                graph.DrawRectangle(pen, p.Box.Xmin, p.Box.Ymin, p.Box.Xmax - p.Box.Xmin, p.Box.Ymax - p.Box.Ymin);



            // rescale image back
            System.Drawing.Bitmap bmpResult = new Bitmap(this.originalImage.Width, this.originalImage.Height);

            graphics = Graphics.FromImage(bmpResult);
            graphics.InterpolationMode = System.Drawing.Drawing2D.InterpolationMode.HighQualityBicubic;
            graphics.DrawImage(rescaledImage, 0, 0, originalImage.Width, originalImage.Height);


            //graph.ScaleTransform(720, 576);
            this.pbRibeye.Width = bmpResult.Width;
            this.pbRibeye.Height = bmpResult.Height;    

            this.pbRibeye.Image = bmpResult;



ort_session = onnxruntime.InferenceSession(ONNXFile)

# Preprocess the input image

image = Image.open(image_path)  # Load the image using PIL
resized_image = image.resize((576, 720))  # If this is omitted then I receive an error regarding the expected input dimensions

transform = torchvision.transforms.Compose([
    torchvision.transforms.ToTensor(),  # Convert PIL image to tensor

input_tensor = transform(resized_image)
input_tensor = input_tensor.unsqueeze(0)  # Add a batch dimension

# Run the model
outputs = ort_session.run(None, {'images': input_tensor.numpy()})

Python 代码能够生成有效的输出,该输出与我在验证过程中获得的结果一致。

        byte* ptr = (byte*)scan0;
        for (int y = 0; y < originalImage.Height; y++)
            for (int x = 0; x < originalImage.Width; x++)
                int offset = y * stride + x * 3;
                input[0, 0, y, x] = ptr[offset + 2] / 255.0f; // Red channel
                input[0, 1, y, x] = ptr[offset + 1] / 255.0f; // Green channel
                input[0, 2, y, x] = ptr[offset] / 255.0f;     // Blue channel

当按 BGR 顺序将每个像素复制到张量时,它会对每个像素执行归一化。

我现在可以使用originalImage的原因是因为我能够从代码中删除所有额外的位图:我意识到DenseTensor似乎只是一个位图,所以图像的大小/方向实际上不会影响结果,只要张量的长度正确 - 至少 C# 似乎是这样。 Python 似乎关心尺寸是否正确。

