将指针从 C++ 传输到与主机和设备内存兼容的 Python

Question

我有一个Python函数，它可以在CPU（使用

Numpy

）和GPU（使用

CuPy

）上运行，我想使用Python C API从C++代码激活它。

我在社区上找不到针对此类问题的适当解决方案，因此在我自己处理之后，我在这里发布我的解决方案以供评论、评论，并希望将来能够为需要它的人提供帮助。请随意提及您认为有用或可以更有效地完成的任何细节。这个答案涉及大量的试验和错误，可能效率不高，可以改进。我忽略了任何类型的内存管理，以稍微简化解决方案。

我的代码的（非常简化的）版本-

C++：


#include <Python.h>

void PythonObjectWrapper::applyFilter(float* image, std::array<int, 3> dim) {
    PyObject* python_method = PyObject_GetAttrString(class_object_, method_name_);
    PyObject* py_image = ??? // convert C-array to PyObject
    PyObject* method_args = PyTuple_New(2);
    PyTuple_SetItem(method_args, 0, py_image);
    PyTuple_SetItem(method_args, 1, ...); // transfer dim
    PyObject* py_filtered_image = PyObject_CallObject(python_method, method_args);
    float* filtered_image = ??? // convert PyObject to C-array
}

Python：


class Filter:
    def __init__(self, gpu):
        self.gpu_ = gpu

    def apply_filter(self, image_ptr, dim)
        image_array = ??? // convert image_ptr PyObject to NumPy / CuPy array
        apply_filter_(image_array)
        filtered_image_ptr = ??? // convert image_array to ptr
        return filtered_image_ptr

我的实际问题是如何填写

???

中标记的4行。

额外的解决方案将需要避免任何不必要的复制（特别是在某个方向上从主机到设备），并高效地完成所有工作并自动支持两种运行模式（CPU\GPU）。

Answer 1

好问题！有一种委托方式来处理您展开代码的 4 个

???

标志。让我们按顺序回顾一下 -

将 C-ptr 转换为主机上的 PyObject

一种便捷的方法是通过

PyByteArray

，并且可以使用

来实现

PyByteArray_FromStringAndSize(reinterpret_cast<char *>(image), sizeof(float) * dim[0] * dim[1] * dim[2]);

将 C-ptr 转换为设备上的 PyObject

这样的话，

PyByteArray

就不发货了，因为它只适配了Host上的连续内存。作为

PyObject

s 的一个方便的原始指针包装器是

PyCapsule

，可以按如下方式初始化 -

PyCapsule_New(reinterpret_cast<void *>(image), "image", NULL);

请注意，这里不需要析构函数（发送 NULL），因为 C 代码负责分配的设备内存。

将 PyObject 转换为 Numpy 数组

PyByteArray

是主机上的连续内存，因此可以通过

Numpy

使用-

将其读取为简单缓冲区

 image_buffer = np.frombuffer(image_ptr, dtype=np.float32, count=dims[0] * dims[1] * dims[2])
 image_array = np.asarray(image_buffer, dtype=np.float32).reshape(dims[2], dims[1], dims[0]).transpose(1, 2, 0)

需要 reshape+transpose 将数组形状从 C 顺序（由 C++ 使用）转换为 Fortran 顺序（由 Numpy 使用）。

将 PyObject 转换为 CuPy 数组

所以这可能是最棘手的一个。您需要直接使用 Python C-API（使用

ctypes.pythonapi

）来解压指针，然后使用一些

Cupy

实用程序将其转换为数组。

PyCapsulte_GetPointer

方法与创建

PyCapsule

的确切方式不兼容（我仍然不完全明白为什么），因此需要手动重新定义预期的 restype 和 argtypes。

首先，我们需要打开

PyCapsule

获取设备上的原始指针 -

ctypes.pythonapi.PyCapsule_GetPointer.restype = ctypes.c_void_p
ctypes.pythonapi.PyCapsule_GetPointer.argtypes = [ctypes.py_object, ctypes.c_void_p]
raw_address = ctypes.pythonapi.PyCapsule_GetPointer(image_ptr, self.pycapsule_name_.encode('utf-8'))
raw_ptr = ct.c_void_p(raw_address)

现在，我们需要根据这个

Cupy

定义具有适当大小的

raw_ptr

数组 -

mem = cp.cuda.MemoryPointer(cp.cuda.UnownedMemory(raw_ptr.value, dims[0] * dims[1] * dims[2] * cp.dtype(cp.float32).itemsize, None), 0)
cupy_array = cp.ndarray(dims, dtype=cp.float32, memptr=mem)
cupy_array = cp.asarray(cupy_array, dtype=cp.float32).reshape(dims[2], dims[1], dims[0])
image_array = cp.transpose(cupy_array, axes=(1, 2, 0))

.

就是这样（输入..）！现在，您可以自动编写代码（使用

np

或

cp

前缀，使用适当的包装器`在 CPU 和 GPU 上工作。

哦，您还想将此数组作为原始指针返回到

C++

？这需要更多的复杂性..

将 NumPy 数组转换为 PyObject

很简单，很简单

filtered_image_ptr = image_array.copy(order='C').data

将 CuPy 数组转换为 PyOjbect

这里您需要再次将原始指针包装为

PyCapsule

。您再次需要重新定义 Python C-API 方法的

restype

和

argtypes

。

 ctypes.pythonapi.PyCapsule_New.restype = ctypes.py_object
 PyCapsule_Destructor = ctypes.CFUNCTYPE(None, ctypes.py_object)
 ctypes.pythonapi.PyCapsule_New.argtypes = [ctypes.c_void_p, ctypes.c_char_p, PyCapsule_Destructor]

 image_raw_ptr = ctypes.c_void_p(image_array.data.ptr)
 name = ctypes.c_char_p(f"b'{self.pycapsule_name_}'")
 filtered_image_ptr = ctypes.pythonapi.PyCapsule_New(image_raw_ptr, name, PyCapsule_Destructor(0))

将 PyObject 转换为主机上的 C-ptr

您可以将从

NumPy

数组返回的值解压为

Py_buffer

。

Py_buffer buffer;
PyObject_GetBuffer(py_filtered_image, &buffer, PyBUF_FORMAT);
memcpy(filtered_image, buffer.buf, dim[0] * dim[1] * dim[2] * sizeof(float));

将 PyObject 转换为设备上的 C-ptr

只需打开

PyCapsule

的包装即可。由于某种原因，不需要重新定义

restype

和

argtypes

。

auto* filtered_image_ptr = reinterpret_cast<float*>(PyCapsule_GetPointer(py_filtered_image, "slices"));
cudaMemcpy(filtered_image, filtered_image_ptr, dim_.Volume() * sizeof(float), cudaMemcpyHostToHost);

.

就是这样！快乐编码:)

将指针从 C++ 传输到与主机和设备内存兼容的 Python

问题描述投票：0回答：1

1个回答

最新问题

将指针从 C++ 传输到与主机和设备内存兼容的 Python

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1