cython ctypedef大双数组导致在Ubuntu 18.04上出现段错误

Question

ctypedef struct ReturnRows:
    double[50000] v1
    double[50000] v2
    double[50000] v3
    double[50000] v4

有效，但

ctypedef struct ReturnRows:
    double[100000] v1
    double[100000] v2
    double[100000] v3
    double[100000] v4

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)失败

对我来说这没有意义，因为上限应该接近该处理任务专用系统的可用限制。是否以某种方式设置了上限？

这是我的建造者：

from distutils.core import setup
import numpy as np
from distutils.core import setup, Extension
from Cython.Build import cythonize

file_names = ['example_cy.pyx', 'enricher_cy.pyx']

for fn in file_names:
    print("cythonize %s" % fn)
    setup(ext_modules = cythonize(fn),
          author='CGi',
          author_email='[email protected]',
          description='Utils for faster data enrichment.',
          packages=['distutils', 'distutils.command'],
          include_dirs=[np.get_include()])

来自问题：我如何使用该结构？我遍历它，来自一个熊猫数据框：

cpdef ReturnRows cython_atrs(list v1, list v2, list v3, list v4):

    cdef ReturnRows s_ReturnRows # Allocate memory for the struct


    s_ReturnRows.v1 = [0] * 50000
    s_ReturnRows.v2 = [0] * 50000
    s_ReturnRows.v3 = [0] * 50000
    s_ReturnRows.v4 = [0] * 50000

    # tmp counters to keep track of the latest data averages and so on.
    cdef int lines_over_v1 = 0
    cdef double all_ranges = 0
    cdef int some_index = 0


    for i in range(len(v3)-1):

        # trs
        s_ReturnRows.v1[i] = (round(v2[i] - v3[i],2))
        # A lot more calculations, why I really need this loop.

Answer 1

作为the linked question @ead suggested，解决方案是在堆上而不是在栈上分配变量（作为函数局部变量）。原因是堆栈上的空间非常有限（在Linux上约为8MB），而堆（通常）是PC上可用的任何空间。

链接的问题主要指new / delete作为C ++的实现方式；尽管Cython代码可以使用C ++，但更常用的是C，并且您可以使用malloc / free。 Cython documentation is pretty good on this，但要通过您的问题中的代码进行演示：

from libc.stdlib cimport malloc, free    

# note some discussion about the return-value at the end of the question...
def cython_atrs(list v1, list v2, list v3, list v4):

    cdef ReturnRows *s_ReturnRows # pointer, not value
    s_ReturnRows = <ReturnRows*>malloc(sizeof(ReturnRows))
    try:
        # all your code goes here and remains the same...
    finally:
        free(s_ReturnRows)

您还可以使用模块级全局变量，但您可能不想这样做。

另一种选择是使用cdef class而不是结构：

cdef class ReturnRows:
    double[50000] v1
    double[50000] v2
    double[50000] v3
    double[50000] v4

这是自动分配到堆上的，Python会跟踪内存。

您还可以使用2D Numpy /其他Python库数组。这些也分配在堆上（但是对您而言是隐藏的）。优点是Python会跟踪内存，因此您不会忘记释放它。第二个优点是您可以轻松更改阵列大小而无需复杂化。如果您发明了一些特别毫无意义的微基准来分配很多数组，而没有做其他任何事情，则可能会发现性能差异，但是对于大多数普通代码而言，您不会。通过类型化的memoryview进行访问应该和C指针一样快。您会发现很多问题，它们需要比较速度/其他功能，但实际上您应该选择一个最容易编写的问题（structs may可能是C。）>

函数中ReturnRows的返回会增加一个复杂性（即使您的现有代码没有崩溃也同样令人怀疑）。您应该编写一个cdef函数并返回一个ReturnRows*，然后将释放位置移至调用函数，或者您应该编写一个def函数并返回一个有效的Python对象。这可能会将您推向Numpy数组作为更好的解决方案，或者可能是cdef类。

您当前的函数将执行的工作是：从Python调用ReturnRows到Python字典（包含数组的Python列表）时，会发生巨大而无效的转换。这可能不是您想要的。

cython ctypedef大双数组导致在Ubuntu 18.04上出现段错误

问题描述投票：0回答：1

1个回答

最新问题

cython ctypedef大双数组导致在Ubuntu 18.04上出现段错误

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1