Numba比任务中的python等效速度慢10倍，应该擅长

Question

我有以下功能：

def dewarp(image, destination_image, pixels, strength, zoom, pts, players):
    height = image.shape[0]
    width = image.shape[1]
    half_height = height / 2
    half_width = width / 2

    pts_transformed = np.empty((0, 2))
    players_transformed = np.empty((0, 2))

    correctionRadius = sqrt(width ** 2 + height ** 2) / strength

    for x_p, y_p in pixels:
        newX = x_p - half_width
        newY = y_p - half_height

        distance = sqrt(newX ** 2 + newY ** 2)
        r = distance / correctionRadius

        if r == 0:
            theta = 1
        else:
            theta = atan(r) / r

        sourceX = int(half_width + theta * newX * zoom)
        sourceY = int(half_height + theta * newY * zoom)

        if 0 < sourceX < width and 0 < sourceY < height:
            destination_image[y_p, x_p, :] = image[sourceY, sourceX, :]
            if (sourceX, sourceY) in pts:
                pts_transformed = np.vstack((pts_transformed, np.array([[x_p, y_p]])))
            if (sourceX, sourceY) in players:
                players_transformed = np.vstack((players_transformed, np.array([[x_p, y_p]])))

    return destination_image, pts_transformed, players_transformed

参数是：图像和目标图像：3840x800x3 numpy数组像素都是像素组合的列表，我也试过了一个双循环，但结果是相同的强度和缩放都浮动pts和玩家都是python套

纯python版本大约需要4秒钟，numba版本通常大约需要30秒。这怎么可能？

我使用了dewarp.inspect_types，numba似乎不在对象模式下。

为方便起见，如果您想重新创建示例，可以将其用作图像，目标图像，点和玩家，并自行检查：

pts = {(70, 667),
 (70, 668),
 (71, 667),
 (71, 668),
 (1169, 94),
 (1169, 95),
 (1170, 94),
 (1170, 95),
 (2699, 86),
 (2699, 87),
 (2700, 86),
 (2700, 87),
 (3794, 641),
 (3794, 642),
 (3795, 641),
 (3795, 642)}

players = {(1092, 257),
 (1092, 258),
 (1093, 257),
 (1093, 258),
 (1112, 252),
 (1112, 253),
 (1113, 252),
 (1113, 253),
 (1155, 167),
 (1155, 168),
 (1156, 167),
 (1156, 168),
 (1158, 357),
 (1158, 358),
 (1159, 357),
 (1159, 358),
 (1246, 171),
 (1246, 172),
 (1247, 171),
 (1247, 172),
 (1260, 257),
 (1260, 258),
 (1261, 257),
 (1261, 258),
 (1280, 273),
 (1280, 274),
 (1281, 273),
 (1281, 274),
 (1356, 410),
 (1356, 411),
 (1357, 410),
 (1357, 411),
 (1385, 158),
 (1385, 159),
 (1386, 158),
 (1386, 159),
 (1406, 199),
 (1406, 200),
 (1407, 199),
 (1407, 200),
 (1516, 481),
 (1516, 482),
 (1517, 481),
 (1517, 482),
 (1639, 297),
 (1639, 298),
 (1640, 297),
 (1640, 298),
 (1806, 148),
 (1806, 149),
 (1807, 148),
 (1807, 149),
 (1807, 192),
 (1807, 193),
 (1808, 192),
 (1808, 193),
 (1834, 285),
 (1834, 286),
 (1835, 285),
 (1835, 286),
 (1875, 199),
 (1875, 200),
 (1876, 199),
 (1876, 200),
 (1981, 206),
 (1981, 207),
 (1982, 206),
 (1982, 207),
 (1990, 326),
 (1990, 327),
 (1991, 326),
 (1991, 327),
 (2021, 355),
 (2021, 356),
 (2022, 355),
 (2022, 356),
 (2026, 271),
 (2026, 272),
 (2027, 271),
 (2027, 272)}
image = np.zeros((800, 3840, 3))    
destination_image = np.zeros((800, 3840, 3))

我错过了什么吗？这只是numba无法做到的事情吗？我应该用不同的方式写吗？谢谢！

行分析器显示了很多，但不是大多数是由numpy完成的。那么应该有适当的空间吗？

Answer 1

无论你是否正在使用Numba，你应该避免在循环中逐渐增长数组，因为它的性能非常差，你应该预先分配一个数组并逐个填充它（因为你可能事先不知道确切的大小，你可以用尽可能大的数据预先分配它，比如len(pixels)，并在最后切掉未使用的空间）。但是，您的代码可以以或多或少的方式进行矢量化。

import numpy as np

def dewarp_vec(image, destination_image, pixels, strength, zoom, pts, players):
    height = image.shape[0]
    width = image.shape[1]
    half_height = height / 2
    half_width = width / 2

    correctionRadius = np.sqrt(width ** 2 + height ** 2) / strength

    x_p, y_p = np.asarray(pixels).T
    newX = x_p - half_width
    newY = y_p - half_height
    distance = np.sqrt(newX ** 2 + newY ** 2)
    r = distance / correctionRadius
    theta = np.arctan(r) / r
    theta[r == 0] = 1
    sourceX = (half_width + theta * newX * zoom).astype(np.int32)
    sourceY = (half_height + theta * newY * zoom).astype(np.int32)
    m1 = (0 < sourceX) & (sourceX < width) & (0 < sourceY) & (sourceY < height)
    x_p, y_p, sourceX, sourceY = x_p[m1], y_p[m1], sourceX[m1], sourceY[m1]
    destination_image[y_p, x_p, :] = image[sourceY, sourceX, :]
    source_flat = sourceY * width + sourceX
    pts_x, pts_y = np.asarray(list(pts)).T
    pts_flat = pts_y * width + pts_x
    players_x, players_y = np.asarray(list(players)).T
    players_flat = players_y * width + players_x
    m_pts = np.isin(source_flat, pts_flat)
    m_players = np.isin(source_flat, players_flat)
    pts_transformed = np.stack([x_p[m_pts], y_p[m_pts]], axis=1)
    players_transformed = np.stack([x_p[m_players], y_p[m_players]], axis=1)
    return destination_image, pts_transformed, players_transformed

与您的代码更不同的部分是如何检查(sourceX, sourceY)是否在pts和players中。为此，我计算了“平坦”像素索引并改为使用np.isin（如果你知道每个输入中不会有重复的坐标对，你可以添加assume_unique=True）。

Answer 2

我不明白为什么这个算法会看到使用numba带来的任何重大好处。所有的升力似乎都在图像复制和np.vstack部分。这一切都是numpy，所以numba不会帮助那里。你迭代使用vstack的方式也有可怕的表现。你最好建立一个子数组列表，然后将它们一起堆叠在一起。

至于问题是什么，dewarp.inspect_types()输出了什么？它应该向您展示numba需要与Python接口的位置。如果在循环中的任何地方完成此操作，那么如果程序是多线程的，性能将受到影响。

Numba比任务中的python等效速度慢10倍，应该擅长

问题描述投票：0回答：2

2个回答

最新问题

Numba比任务中的python等效速度慢10倍，应该擅长

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2