如何将python函数“any（）”转换为CUDA python兼容代码（在GPU上运行）？

Question

我想知道如何在GPU上实现numpy函数any()（使用Numba python）。如果输入的至少一个元素求值为any()，则True函数接受一个数组并返回True。

就像是：

@vectorize(["boolean(boolean)"], target='cuda')
def AnyFunction(a):
    return any(a)

要么

@vectorize(["boolean(boolean)"], target='cuda')
def AnyFunction(a):
    for i in range(len(a)):
        if a[i]==True:
            return True
    return False

Answer 1

any函数操作的更困难的方面（可能）是减少方面。对每个项目进行真/假测试是一种操作，可以很容易地用例如vectorize，但许多结果组合成单一值（减少方面）不能（容易）;实际上vectorize并不是为解决这类问题而设计的，至少不是直接解决。

但是numba cuda为简单的减少问题提供了一些help（比如这个），而不是强迫你编写一个自定义的numba cuda内核。

这是一种可能的方法：

$ cat t20.py
import numpy
from numba import cuda

@cuda.reduce
def or_reduce(a, b):
    return a or b

A = numpy.ones(1000, dtype=numpy.int32)
B = numpy.zeros(1000, dtype=numpy.int32)
expect = A.any()      # numpy reduction
got = or_reduce(A)   # cuda reduction
print expect
print got
expect = B.any()      # numpy reduction
got = or_reduce(B)   # cuda reduction
print expect
print got
B[100] = 1
expect = B.any()      # numpy reduction
got = or_reduce(B)   # cuda reduction
print expect
print got

$ python t20.py
True
1
False
0
True
1
$

关于性能的一些评论：

这可能不是执行此操作的最快方法。但是我从你的问题中得到的印象是你正在寻找一些接近普通python的东西。
写一个custom CUDA kernel in numba可能会更快地完成这项工作。
如果您对性能非常认真，那么建议您尝试将此操作与在GPU上完成的其他工作相结合。在这种情况下，为了获得最大的灵活性，自定义内核将为您提供更高的能力，以最高的性能完成任务。

如何将python函数“any（）”转换为CUDA python兼容代码（在GPU上运行）？

问题描述投票：0回答：1

1个回答

最新问题

如何将python函数“any（）”转换为CUDA python兼容代码（在GPU上运行）？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1