如何以 constexpr 方式将整数转换为 CUDA 的 __half FP16 类型？

Question

我是aerobus的开发者，我在半精度算术方面面临困难。

在库中的某个时刻，我需要在 constexpr 函数中将

IntType

转换为相关

FloatType

（相同的位数）（以评估多项式）。

我有这个功能（更大的类型

aerobus::i16::val<x>

）：

template<typename valueType>
static constexpr valueType get() { return static_cast<valueType>(x); }

封闭式。

当 valueType 为

float

或

double

并且 x 为

int32_t

或

int64_t

时效果很好，但当 valueType 为

__half

且 x 为

int16_t

时效果不佳。

nvcc 12.6.r12.6 生成以下错误：

../src/aerobus.h:576:55: note: ‘static constexpr valueType aerobus::i16::val<x>::get() [with valueType = __half; short int x = 64]’ is not usable as a ‘constexpr’ function because:
  576 |             static constexpr INLINED DEVICE valueType get() { return (valueType)x; }
      |                                                       ^~~
../src/aerobus.h:576:55: error: call to non-‘constexpr’ function ‘__half::__half(short int)’
In file included from cuda_fp16.cpp:1:
/usr/local/cuda-12.6/bin/../targets/x86_64-linux/include/cuda_fp16.h:4652:25: note: ‘__half::__half(short int)’ declared here
 4652 |     __CUDA_HOSTDEVICE__ __half(const short val) { __x = __short2half_rn(val).__x; }

我该如何解决这个问题？我是否被迫实现从 int16_t 到 __half 的 constexpr 转换函数？

Answer 1

cuda_fp16.hpp 标头中没有任何内容包含 constexpr，因此您必须自己完成这项艰苦的工作。以下为我编译：

static constexpr unsigned short my_internal_float2half(const float f, unsigned int& sign, unsigned int& remainder) {
    unsigned int x;
    unsigned int u;
    unsigned int result;
    x = std::bit_cast<int>(f); //c++20 
    ////(void)std::memcpy(&x, &f, sizeof(f)); //not constexpr
    u = (x & 0x7fffffffU);
    sign = ((x >> 16U) & 0x8000U);
    // NaN/+Inf/-Inf
    if (u >= 0x7f800000U) {
        remainder = 0U;
        result = ((u == 0x7f800000U) ? (sign | 0x7c00U) : 0x7fffU);
    } else if (u > 0x477fefffU) { // Overflows
        remainder = 0x80000000U;
        result = (sign | 0x7bffU);
    } else if (u >= 0x38800000U) { // Normal numbers
        remainder = u << 19U;
        u -= 0x38000000U;
        result = (sign | (u >> 13U));
    } else if (u < 0x33000001U) { // +0/-0
        remainder = u;
        result = sign;
    } else { // Denormal numbers
        const unsigned int exponent = u >> 23U;
        const unsigned int shift = 0x7eU - exponent;
        unsigned int mantissa = (u & 0x7fffffU);
        mantissa |= 0x800000U;
        remainder = mantissa << (32U - shift);
        result = (sign | (mantissa >> shift));
        result &= 0x0000FFFFU;
    }
    return static_cast<unsigned short>(result);
}

static constexpr __half my_float2half_rn(const float a) {
    __half val;
    __half_raw r;
    unsigned int sign = 0U;
    unsigned int remainder = 0U;
    r.x = my_internal_float2half(a, sign, remainder);
    if ((remainder > 0x80000000U) || ((remainder == 0x80000000U) && ((r.x & 0x1U) != 0U))) {
        r.x++;
    }
    val = std::bit_cast<__half>(r); //allowed, see operator= for __raw_half -> __half
    return val;
}

static constexpr __half my_int2half_rn(const int i) {
    __half h;
        // double-rounding is not a problem here: if integer
        // has more than 24 bits, it is already too large to
        // be represented in half precision, and result will
        // be infinity.
        const float  f = static_cast<float>(i);
    h = my_float2half_rn(f);
    return h;
}

__device__ consteval __half convert_int16_to_half(int16_t i) {
    return my_float2half_rn(static_cast<float>(i));
}

#ifdef INSIDE_OPS_CLASS_DEF
//in your class header
template<typename VT>
static constexpr VT get() { 
    static_assert(std::is_same<int16_t, decltype(this->x)>::value);
    if constexpr (std::is_same<VT, __half>::value) {
        return convert_int16_to_half(this->x);
    } else {
        return static_cast<valueType>(x); 
    }
}
#endif

请注意，在

consteval

之外，您需要使用内置版本，因为它们使用单个汇编语句，让 GPU 进行转换。

请注意，如果您担心效率，您真的不应该浪费单个

__half

，而应该成对处理它们。

Answer 2

如果您可以分离主机端和设备端实现（您可以 - 至少使用

#ifdef __CUDA_ARCH__

等） - 并且您的主机端编译器支持 C++23，您可以将整数转换为一半- 以通常的方式精确输入。示例：

#include <stdfloat>

constexpr std::float16_t magic(std::int16_t x)
{
    return static_cast<std::float16_t>(x);
}

这适用于 GCC，但似乎不适用于 clang (GodBolt)。当然，您需要将

std::float16_t

重新解释为

__half

，这不是 constexpr，但您可以将其保留到运行时，而不是编译时。

其他要尝试的事情：

使用此转换的现有的、已建立的实现，可以将其标记为 constexpr。 Johan 的答案提供了他自己的答案；但您也可以考虑 GitHub 上的 FP16 库，它正是用于这些目的并且相当受欢迎。这是相关功能，
```
int2half
```
。
如果您能够设法以 constexpr 方式向上转换为浮点数（同样，这需要检查，例如 clang、MSVC），您可以使用编译器的 constexpr int-to-float 转换
向 NVIDIA 提交有关此问题的错误。这不是一个立竿见影的解决方案，但来自用户的压力可能会让他们向我们提供 constexpr fp16 函数。

如何以 constexpr 方式将整数转换为 CUDA 的 __half FP16 类型？

问题描述投票：0回答：2

2个回答

最新问题

如何以 constexpr 方式将整数转换为 CUDA 的 __half FP16 类型？

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2