将浮点值舍入到最接近的 2 次方

Question

虽然对于如何查找整数或浮点数的下一个 2 次方的问题有很多答案，但对于查找数字的最接近的 2 的次方却没有那么多答案。

我已经实施了以下内容：

template <typename T>
static constexpr T round_pow2(T v) {
    if constexpr (std::is_floating_point_v<T>) {
        auto high = static_cast<unsigned long>(std::ceil(v));
        auto low  = static_cast<unsigned long>(std::floor(v));

        if (high == low) {
            return round_pow2<unsigned long>(high);
        } else {
            T a = static_cast<T>(round_pow2<unsigned long>(low));
            T b = static_cast<T>(round_pow2<unsigned long>(high));

            return std::abs(a - v) <= std::abs(b - v) ? a : b;
        }
    } else {
        T high = v - 1;

        for (T i = 1; i < static_cast<T>(sizeof(T)); i *= 2) {
            high |= high >> i;
        }

        high += 1;
        T low = high >> 1;

        return (high - v) < (v - low) ? high : low;
    }
}

这应该适用于任何非负且低于 ULLONG_MAX fp 值。但这似乎不是最佳的。有没有更好（性能更高）的方法来实现这个功能？

编辑： @Eric 对于我的应用程序，在 v == 0 的情况下获得 0 很好。但这对某些人来说确实可能是个问题，因为 0 不是 2 的幂。

@Blixodus 谢谢您的回答，它为我指明了正确的方向。我根据您的想法创建了以下函数：


template <typename T>
constexpr T round_p2(T v) {
    if constexpr (std::is_floating_point_v<T>) {
        using R = std::conditional_t<
            std::is_same_v<T, double>, uint64_t,
            std::conditional_t<std::is_same_v<T, float>, uint32_t,
            void
        >>;

        auto [mlen, es, em] = std::is_same_v<T, double> ? std::make_tuple(52, 1024, 0x7FF) : std::make_tuple(23, 128, 0xFF);
        auto y = *reinterpret_cast<R*>(&v);
        return (T(y >> (sizeof(R) * 8 - 1)) * -2 + 1) * (2 << (((y >> mlen) & em) - es + ((y >> mlen - 1) & 0x1)));
    } else {
        using R = std::make_unsigned_t<T>;
        R rv = static_cast<R>(v);
        T sign = 1;

        if constexpr (std::is_signed_v<T>) {
            if (v < 0) {
                rv = static_cast<R>(-v);
                sign = -1;
            }
        }

        R high = rv - 1;

        for (R i = 1; i < static_cast<R>(sizeof(R)); i *= 2) {
            high |= high >> i;
        }

        high += 1;
        R low = high >> 1;

        return sign * static_cast<T>((high - rv) <= (rv - low) ? high : low);
    }
}

与我的第一个实现相比，它似乎对我的应用程序运行得很好，并且生成了非常好的程序集。

说明：我首先根据 v 是浮点数还是双精度数获得三个神奇值：第一个是尾数的长度，第二个是指数需要递减的量（加一），第三个是掩码用于从 fp 表示中提取指数。

然后我将 fp 值转换为相同大小的无符号整数，以便能够对其进行处理。

接下来，我提取将使用

(T(y >> (sizeof(R) * 8 - 1)) * -2 + 1)

对最终结果进行签名的值。这会提取 fp 值的最高位（符号，0 表示正数，1 表示负数），然后对其应用函数

f(x) = x * -2 + 1

，从而为

提供

x=0

，为

-1

提供

x=1

.

最终，我使用 Blixodus 公式计算给定 fp 值的两个最接近的无符号幂。因为我不使用

std::pow

函数来支持位移（因为我们正在使用 2 的幂）。我需要通过将 1 移至我们要平移 2 的值来解决这一问题（因此

es

的值比预期多 1）。

Answer 1

在 C++ 2020 之前，不存在不使用循环、编译器扩展或假定类型宽度存在某种限制的位移位的将整数舍入为 2 的幂的实现。 Stack Overflow 上有几个与此相关的问题。下面的代码显示了使用 C++

std::countl_zero

的解决方案和使用 GCC 内置计数前导零的替代解决方案，该解决方案适用于

unsigned long long

以内的任何整数类型。

#include <cmath>

#if 201703L <= __cplusplus
    #include <bit>
    #include <type_traits>
#endif


template <typename T> static constexpr T round_pow2(T v)
{
    /*  Since one is the smallest power of two, all numbers less than or equal
        to one round to one.
    */
    if (v <= 1) return 1;

    /*  For floating-point, the standard frexp function gives us the fraction
        and exponent, and ldexp applies an exponent.  The fraction is scaled to
        [.5, 1), so, if it is less than or equal to .75, we round down.
    */
    if constexpr (std::is_floating_point_v<T>)
    {
        int exponent = 0;
        T fraction = frexp(v, &exponent);
        return ldexp(.5, exponent + (.75 < fraction));
    }

    /*  Here we handle integer types.  The midpoints for rounding to powers of
        two are at 3*2^n.  That is, the transition between rounding to one
        power of two and another occurs at a number that has the form 3*2^n.
        To find which interval v is in, we can divide it by three and then
        find the next lower (instead of nearest) power of two.  To get the
        desired rounding at the midpoint, we use v-1.  So the general algorithm
        is to round (v-1)/3 down to the nearest power of two, then quadruple
        that.  For example:

            v   v-1   (v-1)/3   rounded down   quadrupled
            11   10     3            2             8
            12   11     3            2             8
            13   12     4            4            16

        Note that (v-1)/3 is not quite right for v=2, as the subtraction of 1
        jumps a full power of two, from 2 to 1.  (v-.01)/3 would work, but we
        want to stick with integer arithmetic.

        For the general case, we want 4 * 2**floor(log2((v-1)/3)).  To include
        v=2, we will use 2 * 2**f((v-1)/3), where f(x) is floor(log2(x))+1 but
        clamped to produce at least zero.

        If the C++ 2020 std::countl_zero function is available, we use that.
        Otherwise, we use the GCC builtin __builtin_clzll.  In either case, the
        function returns the number of leading zero bits, which depends on the
        width of the type rather than the operand value alone.  To calculate
        the power of two, we get the bit count for a fixed value (zero or one)
        as a reference point.
    */
    else
    {
        #if __cpp_lib_bitops
            /*  std::countl_zero is only provided for unsigned types, so
                define UT to be the unsigned type corresponding to T.
            */
            using UT  = std::make_unsigned<T>::type;

            return static_cast<UT>(2) << std::countl_zero<UT>(0) - std::countl_zero<UT>((v-1)/3));
        #else
            /*  Since __builtin_clzll is not defined for zero operands, we need
                to ensure its operand is at least 1.  To do this, we change
                (v-1)/3 to (v-1)/3*2+1.  The doubling increases the power of
                two by one, so we change the reference point from zero to one,
                decreasing the number of bits for it by one.
            */
            return 2ull << __builtin_clzll(1) - __builtin_clzll((v-1)/3*2+1);
        #endif
    }
}

Answer 2

浮点数使用 1 位符号、n 位指数、m 位有效数进行编码。例如，32 位浮点数的符号为 1 位，指数为 8 位，尾数为 23 位。 32 位浮点数的值可以简单地由方程给出

值 = (-1)^sign * 2^(E-127) * (1 + i = 1 到 23(b_(23-1) * 2^(-i))) 的总和

请参阅符号此处

因此，如果有效数字部分为 < 1.5 you'll be closer to (-1)^sign*2^(E-127), if the significand part is >= 1.5，您将更接近 (-1)^sign*2^(E-126)

有效数部分的第一位表示是否有有效数< 1.5 or significand >= 1.5（它将 2^(-1) = 0.5 添加到总和中）

因此，您可以简单地查看有效数字部分的第一位（32 位情况下的位数 22），如果该位为 0，则该值更接近 (-1)^sign*2^(E-127) ，如果该位为 1，则该值更接近 (-1)^sign*2^(E-126)

Answer 3

查找浮点值类型的最接近的 2 的幂与查找下一个 2 的幂没有太大区别。您只需查看尾数的最高有效位，以便将指数向上或向下舍入。

这是我用来四舍五入浮点值的方法（但可以轻松适应其他类型）：

float round_pow2(float x) {
    uint32_t bits = std::bit_cast<uint32_t>(x);
    // Wipe away all bits of the mantisse except for the most significant one
    bits &= 4290772992; // 2^32-1 - (2^22-1)
    // Add the most significant bit. If it is 1 it will round up the exponent part.
    bits += bits & 4194304; // 2^22
    return std::bit_cast<float>(bits);
}

将浮点值舍入到最接近的 2 次方

问题描述投票：0回答：3

3个回答

最新问题

将浮点值舍入到最接近的 2 次方

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3