C++20 范围的意外运行时基准

Question

这里“myData”是一个整数向量。我只过滤其中的偶数，然后将这些偶数转换为字符串。它们存储在向量中并返回。调用此函数，我们对返回向量的每个元素调用 process 函数。

我尝试使用 std::ranges 来优化此向量的创建，而不是返回一个视图。

原代码：

auto getData() {
        std::vector<T2> v;
        for (T1 p: myData) {
            if (filter(p))
                v.push_back(transform(p));
        }
        return v;
}

Update-1：修改为使用返回 lambda 返回的 lambda 接受一个可调用对象并为每个元素调用它。实际上类似于惰性视图。 这比原始代码的性能提高了 2.5 倍。

auto getDataCallback() {
    return [this](auto func) {
        for (T1 p: myData) {
            if (filter(p))
                func(transform(p));
        }
    };
}

更新2：返回std::view而不是向量

auto dataView() {
        return myData | ranges::views::filter(filter) | ranges::views::transform([this](auto t) { return transform(t); });
}

我预计其表现与 Update-1 类似。但基准测试显示 Update-1 慢了 1.7 倍

为什么基于 std::view 的方法比基于 lambda 的方法慢 1.7 倍？

这里是基准测试链接：https://quick-bench.com/q/3QMVqBVFU0ma7EN3q9_ADnIA_IM

我预计第二个和第三个解决方案的运行时间几乎相同，但不确定我在这里是否做错了什么。

这是完整的代码：

#include <ranges>
#include <algorithm>
#include <numeric>

static inline constexpr size_t million = 1000;  //1000000;
static inline size_t sum = 0;

using T1 = int;
using T2 = std::string;
bool filter(T1 t) { return  0 == t % 2; }
void process(const T2 &p) { sum += p.size(); };
T2 transform(T1 x) { return std::to_string(x); }
static inline std::vector<T1> data;

namespace Conventional {
    class MyClass {
        std::vector<T1>& myData = data;
        T2 transform(T1 x) { return std::to_string(x); }

    public:
        auto getData() {
            std::vector<T2> v;
            for (T1 p: myData) {
                if (filter(p))
                    v.push_back(transform(p));
            }
            return v;
        }
    };

    void myFunc() {
        sum = 0;
        MyClass obj;
        auto data = obj.getData();
        std::for_each(data.begin(), data.end(), process);
    }
};

namespace LazyCpp17 {
    class MyClass {
        std::vector<T1>& myData = data;
        T2 transform(T1 x) { return std::to_string(x); }

    public:
        auto getDataCallback() {
            return [this](auto func) {
                for (T1 p: myData) {
                    if (filter(p))
                        func(transform(p));
                }
            };
        }
    };

    void myFunc() {
        sum = 0;
        MyClass obj;
        obj.getDataCallback()(process);
    }
};

namespace LazyCpp20 {
    class MyClass {
        std::vector<T1>& myData = data;
        T2 transform(T1 x) { return std::to_string(x); }

    public:
        auto dataView() {
            return myData | std::views::filter(filter) | std::views::transform([this](auto t) { return transform(t); });
        }
    };

    void myFunc() {
        sum = 0;
        MyClass obj;
        std::ranges::for_each(obj.dataView(), process);
    }
};


static void ConventionalB(benchmark::State& state) {
  data.resize(million);
  std::iota(data.begin(), data.end(), 0);
  // Code inside this loop is measured repeatedly
  for (auto _ : state) {
        Conventional::myFunc();
    // Make sure the variable is not optimized away by compiler
  }
}
// Register the function as a benchmark
BENCHMARK(ConventionalB);

static void LazyCpp17B(benchmark::State& state) {
  data.resize(million);
  std::iota(data.begin(), data.end(), 0);
  // Code before the loop is not measured
  std::string x = "hello";
  for (auto _ : state) {
        LazyCpp17::myFunc();
  }
}
BENCHMARK(LazyCpp17B);

static void LazyCpp20B(benchmark::State& state) {
  data.resize(million);
  std::iota(data.begin(), data.end(), 0);
  // Code before the loop is not measured
  std::string x = "hello";
  for (auto _ : state) {
        LazyCpp20::myFunc();
  }
}
BENCHMARK(LazyCpp20B);

Answer 1

您的 LazyCpp17 传递

data

中的整数，这是相当便宜的。同时 LazyCpp20 穿过

std::string

对象。

在第一种情况下，

std::string

对象需要用于非常应该的范围，并且由于这个

std::string

从未在“过程”中实际使用，因此可以轻松优化。

在第二种情况下，您的

std::string

循环需要这些

std::ranges::for_each

对象，因为它会通过它们，并将它们用作循环的容器/范围/视图。因此，这些对象必须持续更长时间且更难以优化。

C++20 范围的意外运行时基准

问题描述投票：0回答：1

1个回答

最新问题

C++20 范围的意外运行时基准

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1