异步写入位数组

Question

TL; DR如何使用C ++ 11的A[n/8] |= (1<<n%8);库并行计算时，如何安全地执行单个位更新A，使char成为一个巨大的ns数组（即设置A的位<thread>为真）？

我正在执行一个易于并行化的计算。我正在计算自然数的某个子集的元素，我想找到不在子集中的元素。为此，我创建了一个巨大的阵列（如A = new char[20l*1024l*1024l*1024l]，即20GiB）。如果n位于我的集合中，那么n的这个数组是真的。

当并行执行并使用A[n/8] |= (1<<n%8);将位设置为true时，我似乎得到了一小部分信息丢失，据说是由于A的同一字节上的同时工作（每个线程必须先读取字节，更新单个位并写入字节后面）。我怎么能绕过这个？有没有办法如何将此更新作为原子操作？

代码如下。海湾合作委员会版本：g++ (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609。该机器是8核Intel（R）Xeon（R）CPU E5620 @ 2.40GHz，37GB RAM。编译器选项：g++ -std=c++11 -pthread -O3

#include <iostream>
#include <thread>

typedef long long myint; // long long to be sure

const myint max_A = 20ll*1024ll*1024ll; // 20 MiB for testing
//const myint max_A = 20ll*1024ll*1024ll*1024ll; // 20 GiB in the real code
const myint n_threads = 1; // Number of threads
const myint prime = 1543; // Tested prime

char *A; 
const myint max_n = 8*max_A;

inline char getA(myint n) { return A[n/8] & (1<<(n%8)); }
inline void setAtrue(myint n) { A[n/8] |= (1<<n%8); }

void run_thread(myint startpoint) {
    // Calculate all values of x^2 + 2y^2 + prime*z^2 up to max_n
    // We loop through x == startpoint (mod n_threads)
    for(myint x = startpoint; 1*x*x < max_n; x+=n_threads)
        for(myint y = 0; 1*x*x + 2*y*y < max_n; y++)
            for(myint z = 0; 1*x*x + 2*y*y + prime*z*z < max_n; z++)
                setAtrue(1*x*x + 2*y*y + prime*z*z);
}

int main() {
    myint n;

    // Only n_threads-1 threads, as we will use the master thread as well
    std::thread T[n_threads-1];

    // Initialize the array
    A = new char[max_A]();

    // Start the threads
    for(n = 0; n < n_threads-1; n++) T[n] = std::thread(run_thread, n); 
    // We use also the master thread
    run_thread(n_threads-1);
    // Synchronize
    for(n = 0; n < n_threads-1; n++) T[n].join();

    // Print and count all elements not in the set and n != 0 (mod prime)
    myint cnt = 0;
    for(n=0; n<max_n; n++) if(( !getA(n) )&&( n%1543 != 0 )) {
        std::cout << n << std::endl;
        cnt++;
    }   
    std::cout << "cnt = " << cnt << std::endl;

    return 0;
}

当n_threads = 1，我得到正确的值cnt = 29289。当n_threads = 7，我在两个不同的调用上得到cnt = 29314和cnt = 29321，表明一个字节上的一些按位操作是同意的。

Answer 1

std::atomic提供您需要的所有设施：

std::array<std::atomic<char>, max_A> A;

static_assert(sizeof(A[0]) == 1, "Shall not have memory overhead");
static_assert(std::atomic<char>::is_always_lock_free,
              "No software-level locking needed on common platforms");

inline char getA(myint n) { return A[n / 8] & (1 << (n % 8)); }
inline void setAtrue(myint n) { A[n / 8].fetch_or(1 << n % 8); }

getA中的负载是原子（equivalent to load()），std::atomic甚至内置支持or存储值与另一个（fetch_or），当然原子。

在初始化A时，for (auto& a : A) a = 0;的天真方式需要在每个商店之后进行同步，这可以通过放弃一些线程安全来避免。 std::memory_order_release只要求我们编写的内容对其他线程可见（但不是其他线程的写入对我们可见）。事实上，如果你这样做

// Initialize the array
for (auto& a : A)
  a.store(0, std::memory_order_release);

您可以获得所需的安全性，而无需在x86上进行任何程序集级别的同步。在线程完成后你可以对负载进行相反的操作，但这对x86没有任何额外的好处（它只是一个mov）。

演示完整代码：https://godbolt.org/z/nLPlv1

异步写入位数组

问题描述投票：6回答：1

1个回答

最新问题

异步写入位数组

问题描述 投票：6回答：1

1个回答

最新问题

问题描述投票：6回答：1