① 下面清单 1 中的 C++ 代码旨在为 s
函数(在
Produce
线程上运行)和 main
函数(在 Consume
上运行)的
t
变量进行处理数据竞争
在
main
函数开始时创建的线程)可能会遇到。
清单1:
// a.cpp file
#include <chrono> //01
#include <condition_variable> //02
#include <iostream> //03
#include <mutex> //04
#include <thread> //05
//06
std::string s = ""; //07
std::mutex m; //08
std::condition_variable cv; //09
bool stop = false; //10
//11
void Consume() { //12
while (true) { //13
std::unique_lock<std::mutex> ul{m}; //14
cv.wait(ul, []() { return stop || s != ""; }); //15
if (stop) break; //16
std::cout << "Consume \"" << s << "\"\n"; //17
s = ""; //18
m.unlock(); //19
cv.notify_all(); //20
m.lock(); //21
} //22
} //23
//24
void Produce(std::string s1) { //25
std::unique_lock<std::mutex> ul{m}; //26
cv.wait(ul, []() { return s == ""; }); //27
s = s1; //28
m.unlock(); //29
cv.notify_all(); //30
m.lock(); //31
} //32
//33
void Stop() { //34
std::unique_lock<std::mutex> ul{m}; //35
cv.wait(ul, []() { return s == ""; }); //36
stop = true; //37
m.unlock(); //38
cv.notify_all(); //39
m.lock(); //40
} //41
//42
int main() { //43
auto t = std::thread(Consume); //44
Produce("A"); //45
Produce("B"); //46
Produce("C"); //47
Stop(); //48
t.join(); //49
return 0; //50
} //51
② 使用 Ubuntu clang++ 版本 14.0.0-1ubuntu1.1 编译上述 C++ 代码时,Google ThreadSanitizer(又名 TSan)未检测到数据竞争,如下面的清单 2 所示。
清单 2:
$ clang++ -fsanitize=thread -g -O1 -Wall a.cpp #01
$ ./a.out #02
Consume "A" #03
Consume "B" #04
Consume "C" #05
$ #06
Ubuntu clang++ 版本 14.0.0-1ubuntu1.1 编译上述 C++ 代码时,基于 Valgrind 的 Helgrind 线程错误检测器会检测到可能的数据竞争,如下面的 清单 3 所示。
清单 3:
$ clang++ -g -O1 -Wall a.cpp //01
$ valgrind --tool=helgrind ./a.out //02
==<PID>== Helgrind, a thread error detector //03
==<PID>== Copyright (C) 2007-2017, and GNU GPL'd, by OpenWorks LLP et al. //04
==<PID>== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info //05
==<PID>== Command: ./a.out //06
==<PID>== //07
==<PID>== ---Thread-Announcement------------------------------------------ //08
==<PID>== //09
==<PID>== Thread #1 is the program's root thread //10
==<PID>== //11
==<PID>== ---Thread-Announcement------------------------------------------ //12
==<PID>== //13
==<PID>== Thread #2 was created //14
==<PID>== at 0x53729F3: clone (clone.S : 76) //15
==<PID>== by 0x53738EE: __clone_internal (clone-internal.c: 83) //16
==<PID>== by 0x52E16D8: create_thread (pthread_create.c:295) //17
==<PID>== by 0x52E21FF: pthread_create@@GLIBC_2.34 (pthread_create.c:828) //18
==<PID>== by 0x4E13585: pthread_create_WRK (hg_intercepts.c :445) //19
==<PID>== by 0x4E14A8C: pthread_create@* (hg_intercepts.c :478) //20
==<PID>== by 0x50FD328: std::thread::_M_start_thread( //21
std::unique_ptr< //22
std::thread::_State, //23
std::default_delete<std::thread::_State> //24
>, //25
void (*)() //26
) //27
(in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30) //28
==<PID>== by 0x10A691: thread<void (&)(), void> (std_thread.h:143) //22
==<PID>== by 0x10A691: main (???:43) //23
==<PID>== //24
==<PID>== ---------------------------------------------------------------- //25
==<PID>== //26
==<PID>== Possible data race during read of size 4 at 0x10D110 by thread #1 //27
==<PID>== Locks held: none //28
==<PID>== at 0x52E4F4A: pthread_mutex_lock@@GLIBC_2.2.5 (pthread_mutex_lock.c: 94) //29
==<PID>== by 0x4E1009A: mutex_lock_WRK (hg_intercepts.c :937) //30
==<PID>== by 0x4E14E73: pthread_mutex_lock (hg_intercepts.c :960) //31
==<PID>== by 0x10A4D4 : __gthread_mutex_lock (gthr-default.h :749) //32
==<PID>== by 0x10A4D4 : lock (std_mutex.h :100) //33
==<PID>== by 0x10A4D4 : lock (unique_lock.h :139) //34
==<PID>== by 0x10A4D4 : unique_lock (unique_lock.h : 69) //35
==<PID>== by 0x10A4D4 : Produce( //36
std::__cxx11::basic_string< //37
char, //38
std::char_traits<char>, //39
std::allocator<char> //40
> //41
) (??? :26) //42
==<PID>== by 0x10A6C5 : main (a.cpp:44) //43
==<PID>== //44
==<PID>== This conflicts with a previous write of size 4 by thread #2 //45
==<PID>== Locks held: none //46
==<PID>== at 0x52E696C: __pthread_mutex_unlock_usercnt (pthread_mutex_unlock.c: 62) //47
==<PID>== by 0x52E08AD: __pthread_cond_wait_common (pthread_cond_wait.c : 419) //48
==<PID>== by 0x52E08AD: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.c : 627) //49
==<PID>== by 0x4E1390B: pthread_cond_wait_WRK (hg_intercepts.c :1291) //50
==<PID>== by 0x4E14EAA: pthread_cond_wait@* (hg_intercepts.c :1318) //51
==<PID>== by 0x10A3A9 : wait<(lambda at a.cpp:15:17)> (condition_variable : 103) //52
==<PID>== by 0x10A3A9 : Consume() (??? : 15) //53
==<PID>== by 0x10A833 : __invoke_impl<void, void (*)()> (invoke.h : 61) //54
==<PID>== by 0x10A833 : __invoke<void (*)()> (invoke.h : 96) //55
==<PID>== by 0x10A833 : _M_invoke<0UL> (std_thread.h : 259) //56
==<PID>== by 0x10A833 : operator() (std_thread.h : 266) //57
==<PID>== by 0x10A833 : std::thread::_State_impl< //58
std::thread::_Invoker< //59
std::tuple<void (*)()> //60
> //61
>::_M_run() (std_thread.h : 211) //62
==<PID>== by 0x50FD252: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30) //63
==<PID>== by 0x4E13779: mythread_wrapper (hg_intercepts.c : 406) //64
==<PID>== //65
==<PID>== Address 0x10d110 is 8 bytes inside data symbol "m" //66
==<PID>== //67
==<PID>== ---------------------------------------------------------------- //68
<-- Text Truncated -->
==<PID>== Use --history-level=approx or =none to gain increased speed, at
==<PID>== the cost of reduced accuracy of conflicting-access information
==<PID>== For lists of detected and suppressed errors, rerun with: -s
==<PID>== ERROR SUMMARY: 71 errors from 36 contexts (suppressed: 0 from 0)
$
Google TSan 或Valgrind Helgrind 的报告哪一个是正确的,为什么?
在这些情况下,C++ 标准库以一种看起来像是错误的方式使用 pthread(至少在 DRD 和 Helgrind 的作者在编写它们时认为是这样)。
请参阅此问题的答案,了解 DRD 和 Helgrind 期望的解释和示例:理解 pthread_cond_wait() 和 pthread_cond_signal()。
通常情况下,要“解决”这种问题而不把孩子和洗澡水一起倒掉是很棘手的。
我能做的就是添加一个选项,比如
--ignore-dubious=yes|no Ignore dubious usage of pthread_cond_{signal|broadcast}
where the signaller is not holding the mutex [no]
同时您可以抑制此类错误。
{
dubious DRD condition signal/broadcast
drd:CondRaceErr
fun:pthread_cond_*
}
{
dubious Helgrind condition signal/broadcast
Helgrind:Dubious
fun:pthread_cond*
}