Skip to content

Latest commit



189 lines (143 loc) · 6.73 KB

File metadata and controls

189 lines (143 loc) · 6.73 KB

课后作业(编码) 在本节中,我们将编写一些简单的多线程程序,并使用一个叫helgrind的特定工具来查找程序中的问题。 阅读作业中的 README 文件,以获取有关如何构建程序和运行helgrind的详细信息。


1.首先构建main-race.c,查看代码,以便您可以在代码中看到(非常明显的)数据竞争。 现在运行 helgrind(通过输入valgrind --tool=helgrind main-race)来查看其追踪结果。 它指向正确的代码行吗? 它还能为您提供什么其他信息?

输入make main-race构建,代码中的竞争条件很明显,多线程同时修改balance变量

输入valgrind --tool=helgrind ./main-race 查看结果,


==25676== Helgrind, a thread error detector
==25676== Copyright (C) 2007-2017, and GNU GPL'd, by OpenWorks LLP et al.
==25676== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==25676== Command: ./main-race
==25676== ---Thread-Announcement------------------------------------------
==25676== Thread #1 is the program's root thread
==25676== ---Thread-Announcement------------------------------------------
==25676== Thread #2 was created
==25676==    at 0x49B2282: clone (clone.S:71)
==25676==    by 0x48752EB: create_thread (createthread.c:101)
==25676==    by 0x4876E0F: pthread_create@@GLIBC_2.2.5 (pthread_create.c:817)
==25676==    by 0x4842917: ??? (in /usr/lib/x86_64-linux-gnu/valgrind/
==25676==    by 0x109513: Pthread_create (mythreads.h:51)
==25676==    by 0x1095F1: main (main-race.c:14)
==25676== ----------------------------------------------------------------
==25676== Possible data race during read of size 4 at 0x10C014 by thread #1
==25676== Locks held: none
==25676==    at 0x1095F2: main (main-race.c:15)
==25676== This conflicts with a previous write of size 4 by thread #2
==25676== Locks held: none
==25676==    at 0x1095A6: worker (main-race.c:8)
==25676==    by 0x4842B1A: ??? (in /usr/lib/x86_64-linux-gnu/valgrind/
==25676==    by 0x4876608: start_thread (pthread_create.c:477)
==25676==    by 0x49B2292: clone (clone.S:95)
==25676==  Address 0x10c014 is 0 bytes inside data symbol "balance"
==25676== ----------------------------------------------------------------
==25676== Possible data race during write of size 4 at 0x10C014 by thread #1
==25676== Locks held: none
==25676==    at 0x1095FB: main (main-race.c:15)
==25676== This conflicts with a previous write of size 4 by thread #2
==25676== Locks held: none
==25676==    at 0x1095A6: worker (main-race.c:8)
==25676==    by 0x4842B1A: ??? (in /usr/lib/x86_64-linux-gnu/valgrind/
==25676==    by 0x4876608: start_thread (pthread_create.c:477)
==25676==    by 0x49B2292: clone (clone.S:95)
==25676==  Address 0x10c014 is 0 bytes inside data symbol "balance"
==25676== Use --history-level=approx or =none to gain increased speed, at
==25676== the cost of reduced accuracy of conflicting-access information
==25676== For lists of detected and suppressed errors, rerun with: -s
==25676== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)

可以看到有竞争状态提示:Possible data race during read of size 4 at 0x10C014 by thread #1

代码行提示:at 0x1095A6: worker (main-race.c:8)

地址提示: Address 0x10c014 is 0 bytes inside data symbol "balance"

2.删除有问题的代码行之一会发生什么? 现在,在一个共享变量的更新附近添加锁,然后在所有变量更新周围添加锁。 在每种情况下,Helgrind报告什么?

删除第 15 行代码或第 8 行代码,程序正确运行

加一个锁报错,依然有竞争条件, 所有变量更新周围添加锁:cd code && make && valgrind --tool=helgrind ./main-race 结果正确

3.现在让我们看一下main-deadlock.c. 查看代码。 代码中有一个死锁的问题(我们将在下一章中对此进行更深入的讨论)。 您知道它可能有什么问题吗?


线程 0 获取锁 1,中断,线程 1 执行, 获取锁 2,造成死锁

4.现在运行helgrind检查这段代码。 Helgrind报告什么?


make main-deadlock 
valgrind --tool=helgrind ./main-deadlock 


==28961== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 7 from 7)

5.现在使用main-deadlock-global.c运行helgrind。 查看代码; 它有和main-deadlock.c有一样的问题吗? Helgrind是否应该报告相同的错误? 对于helgrind之类的工具,结果说明了什么?


make main-deadlock-global
valgrind --tool=helgrind ./main-deadlock-global


ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 7 from 7)

main-deadlock-global.c 代码是没有问题的,但 valgrind 工具依然报错, 因此推断 helgrind 靠执行周期与上下文切换次数 判断死锁,因此,helgrind 并不能很好地判断死锁

6.接下来让我们看一下main-signal.c。 这段代码使用变量(done)来表示子进程已完成,并且父线程现在可以继续运行了。 为什么这段代码效率低下? (父线程最终会花时间做什么,特别是当子线程需要很长时间才能执行完成时?)


make main-signal
valgrind --tool=helgrind ./main-signal



==30421== Possible data race during write of size 1 at 0x52861A5 by thread #1
==30421==    by 0x109633: main (main-signal.c:17)
==30421==    by 0x1095CC: worker (main-signal.c:8)
=30326== ERROR SUMMARY: 23 errors from 2 contexts (suppressed: 40 from 36)

竞争条件是 done 变量, 工具指示到 printf 去了,

glibc 的 printf 函数是线程安全的函数,参考:stackoverflow

8.现在看一下main-signal.c稍微修改的版本:main-signal-cv.c。 该版本使用条件变量来发送信号(并进行加锁)。 为什么此代码比以前的版本更好? 是正确,还是性能,或两者兼而有之?


make main-signal-cv
valgrind --tool=helgrind ./main-signal-cv

结果正确,性能比之前的版本要好, 对 done 变量加锁, 且将自旋等待替换为 Pthread_cond_wait, 释放锁并让出 CPU,收到信号时被唤醒,重新获取锁

