{"id":980,"date":"2013-02-25T00:00:00","date_gmt":"2013-02-25T00:00:00","guid":{"rendered":"https:\/\/www.fussylogic.co.uk\/blog\/?p=980"},"modified":"2013-02-21T20:11:57","modified_gmt":"2013-02-21T20:11:57","slug":"valgrind-ii-helgrind","status":"publish","type":"post","link":"https:\/\/www.fussylogic.co.uk\/blog\/?p=980","title":{"rendered":"Valgrind II &#8212; helgrind"},"content":{"rendered":"<p><a href=\"?p=\">Last time<\/a> we looked at <code>valgrind's<\/code> memory leak detector, <code>memcheck<\/code>. This time we\u00e2\u20ac\u2122ll be looking at an even more difficult to catch class of bug, thread race conditions.<\/p>\n<p>First let\u00e2\u20ac\u2122s be clear what a thread race is: a race is when two threads run code that access the same piece of memory. The outcome is different depending on which thread runs through faster. I\u00e2\u20ac\u2122ve discussed race conditions <a href=\"?p=694\">before<\/a> in the context of interrupt handlers on embedded microcontrollers. Exactly the same set of problems exists for threaded programs, being that each thread effectively interrupts the other at any point.<\/p>\n<p>This time we\u00e2\u20ac\u2122re thinking about programs on full desktop systems, and we\u00e2\u20ac\u2122ve got access to the <code>helgrind<\/code> and <code>dsd<\/code> tools in <code>valgrind<\/code> to help us find races.<\/p>\n<p>Let\u00e2\u20ac\u2122s write a C++ program using pthreads with some runtime faults in it.<\/p>\n<pre><code>\/\/ C++\n#include &lt;iostream&gt;\n\/\/ C\n#include &lt;stdlib.h&gt;\n\/\/ OS\n#include &lt;sys\/types.h&gt;\n#include &lt;sys\/stat.h&gt;\n#include &lt;fcntl.h&gt;\n#include &lt;unistd.h&gt;\n#include &lt;pthread.h&gt;\n\nusing namespace std;\n\n\/\/ cerr lock, static initialize\npthread_mutex_t cerrLock = PTHREAD_MUTEX_INITIALIZER;\n\n\/\/ misappropriated lock\npthread_mutex_t misappropriatedLock = PTHREAD_MUTEX_INITIALIZER;\n\n\/\/ racer\nint racer = 0;\n\n\nvoid *thread1_main( void * )\n{\n    \/\/ This is good locking, and won&#39;t trigger helgrind.\n    pthread_mutex_lock( &amp;cerrLock );\n    cerr &lt;&lt; &quot;[THREAD1] Begin&quot; &lt;&lt; endl;\n    pthread_mutex_unlock( &amp;cerrLock );\n\n    usleep(100000);\n    racer++;\n\n    pthread_mutex_lock( &amp;cerrLock );\n    cerr &lt;&lt; &quot;[THREAD1] End&quot; &lt;&lt; endl;\n    pthread_mutex_unlock( &amp;cerrLock );\n\n    \/\/ Lock in this thread\n    pthread_mutex_lock( &amp;misappropriatedLock );\n\n    \/\/ Exiting with lock held\n    return NULL;\n}\n\nvoid *thread2_main( void * )\n{\n    pthread_mutex_lock( &amp;cerrLock );\n    cerr &lt;&lt; &quot;[THREAD2] Begin&quot; &lt;&lt; endl;\n    pthread_mutex_unlock( &amp;cerrLock );\n\n    usleep(200000);\n    racer++;\n\n    pthread_mutex_lock( &amp;cerrLock );\n    cerr &lt;&lt; &quot;[THREAD2] End&quot; &lt;&lt; endl;\n    pthread_mutex_unlock( &amp;cerrLock );\n\n    \/\/ unlock in this thread\n    pthread_mutex_unlock( &amp;misappropriatedLock );\n\n    return NULL;\n}\n\nint main(void)\n{\n    pthread_t thread1, thread2;\n\n    \/\/ Start threads\n    pthread_create( &amp;thread1, NULL, thread1_main, NULL );\n    pthread_create( &amp;thread2, NULL, thread2_main, NULL );\n\n    \/\/ Wait for them to finish\n    pthread_join(thread1, NULL);\n    pthread_join(thread2, NULL);\n\n    \/\/ Whether or not a race condition happens, corrupting the racer\n    \/\/ variable, we expect helgrind to detect our bad accesses of it in\n    \/\/ the two threads.\n    if( racer != 2 ) {\n        cerr &lt;&lt; &quot;race detected&quot; &lt;&lt; endl;\n    } else {\n        cerr &lt;&lt; &quot;race not detected &quot;&lt;&lt; endl;\n    }\n\n    \/\/ API misuse\n    pthread_mutex_t testLock;\n    \/\/ Unlocking an invalid lock\n    pthread_mutex_unlock(&amp;testLock);\n    \/\/ Destroying an invalid lock\n    pthread_mutex_destroy(&amp;testLock);\n\n\n    pthread_mutex_init(&amp;testLock, NULL);\n    \/\/ Not destroying an allocated mutex\n\n    return 0;\n}\n\n\/\/ Command line\n\/\/  $ g++ -ggdb3 -O2 -Wall -rdynamic -Wextra -c -o threadracer.o threadracer.cc\n\/\/  $ g++ -pthread -o threadracer threadracer.o<\/code><\/pre>\n<p>Running it under <code>helgrind<\/code>:<\/p>\n<pre><code>valgrind --tool=helgrind \\\n                --log-file=vg-threadracer.log \\\n                .\/threadracer\n[THREAD1] Begin \n[THREAD2] Begin\n[THREAD1] End\n[THREAD2] End\nrace not detected<\/code><\/pre>\n<p>Now let\u00e2\u20ac\u2122s look at the log.<\/p>\n<pre><code>==26230== Helgrind, a thread error detector\n==26230== Copyright (C) 2007-2012, and GNU GPL&#39;d, by OpenWorks LLP et al.\n==26230== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info\n==26230== Command: .\/threadracer\n==26230== Parent PID: 26229\n==26230== \n==26230== ---Thread-Announcement------------------------------------------\n==26230== \n==26230== Thread #2 was created\n==26230==    at 0x4271778: clone (clone.S:111)\n==26230==    by 0x41893BB: pthread_create@@GLIBC_2.1 (createthread.c:256)\n==26230==    by 0x402A790: pthread_create_WRK (hg_intercepts.c:255)\n==26230==    by 0x8048A4D: main (threadracer.cc:69)\n==26230== \n==26230== ----------------------------------------------------------------\n==26230== \n==26230== Thread #2: Exiting thread still holds 1 lock\n==26230==    at 0x4188CCC: start_thread (pthread_create.c:439)\n==26230==    by 0x427178D: clone (clone.S:130)\n==26230== <\/code><\/pre>\n<p>Locks should be acquired and released in the same thread; so leaving one locked when a thread exits will probably lead to another thread deadlocking on that lock.<\/p>\n<pre><code>==26230== ---Thread-Announcement------------------------------------------\n==26230== \n==26230== Thread #3 was created\n==26230==    at 0x4271778: clone (clone.S:111)\n==26230==    by 0x41893BB: pthread_create@@GLIBC_2.1 (createthread.c:256)\n==26230==    by 0x402A790: pthread_create_WRK (hg_intercepts.c:255)\n==26230==    by 0x8048A71: main (threadracer.cc:70)\n==26230== \n==26230== ----------------------------------------------------------------\n==26230== \n==26230== Possible data race during read of size 4 at 0x804A330 by thread #3\n==26230== Locks held: none\n==26230==    at 0x8048CDA: thread2_main(void*) (threadracer.cc:52)\n==26230==    by 0x4188C38: start_thread (pthread_create.c:304)\n==26230==    by 0x427178D: clone (clone.S:130)\n==26230== \n==26230== This conflicts with a previous write of size 4 by thread #2\n==26230== Locks held: none\n==26230==    at 0x8048E2A: thread1_main(void*) (threadracer.cc:32)\n==26230==    by 0x4188C38: start_thread (pthread_create.c:304)\n==26230==    by 0x427178D: clone (clone.S:130)\n==26230== <\/code><\/pre>\n<p>Magical! <code>helgrind<\/code> has cleverly found us a race. Quite correctly it\u00e2\u20ac\u2122s warning us that the same memory is accessed on line 53 and line 32 without a protective lock.<\/p>\n<pre><code>==26230== ----------------------------------------------------------------\n==26230== \n==26230== Thread #3 unlocked lock at 0x804A334 currently held by thread #2\n==26230==    at 0x402AFB6: pthread_mutex_unlock (hg_intercepts.c:609)\n==26230==    by 0x8048D50: thread2_main(void*) (threadracer.cc:59)\n==26230==    by 0x4188C38: start_thread (pthread_create.c:304)\n==26230==    by 0x427178D: clone (clone.S:130)\n==26230==   Lock at 0x804A334 was first observed\n==26230==    at 0x402AC84: pthread_mutex_lock (hg_intercepts.c:495)\n==26230==    by 0x8048EA0: thread1_main(void*) (threadracer.cc:39)\n==26230==    by 0x4188C38: start_thread (pthread_create.c:304)\n==26230==    by 0x427178D: clone (clone.S:130)\n==26230== <\/code><\/pre>\n<p>This fault that <code>helgrind<\/code> has found is because we\u00e2\u20ac\u2122ve released a lock in thread #3 on line 59, that was acquired in thread #2 on line 39.<\/p>\n<pre><code>==26230== ---Thread-Announcement------------------------------------------\n==26230== \n==26230== Thread #1 is the program&#39;s root thread\n==26230== \n==26230== ----------------------------------------------------------------\n==26230== \n==26230== Thread #1 unlocked an invalid lock at 0xBEB69BC8\n==26230==    at 0x402AFB6: pthread_mutex_unlock (hg_intercepts.c:609)\n==26230==    by 0x8048ACA: main (threadracer.cc:88)\n==26230== <\/code><\/pre>\n<p>We\u00e2\u20ac\u2122ve released a lock that was never allocated on line 88. That\u00e2\u20ac\u2122s an API error at least, but can easily result in a crash.<\/p>\n<pre><code>==26230== ----------------------------------------------------------------\n==26230== \n==26230== Thread #1&#39;s call to pthread_mutex_unlock failed\n==26230==    with error code 1 (EPERM: Operation not permitted)\n==26230==    at 0x402B033: pthread_mutex_unlock (hg_intercepts.c:618)\n==26230==    by 0x8048ACA: main (threadracer.cc:88)\n==26230== <\/code><\/pre>\n<p>Same fault really, using an unallocated lock has resulted in a permission denied response from the OS.<\/p>\n<pre><code>==26230== ----------------------------------------------------------------\n==26230== \n==26230== Thread #1: pthread_mutex_destroy with invalid argument\n==26230==    at 0x402AABC: pthread_mutex_destroy (hg_intercepts.c:455)\n==26230==    by 0x8048AD2: main (threadracer.cc:90)\n==26230== \n==26230== \n==26230== For counts of detected and suppressed errors, rerun with: -v\n==26230== Use --history-level=approx or =none to gain increased speed, at\n==26230== the cost of reduced accuracy of conflicting-access information\n==26230== ERROR SUMMARY: 7 errors from 7 contexts (suppressed: 33 from 33)<\/code><\/pre>\n<p>Unlocking the invalid lock is bad, so is destroying it, and <code>helgrind<\/code> has caught that too.<\/p>\n<p><code>valgrind<\/code> has a similar tool to <code>helgrind<\/code> called <code>drd<\/code>, which serves a similar purpose but is (reportedly) faster. It\u00e2\u20ac\u2122s not quite as thorough though. Here\u00e2\u20ac\u2122s it\u00e2\u20ac\u2122s output on the same executable.<\/p>\n<pre><code>==30573== drd, a thread error detector\n==30573== Copyright (C) 2006-2012, and GNU GPL&#39;d, by Bart Van Assche.\n==30573== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info\n==30573== Command: .\/threadracer\n==30573== Parent PID: 30572\n==30573== \n==30573== Mutex still locked at thread exit: mutex 0x804a334, recursion count 1, owner 2.\n==30573==    at 0x402B6FC: pthread_join (drd_pthread_intercepts.c:509)\n==30573==    by 0x8048A85: main (threadracer.cc:73)\n==30573== mutex 0x804a334 was first observed at:\n==30573==    at 0x402CA7F: pthread_mutex_lock (drd_pthread_intercepts.c:650)\n==30573==    by 0x8048EA0: thread1_main(void*) (threadracer.cc:39)\n==30573==    by 0x418FC38: start_thread (pthread_create.c:304)\n==30573==    by 0x427878D: clone (clone.S:130)\n==30573== \n==30573== Thread 3:\n==30573== Conflicting load by thread 3 at 0x0804a330 size 4\n==30573==    at 0x8048CDA: thread2_main(void*) (threadracer.cc:52)\n==30573==    by 0x418FC38: start_thread (pthread_create.c:304)\n==30573==    by 0x427878D: clone (clone.S:130)\n==30573== Allocation context: BSS section of \/home\/andyp\/dev\/projects\/valgrind\/threadracer\n==30573== Other segment start (thread 2)\n==30573==    (thread finished, call stack no longer available)\n==30573== Other segment end (thread 2)\n==30573==    (thread finished, call stack no longer available)\n==30573== \n==30573== Conflicting store by thread 3 at 0x0804a330 size 4\n==30573==    at 0x8048CDA: thread2_main(void*) (threadracer.cc:52)\n==30573==    by 0x418FC38: start_thread (pthread_create.c:304)\n==30573==    by 0x427878D: clone (clone.S:130)\n==30573== Allocation context: BSS section of \/home\/andyp\/dev\/projects\/valgrind\/threadracer\n==30573== Other segment start (thread 2)\n==30573==    (thread finished, call stack no longer available)\n==30573== Other segment end (thread 2)\n==30573==    (thread finished, call stack no longer available)\n==30573== \n==30573== Mutex not locked: mutex 0x804a334, recursion count 1, owner 0.\n==30573==    at 0x402D2EF: pthread_mutex_unlock (drd_pthread_intercepts.c:703)\n==30573==    by 0x8048D50: thread2_main(void*) (threadracer.cc:59)\n==30573==    by 0x418FC38: start_thread (pthread_create.c:304)\n==30573==    by 0x427878D: clone (clone.S:130)\n==30573== mutex 0x804a334 was first observed at:\n==30573==    at 0x402CA7F: pthread_mutex_lock (drd_pthread_intercepts.c:650)\n==30573==    by 0x8048EA0: thread1_main(void*) (threadracer.cc:39)\n==30573==    by 0x418FC38: start_thread (pthread_create.c:304)\n==30573==    by 0x427878D: clone (clone.S:130)\n==30573== \n==30573== Thread 1:\n==30573== The object at address 0xbed05bc8 is not a mutex.\n==30573==    at 0x402D2EF: pthread_mutex_unlock (drd_pthread_intercepts.c:703)\n==30573==    by 0x8048ACA: main (threadracer.cc:88)\n==30573== \n==30573== The object at address 0xbed05bc8 is not a mutex.\n==30573==    at 0x402C850: pthread_mutex_destroy (drd_pthread_intercepts.c:636)\n==30573==    by 0x8048AD2: main (threadracer.cc:90)\n==30573== \n==30573== \n==30573== For counts of detected and suppressed errors, rerun with: -v\n==30573== ERROR SUMMARY: 6 errors from 6 contexts (suppressed: 50 from 18)<\/code><\/pre>\n<p>I won\u00e2\u20ac\u2122t go through these individually, they\u00e2\u20ac\u2122re the same set of faults, just being reported by a different tool.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Last time we looked at valgrind&#8217;s memory leak detector, memcheck. This time we\u00e2\u20ac\u2122ll be looking at an even more difficult to catch class of bug, thread race conditions. First let\u00e2\u20ac\u2122s be clear what a thread race is: a race is when two threads run code that access the same piece of memory. The outcome is\u2026 <span class=\"read-more\"><a href=\"https:\/\/www.fussylogic.co.uk\/blog\/?p=980\">Read More &raquo;<\/a><\/span><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[65,66,42,6,64],"_links":{"self":[{"href":"https:\/\/www.fussylogic.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/posts\/980"}],"collection":[{"href":"https:\/\/www.fussylogic.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.fussylogic.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.fussylogic.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.fussylogic.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=980"}],"version-history":[{"count":4,"href":"https:\/\/www.fussylogic.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/posts\/980\/revisions"}],"predecessor-version":[{"id":987,"href":"https:\/\/www.fussylogic.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/posts\/980\/revisions\/987"}],"wp:attachment":[{"href":"https:\/\/www.fussylogic.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=980"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.fussylogic.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=980"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.fussylogic.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=980"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}