AbstractWe present Memcheck, a tool that has been implemented with thedynamic binary instrumentation framework Valgrind. Memcheck detects awide range of memory errors in programs as they run. This paperfocuses on one kind of error that Memcheck detects: undefined value errors.Such errors are common, and often cause bugs that are hard to find in programswritten in languages such as C, C++ and Fortran. Memcheck's definednesschecking improves on that of previous tools by being accurate to the level ofindividual bits. This accuracy gives Memcheck a low false positive and falsenegative rate.The definedness checking involves shadowing every bit of data inregisters and memory with a second bit that indicates if the bit has adefined value. Every value-creating operation is instrumented with ashadow operation that propagates shadow bits appropriately. Memcheckuses these shadow bits to detect uses of undefined values that couldadversely affect a program's behaviour.Under Memcheck, programs typically run 20-30 times slower thannormal. This is fast enough to use with large programs. Memcheckfinds many errors in real programs, and has been used during thepast two years by thousands of programmers on a wide range of systems,including OpenOffice, Mozilla, Opera, KDE, GNOME, MySQL, Perl, Samba,The GIMP, and Unreal Tournament.1 IntroductionThe accidental use of undefined values is a notorious source of bugsin programs written in imperative languages such as C, C++ andFortran. Such undefined value errors are easy to make, but canbe extremely difficult to track down manually, sometimes lurkingunfound for years.In this paper we describe Memcheck, a practical tool which detects awide range of memory errors, including undefined value errors.Memcheck is implemented using the dynamic binary instrumentationframework Valgrind [10,9]. 1.1 Basic operation and featuresMemcheck is a dynamic analysis tool, and so checks programs for errorsas they run. Memcheck performs four kinds of memory error checking. First, it tracks the addressability of every byte of memory, updating theinformation as memory is allocated and freed. With this information, it candetect all accesses to unaddressable memory.Second, it tracks all heap blocks allocated with malloc(), new andnew[]. With this information it can detect bad or repeated frees ofheap blocks, and can detect memory leaks at program termination.Third, it checks that memory blocks supplied as arguments tofunctions like strcpy() and memcpy() do not overlap. This does notrequire any additional state to be tracked.Fourth, it performs definedness checking: it tracks the definedness ofevery bit of data in registers and memory. With this information it can detectundefined value errors with bit-precision.All four kinds of checking are useful. However, of the four,definedness checking is easily the most sophisticated, and it isthis checking that this paper focuses on.Memcheck uses dynamic binary instrumentationto instrument the program to be checked (the client)on-the-fly at run-time. Execution of the added instrumentation code isinterleaved with the program's normal execution, not disturbing normal programbehaviour (other than slowing it down), but doing extra work ``on the side'' todetect memory errors. Because it instruments and analyses executable machinecode, rather than source code or object code, it has the following niceproperties.Wide applicability: it works with programs written in any language.1
Total coverage: all parts of the client are executed under Memcheck's control, including dynamically linked libraries and the dynamic linker, even if the source code is not available on the system. Indeed, it is only by doing this that Memcheck can be accurate--partial coverage leads either to lots of missed errors (false negatives) or lots of invalid errors (false positives).
Ease of use: unlike many similar tools, it does not require programs to be prepared (e.g. recompiled or relinked) in any way.
Memcheck is part of the Valgrind suite, which is free (GPL) software,and is available for download from the Valgrind website[14]. It currently runs only on the x86/Linuxplatform, although work is currently underway to port it to otherplatforms such as AMD64/Linux, PowerPC/Linux, x86/FreeBSD, andPowerPC/MacOSX.1.2 Using MemcheckFigure 1:Example program badprog.c Memcheck is easy to use. As an example, consider the (contrived) programbadprog.c in Figure 1. It contains three undefinedvalue errors. To check the compiled program badprog the user only hasto type: valgrind --tool=memcheck badprogThe --tool= option specifies which tool in theValgrind suite is used. The program runs under Memcheck's control,typically 20-30 times slower than usual. This slow-down is partlydue to Memcheck's definedness checking, partly due to its otherchecking, and partly due to Valgrind's inherent overhead. The program'soutput is augmented by Memcheck's output, which goes by default to standarderror, although it can be redirected to a file, file descriptor, or socket witha command line option.Figure 2:Example output for badprog.c Figure 2 shows the resulting output. The first three linesare printed by Memcheck on startup. The middle section shows threeerror messages issued by Memcheck. The final three lines are printed attermination, and summarise Memcheck's findings. Each line of Memcheck's outputis prefixed with the client's process ID, 27607 in this case.The three error messages are quite precise. The first indicates thatthe memory passed to the system call write() via the buf argumentcontains undefined values; its last line indicates that the undefinedvalue is on the stack of thread 1 (the main thread). The secondindicates that a conditional jump depends on an undefined value.The third indicates that an undefined value is used as a pointer.All three error messages include a stack trace that indicate preciselywhere the error is occurring. In order to get exact line numbers inthe stack traces, the client must be compiled with debugginginformation. If this is missing, the code locations are less precise,as can be seen with the location within the write() function(not to be confused with the write() system call)--GNU libc onthis system was not compiled with debugging information.Attentive readers may note that the final line of badprog.c could cause a segmentation fault due to the use of the uninitialisedvariable p as an address. On one system we tried this test on,exactly that happened, and so Memcheck issued an additional``Invalid read of size 4'' warning immediately before the memoryaccess, thanks to its addressability checking. For the run presented,the memory access (un)luckily hit addressable memory, so noaddressability error message was issued.1.3 Bit-level precisionFigure 3:Unsafe use of a bit-arrayMemcheck is the first tool we are aware of that tracks definednessdown to the level of bits. Other tools track definedness at bytegranularity (Purify) or word granularity (Third Degree).This means Memcheck correctly handles code which deals withpartially-defined bytes. In C and C++, two common idiomsgive rise to such bytes: use of bit arrays and use of bitfields instructures. Tools which track definedness at byte or wordgranularities necessarily give inaccurate results in suchsituations - either they fail to report genuine errors resulting fromuses of uninitialised bits, or they falsely flag errors resulting fromcorrect uses of partially defined bytes.The program shown in Figure 3 uses anuninitialised bit in a bit-array. Memcheck reports this, but Purifydoes not2. Memcheck is known to have found previously-undetected usesof single uninitialised bits in C++ structure bitfields, in at leastone large, widely used C++ code base.1.4 ContributionsOur main contribution is a detailed description of Memcheck's definednesschecking, which has only been briefly touched upon in previous publicationsabout Valgrind [10,9]. Memcheck's definednesschecking improves on that performed by previous tools by being accurate to thelevel of individual bits. Its falsepositive and false negative rates are very low. Finally, the run-time overheadof the definedness checking is reasonable enough that it is practical to use onvery large programs.1.5 Paper structureThe rest of this paper is structured as follows. Section 2 describes how Memcheck works, in particular thedetails of shadow bit operations, which are crucial in ensuringMemcheck's accuracy and speed. Section 3 evaluatesMemcheck by considering the cost of its use--in terms of how easy itis to obtain and run, the ease of using its results, and its impact onperformance--and the benefits provided by its bug-finding abilities.Section 4 discusses related work.Section 5 concludes. 2 How Memcheck works2.1 ValgrindMemcheck is implemented as a plug-in toValgrind [10,9]. Valgrind is aframework for creating tools that use dynamic binary instrumentation;it does all the hard work of inserting instrumentation intomachine code at run-time. Tools are created asplug-ins, written in C, to Valgrind'score.The basic view is:Valgrind core tool plug-in Valgrind tool.The resulting tool loads the client at start-up, grafting itself ontothe process as it does so. It then starts executing the client, by(re)compiling the client's code, one basic block at a time, in ajust-in-time, execution-driven fashion. The compilation processinvolves disassembling the machine code into an intermediaterepresentation called UCode. UCode is instrumented by the toolplug-in, and the instrumented UCode is then converted back into x86code. The resulting code is stored in Valgrind's code cache, to bererun as necessary.The core spends most of its execution time making, finding, andrunning translations. None of the client's original code is run. Thecore also provides many services to tools, to ease common tasks suchas recording errors and reading debug information. Only one tool canbe used at a time.The Valgrind distribution contains the core, plus five tools:Memcheck, Addrcheck (a lightweight version of Memcheck that omitsdefinedness checking), a cache profiler, a memory space-use (heap)profiler, and a data race detector for POSIX pthread-ed programs.2.2 OverviewThe basic idea underlying the definedness checking is straightforward. Every single bit of data, , maintained by a program, in both registers and memory, is shadowed by a piece of metadata, called a definedness bit. For historical reasons these are often also referred to as V bits (V being short for ``validity''). Each V bit indicates whether or not the bit it shadows is regarded as currently having a properly defined value.
Every single operation that creates a value is shadowed by a shadow operation that computes the V bits of any outputs, based on the V bits of all inputs and the operation. The exact operations performed by this shadow computation are important, as they must be sufficiently fast to be practical, and sufficiently accurate to not cause many false positives.
Every operation that uses a value in such a way that it could affect the observable behaviour of a program is checked. If the V bits indicate that any of the operation's inputs are undefined, an error message is issued. The V bits are used to detect if any of the following depend on undefined values: control flow transfers, conditional moves, addresses used in memory accesses, and data passed to system calls.Most operations do not directly affect the program's observable behaviour. In such cases, the V bits are not checked. Instead, V bits for the result of the operation are computed. Hence, for the most part, Memcheck silently tracks the flow of undefined values, and only issues an error message when use of such a value is potentially dangerous. This scheme is required because undefined values are often copied around without any problem, due to the common practice, used by both programmers and compilers, of padding structures to ensure fields are word-aligned.
From this overview, the main overheads of definedness checking areapparent. First, the use of V bits doubles the amount of memory inuse. Second, most instructions compute new values, and thus require ashadow operation itself consisting of one or more instructions.2.3 Details of V bitsA V bit of zero indicates thatthe corresponding data bit has a properly defined value, and a V bit of oneindicates that it does not. This is counterintuitive, but makes some of theshadow operations more efficient than they would be if the bit values wereinverted.Every 32-bit general purpose register is shadowed by a 32-bit shadow register,and every byte of memory has a shadow V byte. After that, there are some exceptions to the basic rule of ``one V bit per databit''.The x86 condition code register (%eflags) is approximated witha single V bit, since tracking all 6 condition codes individually isexpensive and mostly unnecessary.
The program counter is not shadowed. Instead, we regard it asalways defined, and emit an error message whenever a conditional jumpdepends on undefined condition codes. One way to interpret such ajump is as an attempt to assign an undefined value to the programcounter, but since we are not tracking the program counter'sdefinedness state, we regard it as always-defined, and so mustimmediately report any attempt to ``assign'' it an undefinedvalue.
Floating point, MMX and SSE registers are not shadowed.Memcheck can still perform definedness checkingon code using these registers, but such checks may produce a higher false positive rate than would have occurred had theregisters been shadowed.
2.4 Instrumentation basicsIn principle it is possible to directly state the V bit transformationsrequired to shadow each x86 instruction. In practice, the x86instruction set is so complex and irregular that this would bedifficult and fragile. Fortunately, UCode (Valgrind's intermediaterepresentation) is RISC-like and clearly exposes all memory and arithmeticoperations, which makes Memcheck's instrumentation task much easier.Another important aspect of UCode is that it supports the use of aninfinite supply of virtual registers. The initial translation fromx86 is expressed in terms of such registers. Memcheckinterleaves its own instrumentation UCode with it, using asmany new virtual registers as required. Valgrind's core has the job ofperforming instruction selection and register allocation toconvert this sequence back to executable x86 code.2.5 Abstract operations on V bitsThe V bit instrumentation scheme is best described in terms of a family of simple abstract operations on V bits. We will use d1 andd2 to denote virtual registers holding real values, and v1 and v2 denote virtual shadow registers holding V bits. Also, some operations below use X and Y indicate the operand andresult widths in bytes (and 0 represents an operand with a width of asingle bit). Those operations for which the width is not specified arewidth-independent.Memcheck uses the following binary operations.DifD(v1,v2) (``defined if either defined'') returns a V bit vector the same width as v1 and v2. Each bit of the result indicates definedness if either corresponding operand bit indicates definedness. Since our encoding is zero for defined and one for undefined, DifD can be implemented using a bitwise-AND operation.
UifU(v1,v2) (``undefined if either undefined'') dually propagates undefinedness from either operand, again at a bitwise level, and can be implemented using a bitwise-OR operation.
ImproveAND(d,v) and ImproveOR(d,v) are helpers for dealing with AND and OR operations. These have the interesting property that the resulting V bits depend not only on the operand V bits, but also on the operand data values. ImproveAND takes a data value (d) and a V bit value (v), and produces an ``improvement value'' bitwise as follows:ImproveAND(0, undefined) = undefinedImproveAND(0, defined) = definedImproveAND(1, undefined) = undefinedImproveAND(1, defined) = undefined The second case is the interesting one. If one of the arguments is a defined zero bit, we don't care that the other argument might be undefined, since the result will be zero anyway. Hence ImproveAND creates a ``defined'' V-bit, which, as described in Section 2.6, is merged into the final result for AND using DifD. This has the effect of forcing that bit of the result to be regarded as defined. ``undefined'' is the identity value for DifD, so the other three cases have no effect on the final outcome.For exactly analogous reasons, ImproveOR behaves similarly, with the interesting case being ImproveOR(1, defined) = defined.
The following unary operations are also needed.Left(v) simulates the worst-case propagation of undefinedness upwards (leftwards) through a carry chain during integer add or subtract. Left(v) is the same as v, except that all bits to the left of the rightmost 1-bit in v are set. For example, using 8-bit values, Left(00010100) = 11111100. Left can be implemented in two x86 instructions, a negation followed by an OR operation.
PCastXY(v) are a family of size-changing operations which can be interpreted as ``pessimising casts''. If all bits of the operand are zero (defined), PCast produces a V bit vector at the new width in which all bits are zero (defined), else it produces a value with all bits one (undefined). For example, PCast12(00010100) = 1111111111111111, and PCast12(00000000) = 0000000000000000.These casts are used in various approximations in which the definedness checking needs to consider the worst-case across a whole word of bits. It is important to appreciate that the narrowing casts (where ) do not simply discard the high bits of the operand. Similarly, the case where is not the identity function. In all cases, each result bit depends on all operand bits, regardless of their relative sizes. PCast can be implemented in at most three x86 instructions: for narrowing casts, negation followed by subtract-with-borrow; for widening casts, a shift, a negation, and a subtract-with-borrow.
ZWidenXY(v) are a family of widening operations which mimic unsigned (zero-extend) widening of data values. As with PCast, X and Y denote argument and result widths, with the additional requirement that . Zero-widening a data value produces zeroes in the new positions, and so ZWiden needs to indicate these new positions are defined. Since defined values are encoded as zero bits, ZWiden can itself be implemented using a zero-widen instruction. This is the first of several cases where choosing zero (rather than one) to mean ``defined'' simplifies the implementation.
SWidenXY(v) is the dual family of signed widening operations. A signed widening copies the top argument bit into all new positions. Therefore SWiden has to copy the top definedness bit into all new positions and so can itself be implemented using a signed-widen instruction.
2.6 The instrumentation scheme properEvery operation (instruction or system call) that creates a value mustbe instrumented with a shadow operation that computes thecorresponding V bits. This section describes these shadow operationsin detail.Recall that for each virtual register in the incoming UCode, Memcheck allocates ashadow virtual register to carry the corresponding V bits.2.6.0.1 Register and memory initialisationAt startup, all registers have their V bits set to one,i.e. undefined. The exception is the stack pointer, which has itsV bits set to zero, i.e. defined.All memory bytes mapped at startup (i.e. code and data segments, andany shared objects) have their V bits set to zero (defined).2.6.0.2 Memory allocation and deallocationThe Valgrind framework intercepts function and system calls whichcause usable address ranges to appear/disappear. Memcheck is notifiedof such events and marks shadow memory appropriately. Forexample, malloc and mmap bring new addresses into play:mmap makes memory addressable and defined, whilst malloc makesmemory addressable but undefined. Similarly, wheneverthe stack grows, the newly exposed area is marked asaddressable but undefined.Whenever memory is deallocated, the deallocated area also has its valuesall marked as undefined.Memcheck also uses such events to update its maps of which addressranges are legitimately addressable. By doing that it can detectaccesses to invalid addresses, and so report to the user problems suchas buffer overruns, use of freed memory, and accesses below the stackpointer. Details of this addressability checking are beyond the scope ofthis paper.2.6.0.3 Literals All literals are, not surprisingly, consideredto be completely defined.2.6.0.4 Data copying operationsThese are straightforward.Register-to-register moves give rise to a move between the correspondingshadow registers. Register-to-memory and memory-to-register transfersare instrumented with a move between the corresponding shadow registerand shadow memory.x86 contains a byte-swap (endianness-change) instruction. As thismerely rearranges bits in a word, a byte-swap instruction is also applied to the shadow value.2.6.0.5 Addition and subtractionGiven d3 = Add(d1,d2) or d3 = Sub(d1,d2), each result bit can simplistically be considereddefined if both the corresponding argument bits are defined. However,a result bit could also be undefined due to an undefined carry/borrowpropagating upwards from less significant bit positions. ThereforeMemcheck needs to generate v3 = Left(UifU(v1,v2)).The same scheme is used for multiplies. This is overly conservative becausethe product of two numbers with and consecutive least-significantdefined bits has least-significant defined bits, rather than as theAdd/Sub scheme generates. It would be possibleto do a better job here, but the extra expense does not seem justifiedgiven that very few, if any, complaints have arisen over the subjectof false positives arising from multiplies.The shadow operation for Neg (negation) is trivially derived byconstant-folding the shadow operation for Sub(0,d), giving the resultLeft(v), where v is the shadow for d.2.6.0.6 Add with carry and subtract with borrowThese take the CPU's carry flag as an additional single-bit operand.Let vfl be the virtual 1-bit-wide register tracking the definedness ofthe condition codes (%eflags). If this extra operand isundefined, the entire result is undefined, so the followingformulation derives straightforwardly from the Add/Sub case:v3 = UifU( Left(UifU(v1,v2)), PCast0X(vfl) )where X is the width of v1, v2 and v3.2.6.0.7 XorA simple case: given d3 = Xor(d1,d2), generatev3 = UifU(v1,v2).The rule for Not is trivially derived byconstant-folding the rule for Xor(0xFF..FF,d), giving,as one might expect, the simple resultv, where v is the shadow for d; i.e. the V bits areunchanged.2.6.0.8 And and OrThese require inspection of the actual operand values as well as theirshadow bits. We start off with a bitwise UifU of the operands,but fold in, using DifD, ``improvements'' contributed by defined zero-arguments (for And) or defined one-arguments (forOr). So, given:d3 = And(d1,d2)d3 = Or(d1,d2)the resulting instrumentation assignments are, respectively:v3 = DifD( UifU(v1,v2), DifD( ImproveAND(d1,v1), ImproveAND(d2,v2) ) )v3 = DifD( UifU(v1,v2), DifD( ImproveOR(d1,v1), ImproveOR(d2,v2) ) )This means instrumentation of And/Or is quite expensive. However,such instructions are often used with one constant operand, in whichcase Memcheck's post-instrumentation cleanup pass can fold these expressionsdown to a single ImproveAND/OR term.2.6.0.9 Shl, Shr, Sar, Rol, Ror(Shift left, Unsigned shift right, Signed shift right, Rotate left,Rotate right). In all cases, if the shift/rotate amount is undefined,the entire result is undefined. Otherwise, for reasons which aresomewhat subtle, the result V bits are obtainedby applying the same shift/rotate operation to the V bits of thevalue to be shifted/rotated.Given input d3 = OP(d1, d2), where d2 is theshift/rotateamount, and the sizes of d1/d3 and d2 arerespectivelyX and Y,the resulting instrumentation assignment isv3 = UifU( PCastYX(v2), OP(v1,d2) )In all five cases, the definedness bits are processed using the sameoperation as the original. For Rol and Ror, the definedness bitsmust be rotated exactly as the data bits are.Shl and Shr shiftzeroes into the data, and so corresponding zeroes--indicatingdefinedness--need to be shifted into the definedness word.Sar copies thetop bit of the data, and so needs to also copy the top bit of thedefinedness word.2.6.0.10 Widening and narrowing conversions A narrowing conversionon data throws away some of the top bits of the word, and so the sameoperation can be used to throw away the top bits of the shadow word.Signed and unsigned widening conversions give rise respectively to asingle SWiden or ZWiden operation.2.6.0.11 Instructions which set flagsOn x86, most integer arithmetic instructions set the condition codes(%eflags) and Memcheck duly tracks the definedness state of %eflagsusing a single shadow bit. When an integer operation sets conditioncodes, it is first instrumented as described above. Memcheck pessimistically narrows the result value(s) of the shadow operation usingPCastX0 to derive a value for the %eflags shadowbit.2.6.0.12 Loads, stores, conditional branches and conditional movesThese are discussed in the next section.2.6.0.13 Floating point (FP) and MMX, SSE, SSE2 (SIMD) operationsValgrind does not disassemble floating point or SIMD instructions tothe same level of detail as it does integer instructions. Instead, itmerely modifies some of the register fields in the instruction, marksany instructions referencing memory as such, and copies them otherwiseunchanged into the output instruction stream.Because of this, Memcheck can only offer crude instrumentation of suchinstructions. Such instrumentation is safe in the sense thatall uses of undefined values, and all illegitimate memoryaccesses, will still be caught. The crudeness of the instrumentationhas the effect that some computations, when done with FP or SIMDregisters, may elicit false-positive undefined value errors, whensimilar or identical operations done using integer registers would not. Themost notable case is that copying undefined data through the FP or SIMDregisters will elicit false positives.The instrumentation scheme is as follows. Neither the FP nor SIMDregisters have any associated V bits. When a value is loadedfrom memory into such a register, if any part of the value isundefined, an error message is issued. When a value is written from such aregister to memory, shadow memory is marked as defined. This is inkeeping with the Eager approximation scheme described shortly.So far, this crude scheme has proven adequate, mostly becauseprogrammers and compilers rarely copy and manipulatepartially-undefined data through FP or SIMD registers. However, vectorising compilers for SIMD architectures are becomingincreasingly common [8], and this scheme cannot continue much longer--itis the biggest weakness of Memcheck. A new version of Memcheckunder development will shadow data in floating point registersand in individual lanes of SIMD registers, thus remedying this deficiency.2.6.0.14 Approximating everything elseThe above cases give sufficient accuracy to achieve anear-zero false positive rate on almost all compiler-generated andhandwritten code. There are a multitude of other cases which could be tracked accurately, but for which there appears to be nopoint. These include: division, rotates through the carry flag, andcalls to helper functions which implement obscure features (CPUID,RDTSC, BCD arithmetic, etc).In such situations, two approximation schemes are possible.Lazy. The V bits of all inputs to the operation are pessimistically summarised into a single bit, using chains of UifU and/or PCastX0 operations. The resulting bit will indicate ``undefined'' if any part of any input is undefined. This bit is duplicated (using PCast0X) so as to give suitable shadow output word(s) for the operation.Using this scheme, undefinedness can be made to ``flow'' through unknown operations, albeit in a pessimistic manner. No error messages will be issued when such operations execute.
Eager. As with Lazy, a summary definedness bit is pessimistically computed. If the bit is one (undefined), an error message is issued. Regardless of the bit's value, shadow output word(s) are created indicating ``defined''. Counterintuitive as this may seem, it stops cascades of undefined value error messages being issued; only the first observation of such values are reported.
Memcheck currently uses Eager for all floating point and SIMDoperations, as decribed above, and Lazy in all other situations. 2.7 Deciding when to issue error messagesAt every point where an undefined value could be consumed byan operation, Memcheck has a choice: should it report the errorright now, or should it silently propagate the undefinedness intothe result? Both approaches have advantages.Reporting the error sooner (the eager strategy mentioned above) makes it easier for users to track down the root cause of undefined values. Undefined value errors originate primarily from reading uninitialised memory. Such values propagate through the computation until they hit a check point. If check points are rare, that path can be long, and users may have to trace back though multiple levels of procedure calls and through their data structures to find the root cause.
Deferring error checking and reporting has two major advantages. Firstly, error checks are expensive--a test and conditional jump--and so minimising them improves performance. Secondly and more importantly, reporting errors too soon can lead to false positives: undefined values might be used in a safe way and then discarded, so an early check on them would give a pointless error to the user.
Memcheck mostly takes the second alternative, deferring error reportingas far as it can. Checks for undefined values are made onlywhen the program is in immediate danger of performing one of the followingactions, which could change its observable behaviour.Taking a memory exception due to use of an undefined address in a load or store.3
Making a conditional jump based on undefined condition codes.
Passing undefined values to a system call.
Loading uninitialised values from memory into a SIMD or FP register.
Accordingly, instrumentation for such events is as follows.Memory access (all kinds): check address for definedness and issue an error message if any address bit is undefined.
Conditional jump: check the V bit which shadows %eflags, and issue an error message if undefined.
System call: check arguments (scalar and in memory) to the extent possible, and issue an error message if undefined values are being passed to the kernel. This requires in-depth knowledge of the kernel interface.
Memory load into a SIMD or FP register: in addition to checking definedness of the address, also check the loaded data for definedness, and issue an error message if necessary.
Conditional moves could be handled using either the eager or lazyscheme. Memcheck handles them eagerly, testing the condition codeand reporting any error immediately.Each error check consists of testing a shadow virtual register4 against zero forany undefinedness, and calling a helper function to issue an error message ifso. An important but non-obvious extra step is that, immediatelyfollowing the test, the shadow register should be set to zero (``alldefined''). Doing so prevents subsequent checks on it issuingessentially duplicate errors, which would confuse users. Considerthe following C fragment: int* p; /* not defined */ ... = *p; *p = ...For the load, p's shadow is tested, and an error message is issued ifnecessary. Subsequently in the store, reporting another such errorfor p would not help lead users to the root cause of theproblem.2.7.0.1 Loads from invalid addressesOne interesting question is how to handle loads from memory whichMemcheck regards as not validly addressable. Our solution iscounterintuitive: data loaded from unaddressable memory is marked asdefined.This helps reduce error cascades. A load from an invalid addresswill in any case cause Memcheck to issue an invalid-address error message.If the loaded data was marked as undefined, Memcheck might, as aresult, later issue undefined value error messages. These would confuse usersand obscure the true cause of the error--the invalid address.Marking the loaded data as defined avoids that problem.2.8 Avoiding false positivesMemcheck has a very low false positive rate. However, a fewhand-coded assembly sequences, and a few very rare compiler-generatedidioms can cause false positives. The few examples we know of are as follows.xor %reg,%reg: %reg is defined after the instruction, even if it is undefined prior to it. This is solved for Memcheck by Valgrind's x86-to-UCode translation phase, which translates this idiom as if it had instead seen mov $0,%reg; xor %reg,%reg.
sbb %reg,%reg: This copies the carry flag into all bits of %reg, and has no real dependence on %reg's original value. The instrumentation described above preserves any undefinedness from %reg's original value, which is inappropriate. Again, the front end solves this by instead translating mov $0,%reg; sbb %reg,%reg.
A more difficult case: GCC occasionally generates code to do a conditional jump based on the highest bit in a register by moving the bit to the sign flag using test %reg,%reg and then doing a conditional jump based on the sign flag. Unfortunately, if bits below the highest bit in %reg are undefined, Memcheck's instrumentation scheme will conclude that all six condition codes are undefined, and so complain at the jump. The problem arises because only one bit is used to approximate the definedness state of all six condition codes. A possible solution is to model the sign flag separately from the rest, but so far we have resisted this extra complexity and run-time overhead.
GNU libc contains highly-optimised, hand-written assembly routines for common string functions, particularly strlen(). These traverse the string a word at a time, relying on detailed properties of carry-chain propagation for correct behaviour. For such code, Memcheck's use of the Left operator to model such propagation is too crude, and leads to false positives.Memcheck has a two-part work-around. First, it replaces the standard versions of these functions with its own less optimised versions that do not cause problems. But GCC sometimes inlines calls to these functions, and handling them currently involves a nasty hack. Such code requires addition/subtraction of carefully chosen constants, such as 0x80808080. If Memcheck sees adds/subtracts with such suspect constants as operands, some undefined value checks in the containing basic block are omitted.A better solution would be to use the presence of such constants as a signal that adds/subtracts in this block should be instrumented using an alternative, more accurate but more expensive formulation which properly tracks carry propagation [6]. We are developing such a scheme.
Finally, Memcheck's underlying assumptions are occasionally invalid. Forexample, some programs deliberately use undefined values as an additionalsource of entropy when generating random numbers.2.9 False negativesWe believe there are very few situations in which Memcheck failsto flag uses of undefined values that could have any observableeffect on program behaviour. The exceptions we are aware of are as follows.The abovementioned omission of some checks in blocks containing magic constants such as 0x80808080. This hack could be removed as suggested above, probably with minimal performance loss.
Caller-saved registers in procedures. Ideally, on entry to a procedure, caller-saved registers should be marked as undefined, since callees assume that caller-saved registers are fresh storage available for use. Memcheck does not currently do so. Doing this correctly is difficult, both because it is calling-convention dependent, and because reliably observing procedure entry/exit on x86/Linux is nearly impossible given the use of tail-call optimisations, leaf-function optimisations, and use of longjmp().5 As a result, it is possible that registers which should be regarded as undefined at the start of a callee are marked as defined due to previous activity in the caller, and so some errors might be missed.
Programs that switch stacks (usually because they implement user-space threading). There is no reliable way to distinguish a large stack allocation or deallocation from a stack-switch. Valgrind uses a heuristic: any change in the stack pointer greater than 2MB is assumed to be a stack-switch. When Valgrind judges that a stack-switch has happened, Memcheck does not take any further actions. So if a stack frame exceeding 2MB is allocated, Valgrind considers this a stack switch, and Memcheck will not mark the newly allocated area as undefined. The program could then use the values in the allocated area unsafely, and Memcheck will not detect the problem.6
Finally, Memcheck of course cannot detect errors on code paths that are not executed, nor can itdetect errors arising from unseen combinations of inputs.This limitation is inherent from the fact that Memcheck uses dynamic analysis.As a result, Memcheck is best used in conjunction with a thorough test suite.In comparison, static analysis does not suffer these limitations, but thepower of the analysis is necessarily much lower [1].2.10 Setting realistic expectationsA system such as Memcheck cannot simultaneously be free of falsenegatives and false positives, since that would be equivalent tosolving the Halting Problem. Our design attempts to almost completelyavoid false negatives and to minimise false positives. Experiencein practice shows this to be mostly successful. Even so,user feedback over the past two years reveals aninteresting fact: many users have an (often unstated) expectation that Memcheckshould not report any false positives at all, no matter how strange the codebeing checked is.We believe this to be unrealistic. A better expectation is to acceptthat false positives are rare but inevitable. Therefore it willoccasionally necessary to add dummy initialisations to code to makeMemcheck be quiet. This may lead to code which is slightly moreconservative than it strictly needs to be, but at least it gives astronger assurance that it really doesn't make use of any undefinedvalues.A worthy aim is to achieve Memcheck-cleanness, so that new errors areimmediately apparent. This is no different from fixing source code toremove all compiler warnings, even ones which are obviously harmless.Many large programs now do run Memcheck-clean, or very nearly so. Inthe authors' personal experience, recent Mozilla releases come closeto that, as do cleaned-up versions of the OpenOffice.org-680development branch, and much of the KDE desktop environment. So thisis an achievable goal.Finally, we would observe that the most effective use of Memcheckcomes not only from ad-hoc debugging, but also when routinely used onapplications running their automatic regression test suites. Suchsuites tend to exercise dark corners of implementations, therebyincreasing their Memcheck-tested code coverage. 3 EvaluationFor a tool such as Memcheck to be worth using, a user must first be using alanguage such as C, C++ or Fortran that is susceptible to the kinds of memoryerrors Memcheck finds. Then, the benefits of use must outweigh the costs.This section considers first the costs of using Memcheck, in terms of its easeof use and performance. It then considers the benefits, that is, itseffectiveness in helping programmers find real bugs.As part of this, we refer to a survey of Valgrind users that weconducted in November 2003, to which 116 responses were received. Theresults are available on the Valgrind website [14].Memcheck's operation has changed very little in the time since, so theresponses about it are still relevant.3.1 Ease of useEase of use has a number of facets: how easy Memcheck is to obtain,how easy it is to run, and how easy it is to act upon its results.3.1.0.1 Obtaining MemcheckMemcheck is very easy to obtain, because the Valgrind suite is freesoftware, is available in source form on the Valgrind website[14], and is widely available in binary form on theweb, packaged for a number of Linux distributions.3.1.0.2 Running MemcheckMemcheck could hardly be easier to run. Using it only requires prefixing aprogram's command line with valgrind -tool=memcheck, plus any otherdesired Valgrind or Memcheck options.Typically, the only further effort a user must make is to compile herprogram with debugging information, to ensure error messages are asinformative as possible. Indeed, 40 of the 116 survey responderspraised Valgrind/Memcheck's ease of running, and another another 14commented that ``it just works'', or similar. Only one respondercomplained that it could be easier to use.3.1.0.3 Custom allocatorsThere are some cases where the user needs to expend a little moreeffort. If a program uses custom memory management rather thanmalloc(), new and new[], Memcheck can miss some errors it wouldotherwise find. The problem can be avoided by embedding small numberof client requests into the code. These are special assemblycode sequences, encoded as C macros for easy use, that Valgrindrecognises, but which perform a cheap no-op if the client is notrunning on Valgrind. They provide a way for the client to passinformation to a Valgrind tool. For example, when using a customallocator, client requests can be used to inform Memcheck when a heapblock has been allocated or deallocated, its size and location, etc.3.1.0.4 Self-modifying codeExtra effort is also needed to handle self-modifying code. Dynamicallygenerated code is not a problem, but if code that has executed ismodified and re-executed, Valgrind will not realise this, and willre-run its out-of-date translations. Auto-detecting this ispossible but expensive [7]; the cost isnot justified by the small number of programs that use self-modifyingcode. Our compromise solution is to provide another client requestwhich tells Valgrind to discard any cached translations of codein a specified address range.3.1.0.5 Different behaviour under MemcheckClient programs sometimes do not behave exactly the same underValgrind/Memcheck as they do normally. Programs that run successfullynormally may fail under Memcheck. This is usually because of latentbugs that are exposed e.g. by different execution timings, or becauseValgrind provides its own implementation of the Pthreads library.This can be regarded as a good thing. More rarely, programs that failnormally may succeed under Memcheck for the same reasons, which can befrustrating for users who hoped to track down a bug with Memcheck.Although Valgrind/Memcheck is robust--we have had feedback from users using iton systems with 25 million lines of code --it occasionally fails due to its own bugs.The main source of problems is that Valgrind interacts closely with thekernel and GNU libc. In particular the signal and thread handling isfragile and hard to get right for all Linux distributions, and amaintenance headache. By comparison, the parts dealing with x86 codeand instrumentation cause few problems, because instruction setschange very slowly. We hope to decrease our level of interaction withthe kernel and GNU libc in the future, and recent development effortshave made major progress in that area.3.1.0.6 Acting upon Memcheck's resultsIn general, Memcheck'saddressability checking, deallocation checking, and overlap checking do quitewell here, in that Memcheck's report of the problem is usually close tothe root cause. However, for definedness checking this is often not the case,as Section 2.7 explained.To help on this front, Memcheck provides another client request thatcan be inserted into the client's source code. It instructs Memcheckto check if a particular variable or memory address range is defined,issuing an error message if not. Judicious uses of this clientrequest can make identifying root causes of undefined value errorsmuch easier.Ideally, error messages would indicate where theundefined value originated from, e.g. from a heapblock whose contents have not been initialised. However, this would requireaugmenting each value with extra information about where it came from, and wecannot see how to do this without incurring prohibitive overheads.Ease of interpreting error messages is also important. Ten surveyresponders complained that the messages are confusing, but 7 praised them.One issue in particular is the depth of stack traces: the default is four,but many users immediately adjust that to a much higher number. This givesmore information, but also makes the error messages longer. This is acase where a GUI (requested by 8 responders) would be useful, in that largestack traces could be gathered, shown to a small depth by default, and then``unfolded'' by the user if necessary. Some users also simply prefer graphicaltools over text-based ones. As it happens, there are several GUI front-endsfor Valgrind, including Alleyoop [17] and Valgui [2].3.2 PerformanceThis section discusses the performance of three Valgrind tools. BesidesMemcheck, it considers two other tools: Addrcheck, and Nulgrind. Addrcheck isa cut-down version of Memcheck that does not perform definedness checking. Theperformance difference between Memcheck and Addrcheck give an indication of thecost of Memcheck's definedness checking. Nulgrind is the ``null'' Valgrind toolthat adds no instrumentation. It gives an idea of the overhead due toValgrind's basic operation.All measurements were performed using Valgrind 2.1.2 on an 1400 MHz AMD Athlonwith 1GB of RAM, running Red Hat Linux 9, kernel version 2.4.20.The test programs are a subset of the SPEC CPU2000 suite[16]. All were tested with the ``test'' (smallest) inputs. Thetime measured was the ``real'' time, as reported by /usr/bin/time. Eachprogram was run once normally, and once under each of the Valgrindtools.Thisis not a very rigorous approach but that does not matter, as the figures hereare only intended to give a broad idea of performance.
Parasoft.C .Test.v2.2.1.2-AGAiN full version
2ff7e9595c
Comments