Ch 7 - Program building blocksUseful to study recurring code patterns, focusing on areas where developers might make security-relevant mistakes
Search for variables that are related to each other, determine their intended relationships, and then determine whether there’s a way to desynchronize these variables from each other
This usually means finding a block of code that alters one variable in a fashion inconsistent with the other variables
Go through the code quickly (in a function) and identify variable relationships, then make one pass to see whether any vars can be desynchronised
Well-designed application keeps variable relationships to a minimum
Data hiding - concealing complex relationships in separate subsystems so that the internals aren’t exposed to callers
Data hiding can make your job harder by spreading complex relationships across multiple files and functions
Examples of data hiding include private variables in a C++ class and the buffer management subsystem in OpenSSH
Familiarise yourself with the interfaces to learn the purpose of objects and their constituent members
One goal of auditing object-oriented code is to determine whether it’s possible to desynchronise related structure members or leave them in an unexpected or inconsistent state to cause the application to perform some sort of unanticipated operation
Structure mismanagement bugs tend to be quite subtle - the code to manage structures is spread out into several small functions that are individually quite simple. Therefore, any vulnerabilities tend to be a result of aggregate, emergent behaviour occurring across multiple functions
One major problem area in this structure management code is low-level language issues, such as type conversion, negative values, arithmetic boundaries, and pointer arithmetic. The reason is that management code tends to perform a lot of length calculations and comparisons
Similarly to structures, objects can be left in an inconsistent state
Potential for subtle vulnerabilities caused by incorrect assumptions of implicit member functions, e.g. overloaded operators
- Variable was intended to be initialised at the beginning of the function but the developer forgot to specify an initialiser in the declaration
- A code path exists where the variable is accidentally used without ever being initialised
Functions that allocate a number of variables commonly have an epilogue that cleans up objects to avoid memory leaks when an error occurs. If these vars have not been allocated, this is potentially exploitable
In C++ code, pay close attention to member variables in objects - unexpected code paths can leave objects in an inconsistent or partially uninitialised state
The best way to begin examining this code is by looking at constructor functions to see whether any constructors neglect to initialise certain elements of the object
Destructors are automatically called during the function epilogue for objects declared in the function, similar to the case of vars freed in an epilogue above
- Discover operations that, if a boundary condition could be triggered, would have security-related consequences (primarily length-based calculations and comparisons)
- Determine a set of values for each operand that trigger the relevant arithmetic boundary wrap
- Determine whether this code path can be reached with values within the set determined in step 2
- Identify the data type of the variable involved
- Determine at which points the variable is assigned a value
- Determine constraints on the variable from assignment until the vulnerable operation
- Determine supporting code path constraints
Occasionally, application developers confuse what the data in a union represents. This can have disastrous consequences on an application, particularly when integer data types are confused with pointer data types, or complex structures of one type are confused with another
Most vulnerabilities of this nature stem from misinterpreting a variable used to define what kind of data the structure contains
Points to address with examining the algorithm:
- Does the algorithm deal correctly with manipulating list elements when the list is empty?
- What are the implications of duplicate elements?
- Do previous and next pointers always get updated correctly?
- Are data ranges accounted for correctly?
Duplicate elements: elements containing identical keys (data values used to characterise the structure as unique) could cause the two elements to get confused, resulting in the wrong element being selected from the list
Previous and next pointer updates: Often happens if the program treats the current member as the head or tail of a list
Data ranges: in ordered lists, the elements are sorted into some type of order based on a data member that distinguishes each list element. Often each data element in the list represents a range of values
Nuances with this:
- Can overlapping data ranges be supplied?
- Can replacement data ranges (duplicate elements) be supplied?
- Does old or new data take precedence?
- What happens when 0 length data ranges are supplied?
- Is the hashing algorithms susceptible to invalid results? E.g. when algorithm uses modulus, force it to return negative results (negative dividend). Or force to have many collisions
- What are the implications of invalidating elements? Some algorithms prune elements based on conditions. Potentially incorrect unlinking
- The terminating conditions don’t account for destination buffer sizes or don’t correctly account for destination sizes in some cases
- The loop is post-test when it should be pretest
- A break or continue statement is missing or incorrectly placed
- Some misplaced punctuation causes the loop to not do what it’s supposed to
The set of terminating conditions in a loop might not adequately account for all possible error conditions, or the implementation of the checks is incorrect
Main problems when calculating lengths:
- The loops fail to account for a buffer’s size
- A size check is made, but it’s incorrect
With size checks off-by-one errors are common, in string processing
Occasionally, when loops terminate in an unexpected fashion, variables can be left in an inconsistent state
Another off-by-one error occurs when a variable is incorrectly checked to ensure that it’s in certain boundaries before it’s incremented and used
Loops that can write multiple data elements in a single iteration might also be vulnerable to incorrect size checks, e.g. because of character escaping or expansion that weren’t adequately taken into account by the loop’s size checking
A loop’s size check could be invalid because of a type conversion, an arithmetic boundary condition, operator misuse, or pointer arithmetic error
Post-test loops should be used when the body of the loop always needs to be performed at least one time. Look for potential situations where execution of the loop body can lead to an unexpected condition. One thing to look out for is the conditional form of the loop performing a sanity check that should be done before the loop is entered
With pre-test loops - if code following a loop expects that the loop body has run at least once, an attacker might be able to intentionally skip the loop entirely and create an exploitable condition
See chapter 6 as well
Developers might assume that a break statement can break out of any nested block and use it in an incorrect place
Or they might assume the statement breaks out of all surrounding loops instead of just the most immediate loop
Another problem is using a continue statement inside a switch statement to restart the switch comparison
When the break statement is left out on purpose, programmers often leave a comment (such as /* FALLTHROUGH */ for lint) indicating that the omission of the break statement is intentional
Check if there are any unaccounted for case
Focus on arguments and aspects of the function that users can influence in some way
Four main vulnerability types:
- Return values are misinterpreted or ignored.
- Arguments supplied are incorrectly formatted in some way.
- Arguments get updated in an unexpected fashion.
- Some unexpected global program state change occurs because of the function call.
Often programmers forget to test malloc or realloc return value for failure
Realloc failures may be exploitable
Other memory allocation functions, especially if they involve copying data
Note where the return value (for functions where it indicates success or failure) is not tested
Note the error conditions returned by the function
Effects of ignoring return value depend on the structure of the caller
First one often happens when a team of programmers is developing an application and using third-party code and libraries
Example: on UNIX snprintf returns typically returns how many bytes it would have written to the destination, had there been enough room
Systematic approach when finding misinterpreted values:
- Determine the intended meaning of the return value for the function. If code is documented, verifying that the function returns what the documenter says it does is still important.
- Look at each location in the application where the function is called and see what it does with the return value. Is it consistent with that return value’s intended meaning?
Finding these cases:
- Determine all the points in a function where it might return. Usually there are multiple points where it might return because of errors and one point at which it returns because of successful completion.
- Examine the value being returned. Is it within the range of expected return values? Is it appropriate for indicating the condition that caused the function to return?
Interesting are the specific function side effects: manipulating arguments passed by reference (value-result arguments) and manipulating globally scoped variables
One common situation is when realloc() is used to resize a buffer passed as a pointer argument. Then the calling function has a pointer that was not updated after a call to realloc(), or the new allocation size is incorrect because of a length miscalculation
“Outdated pointer” bugs are often spread out b/w several functions Make note of security-relevant functions that manipulate pass-by-reference arguments, as well as the specific manner in which they perform this manipulation.
These kinds of argument manipulations often use opaque pointers with an associated set of manipulation functions.
This type of manipulation is also an inherent part of C++ classes, as they implicitly pass a reference to the this pointer. However, C++ member functions can be harder to review due to the number of implicit functions that may be called and the fact that the code paths do not follow a more direct procedural structure.
Determining risk of pass-by-reference manipulation:
- Find all locations in a function where pass-by-reference arguments are modified, particularly structure arguments.
- Differentiate between mandatory modification and optional modification. Mandatory modification occurs every time the function is called; optional modification occurs when an abnormal situation arises. Programmers are more likely to overlook exceptional conditions related to optional modification.
- Examine how calling functions use the modified arguments after the function has returned.
Auditing functions that modify global variables is similar but the vulnerabilities introduced might be more subtle. Especially for the code that can run at any point in the program, e.g. exception handler or signal handler
In object-oriented programs, it can be much harder to determine whether global variables are susceptible to misuse because of unexpected modification. The difficulty arises because the order of execution of constituent member functions often isn’t clear.
- List the type and intended meaning of each argument to a function.
- Examine all the calling functions to determine whether type conversions or incorrect arguments could be supplied.
MultiByteToWideChar() - length is misinterpreted: destination buffer in wide chars, not in bytes. Confusing the two sizes (e.g. by specifying sizeof(buf)) leads to an overflow.
The more difficult the function is to figure out, the more likely it is that it will be used incorrectly
You should be able to answer any questions about a functions quirks and log the answers so that the information is easily accessible later.
Be especially mindful of type conversions that happen with arguments, such as truncation when dealing with short integers, because they are susceptible to boundary issues
Beware of custom allocators
- Unanticipated conditions. Length miscalculations can arise when unanticipated conditions occur during data processing
- Data assumptions. In code dealing with binary data (e.g. proprietary file formats and protocols) programmers tend to be more trusting of the content
E.g. assumptions about a data element’s largest possible size, even when a length is specified before the variable-length data field
- Order of actions. Actions that aren’t performed in the correct order can also result in length miscalculation
- Multiple length calculations on the same input. A common situation is data being processed with an initial pass to determine the length and then a subsequent pass to perform the data copy
Typical issues to look for:
- Is it legal to allocate 0 bytes? Requesting an allocation of 0 bytes on most OS allocation routines is legal. A chunk of a certain minimum size (typically 12 or 16 bytes) is returned. This piece of information is important when you’re searching for integer-related vulnerabilities - a custom alloc call can be a sanitising wrapper to malloc
- Does the allocation routine perform rounding on the requested size? An allocation routine potentially exposes itself to an integer overflow vulnerability when it rounds a requested size up to the next relevant boundary without performing any sanity checks on the request size first
- Are other arithmetic operations performed on the request size? Another potential for integer overflows - when an application performs an extra layer of memory management on top of the OS’s management. E.g. the application memory management routines request large memory chunks from the OS and then divide it into smaller chunks for individual requests. Some sort of header is usually prepended to the chunk and hence the size of such a header is added to the requested chunk size.
Similar situation with reallocation routines when they don’t have sanity checking.
- Are the data types for request sizes consistent? Many typing issues from Ch 6 are relevant for allocators - any type conversion mistake usually leads to memory corruption.
Allocators that use 16 -bit sizes are even easier to overflow.
Similar issues with LP64 arch - long and size_t are 64bit, while int is only 32bit.
Important case - when values passed to memory allocation routines are signed. If an allocation routine doesn’t do anything except pass the integer to the OS, it might not matter whether the size parameter is signed. If the routine is more complex and performs calculations and comparisons based on the size parameter, however, whether the value is signed is definitely important. Usually, the more complicated the allocation routine, the more likely it is that the signed condition of size parameters can become an issue.
- Is there a maximum request size?* Sometimes developers build in a maximum limit for how much memory the code allocates. This often works as a sanitiser.
- Is a different size memory chunk than was requested ever returned? Essentially all integer-wrapping vulnerabilities become exploitable bugs for one reason: A different size memory chunk than was requested is returned. When this happens, there’s the potential for exploitation. Occasionally a memory allocation routine can resize a memory request.
Error domain is a set of values that, when supplied to the function, generate one of the exceptional conditions that could result in memory corruption.
- Free then allocated to other data, overwritten and freed again - with crafted data can lead to code execution
- Block can be entered in the free block list twice (not possible on Windows and glibc - they check that block passed to free() is in use). Can also lead to code exec
Especially pay attention when auditing C++ code. Sometimes keeping track of an object’s internal state is difficult, and unexpected states could lead to double-frees. Be mindful of members that are freed in more than one member function in an object (such as a regular member function and the destructor), and attempt to determine whether the class is ever used in such a way that an object can be destructed when some member variables have already been freed.
Many operating systems’ reallocation routines free a buffer that they’re supposed to reallocate if the new size for the buffer is 0. This is true on most UNIX implementations. Therefore, if an attacker can cause a call to realloc() with a new size of 0, that same buffer might be freed again later; there’s a good chance the buffer that was just freed will be written into.