Thursday 4 August 2016

TAOSSA Chapter 8

Ch 8 - Strings and metacharacters

Major areas of string handling:
  • memory corruption due to string mishandling;
  • vulnerabilities due to in-band control data in the form of metacharacters;
  • vulnerabilities resulting from conversions between character encodings in different languages

C string handling

In C, string buffers have to be managed manually. They can estimate how much memory to reserve for a statically sized array, or they can dynamically allocate memory at runtime when the amount of space required for a data block is known.
The second way is better but a lot of overhead and the need to free memory correctly.
C++ has a safer string class, but the need to interface with C introduces the same issues

Unbounded string functions

The size of the destination buffer is not taken into account when performing data copy.
  • scanf()
    Used when reading in data from a file stream or string. Each data element specified in the format string is stored in a corresponding argument.
  • sprintf()
    Destination buffer can be overflowed, usually by %s or %[] formats. Occasionally with %d or %f. Also format strings vulnerabilities when user can control the format string specifier
    _wsprintfA() and _wsprintfW() copy a maximum of 1024 chars
  • strcpy()
    Destination buffer can be overflown. There are Windows variants.
  • strcat()
    The destination buffer (dst) must be large enough to hold the string already there, the concatenated string (src), plus the NUL terminator

Bounded string functions

Include a length parameter for the destination buffer. Occasionally miscalculated, or some boundary conditions, or data type conversion issues.
  • snprintf()
    Accepts a max number of bytes that can be written to the output buffer.
    On Windows OSs, if there’s not enough room to fit all the data into the resulting buffer, a value of -1 is returned and NUL termination is not guaranteed.
    UNIX implementations guarantee NUL termination no matter what and return the number of characters that would have been written had there been enough room. That is, if the resulting buffer isn’t big enough to hold all the data, it’s NUL-terminated, and a positive integer is returned that’s larger than the supplied buffer size.
  • strncpy()
    Accepts a max number of bytes to be copied into the destination.
    Does not guarantee NUL-termination of the destination string. If the source string is larger than the destination buffer, strncpy() copies as many bytes as indicated by the size parameter, and then ceases copying without NUL-terminating the buffer.
    wcscpyn() function is a safe alternative to wcscpy(). Wide characters confuse developers - they supply destination buffer’s size in bytes not wide chars.
  • strncat()
    Copies at most n bytes, i.e n is the space left in the buffer minus 1 for the NUL byte. This one byte is often miscalculated, resulting in off-by-one.
  • strlcpy()
    BSD alternative to strncpy(). Guarantees null termination of the destination buffer. The size returned is the length of the source string not including the NUL byte. It can be larger than the destination buffer size, which together with, e.g. strncat can lead to off by one.
  • strlcat()
    Similar to strncat but the size parameter is the total size of the destination buffer, not the remaining space. Guarantees NUL termination. Returns the number of bytes required to hold the resulting string. If the destination string is already longer than n parameter, the buffer is left untouched and the n parameter is returned. One of the safest alternatives.

Common issues

  • Unbounded copies. Not checking the bounds of destination buffers;
  • Character expansion where software encodes special chars, resulting in longer string than the original. Common when processing metacharacters or formatting raw data for human readability;
  • Incorrectly incrementing pointers. Pointers can be incremented outside the bounds of the string being operated on. Two main cases: when a string isn’t NUL-terminated correctly; or when a NUL terminator can be skipped because of a processing error
  • Typos. One occasional mistake is a simple pointer use error, which happens when a developer accidentally dereferences a pointer incorrectly or doesn’t dereference a pointer when necessary


In-band representation vs out of band representation of control data/metadata.
  • Embedded delimiters. A pattern in which the application takes user input that isn’t filtered sufficiently and uses it as input to a function that interprets the formatted string. This interpretation might not happen immediately; it might be written to a secondary storage facility and then interpreted later. An attack of this kind is sometimes referred to a “second-order injection attack.”
  • NUL character injection. Special case of embedded delimiter, important in scenarios of Web apps or Java etc passing strings to C-based APIs.
    Example is fgets() which stops reading when it runs out of space in the destination buffer or encounters \n or EOF. NULs have to be dealt with separately.
  • Truncation. In statically sizes buffers, input that exceeds the length of the buffer must be truncated to fit the buffer size and avoid buffer overflows. THis avoids memory corruption, but could lead to interesting side effects from data loss in the shortened input string.
    Can happen when using snprintf instead of sprintf. For functions in this family:
    Consider how every function behaves when it receives data that isn’t going to fit in a destination buffer. Does it just overflow the destination buffer? If it truncates the data, does it correctly NUL-terminate the destination buffer? Does it have a way for the caller to know whether it truncated data? If so, does the caller check for this truncation?

Common metacharacters formats

  • Path metacharacters
    • File canonicalisation - especially directory traversal
    • The Windows registry paths
  • C format strings - printf(), err(), syslog() families of functions
  • Shell metacharacters - e.g. using popen() or Perl open() call
  • SQL queries

Metacharacter filtering

Three options:
  • Detect erroneous input and reject what appears to be an attack.
    • whitelists
    • blacklists
  • Detect and strip dangerous characters.
    • insufficient filtering
    • character stripping vulnerabilities - mistakes in sanitisation routines.
  • Detect and encode dangerous characters with a metacharacter escape sequence.
    • If escape character is not treated carefully, it can be used to undermine the whole escaping routine
When escaping or decoding occurs after a security decision is made on input, it’s a problem

Character sets and unicode

  • Unicode
    • UTF-8
    • UTF-16
    • UTF-32
    • Vulnerabilities in decoding
    • Homographic attacks
  • Windows unicode functions