Thursday, 25 May 2017

Back from dead, and possibly a zombie - argh!

I wouldn't call this "back" exactly though. Re-emerging with a completely non-infosex related topic after a year of learning Chinese (Mandarin, or 普通话, strictly speaking). Also, no links here - use Google if you are a curious type.

After a year of lazy - I was somewhat busy for half of 2015 nearly dying and then recovering - learning and another year of studying reasonably seriously I'm at about HSK4 level. Which isn't really as much as advertised, but that's a separate topic.

This motivates me to wax philosophical a little. Given all my applications of "trial and discover" (haha) method, I formed an opinion that there are few to none language-specific tricks to learning Chinese. All of such advice can be applied to (or rather derived from) learning of other languages.

That is, all language learning is essentially the same - a lot of practice and repetition, preferably a long-term immersion. I've learned a few (human!) languages to various degrees of usefulness, and it always worked that way.

Chinese is especially difficult because of

  • a) its peculiar writing system
  • b) almost complete lack of cognates with Indo-European languages and
  • c) a somewhat alien to a Western speaker grammar.
On the last point - it depends on your mother tongue, with English speakers being hit the heaviest among major IE language groups. At least, complaints in English are the most vocal! Thank god I'm a native Russian speaker and, intriguingly, there are some concepts that are sort of common with Chinese - e.g. verbal aspect is not an entirely alien concept; "verbal complements" in Mandarin correspond to verbal prefixes in Russian, and so on.

Chinese-specific trick or two I learned are all related to writing: Learn writing of some characters, maybe a few hundred (but prioritise using pinyin IME), and learn the most common 150-ish radicals - this will make learning characters easier. Skritter is nice, partially because they record the state of all characters you ever learn, no matter whether you add or remove lists you're studying from.

Reading and listening comprehension - well, read and listen a lot; maybe prioritise learning words over single characters they consist of (not sure about the latter though). Zhongwen is a good Chrome plugin and ChineseGrammarWiki is a good online grammar.

Speaking - the same. Find a tutor, there are tons online, eg. on iTalki. A good piece of software to visualise tones is SpeakGoodChinese (or Praat, which it is based on).

That plus all of the generic language learning advice like "you learn what you practice", "find the group/book(Integrated Chinese for me)/method that works for you", "use SRS(Anki) if it's your thing". And motivation, which is the key with any learning.

好好学习天天向上, in short!

Wednesday, 10 August 2016

TAOSSA code auditing tips

A summary (copy-paste) of all "auditing tips" from the (still!) awesome TAOSSA book

Ch 6

Auditing Tip: Type Conversions
Even those who have studied conversions extensively might still be surprised at the way a compiler renders certain expressions into assembly. When you see code that strikes you as suspicious or potentially ambiguous, never hesitate to write a simple test program or study the generated assembly to verify your intuition.
If you do generate assembly to verify or explore the conversions discussed in this chapter, be aware that C compilers can optimize out certain conversions or use architectural tricks that might make the assembly appear incorrect or inconsistent. At a conceptual level, compilers are behaving as the C standard describes, and they ultimately generate code that follows the rules. However, the assembly might look inconsistent because of optimizations or even incorrect, as it might manipulate portions of registers that should be unused.
Auditing Tip: Signed/Unsigned Conversions
You want to look for situations in which a function takes a size_t or unsigned int length parameter, and the programmer passes in a signed integer that can be influenced by users. Good functions to look for include read(), recvfrom(), memcpy(), memset(), bcopy(), snprintf(), strncat(), strncpy(), and malloc(). If users can coerce the program into passing in a negative value, the function interprets it as a large value, which could lead to an exploitable condition.
Also, look for places where length parameters are read from the network directly or are specified by users via some input mechanism. If the length is interpreted as a signed variable in parts of the code, you should evaluate the impact of a user supplying a negative value.
As you review functions in an application, it’s a good idea to note the data types of each function’s arguments in your function audit log. This way, every time you audit a subsequent call to that function, you can simply compare the types and examine the type conversion tables in this chapter’s “Type Conversions” section to predict exactly what’s going to happen and the implications of that conversion. You learn more about analyzing functions and keeping logs of function prototypes and behavior in Chapter 7, “Program Building Blocks.”
Auditing Tip: Sign Extension
When looking for vulnerabilities related to sign extensions, you should focus on code that handles signed character values or pointers or signed short integer values or pointers. Typically, you can find them in string-handling code and network code that decodes packets with length elements. In general, you want to look for code that takes a character or short integer and uses it in a context that causes it to be converted to an integer. Remember that if you see a signed character or signed short converted to an unsigned integer, sign extension still occurs.
As mentioned previously, one effective way to find sign-extension vulnerabilities is to search the assembly code of the application binary for the movsx instruction. This technique can often help you cut through multiple layers of typedefs, macros, and type conversions when searching for potentially vulnerable locations in code.
Auditing Tip: Truncation
Truncation-related vulnerabilities are typically found where integer values are assigned to smaller data types, such as short integers or characters. To find truncation issues, look for locations where these shorter data types are used to track length values or to hold the result of a calculation. A good place to look for potential variables is in structure definitions, especially in network-oriented code.
Programmers often use a short or character data type just because the expected range of values for a variable maps to that data type nicely. Using these data types can often lead to unanticipated truncations, however.
Auditing Tip
Reviewing comparisons is essential to auditing C code. Pay particular attention to comparisons that protect allocation, array indexing, and copy operations. The best way to examine these comparisons is to go line by line and carefully study each relevant expression.
In general, you should keep track of each variable and its underlying data type. If you can trace the input to a function back to a source you’re familiar with, you should have a good idea of the possible values each input variable can have. Proceed through each potentially interesting calculation or comparison, and keep track of potential values of the variables at different points in the function evaluation. You can use a process similar to the one outlined in the previous section on locating integer boundary condition issues.
When you evaluate a comparison, be sure to watch for unsigned integer values that cause their peer operands to be promoted to unsigned integers. sizeof and strlen () are classic examples of operands that cause this promotion.
Remember to keep an eye out for unsigned variables used in comparisons, like the following:
if (uvar < 0) ...  
if (uvar <= 0) ...
The first form typically causes the compiler to emit a warning, but the second form doesn’t. If you see this pattern, it’s a good indication something is probably wrong with that section of the code. You should do a careful line-by-line analysis of the surrounding functionality.
Auditing Tip: sizeof
Be on the lookout for uses of sizeof in which developers take the size of a pointer to a buffer when they intend to take the size of the buffer. This often happens because of editing mistakes, when a buffer is moved from being within a function to being passed into a function.
Again, look for sizeof in expressions that cause operands to be converted to unsigned values.
Auditing Tip: Unexpected Results
Whenever you encounter a right shift, be sure to check whether the left operand is signed. If so, there might be a slight potential for a vulnerability. Similarly, look for modulus and division operations that operate with signed operands. If users can specify negative values, they might be able to elicit unexpected results.
Auditing Tip
Pointer arithmetic bugs can be hard to spot. Whenever an arithmetic operation is performed that involves pointers, look up the type of those pointers and then check whether the operation agrees with the implicit arithmetic taking place. In Listing 6-29, has sizeof() been used incorrectly with a pointer to a type that’s not a byte? Has a similar operation happened in which the developer assumed the pointer type won’t affect how the operation is performed?

Ch 7

Auditing Tip
When data copies in loops are performed with no size validation, check every code path leading to the dangerous loop and determine whether it can be reached in such a way that the source buffer can be larger than the destination buffer.
Auditing Tip
Mark all the conditions for exiting a loop as well as all variables manipulated by the loop. Determine whether any conditions exist in which variables are left in an inconsistent state. Pay attention to places where the loop is terminated because of an unexpected error, as these situations are more likely to leave variables in an inconsistent state.
Auditing Tip
Determine what each variable in the definition means and how each variable relates to the others. After you understand the relationships, check the member functions or interface functions to determine whether inconsistencies could occur in identified variable relationships. To do this, identify code paths in which one variable is updated and the other one isn’t.
Auditing Tip
When variables are read, determine whether a code path exists in which the variable is not initialized with a value. Pay close attention to cleanup epilogues that are jumped to from multiple locations in a function, as they are the most likely places where vulnerabilities of this nature might occur. Also, watch out for functions that assume variables are initialized elsewhere in the program. When you find this type of code, attempt to determine whether there’s a way to call functions making these assumptions at points when those assumptions are incorrect.

Ch 8

Auditing Tip
When attempting to locate format string vulnerabilities, search for all instances of printf(), err(), or syslog() functions that accept a nonstatic format string argument, and then trace the format argument backward to see whether any part can be controlled by attackers.
If functions in the application take variable arguments and pass them unchecked to printf(), syslog(), or err() functions, search every instance of their use for nonstatic format string arguments in the same way you would search for printf() and so forth.
Auditing Tip
You might find a vulnerability in which you can duplicate a file descriptor. If you have access to an environment similar to one in which the script is running, use lsof or a similar tool to determine what file descriptors are open when the process runs. This tool should help you see what you might have access to.
Auditing Tip
Code that uses snprintf() and equivalents often does so because the developer wants to combine user-controlled data with static string elements. This use may indicate that delimiters can be embedded or some level of truncation can be performed. To spot the possibility of truncation, concentrate on static data following attacker-controllable elements that can be of excessive length.
Auditing Tip
When auditing multicharacter filters, attempt to determine whether building illegal sequences by constructing embedded illegal patterns is possible, as in Listing 8-26.
Also, note that these attacks are possible when developers use a single substitution pattern with regular expressions, such as this example:
$path =~ s/\.\.\///g;
This approach is prevalent in several programming languages (notably Perl and PHP).

Ch 9

Auditing Tip
The access() function usually indicates a race condition because the file it checks can often be altered before it’s actually used. The stat() function has a similar problem.
Auditing Tip
It’s a common misunderstanding to think that the less specific permission bits are consulted if the more specific permissions prevent an action.

Ch 10

Auditing Tip
When auditing code that’s running with special privileges or running remotely in a way that allows users to affect the environment, verify that any call to execvp() or execlp() is secure. Any situation in which full pathnames aren’t specified, or the path for the program being run is in any way controlled by users, is potentially dangerous.
Auditing Tip
Carefully check for any privileged application that writes to a file without verifying whether writes are successful. Remember that checking for an error when calling write() might not be sufficient; they also need to check whether the amount of bytes they wrote were successfully stored in their entirety. Manipulating this application’s rlimits might trigger a security vulnerability by cutting the file short at a strategically advantageous offset.
Auditing Tip
Never assume that a condition is unreachable because it seems unlikely to occur. Using rlimits is one way to trigger unlikely conditions by restricting the resources a privileged process is allowed to use and potentially forcing a process to die when a system resource is allocated where it usually wouldn’t be. Depending on the circumstances of the error condition you want to trigger, you might be able to use other methods by manipulating the program’s environment to force an error.

Ch 14

Auditing Tip
Examine the TCP sequence number algorithm to see how unpredictable it is. Make sure some sort of cryptographic random number generator is used. Try to determine whether any part of the key space can be guessed deductively, which limits the range of possible correct sequence numbers. Random numbers based on system state (such as system time) might not be secure, as this information could be procured from a remote source in a number of ways.

Ch 17

Auditing Tip
Examine all exposed static HTML and the contents of dynamically generated HTML to make sure nothing that could facilitate an attack is exposed unnecessarily. You should do your best to ensure that information isn’t exposed unnecessarily, but at the same time, look out for security mechanisms that rely on obscurity because they are prone to fail in the Web environment.
Auditing Tip
Look at each page of a Web application as though it exists in a vacuum. Consider every possible combination of inputs, and look for ways to create a situation the developer didn’t intend. Determine if any of these unanticipated situations cause a page use the input without first validating it.
Auditing Tip
Always consider what can happen if attackers visit the pages of a Web application in an order the developer didn’t intend. Can you bypass certain security checks by skipping past intermediate verification pages to the functionality that actually performs the processing? Can you take advantage of any race conditions or cause unanticipated results by visiting pages that use session data out of order? Does any page trust the validity of an information user’s control?
Auditing Tip
First, focus on content that’s available without any kind of authentication because this code is most exposed to Internet-based attackers. Then study the authentication system in depth, looking for any kind of issue that lets you access content without valid credentials.
Auditing Tip
When reviewing authorization, you need to ensure that it’s enforced consistently throughout the application. Do this by enumerating all privilege levels, user roles, and privileges in use.
Auditing Tip
Although this sample application might seem very contrived, it is actually representative of flaws that are quite pervasive throughout modern Web applications. You want to look for two patterns when reviewing Web applications:
  1. The Web application takes a piece of input from the user, validates it, and then writes it to an HTML page so that the input is sent to the next page. Web developers often forget to validate the piece of information in the next page, as they don’t expect users to change it between requests. For example, say a Web page takes an account number from the user and validates it as belonging to that user. It then writes this account number as a parameter to a balance inquiry link the user can click. If the balance inquiry page doesn’t do the same validation of the account number, the user can just change it and retrieve account information for other users.
  2. The Web application puts a piece of information on an HTML page that isn’t visible to users. This information is provided to help the Web server perform the next stage of processing, but the developer doesn’t consider the consequences of users modifying the data. For example, say a Web page receives a user’s customer service complaint and creates a form that mails the information to the company’s help desk when the user clicks Submit. If the application places e-mail addresses in the form to tell the mailing script where to send the e-mail, users could change the e-mail addresses and appear to be sending e-mail from official company servers.
Auditing Tip
Weaknesses in the HTTP authentication protocol can prove useful for attackers. It’s a fairly light protocol, so it is possible to perform brute-force login attempts at a rapid pace. HTTP authentication mechanisms often don’t do account lockouts, especially when they are authenticating against flat files or local stores maintained by the Web server. In addition, certain accounts are exempt from lockout and can be brute-forced through exposed authentication interfaces. For example, NT’s administrator account is immune from lockout, so an exposed Integrated Windows Authentication service could be leveraged to launch a high-speed password guessing attack.
You can find several tools on the Internet to help you launch a brute-force attack against HTTP authentication. Check the tools sections at
Auditing Tip
When you review a Web site, you should pay attention to how it uses cookies. They can be easy to ignore because they are in the HTTP request and response headers, not in the HTML (usually), but they should be reviewed with the same intensity you devote to GET and POST parameters.
You can get access to cookies with certain browser extensions or by using an intercepting Web proxy tool, such as Paros ( or SPIKE Proxy ( Make sure cookies are marked secure for sites that use SSL. This helps mitigate the risk of the cookie ever being transmitted in clear text because of deliberate attacks, such as cross-site scripting, or unintentional configuration and programming mistakes and browser bugs.
Auditing Tip
Tracking state based on client IP addresses is inappropriate in most situations, as the Internet is filled to capacity with corporate clients going though NAT devices and sharing the same source IP. Also, you might face clients with changing source IPs if they come from a large ISP that uses an array of proxies, such as AOL. Finally, there is always the possibility of spoofing attacks that allow IP address impersonation.
There are better ways of tracking state, as you see in the following sections. As a reviewer, you should look out for any kind of state-tracking mechanism that relies solely on client IPs.
Auditing Tip
If you see code performing actions or checks based on the request URI, make sure the developer is handling the path information correctly. Many servlet programmers use request.getRequestURI() when they intend to use request.getServletPath(), which can definitely have security consequences. Be sure to look for checks done on file extensions, as supplying unexpected path information can circumvent these checks as well.
Auditing Tip
Generally, you should encourage developers to use POST-style requests for their applications because of the security concerns outlined previously. One issue to watch for is the transmission of a session token via a query string, as that creates a risk for the Web application’s clients. The risk isn’t necessarily a showstopper, but it’s unnecessary and quite easy for a developer or Web designer to avoid.

Tuesday, 9 August 2016

TAOSSA Chapter 16

Ch. 16 Network application protocols

Auditing application protocols

  • Collect documentation
  • Idendity elements of unknown protocols
  • Using packet sniffers
  • Initiate the connection several times
    • Did a single field change by a lot or a little?
    • Was the change of values in a field drastic? Could it be random, e.g. a connection ID?
    • Did the size of the packet change? Did a field change in relation to the size of the packet? Could it be a size field?
  • Many procotols are composed of messages that have a similar header format and varying body
  • Replay messages

Reverse-engineer the application

  • Use symbols
  • Examine strings in the binary
    • Useful for debug strings
    • Getting from error messages to code that generates them
  • Examine special values - e.g. unique tag values
  • Debug - turn on debugging if possible
  • Find communication primitives
    • For exampls, for TCP: read(), recv(), recvmsg(), WSArecv()
  • Use library tracing
  • Match data types with the protocol
    • Analyze the structure of untrusted data processed by a server/client, then match elements of these structures with vulnerabilty classes

Binary protocols

  • Integer overflows and 32-bit length values
    • When 32bit length variables are used to dynamically allocate space for user-supplied data. Usually results in heap corruption
    • size_t in particular
  • Integer underflows and 32-bit length values
    • When variables are not adequately checked against each other to enforce a relationship
    • When length values are required to hold a minimum length but the parsing code never verifies this requirement
  • Small data types
    • Sign extension issues are more relevant because programs often natively use 32-bit variables even when dealing with smaller data types

Text-based protocols

  • Most vulns in binary protocol implementations result from type conversions and arithmetic boundary conditions.
  • Text-based protocols tend to contain vulnerabilties related to text processing - standard buffer overflows, pointer arithmetic errors, off-by-ones etc
  • One exception is text-based protocols specifying lengths in text that are converted to integers
  • Buffer overflows. Text-based protocols manipulate strings, more vulnerable to simpler buffer overflows than to type conversion errors.
    • Unsafe use of string functions
    • Pointer arithmetic bugs are more common because more subtle, especially off-by-ones. Common when there are multiple elements in a single line of text.
  • Text-formatting issues. Format string issues, resource access; problems in text data decoding implementations - bad hex or UTF-8 decoding routines.

Data verification

  • Cases where information disclosure on the network is bad; or when forged or modified data can result in a security issue on the receiver.
  • Encryption may be necessary; data verificaiton may be required.

Access to system resources

  • Many protocols allow users request system resources implicitly or explicitly
  • Questions to consider:
    • Is credential verification for accessing the resource adequate?
    • Does the application give access to resources that it’s supposed to? (ie. implementation is flawed and discloses more than intended)


See chapter 17 for details.
  • Header parsing. Vulnerabilities are more likely when parsing a “folded header”. Code sometimes assumes headers are limited in length, but an arbitrary long header can be supplied by using folded headers.
  • Accessing resources. HTTP is designed to serve content to clients.
    • Many examples of disclosing arbitrary files from the filesystem.
    • Encoding-related traversal bugs.
    • If the server implements additional features or keywords, check the corresponding code, more likely to have bugs.
  • Utility functions.
    • Functions for URL handling - dealing with port, protocoal path components etc. Check for buffer overflows.
    • Logging utility functions can be interesting
  • Posting data. Data supplied via POST method. Simple counted data post and chunked encoding.
    • Simple counted - depends on how the length value is interpreted. large values can lead to integer overflows or sign issues. Processing of signed content length values also error prone.
    • Chunked encoding - remote attackers specifying arbitrary sizes has been a problem. Careful sanitation of specified sizes is required to avoid integer overflows oe sign-comparison vulnerabilities. Integer overflows are easier to trigger than the same in simple counted encoding.
    • Note that realloc(0) is equivalent to free, so need to overflow not to 0 but 1 or more.


  • ISAKMP packet header contains a 32 bit length field. Primarily signed issues and integer overflows in this user-controlled value.
  • The length header in the header is the total packet length, including the header itself. Developers might assume the length field is larger or equal to the ISAKMP header size (=8). Possible underflow.
  • 16 bit length field in ISAKMP payloads. Same - overflows and udnerflow. Minimal size 4 bytes.
  • Amount of bytes currently processed + current payload length <= isakmp packet length
  • Different payload types
<skipped a long description of various issues with different payloads>


Three standardized methods for encoding: Basic Encoding rules (BER); Packed ER (PER), XML ER (XER)

Data types

  • Universal - in the standard
  • Application - tags that are unique to applications
  • Context-specific - tags used to identify a member in a constructed type (e.g. a set)
  • Private - unique to an organization
Primitive vs. constructed types. Constructed composed of one or more simple types and other constructed types. Can be sequences (SEQUENCE), lists (SEQUENCE-OFSET and SET-OF), or choices

Basic encoding rules

Identifier, length, some content data, and an end-of content (EOC) sequence.
  • Identifier class + p/c + tag number. Tag number can be composite if it’s over 30. To encode the value 0x3333, for example, the 0xFF 0xD6 0x33 byte sequence would be used.
  • Length definite or indefinite length. Indefinite = terminated with EOC sequence.
  • Contents
  • End of content - only required when object has an indefinite length. EOC is 0x00 0x00.
Canonical encoding rules (CER) vs Distinguished encoding rules (DER) - limit the ambiguousness of BER.
  • CER Same as BER with restrictions: (used when large objects are transmitted; when all the object data is not available; when object sizes are not known at transmission time)
    • Constructed types must use an indefinite length encoding.
    • Primitive types must use the fewest encoding bytes necessary to express the object size.
  • DER smaller objects in which all bytes for objects are available and the lengths of objects are known at transmission time.
    • All objects must have a definite length encoding (no EOC)
    • The length encoding must use the fewest bytes necessary (same as CER)

Vulnerabilities in BER, CER, DER implementations

  • Tag encodings Some combinations of fields are illegal in certain variants of BER
    • e.g. in CER, an octet string of less than or equal to 1,000 bytes must be encoded using a primitive form rather than a constructed form. Is this really enforced? differences in IDS processing and end host processing.
    • Can trick the implementation into reading more bytes than are available in the data stream.
  • Length encodings A common problem.
    • Multibyte encodings - when the length field is made to be more bytes than are left in the data stream.
    • Extended length-encoding value - you can specify 32 bit integers - integer overflows and signed issues.

Packed encoding rules (PER)

More compact than BER. Can be used only to encode values of a single ASN.1 type. COnsists of 3 fields: Preamble, length and contents
  • Preamble - a bit map used when dealing with sequence, set, and set-of-data types.
  • Length - more complex than in BER. Aligned variants and unaligned variants. Constrained, semiconstrained and unconstrained.
    • The program decoding a PER but stream must already know the structure of an incoming ASN.1 stream so that it knows how to decode the length. Constrained vs unconstrainedand what boundaries are for constrained lengths.

Vulnerabilities in PER

A variety of integer related issues. Problems are more restricted because the values are more constrained.
  • In particular for unconstrained lengths bottom 6 bits can be only 1 to 4 but the implementation might not enforce this rule.
  • Checking return values incorrectly.

XML encoding rules

Very different problems because this is a text markup language. XER prologue and an XML document element that describes a single ASN.1 object. Prologue does not have to be used.

XER vulnerabilities

Text-based errors: simple buffer overfloes or pointer arithmetic bugs. Programs that exchange XML are often exposing a huge codebase to untrusted data. In particular, check the UTF encoding schemes for encoding Unicode endpoints, see Chapter 8.


  • Domain names and resources records
  • Name servers and resolvers
    • Resolver code queries DNS on behalf og user applications
    • Fully functional resolver knows what to do when a non-recursive DNS server doesn’t have an answer
    • Stub resolver relies on a recursive name server to do all the work
  • Zones
  • Resource record types
  • DNS protocol structure
  • DNS name encoding and buggy parsers (3www6google3com)
Sample problems:
  • Failure to deal with invalid label lenghts. The maximum size for a label is 63 bytes because setting the top 2 bits indicates that the byte is the first in a two-byte pointer, leaving 6 bits to represent a label length. That means any label in which one of the top bits is set but the other one isn’t is an invalid length value.
  • Insufficient destination length checks
  • Insufficient source length checks
  • Pointer values not verified in the packet
  • Special pointer values (when pointer compression is used)
  • Length variables. (There are no 32-bit integers to specify data lengths in the DNS protocol; everything is 8 or 16 bits)
    • Sign extension of 16-bit values
    • Integer overflows
  • DNS spoofing
    • Cache poisoning
    • Spoofing responses

Monday, 8 August 2016

TAOSSA Chapters 14-15

Ch. 14 Network Protocols

Internet Protocol

General intro about IP packet structure

Basic IP header validation

  • Is the received packet too small?
    • Must be at least 20 bytes.
  • Does the IP packet contain options?
    • Packets with options can be bigger than 20 bytes, up to 60.
  • Is the IP header length valid?
    • IP header length must be at least 5 (4*5=20).
  • Is the total length field too large?
    • Compared to the actual data received.
  • Are all field lengths consistent?
    • IP header length <= data available
    • 20 <= IP header length <= 60
    • IP total length <= data available
    • IP header length <= IP total length
  • Is the IP checksum correct?

IP options processing

  • Is the option length sign-extended?
    • It shouldn’t be. Byte to int promotion issues are common
  • Is the header big enough to contain the IP option?
  • Is the option length too large?
    • Offset of IP option + IP option length <= IP header length
    • Offset of IP option + IP option length <= IP total length
  • Does the option meet minimum size requirements?
    • Should be at least 2
  • Are IP option buts checked?
    • Most implementations ignore the separate bitfields without parsing
  • Unique problems
    • Solaris example

Source routing

  • Processing
    • Ensure that the pointer byte is within the specified bounds. During processing, an IP option often modifies bytes it pointing at.
    • The pointer is a single-byte field - beware type conversions.
    • Sign extensions could cause the offset to take on a negative value
    • Check that the length of routing options is validated


Pathological fragment sets
  • Data beyond the end of the final segment
    • Attackers can put the final fragment (MF=0) in the middle or beginning of the set of fragments
  • Multiple final fragments
  • Overlapping fragments
  • Idiosyncrasies

User datagram protocol

Basic UDP header validation

  • Us the UDP length field correct?
    • The minimum value is 8 bytes (no data)
  • Is the UDP checksum correct?

Transmission control protocol

Basic TCP header validation

  • Is the TCP data offset field too large?
    • TCP header length <= data available
    • 20 bytes (5 * 4) <= TCP header length <= 60 bytes
  • Is the TCP header length too small?
  • Id the TCP checksum correct?

TCP options processing

  • Is the option length field sign extended?
    • It shouldn’t be, possibility of dangerous bugs
    • For example, assigning char value to an int variable
  • Are enough bytes left for the current option?
  • Is the option length too large or too small?
    • Compared to the size of the TCP header / packet

TCP connections

  • 6 flags: SYN, ACK, RST, URG, FIN, PSH
  • Setting up, closing, tearing down of connections

TCP streams

Sequence numbers (ISNs). TCP spoofing attacks and others

TCP state processing

Various vulns

Urgent pointer processing

  • Handling pointers tint other packets
    • Neglecting to check that the pointer is within the bounds of the current packet
    • Recognising that the pointer is pointing beyond the end of the packet and trying to handle it (often incorrectly)
  • Handling 0-offset urgent pointers
    • 0 offset URG pointer is invalid

Simultaneous open

Both peers send a SYN packet at the same time with mirrors source and destination ports. Then they both send a SYN-ACK packet, and the connection is established.

Ch. 15 Firewalls


Attack surface - Proxy firewalls

Same issues as with network servers. Also make sure the firewall makes a clear distinction between internal and external users or tracks authorised users.

Packet-filtering firewalls

Stateless vs stateful filters

Stateless Firewalls

Stateless firewalls look for connection initiation packets - SYNs, and more or less let other packets go through.
Can be abused for FIN scanning (not sure this works anymore). Stateless FW has to let FIN and RST packets through.
Different stacks behave differently for weird combinations of flags. Eg. SYN-FIN may initiate a connection.
Only port-based rules. Return packets a big problem - e.g. DNS replies from servers. Effectively creates a hole for UDP scanning with a source port 53.
Active / passive FTP; active is a problem for stateless firewalls, similar to UDP above but with TCP.
Either deny completely or apply very simple set of rules to process. No tracking because stateless. Some rules:
  • Fragments with low IP offset (1,2 etc) - drop as they will mess with TCP flags
  • Fragments with 0 offset should contain the full header, otherwise drop
  • Multiple offset 0 fragments - drop all after the full header
  • Fragments with high offset can pass

Simple stateful firewalls

  • TCP
    • These days any issues are rare
  • UDP
    • A common mistake is to allow responses from any UDP port
  • Directionality
  • Fragmentation handling
    • Can be done better than with stateless FWs. Bugs existed
  • Fooling virtual reassembly
  • IP TTL field
  • IP options

Stateful inspection firewalls

  • Checkpoint’s original term - looking inside the packet
  • Layering issues
    • Firewalls are not doing full TCP/IP processing and so make mistakes because they peek at layer they do not understand
    • For FTP, simplistic port lookup in the packet can be fooled into creating connections in the state table by faking 227 responses in the packet

Spoofing attacks

Obviously cannot muck with the destination IP.
  • Spoofing from an internal trusted source
  • Spoofing for a response
    • Try to get hosts to respond to addresses you cannot reach otherwise
    • Especially with source address or
  • Spoofing for a state entry - to get special entries added to the firewall state table for later use
  • Spoofing from a network peer
  • Spoofing destinations to create state table entries
  • Source routing and encapsulation

Sunday, 7 August 2016

TAOSSA Chapter 13

Ch. 13 Synchronisation and State

Synchronisation problems

Reentrancy - function’s capability to work correctly, even when it’s interrupted by another running thread that calls the same function. It must not modify any global vars or shared resources w/o adequate locking.
Race conditions
In race conditions outcome of an operation is successful only if certain resources are acted on in an expected order.
Starvation and deadlocks
Starvation - a thread never receives ownership of a synchronisation object.
Deadlocks can occur when several thread are using multiple sync objects at once but in a different order. For a deadlock to be possible, 4 conditions are required: mutual exclusion, hold and wait, no preemption, circular wait

Process synchronisation

System V process synchronisation

Semaphore - a locking device that uses a counter to limit the number of instances that can be acquired. Decremented when acquired, incremented when released.
semget() - create a new semaphore set or obtain an existing set
semop() - performs operations on selected semaphores in a set
semctl() - perform a control operation on a selected semaphore

Windows process synchronisation


Vulnerabilities with interprocess synchronisation

  1. Synch objects required but not used, e.g. when 2 processes are attempting to access a shared resource
  2. Incorrect use (Windows)
  3. Squatting with named synchronisation objects (Windows)
Helpful tools/notes:
  1. Synchronisation objects scoreboard
  2. Lock matching


Signals are software interrupts that the kernel raises in a process at the request of other processes, or as a reaction to events that occur in the kernel.
Possible actions:
  • Ignore the signal (apart from SIGKILL and SIGSTOP)
  • Block the signal (same exception)
  • Install a signal handler
kill() system call is used to send a signal to a process
signal() for installing a handler
sigaction() interface - more detailed attributes for handled signals
setjmp(), longjmp(), sigsetjmp(), siglongjmp() often used in signal-handling routines to return to a certain location in the program in order to continue processing after a signal has been caught. Program context of setjmp() is restored when returned from longjmp(). Zero return value means a call to setjmp, a non-zero value indicates a return from a longjmp

Signal vulnerabilities

Signal handlers need to be asynchronous-safe - can safely and correctly run even if it is interrupted by an asynchronous even. It is reentrant by definition by also correctly deal with signal interruptions.
Problem when the handler relies on some sort of global program state, such as assumption that global variables are initialised when in fact they aren’t.
Various problems (non-asynchronous-safe state) may arise from attempting to restart execution using longjmp() function in non-returning signal handlers.
Other problems can be caused by invalid longjmp targets. The function that call setjmp or sigsetjmp must be still on the runtime exec stack whenever longjmp or siglongjmp are called. If the original function has terminated, the pointer will be invalid.
Pay special attention for the following reasons:
  • The signal handler doesn’t return, so it’s highly unlikely that it will be asynchronous safe unless it exits immediately.
  • It might be possible to find a code path where the function that did the setjmp returns, but the signal handler with the longjmp is not removed.
  • The signal mask might have changed, which could be an issue if sigsetjmp and siglongjmp aren’t used. If they are, does restoring the old signal mask cause problems as well?
  • Permissions might have changed.
  • Program state might have changed such that the state of variable that are valid when 8setjmp* is originally called but not necessarily when longjmp is called.
The signal handler itself can be interrupted or called more than once. A signal handler can be interrupted only if a signal is delivered to the process that isn’t blocked. Signals are blocked by usingsigprocmask() function, or implicitly - signals of the type the handler catches is blocked vof the period of time the signal handler is running. Also sigaction() function.
Sometimes non-async safe functions are used in signal handlers (see signal(3) or sigaction(2))
Signal handlers using longjmp and siglongjmp are practically guaranteed to be non-async safe unless they jump to a location that immediately exits.


PThreads API is the primary API on UNIX. Uses mutexes and condition variables. Linux has a modified version - LinuxThreads. On Windows the API is more complicated.
<skipped> - Critical sections

Threading Vulnerabilities

  • Race conditions occurs when the successful outcome of an operation depends on whether the threads are scheduled for running in a certain order.
  1. Identify shared resources that are acted on by multiple threads.
  2. Determine whether the appropriate locking mechanism has been selected. There are specific rules in the book for different types of resources.
  3. Examine the code that modifies this resource to see whether appropriate locking mechanisms have been neglected or misused.
  • Deadlocks and starvation
In PThreads deadlocks are more likely to occur rom the use of multiple mutexes. A classic situation: two or more locks can be held by a single thread, and another thread can acquire the same locks in a different order.

Saturday, 6 August 2016

TAOSSA Chapter 10

Ch 10. UNIX II: Processes


fork() creates new processes. Returns in parent the PID of the new child process; in the child process - 0. Return value -1 means call failed, no child spawned
getppid() - get parent PID
If a process terminates while its children are still running, these children are assigned to init (PID 1)
In Linux clone() is a fork() variant that allows callers to specify several parameters of the forking operation
Child inherits a copy of most resources from the parent. For files - different. Child gets a copy of the parent’s file descriptors, and both processes share the same open file structure in the kernel (which points to an inode). As a result parent and child may be fighting for access to the file.

Program invocation

execve() is the standard way of invoking processes execvp() and execlp() if filename is missing slashes, they use PATH env variable to resolve the location of the executable. They also open a shell to run the file if execve fails with ENOEXEC.
It may be possible to supply program switches in the argument array if it is not sanitised properly. Keep in mind that getopt() interprets only the arguments preceding – (two dashes)
  • Metacharacters - see [[TAOSSA notes ch 8]]
  • Globbing
  • Environment issues
  • Setuid shell scripts

Process Attributes

Process attribute retention:
  • File descriptors usually get passed on from the old process to the new one
  • Signal masks - the new process loses all signal handlers installed by the previous process but retains the same signal masks
  • Effective UID - if the program is setuid, the EUID becomes the user ID of the program file owner. Otherwise it stays the same across the execution.
  • Effective GID - if setgid, the egad becomes the group ID of the program file group
  • Saved set-UID - set to the value of the EUID after any setuid processing has been completed
  • Saved set-GID - similar
  • Real UID, GID - preserved across execution
  • PID, PPID, PGID - don’t change across an execve() call
  • Supplemental group privileges are retained
  • Working dir, root dir - same
  • Controlling terminal - inherits from the old process.
  • Resource limits - a lot of details
  • Umask -
Users can set tight limits on a process and then run a setuid or setgid program. Rlimits are cleared out when a process does a fork(), but they survive the exec() family of calls, which can be used to force a failure in a predetermined location in the code. The error-handling code is usually less guarded than more well-traveled code paths.
UNIX does allow developers to mark certain file descriptors as close-on-exec, which means they are closed automatically if the process runs a new program. For applications that spawn new processes at any stage, always check to see whether this step is taken when it opens files. It is also useful to make a note of those persistent files that aren’t marked to close when a new program starts.
Security checks on a file descriptor are performed only once, when the process initially creates a file descriptor by opening or creating a resource. If you can get access to a file descriptor that was opened with write access to a critical system file, you can write to that file regardless of your effective user ID or other system privileges. Therefore, programs that work with file descriptors to security-sensitive resources should close their descriptors before running any user-malleable code.
setenv() and unsetenv() may be dodgy in how they behave with funny variable names.

Interprocess communication

Named pipes created with insufficient privileges might result in unauthorized clients performing some sort of data exchange, potentially leading to compromise via unauthorized (or forged) data messages.
Applications that are intended to deal with regular files might unwittingly find themselves interacting with named pipes. This allows attackers to cause applications to stall in unlikely situations or cause error conditions in unexpected places. When auditing an application that deals with files, if it fails to determine the file type, consider the implications of triggering errors during file accesses and blocking the application at those junctures.
The use of mknod() and mkfifo() might introduce a race condition between the time the pipe is created and the time it’s opened.
Three IPC mechanisms in System V IPC are message queues, semaphores, and shared memory.
Named UNIX domain sockets provide a general-purpose mechanism for exchanging data in a stream-based or record-based fashion.

Remote Procedure Calls

XDR External Data Representation

Friday, 5 August 2016

TAOSSA Chapter 9

Ch 9. Unix I: Privileges and objects

Unix 101; users, groups and processes; setuid and setgid binaries
Effective UID, real UID, saved set-UID.
Daemons and their children - all 3 IDs are 0 for “root” daemons

UID functions

  • seteuid(). Change the effective user ID associated with the process If a process is running with superuser privileges (effective user ID of 0), it can set the effective user ID to any arbitrary ID. Otherwise, for non-root processes, it can toggle the effective user ID between the saved set-user-ID and the real user ID
  • setuid(). Changes all 3 UIDs, is used for permanently assuming the role of a user, usually for the purposes of dropping privileges
  • setresuid(). Explicitly set all 3 UIDs. “-1” is used for “keep the same”. Non super-user can set any of the 3 to a value of any currently assigned 3 UIDs. Super-user - to any value.
  • setreuid(). Set real UID and effective UID, similar to setresuid. More important on Solaris and older BSDs who don’t have setresuid
Before OpenBSD imported the setresuid() function and rewrote the setreuid() function, the only straightforward way for a nonprivileged program to clear the saved set-user-ID was to call thesetuid() function when the effective user ID is set to the real user ID. This can be accomplished by calling setuid(getuid()) twice in a row.

Group ID functions

  • setegid()
  • setgid()
  • setresgid()
  • setregid()
  • setgroups(). Set supplementary groups by the process. Can only be performed by a process with an effective UID 0
  • initgroups(). Set supplementary groups from specified username and add another group. Same, requires EUID 0

Privilege vulnerabilities

A program can drop its root privileges by performing a setuid(getuid()), which sets the saved set-user-ID, the real user ID, and the effective user ID to the value of the real user ID.
A setgid+setuid program can drop root privileges: (order important)
/* drop root privs - correct order */
If the order is reversed, in Linux, Solaris, and OpenBSD, only the effective group ID is modified, and the saved set-group-ID still contains the group ID of the privileged group.
Same pair of calls for non-root prigs only changes effective IDs, not the saved IDs (in FreeBSD and NetBSD all three IDs are changed)
A similar case is when privileges are temporarily dropped and then setuid is called from while under non-0 (root) user. In most implementations this does not affect saved user ID and root privileges can be recovered by using seteuid(0)
Another situation - incorrect attempts to drop privileges temporarily
The book has a couple of pages of checklists for auditing privilege-management code
  • setgroups() works only when running with euid 0
  • Attempting to drop privileges while not running with euid 0 will not work
  • Using setegid() or seteuid() to drop root privileges is a mistake
  • Privileged groups and supplemental groups must be dropped before the process gives up its effective user ID of root
  • *setgid(getgid()) for non-root leaves saved UID set to a privileged user
For temporary dropping of privileges:
  • Make sure the code drops any relevant group permissions as well as supplemental group permissions.
  • Make sure the code drops group permissions before user permissions.
  • Make sure the code restores privileges before attempting to drop privileges again, either temporarily or permanently.

File security

File IDs

The kernel sets the file’s owner and group when the file is first created. The owner is always set to the effective user ID of the process that created the file.
There are two common schemes by which the group ID can be initialised.
  1. BSD-based systems tend to set the initial group ID to the group ID of the file’s parent directory.
  2. The System V and Linux approach is to set the group ID to the effective group ID of the creating process.

File permissions

The four components of the permission bitmask are owner permissions, group permissions, other permissions, and a set of special flags.
The kernel looks only at the most specific set of permissions relevant to a given user. It’s a common misunderstanding to think that the less specific permission bits are consulted if the more specific permissions prevent an action.
The three special permission bits are the setuid bit, the setgid bit, and the sticky (or tacky) bit.


To calculate the initial permission bits for a new file, the permission argument (mode) of the file creation system call is calculated with a bitwise AND operation with the complement of the umask value.
Umask is inherited by the new program; default umask is usually 022

Directory permissions

Slightly different meaning of permissions bits. Read - list contents, write - modify contents of directory - create, delete rename files. Execute - search permission, you need it to access any files or subdirectories. Read permissions work w/o search, write usually require search permissions.

Privilege management with file operations

File opening is typically done with the open()creat()mknod()mkdir(), or socket() system calls; a file’s directory is altered with calls such as unlink() and rename(); and file attributes are changed with calls such as chmod()chown(), or utimes(). All these privilege checks consider a file’s permission bitmask, ownership, and group membership along with the effective user ID, effective group ID, and supplemental groups of the process attempting the action. Effective permissions of the process are critical.
Sources of issues with file permissions:
  • Recklessness with permissions
  • Libraries doing stuff in the background
  • Permissions when creating files
    • Unix open() interface, specific mode and its umask interaction
    • Forgetting O_EXCL (if open() is called with O_CREAT but not O_EXCL, the system might open an existing file instead of creating a new one)
    • setuid root files created but the less privileged users
  • Directory safety (who owns the directory the file is in, who can write to it). All parents in the path must be safe.
  • Filenames and paths: absolute and relative; special entries. Every time you use a system call that takes a pathname, the kernel goes through the process of stepping through each directory to locate the file. For the kernel to follow a path, you must have search permissions on every directory in that path.
    • Pathname tricks, dir traversal
    • Embedded NUL
    • Dangerous dirs

File internals

  • File descriptor
  • File descriptor table
  • Inodes
  • Directories
  • Links - symlinks, hard links

Race conditions

Race conditions are situations in which two different parties simultaneously try to operate on the same resource with deleterious consequences.
For UNIX file system code, these issues usually occur when you have a process that gets preempted or enters a blocking system call at an inopportune moment. This moment is typically somewhere in the middle of a sensitive multiple-step operation involving file and directory manipulation. If another process wins the race and gets scheduled at the right time in the middle of this “window of inopportunity,” it can often subvert a vulnerable nonatomic sequence of file operations and wrest privileges from the application.

Stat() family of functions

stat(), fstat(), lstat()
fstat() is the most resilient in terms of race conditions, as it’s operating on a previously opened file
lstat() does not follow links, stat() does
Standard protection against link-based attacks is to use lstat() on a requested filename and either explicitly check it’s a link or check it’s a file and fail when it’s not
Beware of TOCTOU issues in the above scenario
Possible to delete or rename links when files are open. The kernel does not care if the file that fd indexes has been deleted or renamed. As long as the file descriptor is kept open, the file and corresponding in ode in the file system stay available. This can be used when the program checks after opening the file

Recap and other races

Most file system race conditions can be traced back to using system calls that work with pathnames
Permissions are established by how the file is opened and security checks at that time, so changing permissions will not affect access
Anything besides a single file-based sys call to open a resource followed by multiple file-descriptor based calls has a chance of a race condition
Evading file access checks: Another vulnerability pattern - security check function uses a filename followed by a usage function that uses a filename
Permission races: an app temporarily exposes a file to modification for a short window of time by creating it with insufficient permissions
If attackers can open the file during that window, they retain access even after permissions have been corrected
Ownership races: File is created with the effective privileges of a non privileged user, then later file owner is changed to that of a privileged user. If attackers open the file between open() andfchown(), they get a fd with access mask permitting read and write to the file
Directory races: If a program descends into user-controllable directors, user can move directories around and cause the program to operate on sensitive files

Temporary files

Temp directors are marked as sticky directories with mode octal 1777

Unique file creation

mktemp() generates very easily predictable unique name, based on process ID of the caller plus a static pattern
This can be used in race condition scenarios
tmpnam() and tempnam() have same race condition issues as mktemp()
mkstemp() much safer if used correctly
tmpfile() and mkdtemp() - safe functions

File reuse

Applications also might have a requirement to open temporary files that already exist in a temporary directory. Opening these files is difficult.
Preventing opening soft or hard links is difficult.
Cryogenic sleep attack - sending a job control signal such as SIGSTOP to the application at the right moment then manipulating files. Possible if the program is a setuid root program users had started in their terminal session

STDIO file interface

UNIX application code commonly uses stdio in lieu of the lower-level system call API because it automatically implements buffering and a few convenience functions for data formatting.
A typical FILE structure contains a pointer to buffered file data (if it’s a buffered stream), the file descriptor, and flags related to how the stream is opened.
fopen() is used for opening files. Same potential problems as open(). If the implementation does not take a mask as a parameter, it applies default mask of 0666 then fighter restricts permissions based on umask of the current process. In a privileged context, it should be used very carefully.
freopen() has the same problems,
fdopen() does not.
fread() similar to read() but reads a specified number of params of specified size. Multiplication is involved, potential for integer overflow. fgets() reads a single line from the file. Potential problems: ignoring the return value - if it returns NULL, contents of the destination buffer are unspecified. Another one: when the file containing user-controlled data is incorrectly parsed (because fgets reads up to x chars but not the whole line).
fscanf() reads data of a specified format directly into vars. Potential for buffer overflows when using this function to read in string values. Also need to check the return value.
With writing to a file there are more limitations for users to affect the application, because the data being manipulated is already in memory. Much fewer security implications of writing it into a file.
Potential format string vulnerabilities for printf family; users messing with file format
fclose() - if called twice on a FILE structure a double free() would occur, with a possibility of corrupting the heap.

Thursday, 4 August 2016

TAOSSA Chapter 8

Ch 8 - Strings and metacharacters

Major areas of string handling:
  • memory corruption due to string mishandling;
  • vulnerabilities due to in-band control data in the form of metacharacters;
  • vulnerabilities resulting from conversions between character encodings in different languages

C string handling

In C, string buffers have to be managed manually. They can estimate how much memory to reserve for a statically sized array, or they can dynamically allocate memory at runtime when the amount of space required for a data block is known.
The second way is better but a lot of overhead and the need to free memory correctly.
C++ has a safer string class, but the need to interface with C introduces the same issues

Unbounded string functions

The size of the destination buffer is not taken into account when performing data copy.
  • scanf()
    Used when reading in data from a file stream or string. Each data element specified in the format string is stored in a corresponding argument.
  • sprintf()
    Destination buffer can be overflowed, usually by %s or %[] formats. Occasionally with %d or %f. Also format strings vulnerabilities when user can control the format string specifier
    _wsprintfA() and _wsprintfW() copy a maximum of 1024 chars
  • strcpy()
    Destination buffer can be overflown. There are Windows variants.
  • strcat()
    The destination buffer (dst) must be large enough to hold the string already there, the concatenated string (src), plus the NUL terminator

Bounded string functions

Include a length parameter for the destination buffer. Occasionally miscalculated, or some boundary conditions, or data type conversion issues.
  • snprintf()
    Accepts a max number of bytes that can be written to the output buffer.
    On Windows OSs, if there’s not enough room to fit all the data into the resulting buffer, a value of -1 is returned and NUL termination is not guaranteed.
    UNIX implementations guarantee NUL termination no matter what and return the number of characters that would have been written had there been enough room. That is, if the resulting buffer isn’t big enough to hold all the data, it’s NUL-terminated, and a positive integer is returned that’s larger than the supplied buffer size.
  • strncpy()
    Accepts a max number of bytes to be copied into the destination.
    Does not guarantee NUL-termination of the destination string. If the source string is larger than the destination buffer, strncpy() copies as many bytes as indicated by the size parameter, and then ceases copying without NUL-terminating the buffer.
    wcscpyn() function is a safe alternative to wcscpy(). Wide characters confuse developers - they supply destination buffer’s size in bytes not wide chars.
  • strncat()
    Copies at most n bytes, i.e n is the space left in the buffer minus 1 for the NUL byte. This one byte is often miscalculated, resulting in off-by-one.
  • strlcpy()
    BSD alternative to strncpy(). Guarantees null termination of the destination buffer. The size returned is the length of the source string not including the NUL byte. It can be larger than the destination buffer size, which together with, e.g. strncat can lead to off by one.
  • strlcat()
    Similar to strncat but the size parameter is the total size of the destination buffer, not the remaining space. Guarantees NUL termination. Returns the number of bytes required to hold the resulting string. If the destination string is already longer than n parameter, the buffer is left untouched and the n parameter is returned. One of the safest alternatives.

Common issues

  • Unbounded copies. Not checking the bounds of destination buffers;
  • Character expansion where software encodes special chars, resulting in longer string than the original. Common when processing metacharacters or formatting raw data for human readability;
  • Incorrectly incrementing pointers. Pointers can be incremented outside the bounds of the string being operated on. Two main cases: when a string isn’t NUL-terminated correctly; or when a NUL terminator can be skipped because of a processing error
  • Typos. One occasional mistake is a simple pointer use error, which happens when a developer accidentally dereferences a pointer incorrectly or doesn’t dereference a pointer when necessary


In-band representation vs out of band representation of control data/metadata.
  • Embedded delimiters. A pattern in which the application takes user input that isn’t filtered sufficiently and uses it as input to a function that interprets the formatted string. This interpretation might not happen immediately; it might be written to a secondary storage facility and then interpreted later. An attack of this kind is sometimes referred to a “second-order injection attack.”
  • NUL character injection. Special case of embedded delimiter, important in scenarios of Web apps or Java etc passing strings to C-based APIs.
    Example is fgets() which stops reading when it runs out of space in the destination buffer or encounters \n or EOF. NULs have to be dealt with separately.
  • Truncation. In statically sizes buffers, input that exceeds the length of the buffer must be truncated to fit the buffer size and avoid buffer overflows. THis avoids memory corruption, but could lead to interesting side effects from data loss in the shortened input string.
    Can happen when using snprintf instead of sprintf. For functions in this family:
    Consider how every function behaves when it receives data that isn’t going to fit in a destination buffer. Does it just overflow the destination buffer? If it truncates the data, does it correctly NUL-terminate the destination buffer? Does it have a way for the caller to know whether it truncated data? If so, does the caller check for this truncation?

Common metacharacters formats

  • Path metacharacters
    • File canonicalisation - especially directory traversal
    • The Windows registry paths
  • C format strings - printf(), err(), syslog() families of functions
  • Shell metacharacters - e.g. using popen() or Perl open() call
  • SQL queries

Metacharacter filtering

Three options:
  • Detect erroneous input and reject what appears to be an attack.
    • whitelists
    • blacklists
  • Detect and strip dangerous characters.
    • insufficient filtering
    • character stripping vulnerabilities - mistakes in sanitisation routines.
  • Detect and encode dangerous characters with a metacharacter escape sequence.
    • If escape character is not treated carefully, it can be used to undermine the whole escaping routine
When escaping or decoding occurs after a security decision is made on input, it’s a problem

Character sets and unicode

  • Unicode
    • UTF-8
    • UTF-16
    • UTF-32
    • Vulnerabilities in decoding
    • Homographic attacks
  • Windows unicode functions

Wednesday, 3 August 2016

TAOSSA Chapter 7

Ch 7 - Program building blocks

Useful to study recurring code patterns, focusing on areas where developers might make security-relevant mistakes

Auditing variable use

Different techniques for recognising variable and data structure misuse

Variable relationships

The more variables used to represent state, the higher the chances of error
Search for variables that are related to each other, determine their intended relationships, and then determine whether there’s a way to desynchronize these variables from each other
This usually means finding a block of code that alters one variable in a fashion inconsistent with the other variables
Go through the code quickly (in a function) and identify variable relationships, then make one pass to see whether any vars can be desynchronised
Well-designed application keeps variable relationships to a minimum
Data hiding - concealing complex relationships in separate subsystems so that the internals aren’t exposed to callers
Data hiding can make your job harder by spreading complex relationships across multiple files and functions
Examples of data hiding include private variables in a C++ class and the buffer management subsystem in OpenSSH

Structure and object mismanagement

Applications often use large structures to manage program and session state, and group related data elements
Familiarise yourself with the interfaces to learn the purpose of objects and their constituent members
One goal of auditing object-oriented code is to determine whether it’s possible to desynchronise related structure members or leave them in an unexpected or inconsistent state to cause the application to perform some sort of unanticipated operation
Structure mismanagement bugs tend to be quite subtle - the code to manage structures is spread out into several small functions that are individually quite simple. Therefore, any vulnerabilities tend to be a result of aggregate, emergent behaviour occurring across multiple functions
One major problem area in this structure management code is low-level language issues, such as type conversion, negative values, arithmetic boundaries, and pointer arithmetic. The reason is that management code tends to perform a lot of length calculations and comparisons
Similarly to structures, objects can be left in an inconsistent state
Potential for subtle vulnerabilities caused by incorrect assumptions of implicit member functions, e.g. overloaded operators

Variable initialisation

Reading a value from a variable before it is initialised. Two cases:
  • Variable was intended to be initialised at the beginning of the function but the developer forgot to specify an initialiser in the declaration
  • A code path exists where the variable is accidentally used without ever being initialised
Most vulnerabilities of this nature occur when a function takes an abnormal code path
Functions that allocate a number of variables commonly have an epilogue that cleans up objects to avoid memory leaks when an error occurs. If these vars have not been allocated, this is potentially exploitable
In C++ code, pay close attention to member variables in objects - unexpected code paths can leave objects in an inconsistent or partially uninitialised state
The best way to begin examining this code is by looking at constructor functions to see whether any constructors neglect to initialise certain elements of the object
Destructors are automatically called during the function epilogue for objects declared in the function, similar to the case of vars freed in an epilogue above

Arithmetic boundaries

Structured process for identifying these vulnerabilities (see Ch 6 for details):
  1. Discover operations that, if a boundary condition could be triggered, would have security-related consequences (primarily length-based calculations and comparisons)
  2. Determine a set of values for each operand that trigger the relevant arithmetic boundary wrap
  3. Determine whether this code path can be reached with values within the set determined in step 2
For step 3 (is this what solvers can be used for?):
  • Identify the data type of the variable involved
  • Determine at which points the variable is assigned a value
  • Determine constraints on the variable from assignment until the vulnerable operation
  • Determine supporting code path constraints

Type confusion

Union data types are used when structures or objects are required to represent multiple data types depending on an external condition, e.g. representing different opaque objects read off the network
Occasionally, application developers confuse what the data in a union represents. This can have disastrous consequences on an application, particularly when integer data types are confused with pointer data types, or complex structures of one type are confused with another
Most vulnerabilities of this nature stem from misinterpreting a variable used to define what kind of data the structure contains

Lists and tables

Errors in implementing routines that add and modify these data structures, leading to inconsistencies in these data structures
Points to address with examining the algorithm:
  • Does the algorithm deal correctly with manipulating list elements when the list is empty?
  • What are the implications of duplicate elements?
  • Do previous and next pointers always get updated correctly?
  • Are data ranges accounted for correctly?
Empty lists: often list structure members or global variables are used to point to the head of a list and potentially the tail of the list. Look for mistakes in updating these variables. Code that doesn’t deal with head and tail elements correctly isn’t common, but it can occur, particularly when list management is decentralised (that is, there’s no clean interface for list management, so management happens haphazardly at different points in the code)
Duplicate elements: elements containing identical keys (data values used to characterise the structure as unique) could cause the two elements to get confused, resulting in the wrong element being selected from the list
Previous and next pointer updates: Often happens if the program treats the current member as the head or tail of a list
Data ranges: in ordered lists, the elements are sorted into some type of order based on a data member that distinguishes each list element. Often each data element in the list represents a range of values
Nuances with this:
  • Can overlapping data ranges be supplied?
  • Can replacement data ranges (duplicate elements) be supplied?
  • Does old or new data take precedence?
  • What happens when 0 length data ranges are supplied?
Hashing algorithms: hash tables often implemented as an array of linked lists. They use the list element as input to a hash function. The resulting hash value is used as an index to an array
Important questions:
  • Is the hashing algorithms susceptible to invalid results? E.g. when algorithm uses modulus, force it to return negative results (negative dividend). Or force to have many collisions
  • What are the implications of invalidating elements? Some algorithms prune elements based on conditions. Potentially incorrect unlinking

Auditing control flow

Internal control flow; loops and branches

Looping constructs

Data processing loops - interpret user-supplied data and construct output based on this data
Common errors:
  • The terminating conditions don’t account for destination buffer sizes or don’t correctly account for destination sizes in some cases
  • The loop is post-test when it should be pretest
  • A break or continue statement is missing or incorrectly placed
  • Some misplaced punctuation causes the loop to not do what it’s supposed to
Terminating Conditions
Some loops have multiple terminating conditions when processing user data
The set of terminating conditions in a loop might not adequately account for all possible error conditions, or the implementation of the checks is incorrect
Main problems when calculating lengths:
  • The loops fail to account for a buffer’s size
  • A size check is made, but it’s incorrect
When you read complex functions containing nested loops, these types of suspect loop constructs can be difficult to spot
With size checks off-by-one errors are common, in string processing
Occasionally, when loops terminate in an unexpected fashion, variables can be left in an inconsistent state
Another off-by-one error occurs when a variable is incorrectly checked to ensure that it’s in certain boundaries before it’s incremented and used
Loops that can write multiple data elements in a single iteration might also be vulnerable to incorrect size checks, e.g. because of character escaping or expansion that weren’t adequately taken into account by the loop’s size checking
A loop’s size check could be invalid because of a type conversion, an arithmetic boundary condition, operator misuse, or pointer arithmetic error
Post-test vs pretest loops
Pretest loops tend to be used primarily; post-test loops are used in some situations out of necessity or for personal preference
Post-test loops should be used when the body of the loop always needs to be performed at least one time. Look for potential situations where execution of the loop body can lead to an unexpected condition. One thing to look out for is the conditional form of the loop performing a sanity check that should be done before the loop is entered
With pre-test loops - if code following a loop expects that the loop body has run at least once, an attacker might be able to intentionally skip the loop entirely and create an exploitable condition
Punctuation errors
E.g. semicolon at the end of the line with the for loop - empty loop
See chapter 6 as well

Flow transfer statements

Dual use of break in C (loops/switch) can be confusing
Developers might assume that a break statement can break out of any nested block and use it in an incorrect place
Or they might assume the statement breaks out of all surrounding loops instead of just the most immediate loop
Another problem is using a continue statement inside a switch statement to restart the switch comparison

Switch statements

A common pitfall that developers fall into when using switch statements is to forget the break statement at the end of each case clause
When the break statement is left out on purpose, programmers often leave a comment (such as /* FALLTHROUGH */ for lint) indicating that the omission of the break statement is intentional
Check if there are any unaccounted for case

Auditing functions

What program state changes because of that call? What things can possibly go wrong with that function? What role do arguments play in how that function operates?
Focus on arguments and aspects of the function that users can influence in some way
Four main vulnerability types:
  • Return values are misinterpreted or ignored.
  • Arguments supplied are incorrectly formatted in some way.
  • Arguments get updated in an unexpected fashion.
  • Some unexpected global program state change occurs because of the function call.

Function audit logs

Create a per-function log - purpose and side effects; return values type and meaning, conditions that cause errors, erroneous return values

Return value testing and interpretation

If a return value is misinterpreted or simply ignored, the program might take incorrect code paths as a result, which can have severe security implications
Ignoring return values
Ignoring a return value could cause an error condition to go undetected
Often programmers forget to test malloc or realloc return value for failure
Realloc failures may be exploitable
Other memory allocation functions, especially if they involve copying data
Note where the return value (for functions where it indicates success or failure) is not tested
Note the error conditions returned by the function
Effects of ignoring return value depend on the structure of the caller
Mistinterpreting return values
A return value could be misinterpreted in two ways: a programmer might simply misunderstand the meaning of the return value, or the return value might be involved in a type conversion that causes its intended meaning to change
First one often happens when a team of programmers is developing an application and using third-party code and libraries
Example: on UNIX snprintf returns typically returns how many bytes it would have written to the destination, had there been enough room
Systematic approach when finding misinterpreted values:
  1. Determine the intended meaning of the return value for the function. If code is documented, verifying that the function returns what the documenter says it does is still important.
  2. Look at each location in the application where the function is called and see what it does with the return value. Is it consistent with that return value’s intended meaning?
Occasionally, the fault of a misinterpreted return value isn’t with the calling function, but with the called function
Finding these cases:
  1. Determine all the points in a function where it might return. Usually there are multiple points where it might return because of errors and one point at which it returns because of successful completion.
  2. Examine the value being returned. Is it within the range of expected return values? Is it appropriate for indicating the condition that caused the function to return?
The second type of misinterpretation (type conversion) is an extension of the first. Determine what type conversions occur when a the return value is tested (conversion rules?) or stored (target variable type?)

Function side-effects

A function that does not generate any side-effects is referentially transparent - that is, the function call can be replaced directly with the return value. A function that causes side-effects isreferentially opaque
Interesting are the specific function side effects: manipulating arguments passed by reference (value-result arguments) and manipulating globally scoped variables
One common situation is when realloc() is used to resize a buffer passed as a pointer argument. Then the calling function has a pointer that was not updated after a call to realloc(), or the new allocation size is incorrect because of a length miscalculation
“Outdated pointer” bugs are often spread out b/w several functions Make note of security-relevant functions that manipulate pass-by-reference arguments, as well as the specific manner in which they perform this manipulation.
These kinds of argument manipulations often use opaque pointers with an associated set of manipulation functions.
This type of manipulation is also an inherent part of C++ classes, as they implicitly pass a reference to the this pointer. However, C++ member functions can be harder to review due to the number of implicit functions that may be called and the fact that the code paths do not follow a more direct procedural structure.
Determining risk of pass-by-reference manipulation:
  1. Find all locations in a function where pass-by-reference arguments are modified, particularly structure arguments.
  2. Differentiate between mandatory modification and optional modification. Mandatory modification occurs every time the function is called; optional modification occurs when an abnormal situation arises. Programmers are more likely to overlook exceptional conditions related to optional modification.
  3. Examine how calling functions use the modified arguments after the function has returned.
Also, note when arguments aren’t updated when they should be. Pay close attention to what happens when functions return early because of some error: Are arguments that should be updated not updated for some reason?
Auditing functions that modify global variables is similar but the vulnerabilities introduced might be more subtle. Especially for the code that can run at any point in the program, e.g. exception handler or signal handler
In object-oriented programs, it can be much harder to determine whether global variables are susceptible to misuse because of unexpected modification. The difficulty arises because the order of execution of constituent member functions often isn’t clear.

Argument meaning

When auditing a function for vulnerabilities related to incorrect arguments being supplied, the process is as follows:
  1. List the type and intended meaning of each argument to a function.
  2. Examine all the calling functions to determine whether type conversions or incorrect arguments could be supplied.
Check for type conversions. They may become an issue if the interoperation of the argument can change based on the sign change
MultiByteToWideChar() - length is misinterpreted: destination buffer in wide chars, not in bytes. Confusing the two sizes (e.g. by specifying sizeof(buf)) leads to an overflow.
The more difficult the function is to figure out, the more likely it is that it will be used incorrectly
You should be able to answer any questions about a functions quirks and log the answers so that the information is easily accessible later.
Be especially mindful of type conversions that happen with arguments, such as truncation when dealing with short integers, because they are susceptible to boundary issues

Auditing memory management

Allocation-check-copy logs

Recording variations in allocation sizes of memory blocks, length checks on the block, how data elements are copied in that block
Beware of custom allocators
  • Unanticipated conditions. Length miscalculations can arise when unanticipated conditions occur during data processing
  • Data assumptions. In code dealing with binary data (e.g. proprietary file formats and protocols) programmers tend to be more trusting of the content
    E.g. assumptions about a data element’s largest possible size, even when a length is specified before the variable-length data field
  • Order of actions. Actions that aren’t performed in the correct order can also result in length miscalculation
  • Multiple length calculations on the same input. A common situation is data being processed with an initial pass to determine the length and then a subsequent pass to perform the data copy

Allocation functions

Watch for erroneous handling of requests instead of assuming these custom routines are sound. Audit custom allocators as you would any other complex code - by keeping a log of the semantics of these routines and noting possible error conditions and the implications of those errors.
Typical issues to look for:
  • Is it legal to allocate 0 bytes? Requesting an allocation of 0 bytes on most OS allocation routines is legal. A chunk of a certain minimum size (typically 12 or 16 bytes) is returned. This piece of information is important when you’re searching for integer-related vulnerabilities - a custom alloc call can be a sanitising wrapper to malloc
  • Does the allocation routine perform rounding on the requested size? An allocation routine potentially exposes itself to an integer overflow vulnerability when it rounds a requested size up to the next relevant boundary without performing any sanity checks on the request size first
  • Are other arithmetic operations performed on the request size? Another potential for integer overflows - when an application performs an extra layer of memory management on top of the OS’s management. E.g. the application memory management routines request large memory chunks from the OS and then divide it into smaller chunks for individual requests. Some sort of header is usually prepended to the chunk and hence the size of such a header is added to the requested chunk size.
    Similar situation with reallocation routines when they don’t have sanity checking.
  • Are the data types for request sizes consistent? Many typing issues from Ch 6 are relevant for allocators - any type conversion mistake usually leads to memory corruption.
    Allocators that use 16 -bit sizes are even easier to overflow.
    Similar issues with LP64 arch - long and size_t are 64bit, while int is only 32bit.
    Important case - when values passed to memory allocation routines are signed. If an allocation routine doesn’t do anything except pass the integer to the OS, it might not matter whether the size parameter is signed. If the routine is more complex and performs calculations and comparisons based on the size parameter, however, whether the value is signed is definitely important. Usually, the more complicated the allocation routine, the more likely it is that the signed condition of size parameters can become an issue.
  • Is there a maximum request size?* Sometimes developers build in a maximum limit for how much memory the code allocates. This often works as a sanitiser.
  • Is a different size memory chunk than was requested ever returned? Essentially all integer-wrapping vulnerabilities become exploitable bugs for one reason: A different size memory chunk than was requested is returned. When this happens, there’s the potential for exploitation. Occasionally a memory allocation routine can resize a memory request.

Allocator scorecards and error domains

Create allocator scorecard: Prototype, is 0 bytes legal, rounds to X bytes, additional operations, maximum size, exceptional circumstances, notes, errors. Signedness and 16-bit issues can be inferred from function prototype.
Error domain is a set of values that, when supplied to the function, generate one of the exceptional conditions that could result in memory corruption.

Double frees

  1. Free then allocated to other data, overwritten and freed again - with crafted data can lead to code execution
  2. Block can be entered in the free block list twice (not possible on Windows and glibc - they check that block passed to free() is in use). Can also lead to code exec
Track each path throughout a variable’s lifespan to see whether it’s accidentally deallocated with the free() function more than once.
Especially pay attention when auditing C++ code. Sometimes keeping track of an object’s internal state is difficult, and unexpected states could lead to double-frees. Be mindful of members that are freed in more than one member function in an object (such as a regular member function and the destructor), and attempt to determine whether the class is ever used in such a way that an object can be destructed when some member variables have already been freed.
Many operating systems’ reallocation routines free a buffer that they’re supposed to reallocate if the new size for the buffer is 0. This is true on most UNIX implementations. Therefore, if an attacker can cause a call to realloc() with a new size of 0, that same buffer might be freed again later; there’s a good chance the buffer that was just freed will be written into.