Friday 27 April 2018

Pleco - a great crutch for beginners, an obstacle after that.

So much publicity has been given to Pleco, and now I have finally realised the reason - beginners' tools always are the ones generating the most noise! And Pleco is not a horrible tool at its basics.

TLDR; - Pleco is great for foreigners with minimal Chinese and without a data plan in China. For everyone else, there are better tools.

Pleco is indeed a nice English-language interface for dictionaries it resells. A feature very important for beginners.

At the same time, say, CC-CEDICT is more than enough for any learner's purposes and is used by many other apps without any problems. For beginners, more dictionaries is not better than a single authoritative one - it's worse.

Dictionaries recently appearing in Pleco, like 汉语大词典, have been available in Youdao and other apps for quite a while now. These apps might be in Chinese, but then again, why would you insist on an English interface to a Chinese-Chinese dictionary? In fact, Youdao 有道 has a great mobile app once you are able to navigate the interface in Chinese - which is only good for your learning the language! English everywhere simply limits learning of Chinese, once you're past the beginners' stage. Moreover, Youdao and nearly any other Chinese dictionary app has a desktop companion, which Pleco, by Mike Love's own admission, will never have. The latter I suspect is mainly explained by difficulties associated with selling copyrighted stuff via a desktop app.

A personal disappointment - Pleco's flashcards are a complete disaster. Deficiencies are too many to describe:  e.g. an extremely contrived interface with a million knobs, with any change feeling like you're walking on a minefield. Zero ability to undo almost everywhere throughout the app(!) which Mike says is cured by making manual backups of everything before any "dangerous" (=any) operation (he said he makes triple backups of "important stuff" himself). IIRC you need to back up in two or three completely different parts of the interface to back up more or less all your mutable data.

In any case, for flashcards Anki is and will remain unsurpassed. Do not fall for easiness of "adding a flashcard" in Pleco. It is as easy to lose it and not even realise it! I fell for this easiness, twice(!) and oh did this bit me. If you need dictionary support in Anki - use Chinese Support plugin or even link to Pleco if you cannot lookup CC-CEDICT using other means. A version of my Anki setup with such a link is here.

Why I am writing all this? I used to be a fan of Pleco a few years ago, when I didn't or couldn't use apps in Chinese (aka when I was a beginner, these apps are superior once you're past that stage). I also thought, no idea why, that Pleco was a serious company and not two (or one?) guys and a dog. Alas, the speed of releases of any feature updates (not bugfixes) - once every few years, plus seeing the owner's paying more attention to his promotion forum than actual app has broken that illusion.

Mike Love is a great PR man, and spends a lot of time keeping his forums looking like everyone is singing praise to his app. But the moment a thread appears raising a serious design deficiency, and it receives support from others, it gets deleted and the poster - banned, surprise. And there are many such deficiencies, or rather bad design decisions from 10 years ago. Once I mentioned, among other things, Anki and Google Translate as better alternatives in everything but dictionary lookup. Apparently I was "trying to convince people to buy different products". Guess what happened.

Pleco is a small company struggling for survival - I had no idea. If only Mike didn't waste so much time commenting on his own forums and wrote code instead!

Tuesday 23 January 2018

汉语大词典 for OSX

Recently, I picked up Hanyu Da Cidian in .DSL (ABBYY Lingvo) format somewhere, probably on Russian Interwebs. I already bought a Pleco version of this great dictionary a while ago, so I didn't care much in figuring out its copyright status. Anyhow, since I do not use Lingvo at all, I converted it to OSX native Dictionary format.

Being an old-school nerd (information still wants to be free, no matter how successful the corpos were in convincing millenials otherwise), I don't mind sharing it. Besides, I'll be extremely flattered if this causes some Chinese corpo send Google a DMCA takedown request!

Unpack the ZIP, put the resulting .dictionary package in ~/Library/Dictionaries/ or wherever your custom dictionaries are kept, enable in Dictionary preferences.

Note it is in simplified, not traditional, characters.

Monday 20 November 2017

A nerd's progress on learning Chinese (for no practical reason)

Once a nerd, always a nerd.

An infrequent progress report. These days I could probably pass HSK5 and am thinking whether it's even worth continuing doing these HSK tests. They are so geared towards students planning to study and get their first job in China (har har) that they are not very useful in assessing overall language proficiency. I've heard good things about TOCFL, yet again it is likely very much geared towards future students in Taiwan. We will see.

After finishing Integrated Chinese I moved on to Reading Into a New China and have just started the second volume. With a bit of ChinesePod sprinkled in, my least developed area is probably writing. Language production is always less developed than comprehension, and wrt speaking - I don't have that much of a desire to develop eloquency in spoken Chinese, at least yet! Plus a few hours a week online with a tutor, who doesn't focus on speaking either, but we talk nearly exclusively in Mandarin, sort of help.

I'm thinking what to do about writing, other than the obvious - write a lot. The trouble is, the stuff I can write is rather limited in vocabulary (according to Anki, my passive vocabulary is just under 4k words and 2k characters) and will end up being like a first grader diary :D With a lot of mistakes, naturally.

Perhaps sticking to reading as much as I can will be the best for the next (half?)year. While flipping through talks by Steve Kauffman I saw him mentioning a few nice, though old, readers e.g. Intermediate Reader in Modern Chinese and Twenty Lectures on Chinese Culture. In traditional characters (I'm not looking for an easy way out hah)! The thing about most modern Chinese readers that I don't like is they are extremely utilitarian, especially all textbooks published in mainland China itself. Not only they (China-published books) stick to the repertoire of "read text, memorise the list of words, fill in these blanks", but also the topics of their texts are largely grouped around "how wonderful it is for a foreigner to study and work in our great country". Blergh. Princeton readers and many recent Cheng-Tsui's books are more palatable to me. Plus I'll probably have to learn a bit of traditional charset for the old books above.

So those are on the to-(try to)read list. Plus tentatively Classical Chinese (in simplified, thank gods). Classical more because the further I get away from beginner-ish dialogues, the more I realise how much of modern written Chinese is rooted in 文言文. No matter the charset.

Oh and why I even started writing this post: I wanted to share my nerdy Anki model for Chinese cards. Here it goes - it started with stock Chinese Support model but grew a bit over last year. It's not beginner-friendly and has some minimal Russian language usage. I find some Chinese words and sentences make more sence when translated into Russian than into English. Probably a native language bias.

Here we go:
Styling and

An update re Anki - I'll probably stop using it soon, other than for drilling HSK before tests. Time to switch to just reading, it becomes a waste of time - an hour each day. Some stats:

Thursday 25 May 2017

Back from dead, and possibly a Chinese zombie - argh!

I wouldn't call this "back" exactly though. Re-emerging with a completely non-infosex related topic after a year of learning Chinese (Mandarin, or 普通话, strictly speaking). Also, no links here - use Google if you are a curious type.

After a year of lazy (I was somewhat busy for half of 2015 first nearly dying and then recovering from an acident) learning and another year of studying reasonably seriously I'm at HSK4 level. Which isn't really as much as advertised, but that's a separate topic.

This motivates me to wax philosophical a little. Given all my applications of "trial and discover" method, I am certain that there are few to none language-specific tricks to learning Chinese. All of such advice can be applied to, or rather derived from, learning of other languages.

That is, all language learning is essentially the same - a lot of practice and repetition, preferably a long-term immersion. I've learned a few (human!) languages to various degrees of usefulness in the past, and it always worked that way.

Chinese is especially difficult because of
  • a) its peculiar writing system
  • b) complete lack of cognates with Indo-European (IE) languages and
  • c) a somewhat alien to a Western (IE, really) speaker grammar.
On the last point - it depends on your mother tongue, with English speakers being hit the heaviest among major IE language groups. At least, complaints in English are the most vocal! Thank gods I'm a native Russian speaker and, intriguingly, there are some concepts that are sort of common with Chinese - e.g. verbal aspect (~了) is not an entirely alien concept; "verbal complements" in Mandarin correspond to verbal prefixes in Russian, and so on.

Chinese-specific trick or two I learned are all related to writing: Learn writing of some characters, maybe a few hundred (but prioritise using pinyin IME), and learn the most common 150-ish radicals - this will make learning characters easier. Skritter is nice, partially because they record the state of all characters you ever learn, no matter whether you add or remove lists you're studying from.

Reading and listening comprehension - well, read and listen a lot; maybe prioritise learning words over single characters they consist of (not sure about the latter though). Zhongwen is a good Chrome plugin and ChineseGrammarWiki is a good online grammar.

Speaking - the same. Find a tutor, there are tons online, eg. on iTalki. A good piece of software to visualise tones is SpeakGoodChinese (or Praat, which it is based on).

That plus all of the generic language learning advice like "you learn what you practice", "find the group/book(Integrated Chinese for me)/method that works for you", "use SRS(Anki) if it's your thing". And motivation, which is the key with any learning.

好好学习天天向上, in short!

Monday 15 August 2016

Thursday 11 August 2016

TAOSSA code auditing tips

A summary (copy-paste) of all "auditing tips" from the (still!) awesome TAOSSA book

Ch 6

Auditing Tip: Type Conversions
Even those who have studied conversions extensively might still be surprised at the way a compiler renders certain expressions into assembly. When you see code that strikes you as suspicious or potentially ambiguous, never hesitate to write a simple test program or study the generated assembly to verify your intuition.
If you do generate assembly to verify or explore the conversions discussed in this chapter, be aware that C compilers can optimize out certain conversions or use architectural tricks that might make the assembly appear incorrect or inconsistent. At a conceptual level, compilers are behaving as the C standard describes, and they ultimately generate code that follows the rules. However, the assembly might look inconsistent because of optimizations or even incorrect, as it might manipulate portions of registers that should be unused.
Auditing Tip: Signed/Unsigned Conversions
You want to look for situations in which a function takes a size_t or unsigned int length parameter, and the programmer passes in a signed integer that can be influenced by users. Good functions to look for include read(), recvfrom(), memcpy(), memset(), bcopy(), snprintf(), strncat(), strncpy(), and malloc(). If users can coerce the program into passing in a negative value, the function interprets it as a large value, which could lead to an exploitable condition.
Also, look for places where length parameters are read from the network directly or are specified by users via some input mechanism. If the length is interpreted as a signed variable in parts of the code, you should evaluate the impact of a user supplying a negative value.
As you review functions in an application, it’s a good idea to note the data types of each function’s arguments in your function audit log. This way, every time you audit a subsequent call to that function, you can simply compare the types and examine the type conversion tables in this chapter’s “Type Conversions” section to predict exactly what’s going to happen and the implications of that conversion. You learn more about analyzing functions and keeping logs of function prototypes and behavior in Chapter 7, “Program Building Blocks.”
Auditing Tip: Sign Extension
When looking for vulnerabilities related to sign extensions, you should focus on code that handles signed character values or pointers or signed short integer values or pointers. Typically, you can find them in string-handling code and network code that decodes packets with length elements. In general, you want to look for code that takes a character or short integer and uses it in a context that causes it to be converted to an integer. Remember that if you see a signed character or signed short converted to an unsigned integer, sign extension still occurs.
As mentioned previously, one effective way to find sign-extension vulnerabilities is to search the assembly code of the application binary for the movsx instruction. This technique can often help you cut through multiple layers of typedefs, macros, and type conversions when searching for potentially vulnerable locations in code.
Auditing Tip: Truncation
Truncation-related vulnerabilities are typically found where integer values are assigned to smaller data types, such as short integers or characters. To find truncation issues, look for locations where these shorter data types are used to track length values or to hold the result of a calculation. A good place to look for potential variables is in structure definitions, especially in network-oriented code.
Programmers often use a short or character data type just because the expected range of values for a variable maps to that data type nicely. Using these data types can often lead to unanticipated truncations, however.
Auditing Tip
Reviewing comparisons is essential to auditing C code. Pay particular attention to comparisons that protect allocation, array indexing, and copy operations. The best way to examine these comparisons is to go line by line and carefully study each relevant expression.
In general, you should keep track of each variable and its underlying data type. If you can trace the input to a function back to a source you’re familiar with, you should have a good idea of the possible values each input variable can have. Proceed through each potentially interesting calculation or comparison, and keep track of potential values of the variables at different points in the function evaluation. You can use a process similar to the one outlined in the previous section on locating integer boundary condition issues.
When you evaluate a comparison, be sure to watch for unsigned integer values that cause their peer operands to be promoted to unsigned integers. sizeof and strlen () are classic examples of operands that cause this promotion.
Remember to keep an eye out for unsigned variables used in comparisons, like the following:
if (uvar < 0) ...  
if (uvar <= 0) ...
The first form typically causes the compiler to emit a warning, but the second form doesn’t. If you see this pattern, it’s a good indication something is probably wrong with that section of the code. You should do a careful line-by-line analysis of the surrounding functionality.
Auditing Tip: sizeof
Be on the lookout for uses of sizeof in which developers take the size of a pointer to a buffer when they intend to take the size of the buffer. This often happens because of editing mistakes, when a buffer is moved from being within a function to being passed into a function.
Again, look for sizeof in expressions that cause operands to be converted to unsigned values.
Auditing Tip: Unexpected Results
Whenever you encounter a right shift, be sure to check whether the left operand is signed. If so, there might be a slight potential for a vulnerability. Similarly, look for modulus and division operations that operate with signed operands. If users can specify negative values, they might be able to elicit unexpected results.
Auditing Tip
Pointer arithmetic bugs can be hard to spot. Whenever an arithmetic operation is performed that involves pointers, look up the type of those pointers and then check whether the operation agrees with the implicit arithmetic taking place. In Listing 6-29, has sizeof() been used incorrectly with a pointer to a type that’s not a byte? Has a similar operation happened in which the developer assumed the pointer type won’t affect how the operation is performed?

Ch 7

Auditing Tip
When data copies in loops are performed with no size validation, check every code path leading to the dangerous loop and determine whether it can be reached in such a way that the source buffer can be larger than the destination buffer.
Auditing Tip
Mark all the conditions for exiting a loop as well as all variables manipulated by the loop. Determine whether any conditions exist in which variables are left in an inconsistent state. Pay attention to places where the loop is terminated because of an unexpected error, as these situations are more likely to leave variables in an inconsistent state.
Auditing Tip
Determine what each variable in the definition means and how each variable relates to the others. After you understand the relationships, check the member functions or interface functions to determine whether inconsistencies could occur in identified variable relationships. To do this, identify code paths in which one variable is updated and the other one isn’t.
Auditing Tip
When variables are read, determine whether a code path exists in which the variable is not initialized with a value. Pay close attention to cleanup epilogues that are jumped to from multiple locations in a function, as they are the most likely places where vulnerabilities of this nature might occur. Also, watch out for functions that assume variables are initialized elsewhere in the program. When you find this type of code, attempt to determine whether there’s a way to call functions making these assumptions at points when those assumptions are incorrect.

Ch 8

Auditing Tip
When attempting to locate format string vulnerabilities, search for all instances of printf(), err(), or syslog() functions that accept a nonstatic format string argument, and then trace the format argument backward to see whether any part can be controlled by attackers.
If functions in the application take variable arguments and pass them unchecked to printf(), syslog(), or err() functions, search every instance of their use for nonstatic format string arguments in the same way you would search for printf() and so forth.
Auditing Tip
You might find a vulnerability in which you can duplicate a file descriptor. If you have access to an environment similar to one in which the script is running, use lsof or a similar tool to determine what file descriptors are open when the process runs. This tool should help you see what you might have access to.
Auditing Tip
Code that uses snprintf() and equivalents often does so because the developer wants to combine user-controlled data with static string elements. This use may indicate that delimiters can be embedded or some level of truncation can be performed. To spot the possibility of truncation, concentrate on static data following attacker-controllable elements that can be of excessive length.
Auditing Tip
When auditing multicharacter filters, attempt to determine whether building illegal sequences by constructing embedded illegal patterns is possible, as in Listing 8-26.
Also, note that these attacks are possible when developers use a single substitution pattern with regular expressions, such as this example:
$path =~ s/\.\.\///g;
This approach is prevalent in several programming languages (notably Perl and PHP).

Ch 9

Auditing Tip
The access() function usually indicates a race condition because the file it checks can often be altered before it’s actually used. The stat() function has a similar problem.
Auditing Tip
It’s a common misunderstanding to think that the less specific permission bits are consulted if the more specific permissions prevent an action.

Ch 10

Auditing Tip
When auditing code that’s running with special privileges or running remotely in a way that allows users to affect the environment, verify that any call to execvp() or execlp() is secure. Any situation in which full pathnames aren’t specified, or the path for the program being run is in any way controlled by users, is potentially dangerous.
Auditing Tip
Carefully check for any privileged application that writes to a file without verifying whether writes are successful. Remember that checking for an error when calling write() might not be sufficient; they also need to check whether the amount of bytes they wrote were successfully stored in their entirety. Manipulating this application’s rlimits might trigger a security vulnerability by cutting the file short at a strategically advantageous offset.
Auditing Tip
Never assume that a condition is unreachable because it seems unlikely to occur. Using rlimits is one way to trigger unlikely conditions by restricting the resources a privileged process is allowed to use and potentially forcing a process to die when a system resource is allocated where it usually wouldn’t be. Depending on the circumstances of the error condition you want to trigger, you might be able to use other methods by manipulating the program’s environment to force an error.

Ch 14

Auditing Tip
Examine the TCP sequence number algorithm to see how unpredictable it is. Make sure some sort of cryptographic random number generator is used. Try to determine whether any part of the key space can be guessed deductively, which limits the range of possible correct sequence numbers. Random numbers based on system state (such as system time) might not be secure, as this information could be procured from a remote source in a number of ways.

Ch 17

Auditing Tip
Examine all exposed static HTML and the contents of dynamically generated HTML to make sure nothing that could facilitate an attack is exposed unnecessarily. You should do your best to ensure that information isn’t exposed unnecessarily, but at the same time, look out for security mechanisms that rely on obscurity because they are prone to fail in the Web environment.
Auditing Tip
Look at each page of a Web application as though it exists in a vacuum. Consider every possible combination of inputs, and look for ways to create a situation the developer didn’t intend. Determine if any of these unanticipated situations cause a page use the input without first validating it.
Auditing Tip
Always consider what can happen if attackers visit the pages of a Web application in an order the developer didn’t intend. Can you bypass certain security checks by skipping past intermediate verification pages to the functionality that actually performs the processing? Can you take advantage of any race conditions or cause unanticipated results by visiting pages that use session data out of order? Does any page trust the validity of an information user’s control?
Auditing Tip
First, focus on content that’s available without any kind of authentication because this code is most exposed to Internet-based attackers. Then study the authentication system in depth, looking for any kind of issue that lets you access content without valid credentials.
Auditing Tip
When reviewing authorization, you need to ensure that it’s enforced consistently throughout the application. Do this by enumerating all privilege levels, user roles, and privileges in use.
Auditing Tip
Although this sample application might seem very contrived, it is actually representative of flaws that are quite pervasive throughout modern Web applications. You want to look for two patterns when reviewing Web applications:
  1. The Web application takes a piece of input from the user, validates it, and then writes it to an HTML page so that the input is sent to the next page. Web developers often forget to validate the piece of information in the next page, as they don’t expect users to change it between requests. For example, say a Web page takes an account number from the user and validates it as belonging to that user. It then writes this account number as a parameter to a balance inquiry link the user can click. If the balance inquiry page doesn’t do the same validation of the account number, the user can just change it and retrieve account information for other users.
  2. The Web application puts a piece of information on an HTML page that isn’t visible to users. This information is provided to help the Web server perform the next stage of processing, but the developer doesn’t consider the consequences of users modifying the data. For example, say a Web page receives a user’s customer service complaint and creates a form that mails the information to the company’s help desk when the user clicks Submit. If the application places e-mail addresses in the form to tell the mailing script where to send the e-mail, users could change the e-mail addresses and appear to be sending e-mail from official company servers.
Auditing Tip
Weaknesses in the HTTP authentication protocol can prove useful for attackers. It’s a fairly light protocol, so it is possible to perform brute-force login attempts at a rapid pace. HTTP authentication mechanisms often don’t do account lockouts, especially when they are authenticating against flat files or local stores maintained by the Web server. In addition, certain accounts are exempt from lockout and can be brute-forced through exposed authentication interfaces. For example, NT’s administrator account is immune from lockout, so an exposed Integrated Windows Authentication service could be leveraged to launch a high-speed password guessing attack.
You can find several tools on the Internet to help you launch a brute-force attack against HTTP authentication. Check the tools sections at
Auditing Tip
When you review a Web site, you should pay attention to how it uses cookies. They can be easy to ignore because they are in the HTTP request and response headers, not in the HTML (usually), but they should be reviewed with the same intensity you devote to GET and POST parameters.
You can get access to cookies with certain browser extensions or by using an intercepting Web proxy tool, such as Paros ( or SPIKE Proxy ( Make sure cookies are marked secure for sites that use SSL. This helps mitigate the risk of the cookie ever being transmitted in clear text because of deliberate attacks, such as cross-site scripting, or unintentional configuration and programming mistakes and browser bugs.
Auditing Tip
Tracking state based on client IP addresses is inappropriate in most situations, as the Internet is filled to capacity with corporate clients going though NAT devices and sharing the same source IP. Also, you might face clients with changing source IPs if they come from a large ISP that uses an array of proxies, such as AOL. Finally, there is always the possibility of spoofing attacks that allow IP address impersonation.
There are better ways of tracking state, as you see in the following sections. As a reviewer, you should look out for any kind of state-tracking mechanism that relies solely on client IPs.
Auditing Tip
If you see code performing actions or checks based on the request URI, make sure the developer is handling the path information correctly. Many servlet programmers use request.getRequestURI() when they intend to use request.getServletPath(), which can definitely have security consequences. Be sure to look for checks done on file extensions, as supplying unexpected path information can circumvent these checks as well.
Auditing Tip
Generally, you should encourage developers to use POST-style requests for their applications because of the security concerns outlined previously. One issue to watch for is the transmission of a session token via a query string, as that creates a risk for the Web application’s clients. The risk isn’t necessarily a showstopper, but it’s unnecessary and quite easy for a developer or Web designer to avoid.

Wednesday 10 August 2016

TAOSSA Chapter 16

Ch. 16 Network application protocols

Auditing application protocols

  • Collect documentation
  • Idendity elements of unknown protocols
  • Using packet sniffers
  • Initiate the connection several times
    • Did a single field change by a lot or a little?
    • Was the change of values in a field drastic? Could it be random, e.g. a connection ID?
    • Did the size of the packet change? Did a field change in relation to the size of the packet? Could it be a size field?
  • Many procotols are composed of messages that have a similar header format and varying body
  • Replay messages

Reverse-engineer the application

  • Use symbols
  • Examine strings in the binary
    • Useful for debug strings
    • Getting from error messages to code that generates them
  • Examine special values - e.g. unique tag values
  • Debug - turn on debugging if possible
  • Find communication primitives
    • For exampls, for TCP: read(), recv(), recvmsg(), WSArecv()
  • Use library tracing
  • Match data types with the protocol
    • Analyze the structure of untrusted data processed by a server/client, then match elements of these structures with vulnerabilty classes

Binary protocols

  • Integer overflows and 32-bit length values
    • When 32bit length variables are used to dynamically allocate space for user-supplied data. Usually results in heap corruption
    • size_t in particular
  • Integer underflows and 32-bit length values
    • When variables are not adequately checked against each other to enforce a relationship
    • When length values are required to hold a minimum length but the parsing code never verifies this requirement
  • Small data types
    • Sign extension issues are more relevant because programs often natively use 32-bit variables even when dealing with smaller data types

Text-based protocols

  • Most vulns in binary protocol implementations result from type conversions and arithmetic boundary conditions.
  • Text-based protocols tend to contain vulnerabilties related to text processing - standard buffer overflows, pointer arithmetic errors, off-by-ones etc
  • One exception is text-based protocols specifying lengths in text that are converted to integers
  • Buffer overflows. Text-based protocols manipulate strings, more vulnerable to simpler buffer overflows than to type conversion errors.
    • Unsafe use of string functions
    • Pointer arithmetic bugs are more common because more subtle, especially off-by-ones. Common when there are multiple elements in a single line of text.
  • Text-formatting issues. Format string issues, resource access; problems in text data decoding implementations - bad hex or UTF-8 decoding routines.

Data verification

  • Cases where information disclosure on the network is bad; or when forged or modified data can result in a security issue on the receiver.
  • Encryption may be necessary; data verificaiton may be required.

Access to system resources

  • Many protocols allow users request system resources implicitly or explicitly
  • Questions to consider:
    • Is credential verification for accessing the resource adequate?
    • Does the application give access to resources that it’s supposed to? (ie. implementation is flawed and discloses more than intended)


See chapter 17 for details.
  • Header parsing. Vulnerabilities are more likely when parsing a “folded header”. Code sometimes assumes headers are limited in length, but an arbitrary long header can be supplied by using folded headers.
  • Accessing resources. HTTP is designed to serve content to clients.
    • Many examples of disclosing arbitrary files from the filesystem.
    • Encoding-related traversal bugs.
    • If the server implements additional features or keywords, check the corresponding code, more likely to have bugs.
  • Utility functions.
    • Functions for URL handling - dealing with port, protocoal path components etc. Check for buffer overflows.
    • Logging utility functions can be interesting
  • Posting data. Data supplied via POST method. Simple counted data post and chunked encoding.
    • Simple counted - depends on how the length value is interpreted. large values can lead to integer overflows or sign issues. Processing of signed content length values also error prone.
    • Chunked encoding - remote attackers specifying arbitrary sizes has been a problem. Careful sanitation of specified sizes is required to avoid integer overflows oe sign-comparison vulnerabilities. Integer overflows are easier to trigger than the same in simple counted encoding.
    • Note that realloc(0) is equivalent to free, so need to overflow not to 0 but 1 or more.


  • ISAKMP packet header contains a 32 bit length field. Primarily signed issues and integer overflows in this user-controlled value.
  • The length header in the header is the total packet length, including the header itself. Developers might assume the length field is larger or equal to the ISAKMP header size (=8). Possible underflow.
  • 16 bit length field in ISAKMP payloads. Same - overflows and udnerflow. Minimal size 4 bytes.
  • Amount of bytes currently processed + current payload length <= isakmp packet length
  • Different payload types
<skipped a long description of various issues with different payloads>


Three standardized methods for encoding: Basic Encoding rules (BER); Packed ER (PER), XML ER (XER)

Data types

  • Universal - in the standard
  • Application - tags that are unique to applications
  • Context-specific - tags used to identify a member in a constructed type (e.g. a set)
  • Private - unique to an organization
Primitive vs. constructed types. Constructed composed of one or more simple types and other constructed types. Can be sequences (SEQUENCE), lists (SEQUENCE-OFSET and SET-OF), or choices

Basic encoding rules

Identifier, length, some content data, and an end-of content (EOC) sequence.
  • Identifier class + p/c + tag number. Tag number can be composite if it’s over 30. To encode the value 0x3333, for example, the 0xFF 0xD6 0x33 byte sequence would be used.
  • Length definite or indefinite length. Indefinite = terminated with EOC sequence.
  • Contents
  • End of content - only required when object has an indefinite length. EOC is 0x00 0x00.
Canonical encoding rules (CER) vs Distinguished encoding rules (DER) - limit the ambiguousness of BER.
  • CER Same as BER with restrictions: (used when large objects are transmitted; when all the object data is not available; when object sizes are not known at transmission time)
    • Constructed types must use an indefinite length encoding.
    • Primitive types must use the fewest encoding bytes necessary to express the object size.
  • DER smaller objects in which all bytes for objects are available and the lengths of objects are known at transmission time.
    • All objects must have a definite length encoding (no EOC)
    • The length encoding must use the fewest bytes necessary (same as CER)

Vulnerabilities in BER, CER, DER implementations

  • Tag encodings Some combinations of fields are illegal in certain variants of BER
    • e.g. in CER, an octet string of less than or equal to 1,000 bytes must be encoded using a primitive form rather than a constructed form. Is this really enforced? differences in IDS processing and end host processing.
    • Can trick the implementation into reading more bytes than are available in the data stream.
  • Length encodings A common problem.
    • Multibyte encodings - when the length field is made to be more bytes than are left in the data stream.
    • Extended length-encoding value - you can specify 32 bit integers - integer overflows and signed issues.

Packed encoding rules (PER)

More compact than BER. Can be used only to encode values of a single ASN.1 type. COnsists of 3 fields: Preamble, length and contents
  • Preamble - a bit map used when dealing with sequence, set, and set-of-data types.
  • Length - more complex than in BER. Aligned variants and unaligned variants. Constrained, semiconstrained and unconstrained.
    • The program decoding a PER but stream must already know the structure of an incoming ASN.1 stream so that it knows how to decode the length. Constrained vs unconstrainedand what boundaries are for constrained lengths.

Vulnerabilities in PER

A variety of integer related issues. Problems are more restricted because the values are more constrained.
  • In particular for unconstrained lengths bottom 6 bits can be only 1 to 4 but the implementation might not enforce this rule.
  • Checking return values incorrectly.

XML encoding rules

Very different problems because this is a text markup language. XER prologue and an XML document element that describes a single ASN.1 object. Prologue does not have to be used.

XER vulnerabilities

Text-based errors: simple buffer overfloes or pointer arithmetic bugs. Programs that exchange XML are often exposing a huge codebase to untrusted data. In particular, check the UTF encoding schemes for encoding Unicode endpoints, see Chapter 8.


  • Domain names and resources records
  • Name servers and resolvers
    • Resolver code queries DNS on behalf og user applications
    • Fully functional resolver knows what to do when a non-recursive DNS server doesn’t have an answer
    • Stub resolver relies on a recursive name server to do all the work
  • Zones
  • Resource record types
  • DNS protocol structure
  • DNS name encoding and buggy parsers (3www6google3com)
Sample problems:
  • Failure to deal with invalid label lenghts. The maximum size for a label is 63 bytes because setting the top 2 bits indicates that the byte is the first in a two-byte pointer, leaving 6 bits to represent a label length. That means any label in which one of the top bits is set but the other one isn’t is an invalid length value.
  • Insufficient destination length checks
  • Insufficient source length checks
  • Pointer values not verified in the packet
  • Special pointer values (when pointer compression is used)
  • Length variables. (There are no 32-bit integers to specify data lengths in the DNS protocol; everything is 8 or 16 bits)
    • Sign extension of 16-bit values
    • Integer overflows
  • DNS spoofing
    • Cache poisoning
    • Spoofing responses

Monday 8 August 2016

TAOSSA Chapters 14-15

Ch. 14 Network Protocols

Internet Protocol

General intro about IP packet structure

Basic IP header validation

  • Is the received packet too small?
    • Must be at least 20 bytes.
  • Does the IP packet contain options?
    • Packets with options can be bigger than 20 bytes, up to 60.
  • Is the IP header length valid?
    • IP header length must be at least 5 (4*5=20).
  • Is the total length field too large?
    • Compared to the actual data received.
  • Are all field lengths consistent?
    • IP header length <= data available
    • 20 <= IP header length <= 60
    • IP total length <= data available
    • IP header length <= IP total length
  • Is the IP checksum correct?

IP options processing

  • Is the option length sign-extended?
    • It shouldn’t be. Byte to int promotion issues are common
  • Is the header big enough to contain the IP option?
  • Is the option length too large?
    • Offset of IP option + IP option length <= IP header length
    • Offset of IP option + IP option length <= IP total length
  • Does the option meet minimum size requirements?
    • Should be at least 2
  • Are IP option buts checked?
    • Most implementations ignore the separate bitfields without parsing
  • Unique problems
    • Solaris example

Source routing

  • Processing
    • Ensure that the pointer byte is within the specified bounds. During processing, an IP option often modifies bytes it pointing at.
    • The pointer is a single-byte field - beware type conversions.
    • Sign extensions could cause the offset to take on a negative value
    • Check that the length of routing options is validated


Pathological fragment sets
  • Data beyond the end of the final segment
    • Attackers can put the final fragment (MF=0) in the middle or beginning of the set of fragments
  • Multiple final fragments
  • Overlapping fragments
  • Idiosyncrasies

User datagram protocol

Basic UDP header validation

  • Us the UDP length field correct?
    • The minimum value is 8 bytes (no data)
  • Is the UDP checksum correct?

Transmission control protocol

Basic TCP header validation

  • Is the TCP data offset field too large?
    • TCP header length <= data available
    • 20 bytes (5 * 4) <= TCP header length <= 60 bytes
  • Is the TCP header length too small?
  • Id the TCP checksum correct?

TCP options processing

  • Is the option length field sign extended?
    • It shouldn’t be, possibility of dangerous bugs
    • For example, assigning char value to an int variable
  • Are enough bytes left for the current option?
  • Is the option length too large or too small?
    • Compared to the size of the TCP header / packet

TCP connections

  • 6 flags: SYN, ACK, RST, URG, FIN, PSH
  • Setting up, closing, tearing down of connections

TCP streams

Sequence numbers (ISNs). TCP spoofing attacks and others

TCP state processing

Various vulns

Urgent pointer processing

  • Handling pointers tint other packets
    • Neglecting to check that the pointer is within the bounds of the current packet
    • Recognising that the pointer is pointing beyond the end of the packet and trying to handle it (often incorrectly)
  • Handling 0-offset urgent pointers
    • 0 offset URG pointer is invalid

Simultaneous open

Both peers send a SYN packet at the same time with mirrors source and destination ports. Then they both send a SYN-ACK packet, and the connection is established.

Ch. 15 Firewalls


Attack surface - Proxy firewalls

Same issues as with network servers. Also make sure the firewall makes a clear distinction between internal and external users or tracks authorised users.

Packet-filtering firewalls

Stateless vs stateful filters

Stateless Firewalls

Stateless firewalls look for connection initiation packets - SYNs, and more or less let other packets go through.
Can be abused for FIN scanning (not sure this works anymore). Stateless FW has to let FIN and RST packets through.
Different stacks behave differently for weird combinations of flags. Eg. SYN-FIN may initiate a connection.
Only port-based rules. Return packets a big problem - e.g. DNS replies from servers. Effectively creates a hole for UDP scanning with a source port 53.
Active / passive FTP; active is a problem for stateless firewalls, similar to UDP above but with TCP.
Either deny completely or apply very simple set of rules to process. No tracking because stateless. Some rules:
  • Fragments with low IP offset (1,2 etc) - drop as they will mess with TCP flags
  • Fragments with 0 offset should contain the full header, otherwise drop
  • Multiple offset 0 fragments - drop all after the full header
  • Fragments with high offset can pass

Simple stateful firewalls

  • TCP
    • These days any issues are rare
  • UDP
    • A common mistake is to allow responses from any UDP port
  • Directionality
  • Fragmentation handling
    • Can be done better than with stateless FWs. Bugs existed
  • Fooling virtual reassembly
  • IP TTL field
  • IP options

Stateful inspection firewalls

  • Checkpoint’s original term - looking inside the packet
  • Layering issues
    • Firewalls are not doing full TCP/IP processing and so make mistakes because they peek at layer they do not understand
    • For FTP, simplistic port lookup in the packet can be fooled into creating connections in the state table by faking 227 responses in the packet

Spoofing attacks

Obviously cannot muck with the destination IP.
  • Spoofing from an internal trusted source
  • Spoofing for a response
    • Try to get hosts to respond to addresses you cannot reach otherwise
    • Especially with source address or
  • Spoofing for a state entry - to get special entries added to the firewall state table for later use
  • Spoofing from a network peer
  • Spoofing destinations to create state table entries
  • Source routing and encapsulation

Sunday 7 August 2016

TAOSSA Chapter 13

Ch. 13 Synchronisation and State

Synchronisation problems

Reentrancy - function’s capability to work correctly, even when it’s interrupted by another running thread that calls the same function. It must not modify any global vars or shared resources w/o adequate locking.
Race conditions
In race conditions outcome of an operation is successful only if certain resources are acted on in an expected order.
Starvation and deadlocks
Starvation - a thread never receives ownership of a synchronisation object.
Deadlocks can occur when several thread are using multiple sync objects at once but in a different order. For a deadlock to be possible, 4 conditions are required: mutual exclusion, hold and wait, no preemption, circular wait

Process synchronisation

System V process synchronisation

Semaphore - a locking device that uses a counter to limit the number of instances that can be acquired. Decremented when acquired, incremented when released.
semget() - create a new semaphore set or obtain an existing set
semop() - performs operations on selected semaphores in a set
semctl() - perform a control operation on a selected semaphore

Windows process synchronisation


Vulnerabilities with interprocess synchronisation

  1. Synch objects required but not used, e.g. when 2 processes are attempting to access a shared resource
  2. Incorrect use (Windows)
  3. Squatting with named synchronisation objects (Windows)
Helpful tools/notes:
  1. Synchronisation objects scoreboard
  2. Lock matching


Signals are software interrupts that the kernel raises in a process at the request of other processes, or as a reaction to events that occur in the kernel.
Possible actions:
  • Ignore the signal (apart from SIGKILL and SIGSTOP)
  • Block the signal (same exception)
  • Install a signal handler
kill() system call is used to send a signal to a process
signal() for installing a handler
sigaction() interface - more detailed attributes for handled signals
setjmp(), longjmp(), sigsetjmp(), siglongjmp() often used in signal-handling routines to return to a certain location in the program in order to continue processing after a signal has been caught. Program context of setjmp() is restored when returned from longjmp(). Zero return value means a call to setjmp, a non-zero value indicates a return from a longjmp

Signal vulnerabilities

Signal handlers need to be asynchronous-safe - can safely and correctly run even if it is interrupted by an asynchronous even. It is reentrant by definition by also correctly deal with signal interruptions.
Problem when the handler relies on some sort of global program state, such as assumption that global variables are initialised when in fact they aren’t.
Various problems (non-asynchronous-safe state) may arise from attempting to restart execution using longjmp() function in non-returning signal handlers.
Other problems can be caused by invalid longjmp targets. The function that call setjmp or sigsetjmp must be still on the runtime exec stack whenever longjmp or siglongjmp are called. If the original function has terminated, the pointer will be invalid.
Pay special attention for the following reasons:
  • The signal handler doesn’t return, so it’s highly unlikely that it will be asynchronous safe unless it exits immediately.
  • It might be possible to find a code path where the function that did the setjmp returns, but the signal handler with the longjmp is not removed.
  • The signal mask might have changed, which could be an issue if sigsetjmp and siglongjmp aren’t used. If they are, does restoring the old signal mask cause problems as well?
  • Permissions might have changed.
  • Program state might have changed such that the state of variable that are valid when 8setjmp* is originally called but not necessarily when longjmp is called.
The signal handler itself can be interrupted or called more than once. A signal handler can be interrupted only if a signal is delivered to the process that isn’t blocked. Signals are blocked by usingsigprocmask() function, or implicitly - signals of the type the handler catches is blocked vof the period of time the signal handler is running. Also sigaction() function.
Sometimes non-async safe functions are used in signal handlers (see signal(3) or sigaction(2))
Signal handlers using longjmp and siglongjmp are practically guaranteed to be non-async safe unless they jump to a location that immediately exits.


PThreads API is the primary API on UNIX. Uses mutexes and condition variables. Linux has a modified version - LinuxThreads. On Windows the API is more complicated.
<skipped> - Critical sections

Threading Vulnerabilities

  • Race conditions occurs when the successful outcome of an operation depends on whether the threads are scheduled for running in a certain order.
  1. Identify shared resources that are acted on by multiple threads.
  2. Determine whether the appropriate locking mechanism has been selected. There are specific rules in the book for different types of resources.
  3. Examine the code that modifies this resource to see whether appropriate locking mechanisms have been neglected or misused.
  • Deadlocks and starvation
In PThreads deadlocks are more likely to occur rom the use of multiple mutexes. A classic situation: two or more locks can be held by a single thread, and another thread can acquire the same locks in a different order.

Saturday 6 August 2016

TAOSSA Chapter 10

Ch 10. UNIX II: Processes


fork() creates new processes. Returns in parent the PID of the new child process; in the child process - 0. Return value -1 means call failed, no child spawned
getppid() - get parent PID
If a process terminates while its children are still running, these children are assigned to init (PID 1)
In Linux clone() is a fork() variant that allows callers to specify several parameters of the forking operation
Child inherits a copy of most resources from the parent. For files - different. Child gets a copy of the parent’s file descriptors, and both processes share the same open file structure in the kernel (which points to an inode). As a result parent and child may be fighting for access to the file.

Program invocation

execve() is the standard way of invoking processes execvp() and execlp() if filename is missing slashes, they use PATH env variable to resolve the location of the executable. They also open a shell to run the file if execve fails with ENOEXEC.
It may be possible to supply program switches in the argument array if it is not sanitised properly. Keep in mind that getopt() interprets only the arguments preceding – (two dashes)
  • Metacharacters - see [[TAOSSA notes ch 8]]
  • Globbing
  • Environment issues
  • Setuid shell scripts

Process Attributes

Process attribute retention:
  • File descriptors usually get passed on from the old process to the new one
  • Signal masks - the new process loses all signal handlers installed by the previous process but retains the same signal masks
  • Effective UID - if the program is setuid, the EUID becomes the user ID of the program file owner. Otherwise it stays the same across the execution.
  • Effective GID - if setgid, the egad becomes the group ID of the program file group
  • Saved set-UID - set to the value of the EUID after any setuid processing has been completed
  • Saved set-GID - similar
  • Real UID, GID - preserved across execution
  • PID, PPID, PGID - don’t change across an execve() call
  • Supplemental group privileges are retained
  • Working dir, root dir - same
  • Controlling terminal - inherits from the old process.
  • Resource limits - a lot of details
  • Umask -
Users can set tight limits on a process and then run a setuid or setgid program. Rlimits are cleared out when a process does a fork(), but they survive the exec() family of calls, which can be used to force a failure in a predetermined location in the code. The error-handling code is usually less guarded than more well-traveled code paths.
UNIX does allow developers to mark certain file descriptors as close-on-exec, which means they are closed automatically if the process runs a new program. For applications that spawn new processes at any stage, always check to see whether this step is taken when it opens files. It is also useful to make a note of those persistent files that aren’t marked to close when a new program starts.
Security checks on a file descriptor are performed only once, when the process initially creates a file descriptor by opening or creating a resource. If you can get access to a file descriptor that was opened with write access to a critical system file, you can write to that file regardless of your effective user ID or other system privileges. Therefore, programs that work with file descriptors to security-sensitive resources should close their descriptors before running any user-malleable code.
setenv() and unsetenv() may be dodgy in how they behave with funny variable names.

Interprocess communication

Named pipes created with insufficient privileges might result in unauthorized clients performing some sort of data exchange, potentially leading to compromise via unauthorized (or forged) data messages.
Applications that are intended to deal with regular files might unwittingly find themselves interacting with named pipes. This allows attackers to cause applications to stall in unlikely situations or cause error conditions in unexpected places. When auditing an application that deals with files, if it fails to determine the file type, consider the implications of triggering errors during file accesses and blocking the application at those junctures.
The use of mknod() and mkfifo() might introduce a race condition between the time the pipe is created and the time it’s opened.
Three IPC mechanisms in System V IPC are message queues, semaphores, and shared memory.
Named UNIX domain sockets provide a general-purpose mechanism for exchanging data in a stream-based or record-based fashion.

Remote Procedure Calls

XDR External Data Representation