Writing software is similar to translating from one language to another. Specifically, it is similar to translating from your native language to some other language. You are translating to that other language so that you can help those others do some task for you. You might not understand this other language very well, and some concepts might be difficult to express in the other language. You are doing your best though when translating, but as we know, some things can get lost in translation.
On software testing
When writing software, some things do get lost in translation. You know what your software should do, but you need to express your needs into the particular programming language that you are using. Even small pieces of software will have some sort of problem, which are called software defects. There is a whole field in computer science which is called software testing, and their goal is to find early such software defects so that they get fixed before the software gets released and reaches the market. When you buy a software package, it has gone through intensive software testing. Because if a customer uses the software package, and then it crashes or malfunctions, it reflects really poorly. They might even return the software and demand their money back!
In the field of software testing, you try to identify actions that a typical customer will likely perform, and may crash the software. If you could, you would try to find all possible software defects and have them fixed. But in reality, identifying all software defects is not possible. And this is a hard fact and a known issue in software testing; no matter how hard you try, there will still be some more software defects.
This post is about security though, and not about software testing. What gives? Well, a software defect can make the software to malfunction. This malfunctioning can make the software to perform an action that was not intended by the software developers. It can make the software do what some attacker wants it to do. Farfetched? Not at all. This is what a big part of computer security works on.
When security researchers perform software testing with an aim of finding software defects, we say that they are performing security fuzzing, or just fuzzing. Therefore, fuzzing is similar to software testing, but with the focus on identifying ways to make the software malfunction in a really bad way.
Security researchers find security vulnerabilities, ways to break into a computer system. This means that fuzzing is the first half of the job to find security vulnerabilities. The second part is to analyse each software defect and try to figure out, if possible, a way to break into the system. In this post we are only focusing on the first part of the job.
Defects and vulnerabilities
Are all software defects a potential candidate for a security vulnerability? Let’s see an example of a text editor. If you are using the text editor only to edit your own documents, but not open downloaded text documents, then there is no chance for a security vulnerability. Because an attacker would not have a way to influence this text editor. There would be no input of this text editor that is exposed to the attacker.
However, most computers are connected to the Internet. And most operating systems, either Windows, OS/X or a Linux distribution, are pre-configured to open text documents with a text editor. If you are browsing the Internet, you may find an interesting text document and decide to download and open it on your computer. Or, you may receive an email with an attachment of a text document. In both cases, it is the document file that is fully in control of an attacker. That means that an attacker can modify any aspect of that file. A Word document is a ZIP file that contains several individual files. There are opportunities to modify any of the individual files, ZIP it back into a Doc file and try to open it. If you get a crash, you successfully managed to fuzz the application, in a manual way. If you manage to crash the application simply by editing a Doc document due to your own work, then you are a natural in security fuzzing. Just keep a copy of that exact crashing document because it could be gold to a security researcher.
If there is a complex task that a person could do but it is tedious and expensive, then you can either use a computer and make it work as just like a person would, or break down the task into a simpler but repetitive form so that it is suitable for a computer. The latter is quite enticing because computing power is way cheaper and more abundant than employing an expert.
Suppose you want to recognize apples from digital images. You can either employ an apple expert to identify if there is an apple in a photograph (any variety of apple). Or, get an expert to share the domain knowledge of apples and have them help in creating software that understands all shapes and colors of apples. Or, obtain several thousands of photos of different apples and train an AI system to detect apples in new images.
Employing a domain expert to manually identify the apples does not scale. Developing software using domain knowledge does not scale easily to, let’s say, other fruits. And developing this domain-specific software is also expensive compared to training an AI system to detect the specific objects.
Similarly, with security fuzzing. A security expert working manually does not scale and the process is expensive to perform repeatedly. Developing software that acts exactly like a security expert is also expensive as well the software would have to capture the whole domain knowledge of software security. And the very best next option is to break the problem into smaller tasks, and use primarily cheap computer power.
Advanced Fuzzing League++
And that leads as to the Advanced Fuzzing League++ (afl++). It is a security fuzzing software that requires lots of computer power, it runs the software that we are testing many times with slightly different inputs each time, and looks whether any of the attempts have managed to lead to a software crash.
afl++ does security fuzzing, and this is just the first part of the security work. A security researcher will take the results of the fuzzing (i.e. the list of crash reports) and manually look whether these can be exploited so that an attacker can make the software let them in.
Up to now, afl++ has been developed so that it can use as much computer power as possible. There are many ways to parallelise to multiple computers.
afl++ uses software instrumentation. When you have access to the source code, you can recompile it in a special way so that when afl++ does the fuzzing, afl++ will know if a new input causes the execution to reach new unexplored areas of the executable. It helps afl++ to expand the coverage to all of the executable.
afl++ does not automatically recognize the different inputs to a software. You have to guide it whether the input is from the command-line, from the network, or elsewhere.
afl++ can be fine-tuned in order to perform even better. Running an executable repeatedly from scratch is not as performant as to just running the same main function of the executable repeatedly.
afl++ can be used whether you have the source code of the software or whether you do not have it.
afl++ can fuzz binaries from a different architecture that your fuzzing server. It uses Qemu for hardware virtualization and can also use CPU emulation through unicorn.
afl++ has captured the mind share on security fuzzing and there are more and more new efforts to expand support to different things. For example, there is support for Frida (dynamic instrumentation).
afl++ has a steep learning curve. Good introductory tutorials are hard to find.