retdec (RETargetable DECompiler) is a decompiler, and it is the one that was released recently as open-source software by Avast Software.
retdec can take an executable and work back into recreating the initial source code (with limitations).
An example with retdec
Let’s see first an example. Here is the initial source code, that was compiled to produce mytest.
#include <stdio.h> int main() { for (int i = 0; i < 10; i++) printf("%d\n", i); return 0;}
Here is what retdec can deduce from the executable of the previous program.
// // This file was generated by the Retargetable Decompiler // Website: https://retdec.com // Copyright (c) 2017 Retargetable Decompiler <info@retdec.com> // #include <stdint.h> #include <stdio.h> // ------------------------ Functions ------------------------- // Address range: 0x804840b - 0x804844e int main(int argc, char ** argv) { // entry // branch -> 0x8048425 for (uint32_t i = 0; i < (uint32_t)10; i++) { // 0x8048425 printf("%d\n", i); // PHI copies at the loop end // loop 0x8048425 end } // 0x8048442 return 0; } // --------------- Dynamically Linked Functions --------------- // int printf(const char * restrict format, ...); // --------------------- Meta-Information --------------------- // Detected compiler/packer: gcc (4.7.2) // Detected functions: 1 // Decompilation date: 2017-12-29 19:01:37
Pretty good, huh?
Installing retdec
We are going to compile and install retdec in an LXD container. In this way, it will not mess up our Linux installation with additional packages and installed files. In addition, you can follow this instructions even if you do not run Ubuntu; install LXD in your distribution and create an Ubuntu 16.04 container to work with. If you are not using LXD, skip to the next paragraph.
$ lxc launch ubuntu:x retdec Creating retdec Starting retdec $ lxc exec retdec -- sudo --user ubuntu --login To run a command as administrator (user "root"), use "sudo <command>". See "man sudo_root" for details. ubuntu@retdec:~$ sudo apt update ....
Now we are in the container and we can mess it up as much as we want. We get a copy of the source code. retdec has plenty of submodules, therefore we need the –recursive parameter.
ubuntu@retdec:~$ git clone--recursive
https://github.com/avast-tl/retdec.git
Cloning into 'retdec'...
...
ubuntu@retdec:~$
Installing the latest cmake on Ubuntu
retdec uses cmake, so we need to install it. However, Ubuntu 16.04.3 comes with cmake 3.5 while retdec demands cmake 3.6 or newer. We download from the cmake website and install it.
ubuntu@retdec:~$ wget https://cmake.org/files/v3.10/cmake-3.10.1-Linux-x86_64.sh ubuntu@retdec:~$ sh cmake-3.10.1-Linux-x86_64.sh ...Do you accept the license? [yN]: y By default the CMake will be installed in: "/home/ubuntu/cmake-3.10.1-Linux-x86_64" Do you want to include the subdirectory cmake-3.10.1-Linux-x86_64? Saying no will install in: "/home/ubuntu" [Yn]: Y Using target directory: /home/ubuntu/cmake-3.10.1-Linux-x86_64 Extracting, please wait... Unpacking finished successfully ubuntu@retdec:~$
Then, we add cmake in our $PATH. To add permanently, just add the following to ~/.profile.
ubuntu@retdec:~$ export PATH=$PATH:~/cmake-3.10.1-Linux-x86_64/bin
Installing the prerequisite packages for retdec
I collected all the packages that are required after some trial and error, so here is the full list (for Ubuntu 16.04.3).
ubuntu@retdec:~/$ sudo apt install cmake build-essential zlib1g-dev flex bison pkg-config libtinfo-dev autoconf libtool libssl-dev
Compiling retdec
retdec uses cmake, therefore we are going to put any compiled files in a nice and separate subdirectory. When we run cmake .. from within the subdirectory, it performs the instructions from the CMake configuration file and generates the necessary Makefiles (in this build/ subdirectory).
ubuntu@retdec:~$ cd retdec/ ubuntu@retdec:~/retdec$ mkdir build ubuntu@retdec:~/retdec$ cd build ubuntu@retdec:~/retdec/build$ cmake .. ... Enabling CAPSTONE_ARM_SUPPORT Enabling CAPSTONE_ARM64_SUPPORT Enabling CAPSTONE_M68K_SUPPORT Enabling CAPSTONE_MIPS_SUPPORT Enabling CAPSTONE_PPC_SUPPORT Enabling CAPSTONE_SPARC_SUPPORT Enabling CAPSTONE_SYSZ_SUPPORT Enabling CAPSTONE_XCORE_SUPPORT Enabling CAPSTONE_X86_SUPPORT Enabling CAPSTONE_TMS320C64X_SUPPORT Enabling CAPSTONE_M680X_SUPPORT ... -- Target triple: x86_64-unknown-linux-gnu -- Native target architecture is X86 -- Threads enabled. ... -- Found PythonInterp: /usr/bin/python3.5 (found version "3.5.2") -- Constructing LLVMBuild project information -- Targeting AArch64 -- Targeting ARM -- Targeting Hexagon -- Targeting Mips -- Targeting PowerPC -- Targeting Sparc -- Targeting SystemZ -- Targeting X86 ... -- Configuring done -- Generating done -- Build files have been written to: /home/ubuntu/retdec/build ubuntu@retdec:~/retdec/build$
We are ready to run make. It takes time to compile the source, which includes a copy of LLVM. You may speed up the compilation by using the -j n parameter, where n is the number of cores of your CPU. For a quad-core CPU, try make -j 4 instead.
ubuntu@retdec:~/retdec/build$ make -j 4 Scanning dependencies of target llvm-project ...[100%] Linking CXX static library libllvmir2hll.a [100%] Built target llvmir2hll Scanning dependencies of target llvmir2hlltool [100%] Building CXX object src/llvmir2hlltool/CMakeFiles/llvmir2hlltool.dir/llvmir2hll.cpp.o [100%] Linking CXX executable llvmir2hll [100%] Built target llvmir2hlltool ubuntu@retdec:~/retdec/build$
The compilation completed successfully!
Running “make install”
Let’s install.
ubuntu@retdec:~/retdec/build$ sudo make install ... Install the project... -- Install configuration: "Release" Downloading archive from https://github.com/avast-tl/retdec-support/releases/download/2017-12-15/retdec-support_2017-12-15.tar.xz ... 2017-12-29 18:56:59 URL:https://github-production-release-asset-2e65be.s3.amazonaws.com/114005831/fedb031e-e17f-11e7-9cc7-c7e17e403150?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20171229%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20171229T185626Z&X-Amz-Expires=300&X-Amz-Signature=c6ab7acd11d26ea38240291c9c3ae91f7855a330e0aa07163f6ded20857a184f&X-Amz-SignedHeaders=host&actor_id=0&response-content-disposition=attachment%3B%20filename%3Dretdec-support_2017-12-15.tar.xz&response-content-type=application%2Foctet-stream [124618948/124618948] -> "/usr/local/share/retdec/support/retdec-support_2017-12-15.tar.xz" [1] Verfifying archive's checksum ... Unpacking archive ... RetDec support directory downloaded OK ...-- Installing: /usr/local/bin/decompile-all.sh -- Installing: /usr/local/bin/decompile-archive.sh -- Installing: /usr/local/bin/decompile.sh -- Installing: /usr/local/bin/fileinfo.sh -- Installing: /usr/local/bin/run-unit-tests.sh -- Installing: /usr/local/bin/signature-from-library.sh -- Installing: /usr/local/bin/unpack.sh -- Installing: /usr/local/bin/utils.sh -- Installing: /usr/local/bin/bin2llvmir -- Set runtime path of "/usr/local/bin/bin2llvmir" to "/usr/local/lib" -- Installing: /usr/local/bin/llvmir2hll -- Set runtime path of "/usr/local/bin/llvmir2hll" to "/usr/local/lib" ubuntu@retdec:~/retdec/build$
And that’s it! The main program is decompile.sh. Note that during make install, retdec downloaded an auxiliary package from the Internet.
Testing retdec
Here is a program to test retdec.
#include <stdio.h>
int main() {
for (int i = 0; i < 10; i++)
printf(“%d\n”, i);
return 0;
}
We compile it, run it, and then ask retdec to decompile it.
ubuntu@retdec:~$ gcc mytest.c -o mytest ubuntu@retdec:~$ ./mytest 0 1 2 3 4 5 6 7 8 9 ubuntu@retdec:~$ decompile.sh mytest ##### Checking if file is a Mach-O Universal static library... RUN: /usr/local/bin/macho-extractor --list /home/ubuntu/mytest ##### Checking if file is an archive... RUN: /usr/local/bin/ar-extractor --arch-magic /home/ubuntu/mytest Not an archive, going to the next step. ##### Gathering file information... RUN: /usr/local/bin/fileinfo -c /home/ubuntu/mytest.c.json --similarity /home/ubuntu/mytest --no-hashes=all --crypto /usr/local/bin/../share/retdec/support/generic/yara_patterns/signsrch/signsrch.yara Input file : /home/ubuntu/mytest File format : ELF File class : 64-bit File type : Executable file Architecture : x86-64 Endianness : Little endian Entry point address : 0x400430 Entry point offset : 0x430 Entry point section name : .text Entry point section index: 14 Bytes on entry point : 31ed4989d15e4889e24883e4f0505449c7c0d005400048c7c16005400048c7c726054000e8b7fffffff4660f1f440000b83f Detected tool : GCC (5.4.0) (compiler), .comment section heuristic Detected tool : GCC (4.6.3) Ubuntu (compiler), 65 from 80 significant nibbles (81.25%) ##### Trying to unpack /home/ubuntu/mytest into /home/ubuntu/mytest-unpacked.tmp by using generic unpacker... RUN: /usr/local/bin/unpacker -d /usr/local/bin/unpacker-plugins -o /home/ubuntu/mytest-unpacked.tmp /home/ubuntu/mytest No matching plugins found for 'GCC 5.4.0' No matching plugins found for 'GCC 4.6.3' ##### Unpacking by using generic unpacker: nothing to do ##### Trying to unpack /home/ubuntu/mytest into /home/ubuntu/mytest-unpacked.tmp by using UPX... RUN: upx -d /home/ubuntu/mytest -o /home/ubuntu/mytest-unpacked.tmp /usr/local/bin/unpack.sh: line 122: upx: command not found ##### Unpacking by using UPX: nothing to do Error: Unsupported target architecture 'X86-64'. Supported architectures: Intel x86, ARM, ARM+Thumb, MIPS, PIC32, PowerPC. ubuntu@retdec:~$
We hit the first snag. retdec does not currently support X86-64 (64-bit Intel). We need to recompile our test program into X86 (32-bit Intel). Also, note that retdec also expects to find an unpacker called upx (not required in our example but let’s install anyway).
ubuntu@retdec:~$ gcc -m32 mytest.c -o mytest In file included from /usr/include/stdio.h:27:0, from mytest.c:1: /usr/include/features.h:367:25: fatal error: sys/cdefs.h: No such file or directory compilation terminated. ubuntu@retdec:~$
What? A header file was not found? This is because we are compiling for X86 (32-bit) on a X86-64 (64-bit) installation of Ubuntu, therefore we need some extra packages. Let’s install them, and also upx.
ubuntu@retdec:~$ sudo apt install gcc-multilib upx-ucl
Now we can compile to produce a 32-bit executable and decompile the executable with retdec.
ubuntu@retdec:~$ gcc -m32 mytest.c -o mytest ubuntu@retdec:~$ decompile.sh mytest ##### Checking if file is a Mach-O Universal static library... RUN: /usr/local/bin/macho-extractor --list /home/ubuntu/mytest ##### Checking if file is an archive... RUN: /usr/local/bin/ar-extractor --arch-magic /home/ubuntu/mytest Not an archive, going to the next step. ##### Gathering file information... RUN: /usr/local/bin/fileinfo -c /home/ubuntu/mytest.c.json --similarity /home/ubuntu/mytest --no-hashes=all --crypto /usr/local/bin/../share/retdec/support/generic/yara_patterns/signsrch/signsrch.yara Input file : /home/ubuntu/mytest File format : ELF File class : 32-bit File type : Executable file Architecture : x86 (or later and compatible) Endianness : Little endian Entry point address : 0x8048310 Entry point offset : 0x310 Entry point section name : .text Entry point section index: 14 Bytes on entry point : 31ed5e89e183e4f050545268b084040868508404085156680b840408e8bffffffff466906690669066906690669066908b1c Detected tool : GCC (4.7.2) (compiler), 51 from 51 significant nibbles (100%) Detected tool : GCC (5.4.0) (compiler), .comment section heuristic ##### Trying to unpack /home/ubuntu/mytest into /home/ubuntu/mytest-unpacked.tmp by using generic unpacker... RUN: /usr/local/bin/unpacker -d /usr/local/bin/unpacker-plugins -o /home/ubuntu/mytest-unpacked.tmp /home/ubuntu/mytest No matching plugins found for 'GCC 4.7.2' No matching plugins found for 'GCC 5.4.0' ##### Unpacking by using generic unpacker: nothing to do ##### Trying to unpack /home/ubuntu/mytest into /home/ubuntu/mytest-unpacked.tmp by using UPX... RUN: upx -d /home/ubuntu/mytest -o /home/ubuntu/mytest-unpacked.tmp /usr/local/bin/unpack.sh: line 122: upx: command not found ##### Unpacking by using UPX: nothing to do ##### Decompiling /home/ubuntu/mytest into /home/ubuntu/mytest.c.backend.bc... RUN: /usr/local/bin/bin2llvmir -provider-init -config-path /home/ubuntu/mytest.c.json -decoder -disable-inlining -disable-simplify-libcalls -inst-opt -verify -volatilize -instcombine -reassociate -volatilize -control-flow -cfg-fnc-detect -main-detection -register -stack -control-flow -cond-branch-opt -syscalls -idioms-libgcc -constants -param-return -local-vars -type-conversions -simple-types -generate-dsm -remove-asm-instrs -select-fncs -unreachable-funcs -type-conversions -stack-protect -verify -instcombine -tbaa -targetlibinfo -basicaa -domtree -simplifycfg -domtree -early-cse -lower-expect -targetlibinfo -tbaa -basicaa -globalopt -mem2reg -instcombine -simplifycfg -basiccg -domtree -early-cse -lazy-value-info -jump-threading -correlated-propagation -simplifycfg -instcombine -simplifycfg -reassociate -domtree -loops -loop-simplify -lcssa -loop-rotate -licm -lcssa -instcombine -scalar-evolution -loop-simplifycfg -loop-simplify -aa -loop-accesses -loop-load-elim -lcssa -indvars -loop-idiom -loop-deletion -memdep -gvn -memdep -sccp -instcombine -lazy-value-info -jump-threading -correlated-propagation -domtree -memdep -dse -dce -bdce -adce -die -simplifycfg -instcombine -strip-dead-prototypes -globaldce -constmerge -constprop -instnamer -domtree -instcombine -never-returning-funcs -adapter-methods -class-hierarchy -instcombine -tbaa -targetlibinfo -basicaa -domtree -simplifycfg -domtree -early-cse -lower-expect -targetlibinfo -tbaa -basicaa -globalopt -mem2reg -instcombine -simplifycfg -basiccg -domtree -early-cse -lazy-value-info -jump-threading -correlated-propagation -simplifycfg -instcombine -simplifycfg -reassociate -domtree -loops -loop-simplify -lcssa -loop-rotate -licm -lcssa -instcombine -scalar-evolution -loop-simplifycfg -loop-simplify -aa -loop-accesses -loop-load-elim -lcssa -indvars -loop-idiom -loop-deletion -memdep -gvn -memdep -sccp -instcombine -lazy-value-info -jump-threading -correlated-propagation -domtree -memdep -dse -dce -bdce -adce -die -simplifycfg -instcombine -strip-dead-prototypes -globaldce -constmerge -constprop -instnamer -domtree -instcombine -simple-types -stack-ptr-op-remove -type-conversions -idioms -instcombine -global-to-local -dead-global-assign -instcombine -stack-protect -phi2seq -o /home/ubuntu/mytest.c.backend.bc Running phase: Initialization ( 0.00s ) Running phase: LLVM ( 0.01s ) Running phase: Providers initialization ( 0.01s ) Running phase: Input binary to LLVM IR decoding ( 0.03s ) Running phase: Assembly instruction optimization ( 0.05s ) Running phase: LLVM ( 0.06s ) Running phase: (Un)Volatilize optimization ( 0.06s ) Running phase: LLVM ( 0.06s ) Running phase: (Un)Volatilize optimization ( 0.06s ) Running phase: Control flow optimization ( 0.06s ) Running phase: Control flow function detection optimization ( 0.06s ) Running phase: Main function identification optimization ( 0.06s ) Running phase: Assembly register optimization ( 0.06s ) Running phase: Stack optimization ( 0.06s ) Running phase: Control flow optimization ( 0.06s ) Running phase: Conditional branch optimization ( 0.06s ) Running phase: Syscalls optimization ( 0.06s ) Running phase: Libgcc idioms optimization ( 0.06s ) Running phase: Constants optimization ( 0.06s ) Running phase: Function parameters and returns optimization ( 0.07s ) Running phase: Register localization optimization ( 0.07s ) Running phase: Data type conversions optimization ( 0.07s ) Running phase: Simple types recovery optimization ( 0.07s ) Running phase: Disassembly generation ( 0.07s ) Running phase: Assembly mapping instruction removal ( 0.07s ) Running phase: Selected functions optimization ( 0.07s ) Running phase: Unreachable functions optimization ( 0.07s ) Running phase: Data type conversions optimization ( 0.07s ) Running phase: Stack protection optimization ( 0.07s ) Running phase: LLVM ( 0.07s ) Running phase: Never-returning-functions optimization ( 0.07s ) Running phase: C++ adapter methods optimization ( 0.07s ) Running phase: C++ class hierarchy optimization ( 0.07s ) Running phase: LLVM ( 0.07s ) Running phase: Simple types recovery optimization ( 0.07s ) Running phase: Stack pointer operations optimization ( 0.07s ) Running phase: Data type conversions optimization ( 0.07s ) Running phase: Instruction idioms optimization ( 0.07s ) Running phase: LLVM ( 0.07s ) Running phase: Global to local optimization ( 0.07s ) Running phase: Dead global assign optimization ( 0.07s ) Running phase: LLVM ( 0.07s ) Running phase: Stack protection optimization ( 0.07s ) Running phase: Phi2Seq optimization ( 0.07s ) Running phase: LLVM ( 0.07s ) Running phase: Bitcode Writer ( 0.07s ) Running phase: Assembly Writer ( 0.08s ) Running phase: Cleanup ( 0.08s ) ##### Decompiling /home/ubuntu/mytest.c.backend.bc into /home/ubuntu/mytest.c... RUN: /usr/local/bin/llvmir2hll -target-hll=c -var-renamer=readable -var-name-gen=fruit -var-name-gen-prefix= -call-info-obtainer=optim -arithm-expr-evaluator=c -validate-module -llvmir2bir-converter=orig -o /home/ubuntu/mytest.c /home/ubuntu/mytest.c.backend.bc -enable-debug -emit-debug-comments -config-path=/home/ubuntu/mytest.c.json Running phase: initialization ( 0.01s ) -> creating the used HLL writer [c] ( 0.01s ) -> creating the used alias analysis [simple] ( 0.01s ) -> creating the used call info obtainer [optim] ( 0.01s ) -> creating the used evaluator of arithmetical expressions [c] ( 0.01s ) -> creating the used variable names generator [fruit] ( 0.01s ) -> creating the used variable renamer [readable] ( 0.01s ) -> creating the used LLVM IR to BIR converter [orig] ( 0.01s ) -> creating the used semantics [libc,gcc-general,win-api] ( 0.01s ) -> loading the input config ( 0.01s ) Running phase: conversion of LLVM IR into BIR ( 0.01s ) -> ordering dependent PHI nodes ( 0.01s ) -> converting main() ( 0.01s ) Running phase: removing functions prefixed with [__decompiler_undefined_function_] ( 0.01s ) Running phase: removing functions from standard libraries ( 0.01s ) Running phase: removing code that is not reachable in a CFG ( 0.01s ) Running phase: signed/unsigned types fixing ( 0.01s ) Running phase: converting LLVM intrinsic functions to standard functions ( 0.01s ) Running phase: obtaining debug information ( 0.01s ) Running phase: alias analysis [simple] ( 0.01s ) Running phase: optimizations [normal] ( 0.01s ) -> running RemoveUselessCastsOptimizer ( 0.01s ) -> running UnusedGlobalVarOptimizer ( 0.01s ) -> running DeadLocalAssignOptimizer ( 0.01s ) -> running SimpleCopyPropagationOptimizer ( 0.01s ) -> running CopyPropagationOptimizer ( 0.01s ) -> running AuxiliaryVariablesOptimizer ( 0.01s ) -> running SimplifyArithmExprOptimizer ( 0.01s ) -> running IfStructureOptimizer ( 0.01s ) -> running LoopLastContinueOptimizer ( 0.01s ) -> running PreWhileTrueLoopConvOptimizer ( 0.01s ) -> running WhileTrueToForLoopOptimizer ( 0.01s ) -> running WhileTrueToWhileCondOptimizer ( 0.01s ) -> running IfBeforeLoopOptimizer ( 0.01s ) -> running LLVMIntrinsicsOptimizer ( 0.01s ) -> running VoidReturnOptimizer ( 0.01s ) -> running BreakContinueReturnOptimizer ( 0.01s ) -> running BitShiftOptimizer ( 0.01s ) -> running DerefAddressOptimizer ( 0.01s ) -> running EmptyArrayToStringOptimizer ( 0.01s ) -> running BitOpToLogOpOptimizer ( 0.01s ) -> running SimplifyArithmExprOptimizer ( 0.01s ) -> running UnusedGlobalVarOptimizer ( 0.01s ) -> running DeadLocalAssignOptimizer ( 0.01s ) -> running SimpleCopyPropagationOptimizer ( 0.01s ) -> running CopyPropagationOptimizer ( 0.01s ) -> running SelfAssignOptimizer ( 0.01s ) -> running VarDefForLoopOptimizer ( 0.01s ) -> running VarDefStmtOptimizer ( 0.01s ) -> running SimplifyArithmExprOptimizer ( 0.01s ) -> running DeadCodeOptimizer ( 0.01s ) -> running DerefToArrayIndexOptimizer ( 0.02s ) -> running IfToSwitchOptimizer ( 0.02s ) -> running CCastOptimizer ( 0.02s ) -> running CArrayArgOptimizer ( 0.02s ) Running phase: variable renaming [readable] ( 0.02s ) Running phase: converting constants to symbolic names ( 0.02s ) Running phase: module validation ( 0.02s ) -> running BreakOutsideLoopValidator ( 0.02s ) -> running NoGlobalVarDefValidator ( 0.02s ) -> running ReturnValidator ( 0.02s ) Running phase: emission of the target code [c] ( 0.02s ) Running phase: finalization ( 0.02s ) Running phase: cleanup ( 0.02s ) ##### Done! ubuntu@retdec:~$
The executable was called mytest and retdec created the generated C source file with the name mytest.c, essentially overwriting the original source code.
Here is again the decompiled source code,
ubuntu@retdec:~$ cat mytest.c // // This file was generated by the Retargetable Decompiler // Website: https://retdec.com // Copyright (c) 2017 Retargetable Decompiler <info@retdec.com> // #include <stdint.h> #include <stdio.h> // ------------------------ Functions ------------------------- // Address range: 0x804840b - 0x804844e int main(int argc, char ** argv) { // entry // branch -> 0x8048425 for (uint32_t i = 0; i < (uint32_t)10; i++) { // 0x8048425 printf("%d\n", i); // PHI copies at the loop end // loop 0x8048425 end } // 0x8048442 return 0; } // --------------- Dynamically Linked Functions --------------- // int printf(const char * restrict format, ...); // --------------------- Meta-Information --------------------- // Detected compiler/packer: gcc (4.7.2) // Detected functions: 1 // Decompilation date: 2017-12-29 19:01:37 ubuntu@retdec:~$
Conclusion
retdec is probably one of the very few good open-source decompilers out there. There are no installation packages yet, and it should be easy to create a snap package for retdec by taking into account this post.
There is no X86-64 target yet, and it is currently in their plans to develop it. The decompilation takes lots of memory, therefore if you try to decompile a binary that is over 1MB, you may run out of memory. In any case, give it a try and happy decompilations!
1 comments
Looks interesting but it doesn’t support 64-bit Aarch (ARM v8) architecture. I haven’t found a decompiler yet that does.