Decompiling Android.pdf

(4589 KB) Pobierz
893496320.007.png
For your convenience Apress has placed some of the front
matter material after the index. Please use the Bookmarks
and Contents at a Glance links to access them.
893496320.008.png 893496320.009.png
Contents at a Glance
About the Author............................................................................................ ix
About the Technical Reviewer........................................................................ x
Acknowledgments......................................................................................... xi
Preface.......................................................................................................... xii
Chapter 1: Laying the Groundwork................................................................. 1
Chapter 2: Ghost in the Machine................................................................... 19
Chapter 3: Inside the DEX File....................................................................... 57
Chapter 4: Tools of the Trade........................................................................ 93
Chapter 5: Decompiler Design..................................................................... 151
Chapter 6: Decompiler Implementation...................................................... 175
Chapter 7: Hear No Evil, See No Evil: A Case Study..................................... 229
Appendix A: Opcode Tables......................................................................... 255
Index........................................................................................................... 279
iv
893496320.010.png 893496320.001.png 893496320.002.png 893496320.003.png 893496320.004.png
1
Chapter
Laying the Groundwork
To begin, in this chapter I introduce you to the problem with decompilers and
why virtual machines and the Android platform in particular are at such risk. You
learn about the history of decompilers; it may surprise you that they’ve been
around almost as long as computers. And because this can be such an emotive
topic, I take some time to discuss the legal and moral issues behind
decompilation. Finally, you’re introduced to some of options open to you if you
want to protect your code.
Compilers and Decompilers
Computer languages were developed because most normal people can’t work
in machine code or its nearest equivalent, Assembler. Fortunately, people
realized pretty early in the development of computing technology that humans
weren’t cut out to program in machine code. Computer languages such as
Fortran, COBOL, C, VB, and, more recently, Java and C# were developed to
allow us to put our ideas in a human-friendly format that can then be converted
into a format a computer chip can understand.
At its most basic, it’s the compiler’s job to translate this textual representation or
source code into a series of 0s and 1s or machine code that the computer can
interpret as actions or steps you want it to perform. It does this using a series of
pattern-matching rules. A lexical analyzer tokenizes the source code-----and any
mistakes or words that aren’t in the compiler’s lexicon are rejected. These
tokens are then passed to the language parser, which matches one or more
tokens to a series of rules and translates the tokens into intermediate code
(VB.NET, C#, Pascal, or Java) or sometimes straight into machine code
(Objective-C, C++, or Fortran). Any source code that doesn’t match a compiler’s
rules is rejected, and the compilation fails.
893496320.005.png
CHAPTER 1: Laying the Groundwork
2
Now you know what a compiler does, but I’ve only scratched the surface.
Compiler technology has always been a specialized and sometimes complicated
area of computing. Modern advances mean things are going to get even more
complicated, especially in the virtual machine domain. In part, this drive comes
from Java and .NET. Just in time (JIT) compilers have tried to close the gap
between Java and C++ execution times by optimizing the execution of Java
bytecode. This seems like an impossible task, because Java bytecode is, after
all, interpreted, whereas C++ is compiled. But JIT compiler technology is making
significant advances and also making Java compilers and virtual machines
much more complicated beasts.
Most compilers do a lot of preprocessing and post-processing. The
preprocessor readies the source code for the lexical analysis by stripping out all
unnecessary information, such as the programmer’s comments, and adding any
standard or included header files or packages. A typical post-processor stage is
code optimization, where the compiler parses or scans the code, reorders it,
and removes any redundancies to increase the efficiency and speed of your
code.
Decompilers (no big surprise here) translate the machine code or intermediate
code back into source code. In other words, the whole compiling process is
reversed. Machine code is tokenized in some way and parsed or translated back
into source code. This transformation rarely results in the original source code,
though, because information is lost in the preprocessing and post-processing
stages.
Consider an analogy with human languages: decompiling an Android package
file (APK) back into Java source is like translating German ( classes.dex ) into
French (Java class file) and then into English (Java source). Along they way, bits
of information are lost in translation. Java source code is designed for humans
and not computers, and often some steps are redundant or can be performed
more quickly in a slightly different order. Because of these lost elements, few (if
any) decompilations result in the original source.
A number of decompilers are currently available, but they aren’t well publicized.
Decompilers or disassemblers are available for Clipper (Valkyrie), FoxPro (ReFox
and Defox), Pascal, C (dcc, decomp, Hex-Rays), Objective-C (Hex-Rays), Ada,
and, of course, Java. Even the Newton, loved by Doonesbury aficionados
everywhere, isn’t safe. Not surprisingly, decompilers are much more common
for interpreted languages such as VB, Pascal, and Java because of the larger
amounts of information being passed around.
893496320.006.png
Zgłoś jeśli naruszono regulamin