Monday, October 23, 2017

Pwn2Win 2017 - Shift Register

Disclaimer: I am not an electronics engineer. I just play one on Twitter.
A lot of the following might be heresy to someone who ever actually dealt with silicon.

This Saturday, I woke up to the following on the Dragon Sector IRC channel:

<@valis> <@thotypous> where is q3k to solve my shift register?

Well okay, if you insist :D.

Introduction

Let's take a look at the task description:
ButcherCorp has developed an extremely reliable radiation-hardened IC packaging process. Its only drawback is requiring the ASIC to be fabricated using a specific 0.35μm technology, considered outdated by today's standards. The intel we gathered suggests that their standard cell library is based on osu035.One of our operatives managed to exfiltrate a GDSII file ButcherCorp has sent to a silicon fab. Apparently, it is the IC responsible for validating the launching code in their latest rocket engine.The IC samples the in pin at the clock's positive edge. If the correct sequence of bits is presented to the IC, it drives the unlocked pin to a high logical level. Find this sequence. It forms an ASCII string which you should submit as a flag.

So, we are given artwork of lithography masks for a VLSI ASIC (a.k.a. integrated circuit) fabricated in AMI 0.35µm technology using the Oklahoma State University OSU035 open source flow. The chip is a simple three-terminal device: a clock and in pin to receive a bitstream of a flag, and an unlocked pin that goes high if the bitstream is correct. We have to reverse engineer it to find the key.

The file is in GDSII/GDS2 format - an industry standard file format for exchange of lithography masks of integrated circuits, mostly used as en exchange format between EDA suites and as an output format given to silicon fabs by hardware designers.

The challenge ASIC is fabricated in Standard Cell technology - this means, in very simplified terms, that the software that converts a digital logic definition file (in RTL form) has a bunch of different 'cells' to choose from (ie. AND, NOT, D-Flip-flop, etc...), which it places in a grid and then connects with metal layers to form the final integrated circuit. The library of standard cells used in this task is OSU035.

A quick refresher on the physical topology of a VLSI circuit:
By Cepheiden, CC BY 2.5, via Wikimedia Commons
What we'll bee mostly looking at today is the 'BEOL' (back-end of line) section of the chip - the metal interconnect section that wires together the silicon (in the FEOL) of the standard cells together. These metal interconnects consist of a number of planes/layers with vias that connect them together and tungsten pins that connect them to the silicon of the lower layers.

Or, from a different perspective:
By David Caron, CC BY 3.0, via Wikipedia
The yellow part are the metal interconnects.

Let's load the mask in KLayout, an open source GDSII viewer:


Well, that's a bit of a soup. Let's copy over layer name/number mappings from osu_stdcells/lib/ami035/abstract/ami035_layers.map and reorganize them a bit. And zoom in a bit.


We can see the instantiated cells (for example, three 'LDCZ' cells on the top left) represented as boxes with their contents hidden (they are metal 1, pins and silicon layers), and global metal 1-4 layers and vias that join those layers together (visible). If we change display settings, we can peek into some of those standard cells and see how they're implemented.

Here we can see instances of the 'CNLZ' standard cell metal 1 layer (in red), instantiated multiple times into the design. Some of the instances which are flipped vertically (to fit them across power rails in the design) or horizontally (to better route metal signals).

But what is the logical function of this CNLZ cell? We could go about reverse engineering it at gate-level (by inspecting its' silicon layers). However, if we open osu035_stdcells.gds2 from the OSU035 files, we can see that it's in reality AND2X2 (a 2-to-1 AND gate), just renamed and oriented differently during instantiation.


We can now spend around half an hour and map the obfuscated cell names from the challenge GDSII into OSU035 cell names and their logical functions from the cell definition file. I also took the time to note down coordinate of the pins (i.e. A, B, Y, D, Q, CLK ...) and their names, so that we can map the I/O of the cells later on.


Now, here comes the hard part. We know what cells are used, and where. But we also want to know how they're connected, so we can reverse engineer the logic of the chip.

What we need is a netlist - a description of all the electrical nets in the design and how they connect all the logic cells together. There's a few ways to go about this:
Since I had no experience with any IC design EDA tools, I decided for the last option and wrote my own netlist extraction tool from scratch. I ended up with some pretty craptacular code, but it did the trick. So, without further ado, here's

How To Perform Netlist Extraction In Three Easy Steps

Step 1 - combine GDSII rectangles (boundaries) in all metal layers into 'segments'

In GDSII, every layer is just represented by a collection of rectangles (also called boundaries). There's no information on electrical connectivity, just a graphical representation in the form of boxes/boundaries of silicon layers, metal layers and via layers.

Our first job is to turn these into collections of rectangles on metal layers into objects that represent an electrically connected trace, which I called 'segment'.

Here's how I did it:
  • I loaded the GDSII file, transformed every metal rectangle into a slightly larger bounding box (because I wanted to use bounding box collision detection to connect metal boundaries that might be just slightly touching).
  • For every cell/structure of the design, I instantiated the structure's metal rectangles as well, after transforming them accordingly to flips/rotations.
  • For every metal layer of the design:
    • For every metal bounding box created above, I created a new Segment that's made up just of this metal bounding box.
    • For every segment, I try to find all other segments that touch it, and if so, merge the segments together. I repeated this until no more segments got combined.
      • This is the slowest part of the entire process. I tried to use a quadtree, but failed miserably.
This makes us end up with a set of segments per layer, where each segment is an electrically connected set of metal traces. We can then visualize all of the 4 metal layers separately, displaying each one of them in a different colour to check that they got merged correctly:


layer 49 - metal 1

layer 31 - metal 4
Looks promising.

Step 2 - combine segments together into nets that can span multiple layers

Now we want to start combining those segments across layers, if they connect via vias.

This is actually pretty easy. Here's my approach:
  • For every segment of every layer I wrapped it into a new Net. These nets, apart from containing rectangles, also have:
    • a name (autogenerated or using a predefined name if it starts at a know point, eg. one of the I/O pins)
    • a set of vias they go through (initially empty)
  • For every via in the design:
    • I found all rectangles that connect to it (on the neighbouring metal layers) and the nets in which they're contained. I then appended that via to the nets' via set.
  • I then join all nets by their common vias:
    • For every net in the design:
      • For every via in this net:
        • I merge the net with all other nets that contain this via.
By the end of this, we have a set of Nets that are mad up of electrically connected rectangles and vias, that span multiple metal layers. We can now render all of the nets to have an overview of all the metal layers in the chip:


We can finally start making some sense of the chip. For instance, the repeating horizontal and thick vertical traces are power rails. However, we're still just dealing with passive electrical connections, and no logic. Let's work on this next.

Step 3 - map nets to cells

We now want to create a representation of all the logic cells in this design, with their I/O pins mapped to nets. Here's how I went about this:
  • For every cell of the design, I calculated the absolute coordinates of the I/O pins that I mapped out earlier.
    • For every I/O pin, I find a net that contains it on a rectangle in metal layer 1. I map this net as that pin for this cell. I also back-map this the other way around - what pins on what cells is a given net connected to.
This leaves us with nice (albeit not super easy to traverse from a programming point of view) graph of nets and cells, connected together by the cells' I/O pins. We can generate our first netlist from this:


Hooray! We could go to town on this and feed it into a Verilog simulator and try to blackbox this netlist as is. But I used some extra automation to make my life easier.

Netlist Analysis

The first thing I did was prune all of the buffer cells from the netlist. The code is quite simple, I just find all buffers in the design and merge nets on the input and output:


This makes our netlist much nicer. Nice enough, in fact, to notice a few key facts:
  • All the D-flip-flops are clocked from the outside clock line synchronously
  • All the D-flip-flops are daisy-chained together via AND gates.
  • All the only other use of the D-flip-flops data outputs, apart from daisychaining, is to drive a large combinatorial network of logic gates that outputs the 'unlocked' signal.

This means that we're basically just dealing with a single large shift register fed from the 'in' and 'clock' lines and a bunch of combinatorial logic that checks its' state.

I then performed some extra analysis of the cells to find the actual stages of the shift register. I started from the 'in' signal, and looked for the next AND gate (sometimes implemented via INV/NOR gates) that would then feed the next flip-flop, which would then feed the next AND gate, etc:


This found 320 stages (all the D-flops in the design), which hints at a 40-byte ASCII encoded flag as the key.

The last and final step was to reverse engineer the combinatorial logic that checks the shift register outputs. Of course, I also automated that part. I traversed from the 'unlocked' net down recursively until I found either more gates, or one of the shift register stages (which I now knew to which bit of the key they corresponded). This let me build an AST of the expression that calculates the 'unlocked' value as a function of the shift register data.


Finally, I fed this AST (first converted back into assignment expressions) to the Z3 SMT solver to get the flag.


Summary

Extremely fun task, and mad props to its' author. I had a lot of tun writing the tooling from scratch over the weekend and highly recommend it as a practical exercise in algo coding. I will definitely take a closer look at EDA tools for VLSI, though :).

I definitely hope to see more challenges like these at other CTFs.