数据结构课程设计哈夫曼编译器.docx

资源描述

数据结构课程设计哈夫曼编译器.docx

《数据结构课程设计哈夫曼编译器.docx》由会员分享，可在线阅读，更多相关《数据结构课程设计哈夫曼编译器.docx（20页珍藏版）》请在冰点文库上搜索。

数据结构课程设计哈夫曼编译器.docx

数据结构课程设计哈夫曼编译器

中南大学

数据结构课程设计报告

题目哈夫曼编译器

学生

指导教师

学院信息科学与工程学院

专业班级计科1302

实验要求……………………………………………3

问题描述……………………………………………3

问题解决方法………………………………………3

程序模块功能及流程图……………………………4

调试与测试…………………………………………8

测试结果……………………………………………9

心得体会……………………………………………11

源代码………………………………………………12

一．实验要求

（1）从键盘读入字符集大小n,以及n个字符和权值，建立哈夫曼树。

（2）利用已建好的哈夫曼树对文件正文进行编码，将结果存入相关文件中。

（3）利用已建好的哈夫曼树将编码文件中的代码进行译码，结果存入文件中。

（4）输出代码文件，以紧凑格式显示。

二．问题描述

利用哈夫曼编码进行通信可以大大提高信道利用率，缩短信息传输时间，降低传输成本。

这要求在发送端通过一个编码系统对待传数据预先编码，在接收端将传来的数据进行译码。

对于双向传输信息的信道，每端都需要一个完整的编译码系统。

为这样的信息收发站编写哈夫曼编译系统。

哈夫曼树又称最优二叉树，构造的规则即给定n个权值不同的叶子节点，构造一棵二叉树，使二叉树的带权路径长度达到最小。

具体做法即要使权值较大的结点离根节点较近，权值较小的结点离根节点较远。

三．问题解决方法

建立哈夫曼树时要进行多次选择，每次选择出权值最小和次小的两个节点,将两结点权值相加，作为新生成父节点的权值。

并分别将其作为左、右孩子。

再将父节点加入需选择的结点序列中，继续选择，直到将所有节点都选完为止，构成一颗哈夫曼树。

每种字符对应一个节点，将每种字符的出现次数作为对应节点权值。

在编码过程中，较科学的方法是统计文章中每种字符出现的频率，并以其作为对应节点的权值，使出现频率较高的节点离根结点较近，从而使出现频率越高的字符所得的编码位数越少，这样做得到的编码结果是最简练的，也更有利于译码。

编码需从叶节点向上回溯，若叶节点为其父结点的左孩子，则编码为0，若为右孩子，则编码为1。

然后将父节点作为下一轮循环的子节点，继续重复上述步骤，直至到达根节点为止，即得到初始叶节点对应的编码。

译码是编码的逆过程，所以译码只需读入编码位串，从根结点开始，若读到0，则走向左孩子，读到1，则走向右孩子。

并将对应的子节点作为下一轮循环的叶节点，重复上述步骤，直至到达最终叶节点，该叶节点即为编码对应的节点。

四．程序模块功能及流程图

1.主要程序模块及功能

（1）建立哈夫曼树

数据结构：

tree[]为定义在Huffmantree类上的数组对象。

n为节点个数，即字符种类数。

m为建好的哈夫曼树的总节点数，在哈夫曼树中，m=2*n-1。

Smal、small2分别存放每轮循环中权值最小和次小的节点的权值。

p1,p2分别记住每次合并时权值最小和次小的两个根结点的下标。

对应代码段：

for（i=0;i

tree[i]=newHuffmantree（）;

}

floatsmall1,small2;//建立哈夫曼树

for（i=0;i

{

tree[i].parent=0;

tree[i].lchild=-1;

tree[i].rchild=-1;

tree[i].weight=0;

}

for（i=0;i

{

tree[i].ch=ch[i];

tree[i].weight=arr[i];

}

for（i=n;i

{

p1=0;p2=0;

small1=10000;small2=100;

for（j=0;j

if（tree[j].parent==0）

if（tree[j].weight

{

small2=small1;

small1=tree[j].weight;

p2=p1;

p1=j;

}

else

if（tree[j].weight

{

small2=tree[j].weight;

p2=j;

}

tree[p1].parent=i;//建立子节点与父节点间的对应关系，并将父节点权值赋为两子节点权值之和

tree[p2].parent=i;

tree[i].lchild=p1;

tree[i].rchild=p2;

tree[i].weight=tree[p1].weight+tree[p2].weight;

}

（2）编码模块

数据结构：

Code[]为定义在codetype类上的数组对象。

c为缓冲变量，其值为当前节点的下标值。

p为父节点的下标值。

Start为每个字符编码位串中第一个字符的起始位置。

对应代码段：

intc,p;//编码部分，c为当前节点编号，p为其父节点编号

Code=newCodetype[n];

for（i=0;i

Code[i]=newCodetype（）;

Code[i].bits=newCharacter[n];

}

for（i=0;i

{

Code[i].start=n;//start为编码位串的起始位置

Code[i].ch=tree[i].ch;

c=i;

p=tree[i].parent;

while（p!

=0）

{

Code[i].start--;

if（tree[p].lchild==c）//向上回溯编码

Code[i].bits[Code[i].start]='0';

else

Code[i].bits[Code[i].start]='1';

c=p;

p=tree[p].parent;//将父节点作为下一轮循环的子节点

}

Code[i]=Code[i];

}

（3）译码模块

数据结构：

p为父节点编号。

t为待译码文件的字符数。

b[]为存放待译码文件容的数组。

ym存放译码结果。

对应代码段：

for（intq=0;q

if（b[q]=='0'）

p=tree[p].lchild;

else

p=tree[p].rchild;

if（tree[p].lchild==-1）

{

Stringym=tree[p].ch.toString（）;

fw1.write（ym）;

p=m-1;}}

（4）字符统计模块

数据结构：

len为文章中的字符数。

a[i]为存放文章容的数组。

Ch[j]存放不同种类的字符,开始里面所有字符都为0值。

arr[]存放每种字符在文章中出现的次数。

对应代码段：

for（inti=0;i

for（intj=0;j

if（a[i]==ch[j]）break;

else

if（j==n-1）{ch[n-1]=a[i];//若ch[]中找不到a[i]中存放的字符，则将该种字符放到ch[]中。

若找到，则说明该种字符已被存入ch[].

n++;

break;

}

}//初始化ch[],存放字符种类

for（inti=0;i

for（intj=0;j

{if（a[i]==ch[j]）

arr[j]++;//统计文章中每种字符的出现次数。

}

（5）Huffman类

publicclassHuffmantree{

publicintweight;//weight为节点的权值

publicintparent,lchild,rchild;//分别为当前节点的父节点，左、右子节点编号

publicCharacterch;//ch为节点名，即对应的字符。

publicHuffmantree（）{//初始化，每个节点构成一个单节点树，权值为0。

weight=0;

parent=0;

lchild=-1;

rchild=-1;

ch='0';

}

（5）codetype类

publicclassCodetype{

publicCharacterbits[];//一维数组，存放每个字符对应的编码位串

publicintstart;//start为每个字符位串的起始位置

publiccharch;

publicCodetype（）{

start=0;

ch=0;

}}}

2.流程图

将文件容读入数组

统计文件中字符的种类和出现次数

建立哈夫曼树

哈夫曼树编码

将编码容写入数组和文件

对编码文件进行译码

五．调试与测试

分别输入多篇文章进行测试，文章字符数由少到多。

在程序编写过程中，要应用的数组较多，数组的使用使原本难于实现的算法变得简单易行。

但因数组产生的问题也较多。

编码时因存放文章及其频率的数组定义长度较短，不能给较长的文章编码。

故要把相应数组长度改大一些。

输出时会因为数组长度不匹配的问题出现空字符，也要做相应的调整。

测试文章：

1.su.fv,y,uewgbu;i;fewiu!

2.WhenIwasalittlegirl,Idreamedtogrowup.BecauseIthinkachilddoesn'thasfreedom,andcan'tdoanythingbymyself.

ButnowIhavegrowup,tomysurprise,Ifeelmoretiredandhavemoreresponsibility.ThoughIcandosomethingmyself,Idon'tfeelhappyatall.

Ibelieveyoualsohavethesamethoughswithme.wheneveryuswasachild,wewantedtogrowup,butwhenwebecameaolderman,wedon'thavesuchnicelifeaswish.

Sowhateverwearechildrenoradults,weshouldtrytomakeourlifebetter,andmakeourselvesmorehappy.weshouldtryourbesttostudyhard,thenwecanletparentshavegoodlife,too!

doourbesttodoourself!

Believeyourself!

Youarethebest!

六．测试结果

七．心得体会

通过本次实验，我复习了数据结构中常见的一种结构——树形结构，本次实验对象是一种特殊的树结构，即哈夫曼树。

通过构造哈夫曼树，我熟练掌握了树的构建及其要素。

而编码和译码是在以了解树形结构的基础上，考验我的算法分析与设计能力。

而字符统计及文件连接又涉及到许多文件操作，这使我深入了解了java关于文件的库函数及操作语句。

这些提高了我在程序设计上的综合能力。

同时，本次实验也出现了一些问题如在数组、文件等操作上考虑不周，使程序运行结果不尽如人意。

但通过多次的调试及测试，我逐步改正了这些问题。

这使我认识到调试的重要性，即编写程序不仅要知道怎么实现，更要知道怎么找出错误并改正错误，这是很重要的一项技能。

8．源代码

主类

packageHuffman;

importjava.io.File;

importjava.io.FileReader;

importjava.io.FileWriter;

publicclassMain{

publicstaticHuffmantree[]tree;

publicstaticCodetype[]Code;

publicstaticvoidmain（String[]args）throwsException{

floatlen;

intn=1;

int[]sum=newint[50000];

char[]ch=newchar[50000];

Filefile=newFile（"d:

\\原文件.txt"）;

FileReaderfr=newFileReader（file）;

char[]a=newchar[（int）file.length（）];

fr.read（a）;

fr.close（）;

len=a.length;//len为文件长度，n为字符种类数

for（inti=0;i

for（intj=0;j

if（a[i]==ch[j]）break;

else

if（j==n-1）{ch[n-1]=a[i];

n++;

break;

}

}//初始化ch[],存放字符种类

System.out.println（"文件中容如下:

"）;

for（intu=0;u

{System.out.print（a[u]）;

}

System.out.println（"\n"）;

for（inti=0;i

for（intj=0;j

{if（a[i]==ch[j]）

sum[j]++;

}

System.out.println（"文件中各字符及其出现次数如下:

"）;

for（inti=0;i

System.out.println（ch[i]+":

"+sum[i]）;

}

inti,j,p1,p2,x;

n--;

intm=n*2-1;

tree=newHuffmantree[m];

for（i=0;i

tree[i]=newHuffmantree（）;

}

floatsmall1,small2;//建立哈夫曼树

for（i=0;i

{

tree[i].parent=0;

tree[i].lchild=-1;

tree[i].rchild=-1;

tree[i].weight=0;

}

for（i=0;i

{

tree[i].ch=ch[i];

tree[i].weight=sum[i];

}

for（i=n;i

{

p1=0;p2=0;

small1=10000;small2=100;

for（j=0;j

if（tree[j].parent==0）

if（tree[j].weight

{

small2=small1;

small1=tree[j].weight;

p2=p1;

p1=j;

}

else

if（tree[j].weight

{

small2=tree[j].weight;

p2=j;

}

tree[p1].parent=i;

tree[p2].parent=i;

tree[i].lchild=p1;

tree[i].rchild=p2;

tree[i].weight=tree[p1].weight+tree[p2].weight;

}

intc,p;//编码部分

Code=newCodetype[n];

for（i=0;i

Code[i]=newCodetype（）;

Code[i].bits=newCharacter[n];

}

for（i=0;i

{

Code[i].start=n;

Code[i].ch=tree[i].ch;

c=i;

p=tree[i].parent;

while（p!

=0）

{

Code[i].start--;

if（tree[p].lchild==c）

Code[i].bits[Code[i].start]='0';

else

Code[i].bits[Code[i].start]='1';

c=p;

p=tree[p].parent;

}

Code[i]=Code[i];

}

System.out.println（"每种字符的编码结果如下：

"）;

for（i=0;i

System.out.print（Code[i].ch+":

"）;

for（intr=Code[i].start;r

System.out.print（Code[i].bits[r]）;

System.out.println（""）;

}

FileWriterfw=newFileWriter（"d:

\\编码文件.txt"）;

for（intk=0;k

for（intl=0;l

if（a[k]==Code[l].ch）{

for（inth=Code[l].start;h

Stringbm=Code[l].bits[h].toString（）;

fw.write（bm）;

}

}}

fw.close（）;

Filefile1=newFile（"d:

\\编码文件.txt"）;

FileReaderfr1=newFileReader（file1）;

char[]b=newchar[（int）file1.length（）];//将编码后的文件读入数组b[]

fr1.read（b）;

fr1.close（）;

intt=b.length;

p=m-1;//根节点

FileWriterfw1=newFileWriter（"d:

\\译码文件.txt"）;

for（intq=0;q

if（b[q]=='0'）

p=tree[p].lchild;

else

p=tree[p].rchild;

if（tree[p].lchild==-1）

{

Stringym=tree[p].ch.toString（）;

fw1.write（ym）;

p=m-1;

}

}fw1.close（）;

}

Huffmantree类

publicclassHuffmantree{

publicintweight;

publicintparent,lchild,rchild;

publicCharacterch;

publicHuffmantree（）{

weight=0;

parent=0;

lchild=-1;

rchild=-1;

ch='0';

}}

codetype类

publicclassCodetype{

publicCharacterbits[];

publicintstart;

publiccharch;

publicCodetype（）{

start=0;

ch=0;

}}

结果和注释分别见测试结果和程序模块功能部分。

展开阅读全文