using libxml2.docx - 冰点文库

资源描述

using libxml2.docx

《using libxml2.docx》由会员分享，可在线阅读，更多相关《using libxml2.docx（20页珍藏版）》请在冰点文库上搜索。

using libxml2.docx

usinglibxml2

吴启福　2006-2007

wqf363@

Libxml2istheXMLCparserandtoolkitdevelopedfortheGnomeproject（butusableoutsideoftheGnomeplatform）,itisfreesoftwareavailableundertheMITLicense.

使用简介

数据类型：

xmlChar替代char,使用UTF-8编码的一字节字符串。

如果你的数据使用其它编码，它必须被转换到UTF-8才能使用libxml的函数。

XmlDoc包含由解析文档建立的树结构，xmlDocPtr是指向这个结构的指针。

xmlNodePtrandxmlNode包含单一结点的结构

xmlNodePtr是指向这个结构的指针，它被用于遍历文档树。

优点：

1. 安装、使用比较简单，容易入门；2. 支持的编码格式较多，能很好的解决中文问题（使用一个很简单的编码转换函数）；3. 支持Xpath解析（这点对于任意定位xml文档中的节点还是很有用的哦）；4.支持Well-formed和valid验证，具体而言支持DTD验证，Schema验证功能正在完善中（目前多数解析器都还不完全支持shema验证功能）；5. 支持目前通用的Dom、Sax方式解析等等。

不足：

1. 指针太多，使用不当时就会出现错误，在Linux系统中表现为常见的段错误，同样管理不当易造成内存泄漏；2.个人认为内面有些函数的功能设计的不是很好（比如获取Xpath函数，它不获取节点属性，这样子有些情况会定位不准）。

在学习libxml2中，最好的学习手册就是由官方开发者提供的开发手册就是libxml2-devel-2.6.19，rpm–q–dlibxml2获得文档路径，就是它了。

关于xml

开始研究LibXML2库之前，让我们先来巩固一下XML的相关基础。

XML是一种基于文本的格式，它可用来创建能够通过各种语言和平台访问的结构化数据。

它包括一系列类似HTML的标记，并以树型结构来对这些标记进行排列。

例如，可参见清单1中介绍的简单文档。

这是配置文件部分中研究的配置文件示例的简化版本。

为了更清楚地显示XML的一般概念，所以对其进行了简化。

清单1.一个简单的XML文件

xmlversion="1.0"encoding="UTF-8"?

root

delete

清单1中的第一行是XML声明，它告诉负责处理XML的应用程序，即解析器，将要处理的XML的版本。

大部分的文件使用版本1.0编写，但也有少量的版本1.1的文件。

它还定义了所使用的编码。

大部分文件使用UTF-8，但是，XML设计用来集成各种语言中的数据，包括那些不使用英语字母的语言。

接下来出现的是元素。

一个元素以开始标记开始（如），并以结束标记结束（如），其中使用斜线（/）来区别于开始标记。

元素是Node的一种类型。

XML文档对象模型（DOM）定义了几种不同的Nodes类型，包括Elements（如files或者age）、Attributes（如units）和Text（如root或者10）。

元素可以具有子节点。

例如，age元素有一个子元素，即文本节点10。

而files元素有七个子元素。

其中三个很明显。

它们分别是三个子元素：

owner、action和age。

其他四个分别是元素前后的空白文本符号。

XML解析器可以利用这种父子结构来遍历文档，甚至修改文档的结构或内容。

LibXML2是这样的解析器中的其中一种，并且文中的示例应用程序正是使用这种结构来实现该目的。

对于各种不同的环境，有许多不同的解析器和库。

LibXML2是用于UNIX环境的解析器和库中最好的一种，并且经过扩展，它提供了对几种脚本语言的支持，如Perl和Python。

1tree

/*******************************************

*compile:

gcc-I/usr/include/libxml2/-lxml2tree1.c

*usage:

createaxmltree

*******************************************/

#include

intmain（intargc,char**argv）

{

xmlDocPtrdoc=NULL;/*documentpointer*/

xmlNodePtrroot_node=NULL,node=NULL,node1=NULL;/*nodepointers*/

//Createsanewdocument,anodeandsetitasarootnode

doc=xmlNewDoc（BAD_CAST"1.0"）;

root_node=xmlNewNode（NULL,BAD_CAST"root"）;

xmlDocSetRootElement（doc,root_node）;

//createsanewnode,whichis"attached"aschildnodeofroot_nodenode.

xmlNewChild（root_node,NULL,BAD_CAST"node1",BAD_CAST"contentofnode1"）;

//xmlNewProp（）createsattributes,whichis"attached"toannode.

node=xmlNewChild（root_node,NULL,BAD_CAST"node3",BAD_CAST"nodehasattributes"）;

xmlNewProp（node,BAD_CAST"attribute",BAD_CAST"yes"）;

//Heregoesanotherwaytocreatenodes.

node=xmlNewNode（NULL,BAD_CAST"node4"）;

node1=xmlNewText（BAD_CAST"otherwaytocreatecontent"）;

xmlAddChild（node,node1）;

xmlAddChild（root_node,node）;

//Dumpingdocumenttostdioorfile

xmlSaveFormatFileEnc（argc>1?

argv[1]:

"-",doc,"UTF-8",1）;

/*freethedocument*/

xmlFreeDoc（doc）;

xmlCleanupParser（）;

xmlMemoryDump（）;//debugmemoryforregressiontests

return（0）;

}

生成的xml:

[denny@localhostxml]$gcc-I/usr/include/libxml2/-lxml2tree1.c

[denny@localhostxml]$./a.out

xmlversion="1.0"encoding="UTF-8"?

contentofnode1

nodehasattributes

otherwaytocreatecontent

执行序列：

1声明指针：

文档指针（xmlDocPtr），结点指针（xmlNodePtr）；

2生成文档doc：

xmlNewDoc

3生成根结点root_node：

xmlNewDocNode，xmlNewNode

4文档与根结点捆绑：

xmlDocSetRootElement

5结点操作

1）创建子结点：

xmlNewChild或xmlNewNode

2）设置结点属性：

xmlNewProp

3）设置结点值：

xmlNewText，xmlNewChild，xmlAddChild

6释放内存：

xmlFreeDoc，xmlMemoryDump

7lib的载入退出:

LIBXML_TEST_VERSION,xmlCleanupParser

2parse

对于应用程序来说，读取XML文件的第一步是加载该数据并将其解析为一个Document对象。

在此基础上，可以对DOM树进行遍历以获取特定的节点。

/*******************************************

*compile:

gcc-I/usr/include/libxml2/-lxml2tree1.c

*usage:

tree2filename_or_URL

*******************************************/

#include

#ifdefLIBXML_TREE_ENABLED

staticvoid

print_element_names（xmlNode*a_node）

{

xmlNode*cur_node=NULL;

for（cur_node=a_node;cur_node;cur_node=cur_node->next）{

if（cur_node->type==XML_ELEMENT_NODE）{

printf（"nodetype:

Element,name:

%s\n",cur_node->name）;

}

print_element_names（cur_node->children）;

}

/**

*Simpleexampletoparseafilecalled"file.xml",

*walkdowntheDOM,andprintthenameofthe

*xmlelementsnodes.

int

main（intargc,char**argv）

{

xmlDoc*doc=NULL;

xmlNode*root_element=NULL;

if（argc!

=2）

return

（1）;

//LIBXML_TEST_VERSION

/*parsethefileandgettheDOM*/

doc=xmlReadFile（argv[1],NULL,0）;

if（doc==NULL）{

printf（"error:

couldnotparsefile%s\n",argv[1]）;

}

/*Gettherootelementnode*/

root_element=xmlDocGetRootElement（doc）;

print_element_names（root_element）;

/*freethedocument*/

xmlFreeDoc（doc）;

//xmlCleanupParser（）;

return0;

}

#else

intmain（void）{

fprintf（stderr,"Treesupportnotcompiledin\n"）;

exit

（1）;

}

#endif

执行序列：

1声明指针：

文档指针（xmlDocPtr），结点指针（xmlNodePtr）；

2得到文档doc:

xmlReadFile

3得到根结点root_node：

xmlDocGetRootElement

4结点操作：

1）获得到结点值：

xmlNodeGetContent（对应于xmlFree）

2）遍历：

指向下一个结点：

xmlNodePtr->children

结点值：

xmlNodePtr->name,

结点内遍历：

xmlNodePtr->next

5释放内存：

xmlFreeDoc，xmlFree

3reader&writer

在大型的xml文件中，使用专用的xmlreaderandxmlwriter,读和写是分开的，这样可提高效率。

（writer）使用不同的API来写xml文件：

（下面4个函数接口使用了writer的四种途径）

voidtestXmlwriterFilename（constchar*uri）;

voidtestXmlwriterMemory（constchar*file）;

voidtestXmlwriterDoc（constchar*file）;

voidtestXmlwriterTree（constchar*file）;

4xpath&I/O

5APIMenu

5.1）加载文档

5.1.1）文件加载（文件I/O）

//parseanXMLfilefromthefilesystemorthenetwork.

xmlDocPtrxmlReadFile（constchar*filename,

constchar*encoding,

intoptions）

//parseanXMLdocumentfromI/Ofunctionsandsourceandbuildatree

xmlDocPtrxmlReadIO（xmlInputReadCallbackioread,

xmlInputCloseCallbackioclose,

void*ioctx,

constchar*URL,

constchar*encoding,

intoptions）

//parseanXMLfileandbuildatree.AutomaticsupportforZLIB/Compresscompresseddocumentisprovidedbydefaultiffoundatcompile-time.InthecasethedocumentisnotWellFormed,atreeisbuiltanyway

xmlDocPtrxmlRecoverFile（constchar*filename）

5.1.2）DOM（内存占用）

//parseanXMLin-memorydocumentandbuildatree.

xmlDocPtrxmlReadMemory（constchar*buffer,

intsize,

constchar*URL,

constchar*encoding,

intoptions）

//parseanXMLin-memorydocumentandbuildatree.InthecasethedocumentisnotWellFormed,atreeisbuiltanyway

xmlDocPtrxmlRecoverDoc（xmlChar*cur）

//parseanXMLin-memoryblockandbuildatree.InthecasethedocumentisnotWellFormed,atreeisbuiltanyway

xmlDocPtrxmlRecoverMemory（constchar*buffer,

intsize）

5.1.3）fromparse

//CreatesanewXMLdocument

xmlDocPtrxmlNewDoc（constxmlChar*version）

xmlNodePtrxmlNewDocNode（xmlDocPtrdoc,

xmlNsPtrns,

constxmlChar*name,

constxmlChar*content）

//parseanXMLfileandbuildatree.AutomaticsupportforZLIB/Compresscompresseddocumentisprovidedbydefaultiffoundatcompile-time.

xmlDocPtrxmlParseFile（constchar*filename）

5.2）释放，保存文档内容

//DumpthecurrentDOMtreeintomemoryusingthecharacterencodingspecifiedbythecaller.NoteitisuptothecallerofthisfunctiontofreetheallocatedmemorywithxmlFree（）.

voidxmlDocDumpMemoryEnc（xmlDocPtrout_doc,

xmlChar**doc_txt_ptr,

int*doc_txt_len,

constchar*txt_encoding）

//DumpanXMLdocumenttoafile.Willusecompressionifcompiledinandenabled.If@filenameis"-"thestdoutfileisused.

intxmlSaveFile（constchar*filename,

xmlDocPtrcur）

//mpanXMLdocument,convertingittothegivenencoding

intxmlSaveFileEn（constchar*filename,

xmlDocPtrcur,

constchar*encoding）

//DumpanXMLdocumenttoanI/Obuffer.Warning!

ThiscallxmlOutputBufferClose（）onbufwhichisnotavailableafterthiscall.

intxmlSaveFileTo（xmlOutputBufferPtrbuf,

xmlDocPtrcur,

constchar*encoding）

intxmlSaveFormatFile（constchar*filename,

xmlDocPtrcur,

intformat）

//DumpanXMLdocumenttoafileoranURL.

intxmlSaveFormatFileEnc（constchar*filename,

xmlDocPtrcur,

constchar*encoding,

intformat）

intxmlSaveFormatFileTo（xmlOutputBufferPtrbuf,

xmlDocPtrcur,

constchar*encoding,

intformat）

5.3）根结点

//Gettherootelementofthedocument（doc->childrenisalistcontainingpossiblycomments,PIs,etc...）.

xmlNodePtrxmlDocGetRootElement（xmlDocPtrdoc）

//Settherootelementofthedocument（doc->childrenisalistcontainingpossiblycomments,PIs,etc...）.

xmlNodePtrxmlDocSetRootElement（xmlDocPtrdoc,xmlNodePtrroot）

5.4）结点创建释放操作

//Searchthelastchildofanode.

xmlNodePtrxmlGetLastChild（xmlNodePtrparent）

//BuildastructurebasedPathforthegivennode

xmlChar*xmlGetNodePath（xmlNodePtrnode）

xmlNodePtrxmlNewChild（xmlNodePtrparent,

xmlNsPtrns,

constxmlChar*name,

constxmlChar*content）

//Creationofanewnodeelement.@nsisoptional（NULL）.

xmlNodePtrxmlNewNode（xmlNsPtrns,

constxmlChar*name）

//Set（orreset）thenameofanode.

voidxmlNodeSetName（xmlNodePtrcur,

constxmlChar*name）

//Unlinkanodefromit'scurrentcontext,thenodeisnotfreed

voidxmlUnlinkNode（xmlNodePtrcur）

//Creationofanewchildelement,addedattheendof

展开阅读全文