1、Oracle全文检索方面的研究全1、准备流程 1.1检查和设置数据库角色首先检查数据库中是否有CTXSYS用户和CTXAPP脚色。如果没有这个用户和角色,意味着你的数据库创建时未安装intermedia功能。你必须修改数据库以安装这项功能。默认安装情况下,ctxsys用户是被锁定的,因此要先启用ctxsys的用户。 默认ctxsys用户是被锁定的且密码即时失效,所以我们以sys用户进入em,然后修改ctxsys用户的状态和密码。如图:1.2赋权 测试用户以之前已经建好的foo用户为例,以该用户下的T_DOCNEWS为例先以sys用户dba身份登录,对foo赋resource,connect权限
2、 GRANT resource, connect to foo;再以ctxsys用户登录并对foo用户赋权GRANT ctxapp TO foo;GRANT execute ON ctxsys. ctx_cls TO foo;GRANT execute ON ctxsys. ctx_ddl TO foo;GRANT execute ON ctxsys. ctx_doc TO foo;GRANT execute ON ctxsys. ctx_output TO foo;GRANT execute ON ctxsys. ctx_query TO foo;GRANT execute ON ctxsy
3、s. ctx_report TO foo;GRANT execute ON ctxsys. ctx_thes TO foo;GRANT execute ON ctxsys. ctx_ulexer TO foo;查看系统默认的oracle text 参数Select pre_name, pre_object from ctx_preferences2、Oracle Text 索引原理Oracle text 索引将文本中所有的字符转化成记号(token),如 会转化成www,taobao,com 这样的记号。Oracle10g 里面支持四种类型的索引,context,ctxcat,ctxrule,
4、ctxxpath2.1 Context 索引Oracle text 索引把全部的word 转化成记号,context 索引的架构是反向索引(invertedindex),每个记号都映射着包含它自己的文本位置,如单词dog 可能会有如下的条目这表示dog 在文档doc1,doc3,doc5 中都出现过。索引建好之后,系统中会自动产生如下DR$MYINDEX$I,DR$MYINDEX$K,DR$MYINDEX$R,DR$MYINDEX$X,MYTABLE5 个表(假设表为mytable, 索引为myindx) 。Dml 操作后, context 索引不会自动同步, 需要利用ctx_ddl.sync
5、_index 手工同步索引。例子:Create table docs (id number primary key, text varchar2(200);Insert into docs values(1, california is a state in the us.);Insert into docs values(2, paris is a city in france.);Insert into docs values(3, france is in europe.);Commit;/-建立context 索引Create index idx_docs on docs(text)i
6、ndextype is ctxsys.context parameters(filter ctxsys.null_filter section group ctxsys.html_section_group);-查询Column text format a40; -字符串截为40位显示。Select id, text from docs where contains(text, france) 0;id text- -3 france is in europe.2 paris is a city in france.-继续插入数据Insert into docs values(4, los a
7、ngeles is a city in california.);Insert into docs values(5, mexico city is big.);commit;Select id, text from docs where contains(text, city) 0;-新插入的数据没有查询到id text-2 paris is a city in france.-索引同步beginctx_ddl.sync_index(idx_docs, 2m); -使用2M同步索引end;-查询Column text format a50;Select id, text from docs
8、where contains(text, city) 0; -查到数据id text-5 mexico city is big.4 los angeles is a city in california.2 paris is a city in france.- or 操作符Select id, text from docs where contains(text, city or state ) 0;-and 操作符Select id, text from docs where contains(text, city and state ) 0;或是Select id, text from
9、docs where contains(text, city state ) 0;-score 表示得分,分值越高,表示查到的数据越精确SELECT SCORE(1), id, text FROM docs WHERE CONTAINS(text, oracle, 1) 0;Context 类型的索引不会自动同步,这需要在进行Dml 后,需要手工同步索引。与context 索引相对于的查询操作符为contains2.2 Ctxcat 索引用在多列混合查询中Ctxcat 可以利用index set 建立一个索引集,把一些经常与ctxcat 查询组合使用的查询列添加到索引集中。比如你在查询一个商品
10、名时,还需要查询生产日期,价格,描述等,你可可以将这些列添加到索引集中。oracle 将这些查询封装到catsearch 操作中,从而提高全文索引的效率。在一些实时性要求较高的交易上,context 的索引不能自动同步显然是个问题,ctxcat则会自动同步索引例子:Create table auction(Item_id number,Title varchar2(100),Category_id number,Price number,Bid_close date);Insert into auction values(1, nikon camera, 1, 400, 24-oct-2002
11、);Insert into auction values(2, olympus camera, 1, 300, 25-oct-2002);Insert into auction values(3, pentax camera, 1, 200, 26-oct-2002);Insert into auction values(4, canon camera, 1, 250, 27-oct-2002);Commit;/-确定你的查询条件(很重要)-Determine that all queries search the title column for item descriptions-建立索引
12、集beginctx_ddl.create_index_set(auction_iset);ctx_ddl.add_index(auction_iset,price); /* sub-index a*/end;-建立索引Create index auction_titlex on auction(title) indextype is ctxsys.ctxcatparameters (index set auction_iset);Column title format a40;Select title, price from auction where catsearch(title, cam
13、era, order by price) 0;Title price- -Pentax camera 200Canon camera 250Olympus camera 300Nikon camera 400Insert into auction values(5, aigo camera, 1, 10, 27-oct-2002);Insert into auction values(6, len camera, 1, 23, 27-oct-2002);commit;/-测试索引是否自动同步Select title, price from auction where catsearch(tit
14、le, camera,price 0;Title price- -aigo camera 10len camera 23添加多个子查询到索引集:beginctx_ddl.drop_index_set(auction_iset);ctx_ddl.create_index_set(auction_iset);ctx_ddl.add_index(auction_iset,price); /* sub-index A */ctx_ddl.add_index(auction_iset,price, bid_close); /* sub-index B */end;drop index auction_t
15、itlex;Create index auction_titlex on auction(title) indextype is ctxsys.ctxcatparameters (index set auction_iset);SELECT * FROM auction WHERE CATSEARCH(title, camera,price = 200 order by bid_close)0;SELECT * FROM auction WHERE CATSEARCH(title, camera,order by price, bid_close)0;任何的Dml 操作后,Ctxcat 的索引
16、会自动进行同步,不需要手工去执行,与ctxcat 索引相对应的查询操作符是catsearch.语法:Catsearch(schema.column,Text_query varchar2,Structured_query varchar2,Return number;例子:catsearch(text, dog, foo 15)catsearch(text, dog, bar = SMITH)catsearch(text, dog, foo between 1 and 15)catsearch(text, dog, foo = 1 and abc = 123)2.3 Ctxrule 索引The
17、 function of a classification application is to perform some action based on document content.These actions can include assigning a category id to a document or sending the document to a user.The result is classification of a document.例子:Create table queries (query_id number,query_string varchar2(80
18、);insert into queries values (1, oracle);insert into queries values (2, larry or ellison);insert into queries values (3, oracle and text);insert into queries values (4, market share);commit;Create index queryx on queries(query_string) indextype is ctxsys.ctxrule;Column query_string format a35;Select
19、 query_id,query_string from querieswhere matches(query_string,oracle announced that its market share in databasesincreased over the last year.)0;query_id query_string- -1 oracle4 market share在一句话中建立索引匹配查询2.4 Ctxxpath 索引Create this index when you need to speed up existsNode() queries on an XMLType co
20、lumn3. 索引的内部处理流程3.1 Datastore 属性数据检索负责将数据从数据存储(例如 web 页面、数据库大型对象或本地文件系统)中取出,然后作为数据流传送到下一个阶段。Datastore 包含的类型有Direct datastore,Multi_column_datastore, Detail_datastore, File_datastore, Url_datastore, User_datastore,Nested_datastore。3.1.1.Direct datastore支持存储数据库中的数据,单列查询.没有attributes 属性支持类型:char, varch
21、ar, varchar2, blob, clob, bfile,or xmltype.例子:Create table mytable(id number primary key, docs clob);Insert into mytable values(111555,this text will be indexed);Insert into mytable values(111556,this is a direct_datastore example);Commit;-建立 direct datastoreCreate index myindex on mytable(docs)inde
22、xtype is ctxsys.contextparameters (datastore ctxsys.default_datastore);Select * from mytable where contains(docs, text) 0;3.1.2.Multi_column_datastore适用于索引数据分布在多个列中the column list is limited to 500 bytes支持number 和date 类型,在索引之前会先转化成texttraw and blob columns are directly concatenated as binary data.不支
23、持long, long raw, nchar, and nclob, nested tableCreate table mytable1(id number primary key, doc1 varchar2(400),doc2 clob,doc3clob);Insert into mytable1 values(1,this text will be indexed,following example creates amulti-column ,denotes that the bar column );Insert into mytable1 values(2,this is a di
24、rect_datastore example,use this datastore when your text is stored in more than one column,the system concatenates the text columns);Commit;/-建立 multi datastore 类型BeginCtx_ddl.create_preference(my_multi, multi_column_datastore);Ctx_ddl.set_attribute(my_multi, columns, doc1, doc2, doc3);End;-建立索引Crea
25、te index idx_mytable on mytable1(doc1)indextype is ctxsys.contextparameters(datastore my_multi)Select * from mytable1 where contains(doc1,direct datastore)0;Select * from mytable1 where contains(doc1,example creates)0;注意:检索时,检索词对英文,必须是有意义的词,比如,Select * from mytable1 where contains(doc1, more than on
26、e column )0;可以查出第二条纪录,但你检索more将没有显示,因为more在那句话中不是有意义的一个词。-只更新从表,看是否能查到更新的信息Update mytable1 set doc2=adladlhadad this datastore when your text is stored test whereid=2;BeginCtx_ddl.sync_index(idx_mytable);End;Select * from mytable1 where contains(doc1,adladlhadad)0; -没有记录Update mytable1 set doc1=this
27、 is a direct_datastore example where id=2; -更新主表BeginCtx_ddl.sync_index(idx_mytable);-同步索引End;Select * from mytable1 where contains(doc1,adladlhadad)0; -查到从表的更新对于多列的全文索引可以建立在任意一列上,但是,在查询时指定的列必须与索引时指定的列保持一致,只有索引指定的列发生修改,oracle 才会认为被索引数据发生了变化,仅修改其他列而没有修改索引列,即使同步索引也不会将修改同步到索引中.也就是说,只有更新了索引列,同步索引才能生效,要更
28、改其他列的同时也要再写一次即可。在多列中,对任意一列建立索引即可,更新其他列的同时,在update那个列,同步索引一次即可看到效果了。3.1.3 Detail_datastore适用于主从表查询(原文:use the detail_datastore type for text stored directly in the database indetail tables, with the indexed text column located in the master table)因为真正被索引的是从表上的列,选择主表的那个列作为索引并不重要,但是选定之后,查询条件中就必须指明这个列主表中的被索引列的内容并没有包含在索引中DETAIL_DATASTORE 属性定义例子:create table my_master 建立主表(article_id number primary key,author varchar2(30),title varchar2(50),body varchar2(1);create table my_detail 建立从表(article_id number, seq number, text varchar2(4000),constraint fr_
copyright@ 2008-2023 冰点文库 网站版权所有
经营许可证编号:鄂ICP备19020893号-2