lucene教程详解.docx-资源下载

lucene教程详解.docx

1、lucene教程详解Lucene使用代码实例之搜索文档1，Query类：这是一个抽象类，用于将用户输入的查询字符串封装成Lucene能够识别的Query，它具有TermQuery, BooleanQuery, PrefixQuery等多种实现。2，Term类：用于描述搜索的基本单位，其构造函数是Term(“fieldName”,”queryWord”)，其中第一个参数代表要在文档的哪一个Field上进行搜索，第二个参数代表要搜索的关键词。3，TermQuery类：TermQuery是抽象类Query的一个具体实现，也是Lucene支持的最为基本的一个查询类。TermQuery的构造函数是Ter

2、mQuery(new Term(“fieldName”,”queryWord”)，唯一的参数是一个Term对象。4，IndexSearcher类：用于在建立好的索引上进行搜索的句柄类，其打开索引方式被设置为只读，因此允许多个IndexSearcher实例操作同一个索引。5，Hits类：搜索结果类。代码：利用索引搜索文档package TestLucene;import java.io.File;import org.apache.lucene.document.Document;import org.apache.lucene.index.Term;import org.apache.luce

3、ne.search.Hits;import org.apache.lucene.search.IndexSearcher;import org.apache.lucene.search.TermQuery;import org.apache.lucene.store.FSDirectory;/* * This class is used to demonstrate the * process of searching on an existing * Lucene index * */public class TxtFileSearcher public static void main(S

4、tring args) throws Exception String queryStr = lucene; /This is the directory that hosts the Lucene index File indexDir = new File(D:luceneIndex); FSDirectory directory = FSDirectory.getDirectory(indexDir,false); IndexSearcher searcher = new IndexSearcher(directory); if(!indexDir.exists() System.out

5、.println(The Lucene index is not exist); return; Term term = new Term(contents,queryStr.toLowerCase(); TermQuery luceneQuery = new TermQuery(term); Hits hits = searcher.search(luceneQuery); for(int i = 0; i hits.length(); i+) Document document = hits.doc(i); System.out.println(File: + document.get(p

6、ath); 在代码中，类IndexSearcher的构造函数接受一个类型为Directory的对象，传入的FSDirctory对象代表索引存储在磁盘上的位置，IndexSearcher实例化后，其以只读方式打开了这个索引。然后程序构造了一个Term对象，指定要在文档内容中搜索包含关键词“lucene”的文档，程序利用这个Term对象构造出TermQuery对象，并把其传入到IndexSearcher的search方法中进行查询，返回的结果保存在Hits对象中。最后程序利用循环代码将搜索到的文档路径全部打印出来。本文来源于金色坐标 , 原文地址：构建各种Lucene Query (一)2009

7、-11-25 17:411 搜索流程中的第二步就是构建一个Query。下面就来介绍Query及其构建。 2 3 当用户输入一个关键字，搜索引擎接收到后，并不是立刻就将它放入后台开始进行关键字的检索，而应当首先对这个关键字进行一定的分析和处理，使之成为一种后台可以理解的形式，只有这样，才能提高检索的效率，同时检索出更加有效的结果。那么，在Lucene中，这种处理，其实就是构建一个Query对象。 4 5 就Query对象本身言，它只是Lucene的search包中的一个抽象类，这个抽象类有许多子类，代表了不同类型的检索。如常见的TermQuery就是将一个简单的关键字进行封装后的对象，类似的

8、还有BooleanQuery，即布尔型的查找。 6 7 IndexSearcher对象的search方法中总是需要一个Query对象（或是Query子类的对象），本节就来介绍各种Query类。 8 9 11.4.1 按词条搜索TermQuery 10 TermQuery是最简单、也是最常用的Query。TermQuery可以理解成为“词条搜索”，在搜索引擎中最基本的搜索就是在索引中搜索某一词条，而TermQuery就是用来完成这项工作的。 11 12 在Lucene中词条是最基本的搜索单位，从本质上来讲一个词条其实就是一个名/值对。只不过这个“名”是字段名，而“值”则表示字段中所包含的某个关键

9、字。 13 14 要使用TermQuery进行搜索首先需要构造一个Term对象，示例代码如下： 15 16 Term aTerm = new Term(contents, java)； 17 18 然后使用aTerm对象为参数来构造一个TermQuery对象，代码设置如下： 19 20 Query query = new TermQuery(aTerm)； 21 22 这样所有在“contents”字段中包含有“java”的文档都会在使用TermQuery进行查询时作为符合查询条件的结果返回。 23 24 下面就通过代码11.4来介绍TermQuery的具体实现过程。 25 26 代码11.4

10、 TermQueryTest.java 27 28 package ch11; 29 30 import org.apache.lucene.analysis.standard.StandardAnalyzer; 31 32 import org.apache.lucene.document.Document; 33 34 import org.apache.lucene.document.Field; 35 36 import org.apache.lucene.index.IndexWriter; 37 38 import org.apache.lucene.index.Term; 39

11、40 import org.apache.lucene.search.Hits; 41 42 import org.apache.lucene.search.IndexSearcher; 43 44 import org.apache.lucene.search.Query; 45 46 import org.apache.lucene.search.TermQuery; 47 48 49 50 public class TermQueryTest 51 52 53 54 public static void main(String args) throws Exception 55 56 5

12、7 58 /生成Document对象 59 60 Document doc1 = new Document(); 61 62 /添加“name”字段的内容 63 64 doc1.add(Field.Text(name, word1 word2 word3); 65 66 /添加“title”字段的内容 67 68 doc1.add(Field.Keyword(title, doc1); 69 70 /生成索引书写器 71 72 IndexWriter writer = new IndexWriter(c:index, new StandardAnalyzer(), true); 73 74 7

13、5 76 /将文档添加到索引中 77 78 writer.addDocument(doc1); 79 80 /关闭索引 81 82 writer.close(); 83 84 85 86 /生成查询对象query 87 88 Query query = null; 89 90 91 92 /生成hits结果对象，保存返回的检索结果 93 94 Hits hits = null; 95 96 97 98 /生成检索器 99 100 IndexSearcher searcher = new IndexSearcher(c:index); 101 102 103 104 / 构造一个TermQuer

14、y对象 105 106 query = new TermQuery(new Term(name,word1); 107 108 /开始检索，并返回检索结果到hits中 109 110 hits = searcher.search(query); 111 112 /输出检索结果中的相关信息 113 114 printResult(hits, word1); 115 116 117 118 / 再次构造一个TermQuery对象，只不过查询的字段变成了title 119 120 query = new TermQuery(new Term(title,doc1); 121 122 /开始第二次检索

15、，并返回检索结果到hits中 123 124 hits = searcher.search(query); 125 126 /输出检索结果中的相关信息 127 128 printResult(hits, doc1); 129 130 131 132 133 134 135 136 public static void printResult(Hits hits, String key) throws Exception 137 138 139 140 System.out.println(查找 + key + :); 141 142 if (hits != null) 143 144 145

16、146 if (hits.length() = 0) 147 148 149 150 System.out.println(没有找到任何结果); 151 152 153 154 else 155 156 157 158 System.out.println(找到 + hits.length() + 个结果); 159 160 for (int i = 0; i hits.length(); i+) 161 162 163 164 Document d = hits.doc(i); 165 166 String dname = d.get(title); 167 168 System.out.p

17、rint(dname + ); 169 170 171 172 System.out.println(); 173 174 System.out.println(); 175 176 177 178 179 180 181 182 183 184 在代码11.4中使用TermQuery进行检索的运行结果如图11-8所示。 185 186 注意：字段值是区分大小写的，因此在查询时必须注意大小写的匹配。 187 188 从图11-8中可以看出，代码11.4两次分别以“word1”和“doc1”为关键字进行检索，并且都只得到了一个检索结果。 189 190 在代码11.4中通过构建TermQuery

18、的对象，两次完成了对关键字的查找。两次查找过程中不同的是，第一次构建的TermQuery是查找“name”这个字段，而第二次构建的TermQuery则查找的是“title”这个字段。 191 192 11.4.2 “与或”搜索BooleanQuery 193 BooleanQuery 也是实际开发过程中经常使用的一种Query。它其实是一个组合的Query，在使用时可以把各种Query对象添加进去并标明它们之间的逻辑关系。在本节中所讨论的所有查询类型都可以使用BooleanQuery综合起来。BooleanQuery本身来讲是一个布尔子句的容器，它提供了专门的API方法往其中添加子句，并标

19、明它们之间的关系，以下代码为BooleanQuery提供的用于添加子句的API接口： 194 195 public void add(Query query, boolean required, boolean prohibited)； 196 197 注意：BooleanQuery是可以嵌套的，一个BooleanQuery可以成为另一个BooleanQuery的条件子句。 198 199 下面以11.5为例来介绍进行“与”操作的布尔型查询。 200 201 代码11.5 BooleanQueryTest1.java 202 203 package ch11; 204 205 import o

20、rg.apache.lucene.analysis.standard.StandardAnalyzer; 206 207 import org.apache.lucene.document.Document; 208 209 import org.apache.lucene.document.Field; 210 211 import org.apache.lucene.index.IndexWriter; 212 213 import org.apache.lucene.index.Term; 214 215 import org.apache.lucene.search.BooleanQu

21、ery; 216 217 import org.apache.lucene.search.Hits; 218 219 import org.apache.lucene.search.IndexSearcher; 220 221 import org.apache.lucene.search.Query; 222 223 import org.apache.lucene.search.TermQuery; 224 225 226 227 public class BooleanQueryTest1 228 229 230 231 public static void main (String a

22、rgs) throws Exception 232 233 /生成新的Document对象 234 235 Document doc1 = new Document(); 236 237 doc1.add(Field.Text(name, word1 word2 word3); 238 239 doc1.add(Field.Keyword(title, doc1); 240 241 242 243 Document doc2 = new Document(); 244 245 doc2.add(Field.Text(name, word1 word4 word5); 246 247 doc2.

23、add(Field.Keyword(title, doc2); 248 249 250 251 Document doc3 = new Document(); 252 253 doc3.add(Field.Text(name, word1 word2 word6); 254 255 doc3.add(Field.Keyword(title, doc3); 256 257 258 259 /生成索引书写器 260 261 IndexWriter writer = new IndexWriter(c:index, new StandardAnalyzer(), true); 262 263 /添加

24、到索引中 264 265 writer.addDocument(doc1); 266 267 writer.addDocument(doc2); 268 269 writer.addDocument(doc3); 270 271 writer.close(); 272 273 274 275 Query query1 = null; 276 277 Query query2 = null; 278 279 BooleanQuery query = null; 280 281 Hits hits = null; 282 283 284 285 /生成IndexSearcher对象 286 287

25、 IndexSearcher searcher = new IndexSearcher(c:index); 288 289 290 291 query1 = new TermQuery(new Term(name,word1); 292 293 query2 = new TermQuery(new Term(name,word2); 294 295 296 297 / 构造一个布尔查询 298 299 query = new BooleanQuery(); 300 301 302 303 / 添加两个子查询 304 305 query.add(query1, true, false); 306

26、 307 query.add(query2, true, false); 308 309 310 311 hits = searcher.search(query); 312 313 printResult(hits, word1和word2); 314 315 316 317 318 319 320 321 public static void printResult(Hits hits, String key) throws Exception 322 323 324 325 System.out.println(查找 + key + :); 326 327 if (hits != nul

27、l) 328 329 330 331 if (hits.length() = 0) 332 333 334 335 System.out.println(没有找到任何结果); 336 337 338 339 else 340 341 342 343 System.out.println(找到 + hits.length() + 个结果); 344 345 for (int i = 0; i hits.length(); i+) 346 347 348 349 Document d = hits.doc(i); 350 351 String dname = d.get(title); 352 3

28、53 System.out.print(dname + ); 354 355 356 357 System.out.println(); 358 359 System.out.println(); 360 361 362 363 364 365 366 367 368 369 代码11.5首先构造了两个TermQuery，然后构造了一个BooleanQuery的对象，并将两个TermQuery当成它的查询子句加入Boolean查询中。 370 371 再来看一下BooleanQuery的add方法，除了它的第一个参数外，它还有另外两个布尔型的参数。第1个参数的意思是当前所加入的查询子句是否必须满足，第2个参数的意思是当前所加入的查询子句是否不需要满足。这样，当这两个参数分别选择true和false时，会有4种不同的组合。 372 373 true false：表明当前加入的子句是必须要满足的。 374 37

邮箱/手机：
温馨提示：	快捷下载时，用户名和密码都是您填写的邮箱或者手机号，方便查询和重复下载（系统自动生成）。如填写123，账号就是123，密码也是123。
特别说明：	请自助下载，系统不会自动发送文件的哦；如果您已付费，想二次下载，请登录后访问：我的下载记录
支付方式：
验证码：	换一换

账号：
密码：
验证码：	换一换
当日自动登录忘记密码？