【Lucene4.8教程之四】分析

全栈程序员-用户IM • 2022年2月4日下午3:00 • 未分类

大家好，又见面了，我是全栈君。

1、基础内容

（1）相关概念

分析(Analysis)，在Lucene中指的是将域(Field)文本转换成最主要的索引表示单元–项(Term)的过程。在搜索过程中，这些项用于决定什么样的文档可以匹配查词条件。

分析器对分析操作进行了封装，它通过运行若干操作，将文本转化成语汇单元，这个处理过程也称为语汇单元化过程(tokenization)。而从文本洲中提取的文本块称为语汇单元(token)。词汇单元与它的域名结合后，就形成了项。

（2）何时使用分析器

建立索引期间

		Directory returnIndexDir = FSDirectory.open(indexDir);

		IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_48,
				new StandardAnalyzer(Version.LUCENE_48));

		IndexWriter writer = new IndexWriter(returnIndexDir, iwc);

使用QueryParser对象进行搜索时

QueryParser parser = new QueryParser(Version.LUCENE_48, "contents",
				new SimpleAnalyzer(Version.LUCENE_48));

在搜索中高亮显示结果时

（3）经常使用的4个分析器：

WhitespaceAnalyzer, as the name implies, simply splits text into tokens on whitespace characters and makes no other effort to normalize the tokens.
SimpleAnalyzer first splits tokens at non-letter characters, then lowercases each token. Be careful! This analyzer quietly discards numeric characters.
StopAnalyzer is the same as SimpleAnalyzer, except it removes common words (called stop words, described more in section XXX). By default it removes common words in the English language (the, a, etc.), though you can pass in your own set.
StandardAnalyzer is Lucene’s most sophisticated core analyzer. It has quite a bit of logic to identify certain kinds of tokens, such as company names,

四、其他内容

在创建IndexWriter时，须要指定分析器，如：

<span>		</span>IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_48,
<span>				</span>new StandardAnalyzer(Version.LUCENE_48));

<span>		</span>writer = new IndexWriter(returnIndexDir, iwc);

便在每次向writer中加入文档时。能够针对该文档指定一个分析器，如

writer.addDocument(doc, new SimpleAnalyzer(Version.LUCENE_48));

发布者：全栈程序员-用户IM，转载请注明出处：https://javaforall.cn/115433.html原文链接：https://javaforall.cn

【正版授权，激活自己账号】： Jetbrains全家桶Ide使用，1年售后保障，每天仅需1毛

【官方授权正版激活】： 官方授权正版激活支持Jetbrains家族下所有IDE 使用个人JB账号...

【Lucene4.8教程之四】分析

相关推荐

Java截取字符串方法_java通过split截取字符串

Qt Mac 在软件 icns图标制作

关于group by的基础用法和原理

vue 加载页面时触发时间_Vue 刷新页面时会触发事件吗「建议收藏」

webapi安全验证_手机测试路由器丢包率

一文搞懂基因融合（gene fusion）的定义、产生机制及鉴定方法[通俗易懂]

发表回复