Lucene4 TokenStream「建议收藏」

Lucene4 TokenStream「建议收藏」packageorg.apache.lucene.analysis;/**LicensedtotheApacheSoftwareFoundation(ASF)underoneormore*contributorlicenseagreements.SeetheNOTICEfiledistributedwith*thisworkfor

大家好,又见面了,我是你们的朋友全栈君。

package org.apache.lucene.analysis;

/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

import java.io.IOException;
import java.io.Closeable;
import java.lang.reflect.Modifier;

import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.util.Attribute;
import org.apache.lucene.util.AttributeImpl;
import org.apache.lucene.util.AttributeSource;

/**
 * A <code>TokenStream</code> enumerates the sequence of tokens, either from
 * {@link Field}s of a {@link Document} or from query text.
 * <p>
 * This is an abstract class; concrete subclasses are:
 * <ul>
 * <li>{@link Tokenizer}, a <code>TokenStream</code> whose input is a Reader; and
 * <li>{@link TokenFilter}, a <code>TokenStream</code> whose input is another
 * <code>TokenStream</code>.
 * </ul>
 * A new <code>TokenStream</code> API has been introduced with Lucene 2.9. This API
 * has moved from being {@link Token}-based to {@link Attribute}-based. While
 * {@link Token} still exists in 2.9 as a convenience class, the preferred way
 * to store the information of a {@link Token} is to use {@link AttributeImpl}s.
 * <p>
 * <code>TokenStream</code> now extends {@link AttributeSource}, which provides
 * access to all of the token {@link Attribute}s for the <code>TokenStream</code>.
 * Note that only one instance per {@link AttributeImpl} is created and reused
 * for every token. This approach reduces object creation and allows local
 * caching of references to the {@link AttributeImpl}s. See
 * {@link #incrementToken()} for further details.
 * <p>
 * <b>The workflow of the new <code>TokenStream</code> API is as follows:</b>
 * <ol>
 * <li>Instantiation of <code>TokenStream</code>/{@link TokenFilter}s which add/get
 * attributes to/from the {@link AttributeSource}.
 * <li>The consumer calls {@link TokenStream#reset()}.
 * <li>The consumer retrieves attributes from the stream and stores local
 * references to all attributes it wants to access.
 * <li>The consumer calls {@link #incrementToken()} until it returns false
 * consuming the attributes after each call.
 * <li>The consumer calls {@link #end()} so that any end-of-stream operations
 * can be performed.
 * <li>The consumer calls {@link #close()} to release any resource when finished
 * using the <code>TokenStream</code>.
 * </ol>
 * To make sure that filters and consumers know which attributes are available,
 * the attributes must be added during instantiation. Filters and consumers are
 * not required to check for availability of attributes in
 * {@link #incrementToken()}.
 * <p>
 * You can find some example code for the new API in the analysis package level
 * Javadoc.
 * <p>
 * Sometimes it is desirable to capture a current state of a <code>TokenStream</code>,
 * e.g., for buffering purposes (see {@link CachingTokenFilter},
 * TeeSinkTokenFilter). For this usecase
 * {@link AttributeSource#captureState} and {@link AttributeSource#restoreState}
 * can be used.
 * <p>The {@code TokenStream}-API in Lucene is based on the decorator pattern.
 * Therefore all non-abstract subclasses must be final or have at least a final
 * implementation of {@link #incrementToken}! This is checked when Java
 * assertions are enabled.
 */
public abstract class TokenStream extends AttributeSource implements Closeable {

  /**
   * A TokenStream using the default attribute factory.
   */
  protected TokenStream() {
    super();
    assert assertFinal();
  }
  
  /**
   * A TokenStream that uses the same attributes as the supplied one.
   */
  protected TokenStream(AttributeSource input) {
    super(input);
    assert assertFinal();
  }
  
  /**
   * A TokenStream using the supplied AttributeFactory for creating new {@link Attribute} instances.
   */
  protected TokenStream(AttributeFactory factory) {
    super(factory);
    assert assertFinal();
  }
  
  private boolean assertFinal() {
    try {
      final Class<?> clazz = getClass();
      if (!clazz.desiredAssertionStatus())
        return true;
      assert clazz.isAnonymousClass() ||
        (clazz.getModifiers() & (Modifier.FINAL | Modifier.PRIVATE)) != 0 ||
        Modifier.isFinal(clazz.getMethod("incrementToken").getModifiers()) :
        "TokenStream implementation classes or at least their incrementToken() implementation must be final";
      return true;
    } catch (NoSuchMethodException nsme) {
      return false;
    }
  }
  
  /**
   * Consumers (i.e., {@link IndexWriter}) use this method to advance the stream to
   * the next token. Implementing classes must implement this method and update
   * the appropriate {@link AttributeImpl}s with the attributes of the next
   * token.
   * <P>
   * The producer must make no assumptions about the attributes after the method
   * has been returned: the caller may arbitrarily change it. If the producer
   * needs to preserve the state for subsequent calls, it can use
   * {@link #captureState} to create a copy of the current attribute state.
   * <p>
   * This method is called for every token of a document, so an efficient
   * implementation is crucial for good performance. To avoid calls to
   * {@link #addAttribute(Class)} and {@link #getAttribute(Class)},
   * references to all {@link AttributeImpl}s that this stream uses should be
   * retrieved during instantiation.
   * <p>
   * To ensure that filters and consumers know which attributes are available,
   * the attributes must be added during instantiation. Filters and consumers
   * are not required to check for availability of attributes in
   * {@link #incrementToken()}.
   * 
   * @return false for end of stream; true otherwise
   */
  public abstract boolean incrementToken() throws IOException;
  
  /**
   * This method is called by the consumer after the last token has been
   * consumed, after {@link #incrementToken()} returned <code>false</code>
   * (using the new <code>TokenStream</code> API). Streams implementing the old API
   * should upgrade to use this feature.
   * <p/>
   * This method can be used to perform any end-of-stream operations, such as
   * setting the final offset of a stream. The final offset of a stream might
   * differ from the offset of the last token eg in case one or more whitespaces
   * followed after the last token, but a WhitespaceTokenizer was used.
   * 
   * @throws IOException If an I/O error occurs
   */
  public void end() throws IOException {
    // do nothing by default
  }

  /**
   * This method is called by a consumer before it begins consumption using
   * {@link #incrementToken()}.
   * <p/>
   * Resets this stream to a clean state. Stateful implementations must implement
   * this method so that they can be reused, just as if they had been created fresh.
   */
  public void reset() throws IOException {}
  
  /** Releases resources associated with this stream. */
  public void close() throws IOException {}
  
}

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 举报,一经查实,本站将立刻删除。

发布者:全栈程序员-用户IM,转载请注明出处:https://javaforall.cn/163078.html原文链接:https://javaforall.cn

【正版授权,激活自己账号】: Jetbrains全家桶Ide使用,1年售后保障,每天仅需1毛

【官方授权 正版激活】: 官方授权 正版激活 支持Jetbrains家族下所有IDE 使用个人JB账号...

(0)


相关推荐

  • matlab如何导入txt数据画图形_matlab画复杂函数图像

    matlab如何导入txt数据画图形_matlab画复杂函数图像MATLAB读取txt文件数据绘制图像现有data.txt文件存储由数据采集卡读取到的6000000个数据。下面记录最基础的用MATLAB读取txt文件数据并绘制图像的代码。%关闭所有的Figure窗口closeall;%清除工作空间的所有变量,函数,和MEX文件clearall;%加载数据文件,并命名为AA=load(‘data.txt’);%矩阵A的规模,[行,列][m,n]=size(A);%绘制txt文件第一列的数据figure(1);plo

  • Python数据类型有哪些?

    Python数据类型有哪些?数据类型是每种编程语言必备属性,只有给数据赋予明确的数据类型,计算机才能对数据进行处理运算,因此,正确使用数据类型是十分必要的,不同的语言,数据类型类似,但具体表示方法有所不同,以下是Python编程常用的数据类型:1.数字类型Python数字类型主要包括int(整型)、long(长整型)和float(浮点型),但是在Python3中就不再有long类型了。int(整型)在32位…

  • hive优化大全-一篇就够了[通俗易懂]

    hive优化大全-一篇就够了[通俗易懂]1.概述  在工作中总结Hive的常用优化手段和在工作中使用Hive出现的问题。下面开始本篇文章的优化介绍。2.介绍 首先,我们来看看Hadoop的计算框架特性,在此特性下会衍生哪些问题?数据量大不是问题,数据倾斜是个问题。jobs数比较多的作业运行效率相对比较低,比如即使有几百行的表,如果多次关联多次汇总,产生十几个jobs,耗时很长。原因是mapreduce作业初始化的时间是…

  • hibernate实现多租户[通俗易懂]

    hibernate实现多租户[通俗易懂]hibernate实现多租户

  • Linux移植之移植步骤

    Linux移植之移植步骤在这里总结一下我在移植Linux2.6.22.6内核过程时的步骤。移植成功后最终能挂接做好的根文件系统,并且启动第一个init程序。移植的步骤如下:1、将网上下载的内核源码文件linux-2.6.2

  • 两个元素的矩阵乘除法「建议收藏」

    矩阵的乘除法: 1 矩阵相乘,两个矩阵只有当左边的矩阵的列数等于右边矩阵的行数时,两个矩阵才可以进行矩阵的乘法运算 主要方法就是:用左边矩阵的第一行,逐个乘以右边矩阵的列,第一行与第一列各个元素的乘积相加,第一行与第二列的各个元素的乘积相加。。。。第二行也是,逐个乘以右边矩阵的列。。。。第三行。。。。。。。最后得出结果不明白的可以继续往下看   2…

发表回复

您的电子邮箱地址不会被公开。

关注全栈程序员社区公众号