Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I have two major questions about the Lucene Demo. Does the Lucene demo use stopwords before any modification? What about stemming? If so, what stemmer does it use?

question from:https://stackoverflow.com/questions/65946551/stopwords-and-stemming-in-lucene-demo

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
734 views
Welcome To Ask or Share your Answers For Others

1 Answer

Which demo are you referring to?

If it's this one, then the answers are:

(a) Stop words: no, it does not. It uses the StandardAnalyzer() which does not use stop words when created with no arguments (but it can, if you choose to provide some).

(b) Stemming: no it does not use stemming - there are no stemming classes involved in the demo code, because there is no stemming used by the standard analyzer.

Take a look at the javadoc for the StandardAnalyzer. You will see the following:

Filters StandardTokenizer with LowerCaseFilter and StopFilter, using a configurable list of stop words.

So, this tells you how your input documents are analyzed:


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...