es 原理
what is es
a distrubuted document store;
basic concept
1. score
In general, scoring in Elasticsearch is a process to determine the relevance of retrieved documents based on user queries, term frequencies, and other important parameters
1. index
-
as a noun: index is the collections of document
-
as a verb the process of putting the document in the index
2. document
the collection of fields ;
3. shared
index are divided into mutiple shared;
primary shared; replicated share
1. inverted index;
in search engine, every document is a collection of keywords
forward index: 文档到词映射
document -> word doc1: java, hello doc2: hello,world
inverted index: word到文档的映射; word -> document hello: doc1, doc2; go: doc3, doc4; you: doc5
“hello you”: doc1, doc5
2. how to index;
2. mapping
1. what is ?
_mapping
- define field data type
- the attribute of the filed
|
|
2. dynamic mapping vs explict mapping
-
dynamic: automic generate
-
explict : you specify
3.text analysis
the process of breaking up a string to mutiple words
- tokenization
- normalization
4. dataType
-
commom data type
-
bool
-
number
-
keyword
-
data
-
text (analyzed)
1. text vs keyword
- text is analyzed; keyword isn’t analyzed
so text lose the origin meaning, they are not available for sorting or aggregatition
text can set Fielddata=true, but this can however use significant memory.
2. Bucket
query
compound query
1. term query vs match query
- term query will analyze
analysis
at indexed time and query time,they use the same analyzer (normalizer)
1.analyzer (full text search)
analyzer:
-
tokenizer: break the strings into secton of strings called token
-
token filter: modify the token
- Lowercase filter
2. normalizers(keyword)
don’t have tokenizer
search
1. FULL text search
match, match phase ….
The query string is processed using the same analyzer that was applied to the field during indexing.
2. Term query
term; exists
DSL (domain specific languagae)
1. query
-
full text
-
term query
-
compound:
- bool:
- must
- should
- not
- bool:
2. aggregation
1. bucket aggregation
1. Term aggregation
default return top 10 terms
|
|
-
bucket are dynamically built, one per unique value, by default,the
terms
aggregation will return the buckets for the top ten terms ordered by thedoc_count
2. metric aggregation
xpack
a extension provide security,alerting,monitoring,machine learning, and many other capabilities
index template
|
|
1. command
|
|
1. what is ?
how to configure an index when it’s created
2. the detail
- setting: set the index
- number_of_shards
- mapping: define the files