es 原理

tony2099 included in elasticsearch

2021-01-25 556 words 3 minutes

Contents

what is es

a distrubuted document store;

basic concept

1. score

In general, scoring in Elasticsearch is a process to determine the relevance of retrieved documents based on user queries, term frequencies, and other important parameters

1. index

as a noun: index is the collections of document
as a verb the process of putting the document in the index

2. document

the collection of fields ;

3. shared

index are divided into mutiple shared;

primary shared; replicated share

1. inverted index;

in search engine, every document is a collection of keywords

forward index: 文档到词映射

document -> word doc1: java, hello doc2: hello,world

inverted index: word到文档的映射； word -> document hello: doc1, doc2; go: doc3, doc4; you: doc5

“hello you”: doc1, doc5

2. how to index;

2. mapping

1. what is ?

_mapping

define field data type
the attribute of the filed

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20


PUT my-index-000001
{
  "mappings": {
    "properties": {
      "message": {
        "type": "keyword",
        "ignore_above": 20 
      },
      "ad_req_id": {
                 "type": "text",
                 "fields": {
                     "keyword": {
                         "type": "keyword",
                         "ignore_above": 256
                     }
                 }
             },
    }
  }
}

2. dynamic mapping vs explict mapping

dynamic: automic generate
explict : you specify

3.text analysis

the process of breaking up a string to mutiple words

tokenization
normalization

4. dataType

commom data type
bool
number
keyword
data
text (analyzed)

1. text vs keyword

text is analyzed; keyword isn’t analyzed

so text lose the origin meaning, they are not available for sorting or aggregatition

text can set Fielddata=true, but this can however use significant memory.

2. Bucket

query

compound query

1. term query vs match query

term query will analyze

analysis

at indexed time and query time,they use the same analyzer (normalizer)

1.analyzer (full text search)

analyzer:

tokenizer: break the strings into secton of strings called token
token filter: modify the token
1. Lowercase filter

2. normalizers(keyword)

normalizer

don’t have tokenizer

search

1. FULL text search

match, match phase ….

The query string is processed using the same analyzer that was applied to the field during indexing.

2. Term query

term; exists

DSL (domain specific languagae)

1. query

full text
term query
compound:
1. bool:
  1. must
  2. should
  3. not

2. aggregation

1. bucket aggregation

1. Term aggregation

default return top 10 terms

1
2
3
4
5
6
7
8


GET /_search
{
  "aggs": {
    "genres": {
      "terms": { "field": "genre" } 
    }
  }
}

bucket are dynamically built, one per unique value, by default,the terms aggregation will return the buckets for the top ten terms ordered by the doc_count

2. metric aggregation

xpack

a extension provide security,alerting,monitoring,machine learning, and many other capabilities

index template

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20


{
  "index_patterns": ["te*", "bar*"], # match index
  "settings": {
    "number_of_shards": 1
  },
  "mappings": {
    "_source": {
      "enabled": false
    },
    "properties": {
      "host_name": {
        "type": "keyword"
      },
      "created_at": {
        "type": "date",
        "format": "EEE MMM dd HH:mm:ss Z yyyy"
      }
    }
  }
}

1. command

1
2
3
4
5
6
7
8
9


# create new template
PUT _template/template_1

#  get templates
GET _template

# delete template 

DELETE  _template/template_1

1. what is ?

how to configure an index when it’s created

2. the detail

setting: set the index
1. number_of_shards
mapping: define the files