[筆記] ElasticSearch 基礎操作
ElasticSearch入門隨筆
層級單位對應
MySQL | Elastic |
---|---|
table | Index |
Row | Document |
column | Field |
Schema | Mapping |
SQL | DSL |
差異較大的部分為
因 Index 存儲單位是json. 若沒有定義mapping , 整個index可以沒有統一field, index中裡面每一筆Document都可以是不同格式的json
儲存與查找概念
-
存儲
elasticsearch 存儲document時, 會根據field類型判斷將內容進行分詞, 並用倒排索引方式將document做索引標記 -
查找
elasticseartch 搜索一段 文字時
也會將該文字拆為多組term
該terms查找索引後 document id 出現次數愈高
該document candiate score 分數愈高
會將相關結果由score高至低輸出
倒排索引
一種利用keyword作為索引的方式,
elasticsearch 中是利用詞彙作為Index去查找document id
為一對多,
e.g.
keyword | document_id |
---|---|
大毛 | 1001,1008,1023 |
小毛 | 1001,1010 |
e.g.
搜索 “大毛與小毛”
評分最高的是1001,. 其他 1008,1010,1023也會被查找出來
output: 1001 1008 1010 1023
Index
list all Index
GET _cat/indices?v
create
idempotence, 創建一個名為shoppinga的index
PUT shopping
return
{
"acknowledged" : true,
"shards_acknowledged" : true,
"index" : "shopping"
}
read
讀取index基本屬性
GET shopping
return
{
"shopping" : {
"aliases" : { },
"mappings" : { },
"settings" : {
"index" : {
"routing" : {
"allocation" : {
"include" : {
"_tier_preference" : "data_content"
}
}
},
"number_of_shards" : "1",
"provided_name" : "shopping",
"creation_date" : "1699322452796",
"number_of_replicas" : "1",
"uuid" : "X3DBdgmCQAS41BoQYW8d0w",
"version" : {
"created" : "7170699"
}
}
}
}
}
delete
delete shopping
return
{
"acknowledged" : true
}
Document
create
non-idempotent
每次創建都會隨機一串 id, 重複執行會重複創建, 該id可用來查詢用 類似SQL PrimaryKey
POST shopping/_doc
{
"title":"IPhone100",
"category":"Apple",
"price":"999"
}
return
{
"_index" : "shopping",
"_type" : "_doc",
"_id" : "UVaep4sBBjIGApZi34_r", ## <<<<隨機產生
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1
}
idempotent
產生的id 也可自已定義, 重複執行不影響, 僅version紀錄
PUT shopping/_doc/1000
{
"title":"IPhone100",
"category":"Apple",
"price":"999"
}
return
{
"_index" : "shopping",
"_type" : "_doc",
"_id" : "1000", ## << 固定
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 7,
"_primary_term" : 1
}
update
全局更新
其實等同create的 idempotent, 利用doc id 更新 整個 doc, 因此field 必須完整
PUT shopping/_doc/1000
{
"title":"IPhone100",
"category":"Apple",
"price":"1999"
}
return
{
"_index" : "shopping",
"_type" : "_doc",
"_id" : "1000",
"_version" : 2,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 8,
"_primary_term" : 1
}
局部更新
non-idempotent, 可修改原數值或是增加field, 沒有用put是因為只更新其中一個field無法保證該doc的整個狀態, 只能保證其中之一, 若下次執行時 其他feld已被改動 這次執行與上次執行後 對於整個doc的狀態是不相同的 為non-idempotent,
POST shopping/_update/1000
{
"doc":{
"title":"IPhone300"
}
}
return
{
"_index" : "shopping",
"_type" : "_doc",
"_id" : "1000",
"_version" : 4,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 10,
"_primary_term" : 1
}
POST shopping/_update/1000
{
"doc":{
"title22222":"IPhone300"
}
}
return
{
"_index" : "shopping",
"_type" : "_doc",
"_id" : "1000",
"_version" : 5,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 11,
"_primary_term" : 1
}
delete
DELETE shopping/_doc/1000
return
{
"_index" : "shopping",
"_type" : "_doc",
"_id" : "1000",
"_version" : 7,
"result" : "deleted",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 13,
"_primary_term" : 1
}
read
查詢特定doc id
GET shopping/_doc/1000
return
{
"_index" : "shopping",
"_type" : "_doc",
"_id" : "1000",
"_version" : 1,
"_seq_no" : 7,
"_primary_term" : 1,
"found" : true,
"_source" : {
"title" : "IPhone100",
"category" : "Apple",
"price" : "999"
}
}
查詢某index下所有document
GET shopping/_search
or
GET shopping/_search
{
"query": {
"match_all":{}
}
}
return
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3, ## 表示共三筆
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "shopping", ## index name
"_type" : "_doc",
"_id" : "UVaep4sBBjIGApZi34_r", ## doc id
"_score" : 1.0,
"_source" : { ## fieldName:value
"title" : "IPhone100",
"category" : "Apple",
"price" : "999"
}
},
{
"_index" : "shopping",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"title" : "IPhone100",
"category" : "Apple",
"price" : "999"
}
},
{
"_index" : "shopping",
"_type" : "_doc",
"_id" : "1000",
"_score" : 1.0,
"_source" : {
"title" : "IPhone100",
"category" : "Apple",
"price" : "999"
}
}
]
}
}
條件查詢
elasticsearch複雜的條件查詢推薦使用body
可能較常用的
關鍵字 | 功能 |
---|---|
terms | 不分詞 查詢詞綴需與資料庫index中完全相同 |
match | 會將查詢詞綴進行分詞 只要之中包含任一詞彙 就符合資格 |
match_phrase | 會將查詢詞綴進行分詞 匹配方式是依據array的index 相對位置做匹配, 強順序性 |
wildcard | 可使用通配符 查詢 |
match
GET shopping/_search
{
"query": {
"match": {
"title": "IPhone100 is a cell phone"
}
}
}
return
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 0.13353139,
"hits" : [
{
"_index" : "shopping",
"_type" : "_doc",
"_id" : "1000",
"_score" : 0.13353139,
"_source" : {
"title" : "IPhone100",
"category" : "Apple",
"price" : 999
}
},
{
"_index" : "shopping",
"_type" : "_doc",
"_id" : "aLG-qIsBnOvF5S7_MlFU",
"_score" : 0.13353139,
"_source" : {
"title" : "IPhone100",
"category" : "Apple",
"price" : 999
}
},
{
"_index" : "shopping",
"_type" : "_doc",
"_id" : "gLG-qIsBnOvF5S7_SlFE",
"_score" : 0.13353139,
"_source" : {
"title" : "IPhone100",
"category" : "Apple",
"price" : 2000
}
}
]
}
}
## "IPhone100 is a cell phone" 會被拆成 IPhone100,is,a,cell,phone 去查詢 因此可以查找到IPhone100
match_phrase
GET shopping/_search
{
"query": {
"term": {
"title": "IPhone100 is a cell phone"
}
}
}
return
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
## 會需完全匹配 IPhone100 is a cell phone, 因此查不到 'IPhone100'
多條件查詢
其實就是多一層bool, 類似 if 概念 , bool 下 { } 需同時滿足
- must == AND
GET shopping/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"category": "Apple"
}
},
{
"match": {
"title": "IPhone100"
}
}
]
}
}
}
## bool {must: [ 條件1,條件2 ....] } == if (條件1 and 條件2)
- should == or
GET shopping/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"category": "Apple"
}
},
{
"match": {
"title": "IPhone100"
}
}
]
}
}
}
## bool {must: [ 條件1,條件2 ....] } == if (條件1 or 條件2)
- filer range == >= , <= , > , <
GET shopping/_search
{
"query": {
"bool": {
"filter": {
"range": {
"price": {
"gt": 1000
}
}
}
}
}
}
## bool {filter: range: { field1 : { gt : <value> } } } == if field > value
## gt: > , lt, < , gte >= , lte <=
範例 category=Apple and title = iphone100 and price > 1000
GET shopping/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"category": "Apple"
}
},
{
"match": {
"title": "IPhone100"
}
}
],
"filter": {
"range": {
"price": {
"gt": 1000
}
}
}
}
}
}
### == if (條件1 and 條件2) and (price > 1000)
指定特定資源
類似SQL , select A,B,C … 不返回所有資源
GET shopping/_search
{
"_source": ["price"],
"query": {
"match_all":{}
}
}
return
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "shopping",
"_type" : "_doc",
"_id" : "1000",
"_score" : 1.0,
"_source" : {
"price" : 999
}
},
{
"_index" : "shopping",
"_type" : "_doc",
"_id" : "aLG-qIsBnOvF5S7_MlFU",
"_score" : 1.0,
"_source" : {
"price" : 999
}
},
{
"_index" : "shopping",
"_type" : "_doc",
"_id" : "gLG-qIsBnOvF5S7_SlFE",
"_score" : 1.0,
"_source" : {
"price" : 2000
}
}
]
}
}
分頁
用 查詢某index下所有document 舉例
GET shopping/_search
{
"query": {
"match_all": {}
},
"from": 0,
"size": 1,
"_source":["title","price"],
"sort":{
"price": "asc"
}
}
#"from": 0, # 起始位置 假設查詢結果有100筆 從第0筆開始,
#"size": 1, # 顯示多少筆資料
#"_source":["title","price"] # 想顯示的field = select A,B..l.
#"sort":{"price": "asc" } # 排序 asc or desc
from 頁數公式: 參考公式: (page-1)size
e.g. 第二頁 = (2-1)1
return
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "shopping",
"_type" : "_doc",
"_id" : "UVaep4sBBjIGApZi34_r",
"_score" : 1.0,
"_source" : {
"price" : "999",
"title" : "IPhone100"
}
}
]
}
}
Highligt
將查詢結果特定field做highlight, 這邊設為藍色
GET shopping/_search
{
"query": {
"match": {
"category": "Apple"
}
},
"highlight": {
"fields": {
"category": {}
},
"pre_tags": ["<span style='color: blue;'>"],
"post_tags": ["</span>"]
}
}
return
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 0.13353139,
"hits" : [
{
"_index" : "shopping",
"_type" : "_doc",
"_id" : "1000",
"_score" : 0.13353139,
"_source" : {
"title" : "IPhone100",
"category" : "Apple",
"price" : 999
},
"highlight" : {
"category" : [
"<span style='color: blue;'>Apple</span>"
]
}
},
{
"_index" : "shopping",
"_type" : "_doc",
"_id" : "aLG-qIsBnOvF5S7_MlFU",
"_score" : 0.13353139,
"_source" : {
"title" : "IPhone100",
"category" : "Apple",
"price" : 999
},
"highlight" : {
"category" : [
"<span style='color: blue;'>Apple</span>"
]
}
},
{
"_index" : "shopping",
"_type" : "_doc",
"_id" : "gLG-qIsBnOvF5S7_SlFE",
"_score" : 0.13353139,
"_source" : {
"title" : "IPhone100",
"category" : "Apple",
"price" : 2000
},
"highlight" : {
"category" : [
"<span style='color: blue;'>Apple</span>"
]
}
}
]
}
}
聚合函數
可能較常用的
關鍵字 | 功能 |
---|---|
terms | 分組 類似SQL groupby |
avg | 計算平均 |
sum | 計算總和 |
min | 統計最小值 |
max | 統計最大值 |
histogram | 數值區間統計 |
stats | 統計數值訊息,max,min,avg,sum,數值總數量 |
top_hits | index前N doc中最佳匹配 |
GET shopping/_search
{
"aggs": {
"task_name": {
"terms": {
"field": "price"
}
}
},
"size":0
}
## task_name 可隨便自訂
## terms 表示功能
## field 表示 函數作用範圍
## size:0 表示不顯示查詢結果 表示只顯示統計果
return
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"task_name" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 999,
"doc_count" : 2
},
{
"key" : 2000,
"doc_count" : 1
}
]
}
}
}
## buckets 回傳類似SQL group_by的分組 分組key, 與該組數量
Mapping
類似MySQL table schema, 只能創建或新增 不能修改
create
PUT user
PUT user/_mapping
{
"properties": {
"name": {
"type": "text",
"index": true,
"analyzer": "ik_max_word",
"search_analyzer": "ik_max_word"
},
"sex": {
"type":"keyword",
"index": true
},
"tel": {
"type": "keyword",
"index": false
}
}
}
## type: text 建立時 會被分詞
## type: keyword 建立時 不會被分詞
## index: false, 不會建立索引, 無法被作為條件query
## analyzer: 儲存時 使用的分析器
## search_analyzer: 查詢時 使用的分析器
測試範例
text
PUT user/_doc/1000
{
"name": "大毛",
"sex": "男性",
"tel": "3345678"
}
GET user/_search
{
"query": {
"match": {
"name": "毛"
}
}
}
return
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.2876821,
"hits" : [
{
"_index" : "user",
"_type" : "_doc",
"_id" : "1000",
"_score" : 0.2876821,
"_source" : {
"name" : "大毛",
"sex" : "男性",
"tel" : "3345678"
}
}
]
}
}
# 用 "毛"查找 "大毛" 可以被找到 , 表示大毛 儲存時有被分詞
keyword
GET user/_doc/1000
{
"query": {
"match": {
"sex": "男"
}
}
}
return
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
## 用"男"無法查找"男性" 表示 儲存時 沒有被拆分
index: false
GET user/_doc/1000
{
"query": {
"match": {
"tel": "3345678"
}
}
}
{
"error" : {
"root_cause" : [
{
"type" : "query_shard_exception",
"reason" : "failed to create query: Cannot search on field [tel] since it is not indexed.",
"index_uuid" : "CPvT5dd4RpekoekLL8Ne2w",
"index" : "user"
}
],
"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query",
"grouped" : true,
"failed_shards" : [
{
"shard" : 0,
"index" : "user",
"node" : "HP2s1sefTHSahOSTYc0EjA",
"reason" : {
"type" : "query_shard_exception",
"reason" : "failed to create query: Cannot search on field [tel] since it is not indexed.",
"index_uuid" : "CPvT5dd4RpekoekLL8Ne2w",
"index" : "user",
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "Cannot search on field [tel] since it is not indexed."
}
}
}
]
},
"status" : 400
}
# tel 儲存時 沒有被建立index, 因此不支援被query
score機制
評分方式基於 TF-IDF公式,
簡單說 會統計該term在這個doc裡出現次數, 愈多相關性愈高, 但若在所有文檔中出現愈多次反而愈不重要
一般查詢
GET shopping/_search?
{
"query": {
"match": {
"title": "IPhone100 is a cell phone"
}
}
}
return
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 0.13353139,
"hits" : [
{
"_index" : "shopping",
"_type" : "_doc",
"_id" : "1000",
"_score" : 0.13353139, ##<<<< 查詢score
"_source" : {
"title" : "IPhone100",
"category" : "Apple",
"price" : 999
}
},
{
"_index" : "shopping",
"_type" : "_doc",
"_id" : "aLG-qIsBnOvF5S7_MlFU",
"_score" : 0.13353139, ##<<<<
"_source" : {
"title" : "IPhone100",
"category" : "Apple",
"price" : 999
}
},
{
"_index" : "shopping",
"_type" : "_doc",
"_id" : "gLG-qIsBnOvF5S7_SlFE",
"_score" : 0.13353139, ##<<<<
"_source" : {
"title" : "IPhone100",
"category" : "Apple",
"price" : 2000
}
}
]
}
}
使用explain=true 可以知道查詢的評分權重
GET shopping/_search?explain=true
{
"query": {
"match": {
"title": "IPhone100 is a cell phone"
}
}
}
return
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 0.13353139,
"hits" : [
{
"_shard" : "[shopping][0]",
"_node" : "HP2s1sefTHSahOSTYc0EjA",
"_index" : "shopping",
"_type" : "_doc",
"_id" : "1000",
"_score" : 0.13353139,
"_source" : {
"title" : "IPhone100",
"category" : "Apple",
"price" : 999
},
"_explanation" : {
"value" : 0.13353139,
"description" : "sum of:",
"details" : [
{
"value" : 0.13353139,
"description" : "weight(title:iphone100 in 0) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 0.13353139,
"description" : "score(freq=1.0), computed as boost * idf * tf from:", #### 計算方式
"details" : [
{
"value" : 2.2,
"description" : "boost", ## boost
"details" : [ ]
},
{
"value" : 0.13353139,
"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:", ######################## idf
"details" : [
{
"value" : 3,
"description" : "n, number of documents containing term",
"details" : [ ]
},
{
"value" : 3,
"description" : "N, total number of documents with field",
"details" : [ ]
}
]
},
{
"value" : 0.45454544,
"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:", ############### tf
"details" : [
{
"value" : 1.0,
"description" : "freq, occurrences of term within document",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "k1, term saturation parameter",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "b, length normalization parameter",
"details" : [ ]
},
{
"value" : 1.0,
"description" : "dl, length of field",
"details" : [ ]
},
{
"value" : 1.0,
"description" : "avgdl, average length of field",
"details" : [ ]
}
]
}
]
}
]
}
]
}
### ... 只保留一組