当前位置: 首页 > news >正文

ElasticSearch分词器、相关性详解与聚合查询实战

目录

1. ES分词器详解

1.1 基本概念

1.2 分词发生时期

1.3 分词器的组成

切词器:Tokenizer

词项过滤器:Token Filter

字符过滤器:Character Filter

1.4 倒排索引的数据结构

2. 相关性详解

2.1 什么是相关性(Relevance)

2.2 相关性算法

TF-IDF

BM25

2.3 通过Explain API查看TF-IDF

2.4 Boosting Query(常用)

利用must not排除不是苹果公司产品的文档

利用negative_boost降低相关性

3. 单字符串多字段查询

3.1 最佳字段查询Dis Max Query

使用最佳字段查询dis max query

可以通过tie_breaker参数调整

3.2 Multi Match Query(常用)

最佳字段(Best Fields)搜索

使用多数字段(Most Fields)搜索

3.3 跨字段(Cross Field)搜索

4. ElasticSearch聚合操作

使用场景

基本语法

4.1 聚合的分类

4.2 Metric Aggregation

4.3 Bucket Aggregation

获取job的分类信息

限定聚合范围

Range & Histogram聚合

4.4 Pipeline Aggregation

min_bucket示例

Stats示例

percentiles示例

Cumulative_sum示例

4.5 聚合的作用范围

4.6 排序

4.7 ES聚合分析不精准原因分析

4.8 Elasticsearch 聚合性能优化

启用 eager global ordinals 提升高基数聚合性能

插入数据时对索引进行预排序

使用节点查询缓存

使用分片请求缓存

拆分聚合,使聚合并行化


1. ES分词器详解

1.1 基本概念

分词器官方称之为文本分析器,顾名思义,是对文本进行分析处理的一种手段,基本处理逻辑为按照预先制定的分词规则,把原始文档分割成若干更小粒度的词项,粒度大小取决于分词器规则。

1.2 分词发生时期

分词器的处理过程发生在 Index Time 和 Search Time 两个时期。

  • Index Time:文档写入并创建倒排索引时期,其分词逻辑取决于映射参数analyzer。
  • Search Time:搜索发生时期,其分词仅对搜索词产生作用。

1.3 分词器的组成

  • 切词器(Tokenizer):用于定义切词(分词)逻辑
  • 词项过滤器(Token Filter):用于对分词之后的单个词项的处理逻辑
  • 字符过滤器(Character Filter):用于处理单个字符

注意

  • 分词器不会对源数据造成任何影响,分词仅仅是对倒排索引或者搜索词的行为。

切词器:Tokenizer

tokenizer 是分词器的核心组成部分之一,其主要作用是分词,或称之为切词。主要用来对原始文本进行细粒度拆分。拆分之后的每一个部分称之为一个 Term,或称之为一个词项。可以把切词器理解为预定义的切词规则。官方内置了很多种切词器,默认的切词器为 standard。

词项过滤器:Token Filter

词项过滤器用来处理切词完成之后的词项,例如把大小写转换,删除停用词或同义词处理等。官方同样预置了很多词项过滤器,基本可以满足日常开发的需要。当然也是支持第三方也自行开发的。

GET _analyze
{"filter" : ["lowercase"],"text" : "WWW ELASTIC ORG CN"
}GET _analyze
{"tokenizer" : "standard","filter" : ["uppercase"],"text" : ["www.elastic.org.cn","www elastic org cn"]
}

运行结果

{"tokens" : [{"token" : "www elastic org cn","start_offset" : 0,"end_offset" : 18,"type" : "word","position" : 0}]
}{"tokens" : [{"token" : "WWW.ELASTIC.ORG.CN","start_offset" : 0,"end_offset" : 18,"type" : "<ALPHANUM>","position" : 0},{"token" : "WWW","start_offset" : 19,"end_offset" : 22,"type" : "<ALPHANUM>","position" : 101},{"token" : "ELASTIC","start_offset" : 23,"end_offset" : 30,"type" : "<ALPHANUM>","position" : 102},{"token" : "ORG","start_offset" : 31,"end_offset" : 34,"type" : "<ALPHANUM>","position" : 103},{"token" : "CN","start_offset" : 35,"end_offset" : 37,"type" : "<ALPHANUM>","position" : 104}]
}

停用词

在切词完成之后,会被干掉词项,即停用词。停用词可以自定义

英文停用词(english):a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, with。

中日韩停用词(cjk):a, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, s, such, t, that, the, their, then, there, these, they, this, to, was, will, with, www。

GET _analyze
{"tokenizer": "standard", "filter": ["stop"],"text": ["What are you doing"]
}### 自定义 filter
DELETE test_token_filter_stop
PUT test_token_filter_stop
{"settings": {"analysis": {"filter": {"my_filter": {"type": "stop","stopwords": ["www"],"ignore_case": true}}}}
}
GET test_token_filter_stop/_analyze
{"tokenizer": "standard", "filter": ["my_filter"], "text": ["What www WWW are you doing"]
}

运行结果

第一个GET
{"tokens" : [{"token" : "What","start_offset" : 0,"end_offset" : 4,"type" : "<ALPHANUM>","position" : 0},{"token" : "you","start_offset" : 9,"end_offset" : 12,"type" : "<ALPHANUM>","position" : 2},{"token" : "doing","start_offset" : 13,"end_offset" : 18,"type" : "<ALPHANUM>","position" : 3}]
}第二个GET
{"tokens" : [{"token" : "What","start_offset" : 0,"end_offset" : 4,"type" : "<ALPHANUM>","position" : 0},{"token" : "are","start_offset" : 13,"end_offset" : 16,"type" : "<ALPHANUM>","position" : 3},{"token" : "you","start_offset" : 17,"end_offset" : 20,"type" : "<ALPHANUM>","position" : 4},{"token" : "doing","start_offset" : 21,"end_offset" : 26,"type" : "<ALPHANUM>","position" : 5}]
}

同义词

同义词定义规则

  • a, b, c => d:这种方式,a、b、c 会被 d 代替。
  • a, b, c, d:这种方式下,a、b、c、d 是等价的。
PUT test_token_filter_synonym
{"settings": {"analysis": {"filter": {"my_synonym": {"type": "synonym","synonyms": [ "good, nice => excellent" ] //good, nice, excellent}}}}
}
GET test_token_filter_synonym/_analyze
{"tokenizer": "standard", "filter": ["my_synonym"], "text": ["good"]
}

运行结果

{"tokens" : [{"token" : "excellent","start_offset" : 0,"end_offset" : 4,"type" : "SYNONYM","position" : 0}]
}

字符过滤器:Character Filter

分词之前的预处理,过滤无用字符。

PUT <index_name>
{"settings": {"analysis": {"char_filter": {"my_char_filter": {"type": "<char_filter_type>"}}}}
}

type:使用的字符过滤器类型名称,可配置以下值:

  • html_strip
  • mapping
  • pattern_replace

HTML 标签过滤器:HTML Strip Character Filter

字符过滤器会去除 HTML 标签和转义 HTML 元素,如 、&

PUT test_html_strip_filter
{"settings": {"analysis": {"char_filter": {"my_char_filter": {"type": "html_strip",  // html_strip 代表使用 HTML 标签过滤器"escaped_tags": [     // 当前仅保留 a 标签        "a"]}}}}
}
GET test_html_strip_filter/_analyze
{"tokenizer": "standard", "char_filter": ["my_char_filter"],"text": ["<p>I&apos;m so <a>happy</a>!</p>"]
}

运行结果

{"tokens" : [{"token" : "I'm","start_offset" : 3,"end_offset" : 11,"type" : "<ALPHANUM>","position" : 0},{"token" : "so","start_offset" : 12,"end_offset" : 14,"type" : "<ALPHANUM>","position" : 1},{"token" : "a","start_offset" : 16,"end_offset" : 17,"type" : "<ALPHANUM>","position" : 2},{"token" : "happy","start_offset" : 18,"end_offset" : 23,"type" : "<ALPHANUM>","position" : 3},{"token" : "a","start_offset" : 25,"end_offset" : 26,"type" : "<ALPHANUM>","position" : 4}]
}

参数:escaped_tags:需要保留的 html 标签

字符映射过滤器:Mapping Character Filter

通过定义映射替换为规则,把特定字符替换为指定字符

PUT test_html_strip_filter
{"settings": {"analysis": {"char_filter": {"my_char_filter": {"type": "mapping",    // mapping 代表使用字符映射过滤器"mappings": [         // 数组中规定的字符会被等价替换为 => 指定的字符"滚 => *","垃 => *","圾 => *"]}}}}
}
GET test_html_strip_filter/_analyze
{//"tokenizer": "standard", "char_filter": ["my_char_filter"],"text": "你就是个垃圾!滚"
}

运行结果

{"tokens" : [{"token" : "你就是个**!*","start_offset" : 0,"end_offset" : 8,"type" : "word","position" : 0}]
}

正则替换过滤器:Pattern Replace Character Filter

PUT text_pattern_replace_filter
{"settings": {"analysis": {"char_filter": {"my_char_filter": {"type": "pattern_replace",    // pattern_replace 代表使用正则替换过滤器            "pattern": """(\d{3})\d{4}(\d{4})""",    // 正则表达式"replacement": "$1****$2"}}}}
}
GET text_pattern_replace_filter/_analyze
{"char_filter": ["my_char_filter"],"text": "您的手机号是18868686688"
}

运行结果

{"tokens" : [{"token" : "您的手机号是188****6688","start_offset" : 0,"end_offset" : 17,"type" : "word","position" : 0}]
}

1.4 倒排索引的数据结构

当数据写入 ES 时,数据将会通过 分词 被切分为不同的 term,ES 将 term 与其对应的文档列表建立一种映射关系,这种结构就是 倒排索引。

如下图所示:

为了进一步提升索引的效率,ES 在 term 的基础上利用 term 的前缀或者后缀构建了 term index, 用于对 term 本身进行索引,ES 实际的索引结构如下图所示:

这样当我们去搜索某个关键词时,ES 首先根据它的前缀或者后缀迅速缩小关键词的在 term dictionary 中的范围,大大减少了磁盘IO的次数。

  • 单词词典(Term Dictionary) :记录所有文档的单词,记录单词到倒排列表的关联关系
    • 常用字典数据结构:lucene字典实现原理 - zhanlijun - 博客园
  • 倒排列表(Posting List)-记录了单词对应的文档结合,由倒排索引项组成
  • 倒排索引项(Posting):
    • 文档ID
    • 词频TF–该单词在文档中出现的次数,用于相关性评分
    • 位置(Position)-单词在文档中分词的位置。用于短语搜索(match phrase query)
    • 偏移(Offset)-记录单词的开始结束位置,实现高亮显示

Elasticsearch 的JSON文档中的每个字段,都有自己的倒排索引。

可以指定对某些字段不做索引:

  • 优点︰节省存储空间
  • 缺点: 字段无法被搜索

2. 相关性详解

搜索是用户和搜索引擎的对话,用户关心的是搜索结果的相关性

  • 是否可以找到所有相关的内容
  • 有多少不相关的内容被返回了
  • 文档的打分是否合理
  • 结合业务需求,平衡结果排名

2.1 什么是相关性(Relevance)

搜索的相关性算分,描述了一个文档和查询语句匹配的程度。ES 会对每个匹配查询条件的结果进行算分_score。打分的本质是排序,需要把最符合用户需求的文档排在前面。

如下例子:显而易见,查询JAVA多线程设计模式,文档id为2,3的文档的算分更高

关键词

文档ID

JAVA

1,2,3

设计模式

1,2,3,4,5,6

多线程

2,3,7,9

如何衡量相关性:

  • Precision(查准率)―尽可能返回较少的无关文档
  • Recall(查全率)–尽量返回较多的相关文档
  • Ranking -是否能够按照相关度进行排序

2.2 相关性算法

ES 5之前,默认的相关性算分采用TF-IDF,现在采用BM 25。

TF-IDF

TF-IDF(term frequency–inverse document frequency)是一种用于信息检索与数据挖掘的常用加权技术。

  • TF-IDF被公认为是信息检索领域最重要的发明,除了在信息检索,在文献分类和其他相关领域有着非常广泛的应用。
  • IDF的概念,最早是剑桥大学的“斯巴克.琼斯”提出
    • 1972年——“关键词特殊性的统计解释和它在文献检索中的应用”,但是没有从理论上解释IDF应该是用log(全部文档数/检索词出现过的文档总数),而不是其他函数,也没有做进一步的研究
    • 1970,1980年代萨尔顿和罗宾逊,进行了进一步的证明和研究,并用香农信息论做了证明http://www.staff.city.ac.uk/~sb317/papers/foundations_bm25_review.pdf
  • 现代搜索引擎,对TF-IDF进行了大量细微的优化

Lucene中的TF-IDF评分公式:

TF是词频(Term Frequency)

检索词在文档中出现的频率越高,相关性也越高。

词频(TF) = 某个词在文档中出现的次数 / 文档的总词数

IDF是逆向文本频率(Inverse Document Frequency)

每个检索词在索引中出现的频率,频率越高,相关性越低。总文档中有些词比如“是”、“的” 、“在” 在所有文档中出现频率都很高,并不重要,可以减少多个文档中都频繁出现的词的权重。

逆向文本频率(IDF)= log (语料库的文档总数 / (包含该词的文档数+1))

字段长度归一值( field-length norm)

检索词出现在一个内容短的 title 要比同样的词出现在一个内容长的 content 字段权重更大。

以上三个因素——词频(term frequency)、逆向文本频率(inverse document frequency)和字段长度归一值(field-length norm)——是在索引时计算并存储的,最后将它们结合在一起计算单个词在特定文档中的权重。

BM25

BM25 就是对 TF-IDF 算法的改进,对于 TF-IDF 算法,TF(t) 部分的值越大,整个公式返回的值就会越大。BM25 就针对这点进行来优化,随着TF(t) 的逐步加大,该算法的返回值会趋于一个数值。

  • 从ES 5开始,默认算法改为BM 25
  • 和经典的TF-IDF相比,当TF无限增加时,BM 25算分会趋于一个数值

  • BM 25的公式

2.3 通过Explain API查看TF-IDF

PUT /test_score/_bulk
{"index":{"_id":1}}
{"content":"we use Elasticsearch to power the search"}
{"index":{"_id":2}}
{"content":"we like elasticsearch"}
{"index":{"_id":3}}
{"content":"Thre scoring of documents is caculated by the scoring formula"}
{"index":{"_id":4}}
{"content":"you know,for search"}GET /test_score/_search
{"explain": true, "query": {"match": {"content": "elasticsearch"}}
}GET /test_score/_explain/2
{"query": {"match": {"content": "elasticsearch"}}
}

运行结果

{"_index" : "test_score","_type" : "_doc","_id" : "2","matched" : true,"explanation" : {"value" : 0.8713851,"description" : "weight(content:elasticsearch in 1) [PerFieldSimilarity], result of:","details" : [{"value" : 0.8713851,"description" : "score(freq=1.0), computed as boost * idf * tf from:","details" : [{"value" : 2.2,"description" : "boost","details" : [ ]},{"value" : 0.6931472,"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:","details" : [{"value" : 2,"description" : "n, number of documents containing term","details" : [ ]},{"value" : 4,"description" : "N, total number of documents with field","details" : [ ]}]},{"value" : 0.5714286,"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:","details" : [{"value" : 1.0,"description" : "freq, occurrences of term within document","details" : [ ]},{"value" : 1.2,"description" : "k1, term saturation parameter","details" : [ ]},{"value" : 0.75,"description" : "b, length normalization parameter","details" : [ ]},{"value" : 3.0,"description" : "dl, length of field","details" : [ ]},{"value" : 6.0,"description" : "avgdl, average length of field","details" : [ ]}]}]}]}
}

GET /es_db/_explain/3
{"query": {"match": {"address": "广州公园"}}
}

运行结果

{"_index" : "es_db","_type" : "_doc","_id" : "3","matched" : true,"explanation" : {"value" : 1.6476591,"description" : "sum of:","details" : [{"value" : 0.5978369,"description" : "weight(address:广州 in 2) [PerFieldSimilarity], result of:","details" : [{"value" : 0.5978369,"description" : "score(freq=1.0), computed as boost * idf * tf from:","details" : [{"value" : 2.2,"description" : "boost","details" : [ ]},{"value" : 0.597837,"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:","details" : [{"value" : 5,"description" : "n, number of documents containing term","details" : [ ]},{"value" : 9,"description" : "N, total number of documents with field","details" : [ ]}]},{"value" : 0.45454544,"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:","details" : [{"value" : 1.0,"description" : "freq, occurrences of term within document","details" : [ ]},{"value" : 1.2,"description" : "k1, term saturation parameter","details" : [ ]},{"value" : 0.75,"description" : "b, length normalization parameter","details" : [ ]},{"value" : 5.0,"description" : "dl, length of field","details" : [ ]},{"value" : 5.0,"description" : "avgdl, average length of field","details" : [ ]}]}]}]},{"value" : 1.0498221,"description" : "weight(address:公园 in 2) [PerFieldSimilarity], result of:","details" : [{"value" : 1.0498221,"description" : "score(freq=1.0), computed as boost * idf * tf from:","details" : [{"value" : 2.2,"description" : "boost","details" : [ ]},{"value" : 1.0498221,"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:","details" : [{"value" : 3,"description" : "n, number of documents containing term","details" : [ ]},{"value" : 9,"description" : "N, total number of documents with field","details" : [ ]}]},{"value" : 0.45454544,"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:","details" : [{"value" : 1.0,"description" : "freq, occurrences of term within document","details" : [ ]},{"value" : 1.2,"description" : "k1, term saturation parameter","details" : [ ]},{"value" : 0.75,"description" : "b, length normalization parameter","details" : [ ]},{"value" : 5.0,"description" : "dl, length of field","details" : [ ]},{"value" : 5.0,"description" : "avgdl, average length of field","details" : [ ]}]}]}]}]}
}

2.4 Boosting Query(常用)

Boosting是控制相关度的一种手段。可以通过指定字段的boost值影响查询结果

参数boost的含义:

  • 当boost > 1时,打分的权重相对性提升
  • 当0 < boost <1时,打分的权重相对性降低
  • 当boost <0时,贡献负分

应用场景:希望包含了某项内容的结果不是不出现,而是排序靠后。

POST /blogs/_bulk
{"index":{"_id":1}}
{"title":"Apple iPad","content":"Apple iPad,Apple iPad"}
{"index":{"_id":2}}
{"title":"Apple iPad,Apple iPad","content":"Apple iPad"}GET /blogs/_search
{"query": {"bool": {"should": [{"match": {"title": {"query": "apple,ipad","boost": 1}}},{"match": {"content": {"query": "apple,ipad","boost": 4}}}]}}
}

运行结果

{"took" : 1,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 2,"relation" : "eq"},"max_score" : 2.2558527,"hits" : [{"_index" : "blogs","_type" : "_doc","_id" : "1","_score" : 2.2558527,"_source" : {"title" : "Apple iPad","content" : "Apple iPad,Apple iPad"}},{"_index" : "blogs","_type" : "_doc","_id" : "2","_score" : 2.1472821,"_source" : {"title" : "Apple iPad,Apple iPad","content" : "Apple iPad"}}]}
}

案例:要求苹果公司的产品信息优先展示

POST /news/_bulk
{"index":{"_id":1}}
{"content":"Apple Mac"}
{"index":{"_id":2}}
{"content":"Apple iPad"}
{"index":{"_id":3}}
{"content":"Apple employee like Apple Pie and Apple Juice"}GET /news/_search
{"query": {"bool": {"must": {"match": {"content": "apple"}}}}
}

运行结果

{"took" : 1,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 3,"relation" : "eq"},"max_score" : 0.17280531,"hits" : [{"_index" : "news","_type" : "_doc","_id" : "3","_score" : 0.17280531,"_source" : {"content" : "Apple employee like Apple Pie and Apple Juice"}},{"_index" : "news","_type" : "_doc","_id" : "1","_score" : 0.16786805,"_source" : {"content" : "Apple Mac"}},{"_index" : "news","_type" : "_doc","_id" : "2","_score" : 0.16786805,"_source" : {"content" : "Apple iPad"}}]}
}

利用must not排除不是苹果公司产品的文档

GET /news/_search
{"query": {"bool": {"must": {"match": {"content": "apple"}},"must_not": {"match":{"content": "pie"}}}}
}

运行结果

{"took" : 1,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 2,"relation" : "eq"},"max_score" : 0.16786805,"hits" : [{"_index" : "news","_type" : "_doc","_id" : "1","_score" : 0.16786805,"_source" : {"content" : "Apple Mac"}},{"_index" : "news","_type" : "_doc","_id" : "2","_score" : 0.16786805,"_source" : {"content" : "Apple iPad"}}]}
}

利用negative_boost降低相关性

对某些返回结果不满意,但又不想排除掉( must_not),可以考虑boosting query的negative_boost。

  • negative_boost 对 negative部分query生效
  • 计算评分时,boosting部分评分不修改,negative部分query乘以negative_boost值
  • negative_boost取值:0-1.0,举例:0.3
GET /news/_search
{"query": {"boosting": {"positive": {"match": {"content": "apple"}},"negative": {"match": {"content": "pie"}},"negative_boost": 0.2}}
}

运行结果

{"took" : 11,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 3,"relation" : "eq"},"max_score" : 0.16786805,"hits" : [{"_index" : "news","_type" : "_doc","_id" : "1","_score" : 0.16786805,"_source" : {"content" : "Apple Mac"}},{"_index" : "news","_type" : "_doc","_id" : "2","_score" : 0.16786805,"_source" : {"content" : "Apple iPad"}},{"_index" : "news","_type" : "_doc","_id" : "3","_score" : 0.034561064,"_source" : {"content" : "Apple employee like Apple Pie and Apple Juice"}}]}
}

3. 单字符串多字段查询

三种场景:

  • 最佳字段(Best Fields)

当字段之间相互竞争,又相互关联。例如,对于博客的 title和 body这样的字段,评分来自最匹配字段

  • 多数字段(Most Fields)

处理英文内容时的一种常见的手段是,在主字段( English Analyzer),抽取词干,加入同义词,以匹配更多的文档。相同的文本,加入子字段(Standard Analyzer),以提供更加精确的匹配。其他字段作为匹配文档提高相关度的信号,匹配字段越多则越好。

  • 混合字段(Cross Fields)

对于某些实体,例如人名,地址,图书信息。需要在多个字段中确定信息,单个字段只能作为整体的一部分。希望在任何这些列出的字段中找到尽可能多的词。

3.1 最佳字段查询Dis Max Query

将任何与任一查询匹配的文档作为结果返回,采用字段上最匹配的评分最终评分返回。 max(a,b)

官方文档:Disjunction max query | Elasticsearch Guide [7.17] | Elastic

测试:

DELETE /blogs
PUT /blogs/_doc/1
{"title": "Quick brown rabbits","body":  "Brown rabbits are commonly seen."
}PUT /blogs/_doc/2
{"title": "Keeping pets healthy","body":  "My quick brown fox eats rabbits on a regular basis."
}POST /blogs/_search
{"query": {"bool": {"should": [{ "match": { "title": "Brown fox" }},{ "match": { "body":  "Brown fox" }}]}}
}

运行结果

{"took" : 701,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 2,"relation" : "eq"},"max_score" : 0.90425634,"hits" : [{"_index" : "blogs","_type" : "_doc","_id" : "1","_score" : 0.90425634,"_source" : {"title" : "Quick brown rabbits","body" : "Brown rabbits are commonly seen."}},{"_index" : "blogs","_type" : "_doc","_id" : "2","_score" : 0.77041256,"_source" : {"title" : "Keeping pets healthy","body" : "My quick brown fox eats rabbits on a regular basis."}}]}
}

思考:查询结果不符合预期,为什么?

bool should的算法过程:

  • 查询should语句中的两个查询
  • 加和两个查询的评分
  • 乘以匹配语句的总数
  • 除以所有语句的总数

上述例子中,title和body属于竞争关系,不应该将分数简单叠加,而是应该找到单个最佳匹配的字段的评分。

使用最佳字段查询dis max query

POST /blogs/_search
{"query": {"dis_max": {"queries": [{ "match": { "title": "Brown fox" }},{ "match": { "body":  "Brown fox" }}]}}
}

运行结果->符合预期

{"took" : 6,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 2,"relation" : "eq"},"max_score" : 0.77041256,"hits" : [{"_index" : "blogs","_type" : "_doc","_id" : "2","_score" : 0.77041256,"_source" : {"title" : "Keeping pets healthy","body" : "My quick brown fox eats rabbits on a regular basis."}},{"_index" : "blogs","_type" : "_doc","_id" : "1","_score" : 0.6931471,"_source" : {"title" : "Quick brown rabbits","body" : "Brown rabbits are commonly seen."}}]}
}

可以通过tie_breaker参数调整

Tier Breaker是一个介于0-1之间的浮点数。0代表使用最佳匹配;1代表所有语句同等重要。

  1. 获得最佳匹配语句的评分_score 。
  2. 将其他匹配语句的评分与tie_breaker相乘
  3. 对以上评分求和并规范化

最终得分=最佳匹配字段+其他匹配字段*tie_breaker

POST /blogs/_search
{"query": {"dis_max": {"queries": [{ "match": { "title": "Brown fox" }},{ "match": { "body":  "Brown fox" }}],"tie_breaker": 0.1}}
}

运行结果

{"took" : 1,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 2,"relation" : "eq"},"max_score" : 0.77041256,"hits" : [{"_index" : "blogs","_type" : "_doc","_id" : "2","_score" : 0.77041256,"_source" : {"title" : "Keeping pets healthy","body" : "My quick brown fox eats rabbits on a regular basis."}},{"_index" : "blogs","_type" : "_doc","_id" : "1","_score" : 0.714258,"_source" : {"title" : "Quick brown rabbits","body" : "Brown rabbits are commonly seen."}}]}
}

3.2 Multi Match Query(常用)

最佳字段(Best Fields)搜索

best_fields策略获取最佳匹配字段的得分, final_score = max(其他匹配字段得分, 最佳匹配字段得分)

采用 best_fields 查询,并添加参数 tie_breaker=0.1,final_score = 其他匹配字段得分 * 0.1 + 最佳匹配字段得分

Best Fields是默认类型,可以不用指定,等价于dis_max查询方式

POST /blogs/_search
{"query": {"multi_match": {"type": "best_fields","query": "Brown fox","fields": ["title","body"],"tie_breaker": 0.2}}
}

运行结果

{"took" : 2,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 2,"relation" : "eq"},"max_score" : 0.77041256,"hits" : [{"_index" : "blogs","_type" : "_doc","_id" : "2","_score" : 0.77041256,"_source" : {"title" : "Keeping pets healthy","body" : "My quick brown fox eats rabbits on a regular basis."}},{"_index" : "blogs","_type" : "_doc","_id" : "1","_score" : 0.73536897,"_source" : {"title" : "Quick brown rabbits","body" : "Brown rabbits are commonly seen."}}]}
}

案例

PUT /employee
{"settings" : {"index" : {"analysis.analyzer.default.type": "ik_max_word"}}
}POST /employee/_bulk
{"index":{"_id":1}}
{"empId":"1","name":"员工001","age":20,"sex":"男","mobile":"19000001111","salary":23343,"deptName":"技术部","address":"湖北省武汉市洪山区光谷大厦","content":"i like to write best elasticsearch article"}
{"index":{"_id":2}}
{"empId":"2","name":"员工002","age":25,"sex":"男","mobile":"19000002222","salary":15963,"deptName":"销售部","address":"湖北省武汉市江汉路","content":"i think java is the best programming language"}
{"index":{"_id":3}}
{"empId":"3","name":"员工003","age":30,"sex":"男","mobile":"19000003333","salary":20000,"deptName":"技术部","address":"湖北省武汉市经济开发区","content":"i am only an elasticsearch beginner"}
{"index":{"_id":4}}
{"empId":"4","name":"员工004","age":20,"sex":"女","mobile":"19000004444","salary":15600,"deptName":"销售部","address":"湖北省武汉市沌口开发区","content":"elasticsearch and hadoop are all very good solution, i am a beginner"}
{"index":{"_id":5}}
{"empId":"5","name":"员工005","age":20,"sex":"男","mobile":"19000005555","salary":19665,"deptName":"测试部","address":"湖北省武汉市东湖隧道","content":"spark is best big data solution based on scala, an programming language similar to java"}
{"index":{"_id":6}}
{"empId":"6","name":"员工006","age":30,"sex":"女","mobile":"19000006666","salary":30000,"deptName":"技术部","address":"湖北省武汉市江汉路","content":"i like java developer"}
{"index":{"_id":7}}
{"empId":"7","name":"员工007","age":60,"sex":"女","mobile":"19000007777","salary":52130,"deptName":"测试部","address":"湖北省黄冈市边城区","content":"i like elasticsearch developer"}
{"index":{"_id":8}}
{"empId":"8","name":"员工008","age":19,"sex":"女","mobile":"19000008888","salary":60000,"deptName":"技术部","address":"湖北省武汉市江汉大学","content":"i like spark language"}
{"index":{"_id":9}}
{"empId":"9","name":"员工009","age":40,"sex":"男","mobile":"19000009999","salary":23000,"deptName":"销售部","address":"河南省郑州市郑州大学","content":"i like java developer"}
{"index":{"_id":10}}
{"empId":"10","name":"张湖北","age":35,"sex":"男","mobile":"19000001010","salary":18000,"deptName":"测试部","address":"湖北省武汉市东湖高新","content":"i like java developer, i also like elasticsearch"}
{"index":{"_id":11}}
{"empId":"11","name":"王河南","age":61,"sex":"男","mobile":"19000001011","salary":10000,"deptName":"销售部","address":"河南省开封市河南大学","content":"i am not like java"}
{"index":{"_id":12}}
{"empId":"12","name":"张大学","age":26,"sex":"女","mobile":"19000001012","salary":11321,"deptName":"测试部","address":"河南省开封市河南大学","content":"i am java developer, java is good"}
{"index":{"_id":13}}
{"empId":"13","name":"李江汉","age":36,"sex":"男","mobile":"19000001013","salary":11215,"deptName":"销售部","address":"河南省郑州市二七区","content":"i like java and java is very best, i like it, do you like java"}
{"index":{"_id":14}}
{"empId":"14","name":"王技术","age":45,"sex":"女","mobile":"19000001014","salary":16222,"deptName":"测试部","address":"河南省郑州市金水区","content":"i like c++"}
{"index":{"_id":15}}
{"empId":"15","name":"张测试","age":18,"sex":"男","mobile":"19000001015","salary":20000,"deptName":"技术部","address":"河南省郑州市高新开发区","content":"i think spark is good"}GET /employee/_search
{"query": {"multi_match": {"query": "elasticsearch beginner 湖北省 开封市","type": "best_fields","fields": ["content","address"]}},"size": 15
}# 查看执行计划
GET /employee/_explain/3
{"query": {"multi_match": {"query": "elasticsearch beginner 湖北省 开封市","type": "best_fields","fields": ["content","address"]}}
}GET /employee/_explain/3
{"query": {"multi_match": {"query": "elasticsearch beginner 湖北省 开封市","type": "best_fields","fields": ["content","address"],"tie_breaker": 0.1}}
}

使用多数字段(Most Fields)搜索

most_fields策略获取全部匹配字段的累计得分(综合全部匹配字段的得分),等价于bool should查询方式

GET /employee/_explain/3
{"query": {"multi_match": {"query": "elasticsearch beginner 湖北省 开封市","type": "most_fields","fields": ["content","address"]}}
}

案例

DELETE /titles
PUT /titles
{"mappings": {"properties": {"title": {"type": "text","analyzer": "english","fields": {"std": {"type": "text","analyzer": "standard"}}}}}
}POST titles/_bulk
{ "index": { "_id": 1 }}
{ "title": "My dog barks" }
{ "index": { "_id": 2 }}
{ "title": "I see a lot of barking dogs on the road " }# 结果与预期不匹配
GET /titles/_search
{"query": {"match": {"title": "barking dogs"}}
}

运行结果

{"took" : 1,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 2,"relation" : "eq"},"max_score" : 0.42221838,"hits" : [{"_index" : "titles","_type" : "_doc","_id" : "1","_score" : 0.42221838,"_source" : {"title" : "My dog barks"}},{"_index" : "titles","_type" : "_doc","_id" : "2","_score" : 0.320886,"_source" : {"title" : "I see a lot of barking dogs on the road "}}]}
}

用广度匹配字段title包括尽可能多的文档——以提升召回率——同时又使用字段title.std 作为信号将相关度更高的文档置于结果顶部。

GET /titles/_search
{"query": {"multi_match": {"query": "barking dogs","type": "most_fields","fields": ["title","title.std"]}}
}

运行结果

{"took" : 0,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 2,"relation" : "eq"},"max_score" : 1.4569323,"hits" : [{"_index" : "titles","_type" : "_doc","_id" : "2","_score" : 1.4569323,"_source" : {"title" : "I see a lot of barking dogs on the road "}},{"_index" : "titles","_type" : "_doc","_id" : "1","_score" : 0.42221838,"_source" : {"title" : "My dog barks"}}]}
}

每个字段对于最终评分的贡献可以通过自定义值boost 来控制。比如,使 title 字段更为重要,这样同时也降低了其他信号字段的作用:

#增加title的权重
GET /titles/_search
{"query": {"multi_match": {"query": "barking dogs","type": "most_fields","fields": ["title^10","title.std"]}}
}

运行结果

{"took" : 0,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 2,"relation" : "eq"},"max_score" : 4.3449063,"hits" : [{"_index" : "titles","_type" : "_doc","_id" : "2","_score" : 4.3449063,"_source" : {"title" : "I see a lot of barking dogs on the road "}},{"_index" : "titles","_type" : "_doc","_id" : "1","_score" : 4.222184,"_source" : {"title" : "My dog barks"}}]}
}

3.3 跨字段(Cross Field)搜索

搜索内容在多个字段中都显示,类似bool+dis_max组合

DELETE /address
PUT /address
{"settings" : {"index" : {"analysis.analyzer.default.type": "ik_max_word"}}
}PUT /address/_bulk
{ "index": { "_id": "1"} }
{"province": "湖南","city": "长沙"}
{ "index": { "_id": "2"} }
{"province": "湖南","city": "常德"}
{ "index": { "_id": "3"} }
{"province": "广东","city": "广州"}
{ "index": { "_id": "4"} }
{"province": "湖南","city": "邵阳"}#使用most_fields的方式结果不符合预期,不支持operator
GET /address/_search
{"query": {"multi_match": {"query": "湖南常德","type": "most_fields","fields": ["province","city"]}}
}# 可以使用cross_fields,支持operator
#与copy_to相比,其中一个优势就是它可以在搜索时为单个字段提升权重。
GET /address/_search
{"query": {"multi_match": {"query": "湖南常德","type": "cross_fields","operator": "and", "fields": ["province","city"]}}
}

运行结果

第一个GET
{"took" : 0,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 3,"relation" : "eq"},"max_score" : 1.5606477,"hits" : [{"_index" : "address","_type" : "_doc","_id" : "2","_score" : 1.5606477,"_source" : {"province" : "湖南","city" : "常德"}},{"_index" : "address","_type" : "_doc","_id" : "1","_score" : 0.35667494,"_source" : {"province" : "湖南","city" : "长沙"}},{"_index" : "address","_type" : "_doc","_id" : "4","_score" : 0.35667494,"_source" : {"province" : "湖南","city" : "邵阳"}}]}
}第二个GET
{"took" : 0,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 1,"relation" : "eq"},"max_score" : 1.5606477,"hits" : [{"_index" : "address","_type" : "_doc","_id" : "2","_score" : 1.5606477,"_source" : {"province" : "湖南","city" : "常德"}}]}
}

可以用copy...to 解决,但是需要额外的存储空间

DELETE /address
# copy_to参数允许将多个字段的值复制到组字段中,然后可以将其作为单个字段进行查询
PUT /address
{"mappings" : {"properties" : {"province" : {"type" : "keyword","copy_to": "full_address"},"city" : {"type" : "text","copy_to": "full_address"}}},"settings" : {"index" : {"analysis.analyzer.default.type": "ik_max_word"}}
}PUT /address/_bulk
{ "index": { "_id": "1"} }
{"province": "湖南","city": "长沙"}
{ "index": { "_id": "2"} }
{"province": "湖南","city": "常德"}
{ "index": { "_id": "3"} }
{"province": "广东","city": "广州"}
{ "index": { "_id": "4"} }
{"province": "湖南","city": "邵阳"}GET /address/_search
{"query": {"match": {"full_address": {"query": "湖南常德","operator": "and"}}}
}

运行结果

{"took" : 0,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 1,"relation" : "eq"},"max_score" : 1.5606477,"hits" : [{"_index" : "address","_type" : "_doc","_id" : "2","_score" : 1.5606477,"_source" : {"province" : "湖南","city" : "常德"}}]}
}

4. ElasticSearch聚合操作

Elasticsearch除搜索以外,提供了针对ES 数据进行统计分析的功能。聚合(aggregations)可以让我们极其方便的实现对数据的统计、分析、运算。例如:

  • 什么品牌的手机最受欢迎?
  • 这些手机的平均价格、最高价格、最低价格?
  • 这些手机每月的销售情况如何?

使用场景

聚合查询可以用于各种场景,比如商业智能、数据挖掘、日志分析等等。

  • 电商平台的销售分析:统计每个地区的销售额、每个用户的消费总额、每个产品的销售量等,以便更好地了解销售情况和趋势。
  • 社交媒体的用户行为分析:统计每个用户的发布次数、转发次数、评论次数等,以便更好地了解用户行为和趋势,同时可以将数据按照地区、时间、话题等维度进行分析。
  • 物流企业的运输分析:统计每个区域的运输量、每个车辆的运输次数、每个司机的行驶里程等,以便更好地了解运输情况和优化运输效率。
  • 金融企业的交易分析:统计每个客户的交易总额、每个产品的销售量、每个交易员的业绩等,以便更好地了解交易情况和优化业务流程。
  • 智能家居的设备监控分析:统计每个设备的使用次数、每个家庭的能源消耗量、每个时间段的设备使用率等,以便更好地了解用户需求和优化设备效能。

基本语法

聚合查询的语法结构与其他查询相似,通常包含以下部分:

  • 查询条件:指定需要聚合的文档,可以使用标准的 Elasticsearch 查询语法,如 term、match、range 等等。
  • 聚合函数:指定要执行的聚合操作,如 sum、avg、min、max、terms、date_histogram 等等。每个聚合命令都会生成一个聚合结果。
  • 聚合嵌套:聚合命令可以嵌套,以便更细粒度地分析数据。
GET <index_name>/_search
{"aggs": {"<aggs_name>": { // 聚合名称需要自己定义"<agg_type>": {"field": "<field_name>"}}}
}
  • aggs_name:聚合函数的名称
  • agg_type:聚合种类,比如是桶聚合(terms)或者是指标聚合(avg、sum、min、max等)
  • field_name:字段名称或者叫域名。

4.1 聚合的分类

  • Metric Aggregation:—些数学运算,可以对文档字段进行统计分析,类比Mysql中的 min(), max(), sum() 操作。
SELECT MIN(price), MAX(price) FROM products
#Metric聚合的DSL类比实现:
{"aggs":{"avg_price":{"avg":{"field":"price"}}}
}
  • Bucket Aggregation: 一些满足特定条件的文档的集合放置到一个桶里,每一个桶关联一个key,类比Mysql中的group by操作。
SELECT size COUNT(*) FROM products GROUP BY size
#bucket聚合的DSL类比实现:
{"aggs": {"by_size": {"terms": {"field": "size"}}
}
  • Pipeline Aggregation:对其他的聚合结果进行二次聚合

示例数据

DELETE /employees
#创建索引库
PUT /employees
{"mappings": {"properties": {"age":{"type": "integer"},"gender":{"type": "keyword"},"job":{"type" : "text","fields" : {"keyword" : {"type" : "keyword","ignore_above" : 50}}},"name":{"type": "keyword"},"salary":{"type": "integer"}}}
}PUT /employees/_bulk
{ "index" : {  "_id" : "1" } }
{ "name" : "Emma","age":32,"job":"Product Manager","gender":"female","salary":35000 }
{ "index" : {  "_id" : "2" } }
{ "name" : "Underwood","age":41,"job":"Dev Manager","gender":"male","salary": 50000}
{ "index" : {  "_id" : "3" } }
{ "name" : "Tran","age":25,"job":"Web Designer","gender":"male","salary":18000 }
{ "index" : {  "_id" : "4" } }
{ "name" : "Rivera","age":26,"job":"Web Designer","gender":"female","salary": 22000}
{ "index" : {  "_id" : "5" } }
{ "name" : "Rose","age":25,"job":"QA","gender":"female","salary":18000 }
{ "index" : {  "_id" : "6" } }
{ "name" : "Lucy","age":31,"job":"QA","gender":"female","salary": 25000}
{ "index" : {  "_id" : "7" } }
{ "name" : "Byrd","age":27,"job":"QA","gender":"male","salary":20000 }
{ "index" : {  "_id" : "8" } }
{ "name" : "Foster","age":27,"job":"Java Programmer","gender":"male","salary": 20000}
{ "index" : {  "_id" : "9" } }
{ "name" : "Gregory","age":32,"job":"Java Programmer","gender":"male","salary":22000 }
{ "index" : {  "_id" : "10" } }
{ "name" : "Bryant","age":20,"job":"Java Programmer","gender":"male","salary": 9000}
{ "index" : {  "_id" : "11" } }
{ "name" : "Jenny","age":36,"job":"Java Programmer","gender":"female","salary":38000 }
{ "index" : {  "_id" : "12" } }
{ "name" : "Mcdonald","age":31,"job":"Java Programmer","gender":"male","salary": 32000}
{ "index" : {  "_id" : "13" } }
{ "name" : "Jonthna","age":30,"job":"Java Programmer","gender":"female","salary":30000 }
{ "index" : {  "_id" : "14" } }
{ "name" : "Marshall","age":32,"job":"Javascript Programmer","gender":"male","salary": 25000}
{ "index" : {  "_id" : "15" } }
{ "name" : "King","age":33,"job":"Java Programmer","gender":"male","salary":28000 }
{ "index" : {  "_id" : "16" } }
{ "name" : "Mccarthy","age":21,"job":"Javascript Programmer","gender":"male","salary": 16000}
{ "index" : {  "_id" : "17" } }
{ "name" : "Goodwin","age":25,"job":"Javascript Programmer","gender":"male","salary": 16000}
{ "index" : {  "_id" : "18" } }
{ "name" : "Catherine","age":29,"job":"Javascript Programmer","gender":"female","salary": 20000}
{ "index" : {  "_id" : "19" } }
{ "name" : "Boone","age":30,"job":"DBA","gender":"male","salary": 30000}
{ "index" : {  "_id" : "20" } }
{ "name" : "Kathy","age":29,"job":"DBA","gender":"female","salary": 20000}

4.2 Metric Aggregation

  • 单值分析︰只输出一个分析结果
    • min, max, avg, sum
    • Cardinality(类似distinct Count)
  • 多值分析:输出多个分析结果
  • stats(统计), extended stats
  • percentile (百分位), percentile rank
  • top hits(排在前面的示例)

查询员工的最低最高和平均工资

#多个 Metric 聚合,找到最低最高和平均工资
POST /employees/_search
{"size": 0,  "aggs": {"max_salary": {"max": {"field": "salary"}},"min_salary": {"min": {"field": "salary"}},"avg_salary": {"avg": {"field": "salary"}}}
}

运行结果

{"took" : 5,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 20,"relation" : "eq"},"max_score" : null,"hits" : [ ]},"aggregations" : {"max_salary" : {"value" : 50000.0},"avg_salary" : {"value" : 24700.0},"min_salary" : {"value" : 9000.0}}
}

对salary进行统计

# 一个聚合,输出多值
POST /employees/_search
{"size": 0,"aggs": {"stats_salary": {"stats": {"field":"salary"}}}
}

运行结果

{"took" : 1,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 20,"relation" : "eq"},"max_score" : null,"hits" : [ ]},"aggregations" : {"stats_salary" : {"count" : 20,"min" : 9000.0,"max" : 50000.0,"avg" : 24700.0,"sum" : 494000.0}}
}

cardinate对搜索结果去重

POST /employees/_search
{"size": 0,"aggs": {"cardinate": {"cardinality": {"field": "job.keyword"}}}
}

运行结果

{"took" : 1,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 20,"relation" : "eq"},"max_score" : null,"hits" : [ ]},"aggregations" : {"cardinate" : {"value" : 7}}
}

4.3 Bucket Aggregation

按照一定的规则,将文档分配到不同的桶中,从而达到分类的目的。ES提供的一些常见的 Bucket Aggregation。

  • Terms,需要字段支持filedata
    • keyword 默认支持fielddata
    • text需要在Mapping 中开启fielddata,会按照分词后的结果进行分桶

  • 数字类型
    • Range / Data Range
    • Histogram(直方图) / Date Histogram

  • 支持嵌套: 也就在桶里再做分桶

桶聚合可以用于各种场景,例如:

  • 对数据进行分组统计,比如按照地区、年龄段、性别等字段进行分组统计。
  • 对时间序列数据进行时间段分析,比如按照每小时、每天、每月、每季度、每年等时间段进行分析。
  • 对各种标签信息分类,并统计其数量。

获取job的分类信息

# 对keword 进行聚合
GET /employees/_search
{"size": 0,"aggs": {"jobs": {"terms": {"field":"job.keyword"}}}
}

运行结果

{"took" : 6,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 20,"relation" : "eq"},"max_score" : null,"hits" : [ ]},"aggregations" : {"jobs" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "Java Programmer","doc_count" : 7},{"key" : "Javascript Programmer","doc_count" : 4},{"key" : "QA","doc_count" : 3},{"key" : "DBA","doc_count" : 2},{"key" : "Web Designer","doc_count" : 2},{"key" : "Dev Manager","doc_count" : 1},{"key" : "Product Manager","doc_count" : 1}]}}
}

聚合可配置属性有:

  • field:指定聚合字段
  • size:指定聚合结果数量
  • order:指定聚合结果排序方式

默认情况下,Bucket聚合会统计Bucket内的文档数量,记为_count,并且按照_count降序排序。

我们可以指定order属性,自定义聚合的排序方式:

GET /employees/_search
{"size": 0,"aggs": {"jobs": {"terms": {"field":"job.keyword","size": 10,"order": {"_count": "desc" }}}}
}

运行结果

{"took" : 1,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 20,"relation" : "eq"},"max_score" : null,"hits" : [ ]},"aggregations" : {"jobs" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "Java Programmer","doc_count" : 7},{"key" : "Javascript Programmer","doc_count" : 4},{"key" : "QA","doc_count" : 3},{"key" : "DBA","doc_count" : 2},{"key" : "Web Designer","doc_count" : 2},{"key" : "Dev Manager","doc_count" : 1},{"key" : "Product Manager","doc_count" : 1}]}}
}

限定聚合范围

#只对salary在10000元以上的文档聚合
GET /employees/_search
{"query": {"range": {"salary": {"gte": 10000 }}}, "size": 0,"aggs": {"jobs": {"terms": {"field":"job.keyword","size": 10,"order": {"_count": "desc" }}}}
}

运行结果

{"took" : 2,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 19,"relation" : "eq"},"max_score" : null,"hits" : [ ]},"aggregations" : {"jobs" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "Java Programmer","doc_count" : 6},{"key" : "Javascript Programmer","doc_count" : 4},{"key" : "QA","doc_count" : 3},{"key" : "DBA","doc_count" : 2},{"key" : "Web Designer","doc_count" : 2},{"key" : "Dev Manager","doc_count" : 1},{"key" : "Product Manager","doc_count" : 1}]}}
}

注意:对 Text 字段进行 terms 聚合查询,会失败抛出异常

POST /employees/_search
{"size": 0,"aggs": {"jobs": {"terms": {"field":"job"}}}
}

解决办法:对 Text 字段打开 fielddata,支持terms aggregation(不推荐)

PUT /employees/_mapping
{"properties" : {"job":{"type":  "text","fielddata": true}}
}# 对 Text 字段进行分词,分词后的terms
POST /employees/_search
{"size": 0,"aggs": {"jobs": {"terms": {"field":"job"}}}
}

运行结果

{"took" : 7,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 20,"relation" : "eq"},"max_score" : null,"hits" : [ ]},"aggregations" : {"jobs" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "programmer","doc_count" : 11},{"key" : "java","doc_count" : 7},{"key" : "javascript","doc_count" : 4},{"key" : "qa","doc_count" : 3},{"key" : "dba","doc_count" : 2},{"key" : "designer","doc_count" : 2},{"key" : "manager","doc_count" : 2},{"key" : "web","doc_count" : 2},{"key" : "dev","doc_count" : 1},{"key" : "product","doc_count" : 1}]}}
}

对job.keyword 和 job 进行 terms 聚合,分桶的总数并不一样

POST /employees/_search
{"size": 0,"aggs": {"cardinate": {"cardinality": {"field": "job"}}}
}

运行结果

{"took" : 1,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 20,"relation" : "eq"},"max_score" : null,"hits" : [ ]},"aggregations" : {"cardinate" : {"value" : 10}}
}

Range & Histogram聚合

  • 按照数字的范围,进行分桶
  • 在Range Aggregation中,可以自定义Key

Range 示例:按照工资的 Range 分桶

Salary Range分桶,可以自己定义 key
POST employees/_search
{"size": 0,"aggs": {"salary_range": {"range": {"field":"salary","ranges":[{"to":10000},{"from":10000,"to":20000},{"key":">20000","from":20000}]}}}
}

运行结果

{"took" : 1,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 20,"relation" : "eq"},"max_score" : null,"hits" : [ ]},"aggregations" : {"salary_range" : {"buckets" : [{"key" : "*-10000.0","to" : 10000.0,"doc_count" : 1},{"key" : "10000.0-20000.0","from" : 10000.0,"to" : 20000.0,"doc_count" : 4},{"key" : ">20000","from" : 20000.0,"doc_count" : 15}]}}
}

Histogram示例:按照工资的间隔分桶

#工资0到10万,以 5000一个区间进行分桶
POST employees/_search
{"size": 0,"aggs": {"salary_histrogram": {"histogram": {"field":"salary","interval":5000,"extended_bounds":{"min":0,"max":100000}}}}
}

运行结果

{"took" : 9,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 20,"relation" : "eq"},"max_score" : null,"hits" : [ ]},"aggregations" : {"salary_histrogram" : {"buckets" : [{"key" : 0.0,"doc_count" : 0},{"key" : 5000.0,"doc_count" : 1},{"key" : 10000.0,"doc_count" : 0},{"key" : 15000.0,"doc_count" : 4},{"key" : 20000.0,"doc_count" : 6},{"key" : 25000.0,"doc_count" : 3},{"key" : 30000.0,"doc_count" : 3},{"key" : 35000.0,"doc_count" : 2},{"key" : 40000.0,"doc_count" : 0},{"key" : 45000.0,"doc_count" : 0},{"key" : 50000.0,"doc_count" : 1},{"key" : 55000.0,"doc_count" : 0},{"key" : 60000.0,"doc_count" : 0},{"key" : 65000.0,"doc_count" : 0},{"key" : 70000.0,"doc_count" : 0},{"key" : 75000.0,"doc_count" : 0},{"key" : 80000.0,"doc_count" : 0},{"key" : 85000.0,"doc_count" : 0},{"key" : 90000.0,"doc_count" : 0},{"key" : 95000.0,"doc_count" : 0},{"key" : 100000.0,"doc_count" : 0}]}}
}

top_hits应用场景: 当获取分桶后,桶内最匹配的顶部文档列表

# 指定size,不同工种中,年纪最大的3个员工的具体信息
POST /employees/_search
{"size": 0,"aggs": {"jobs": {"terms": {"field":"job.keyword"},"aggs":{"old_employee":{"top_hits":{"size":3,"sort":[{"age":{"order":"desc"}}]}}}}}
}

运行结果

{"took" : 13,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 20,"relation" : "eq"},"max_score" : null,"hits" : [ ]},"aggregations" : {"jobs" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "Java Programmer","doc_count" : 7,"old_employee" : {"hits" : {"total" : {"value" : 7,"relation" : "eq"},"max_score" : null,"hits" : [{"_index" : "employees","_type" : "_doc","_id" : "11","_score" : null,"_source" : {"name" : "Jenny","age" : 36,"job" : "Java Programmer","gender" : "female","salary" : 38000},"sort" : [36]},{"_index" : "employees","_type" : "_doc","_id" : "15","_score" : null,"_source" : {"name" : "King","age" : 33,"job" : "Java Programmer","gender" : "male","salary" : 28000},"sort" : [33]},{"_index" : "employees","_type" : "_doc","_id" : "9","_score" : null,"_source" : {"name" : "Gregory","age" : 32,"job" : "Java Programmer","gender" : "male","salary" : 22000},"sort" : [32]}]}}},{"key" : "Javascript Programmer","doc_count" : 4,"old_employee" : {"hits" : {"total" : {"value" : 4,"relation" : "eq"},"max_score" : null,"hits" : [{"_index" : "employees","_type" : "_doc","_id" : "14","_score" : null,"_source" : {"name" : "Marshall","age" : 32,"job" : "Javascript Programmer","gender" : "male","salary" : 25000},"sort" : [32]},{"_index" : "employees","_type" : "_doc","_id" : "18","_score" : null,"_source" : {"name" : "Catherine","age" : 29,"job" : "Javascript Programmer","gender" : "female","salary" : 20000},"sort" : [29]},{"_index" : "employees","_type" : "_doc","_id" : "17","_score" : null,"_source" : {"name" : "Goodwin","age" : 25,"job" : "Javascript Programmer","gender" : "male","salary" : 16000},"sort" : [25]}]}}},{"key" : "QA","doc_count" : 3,"old_employee" : {"hits" : {"total" : {"value" : 3,"relation" : "eq"},"max_score" : null,"hits" : [{"_index" : "employees","_type" : "_doc","_id" : "6","_score" : null,"_source" : {"name" : "Lucy","age" : 31,"job" : "QA","gender" : "female","salary" : 25000},"sort" : [31]},{"_index" : "employees","_type" : "_doc","_id" : "7","_score" : null,"_source" : {"name" : "Byrd","age" : 27,"job" : "QA","gender" : "male","salary" : 20000},"sort" : [27]},{"_index" : "employees","_type" : "_doc","_id" : "5","_score" : null,"_source" : {"name" : "Rose","age" : 25,"job" : "QA","gender" : "female","salary" : 18000},"sort" : [25]}]}}},{"key" : "DBA","doc_count" : 2,"old_employee" : {"hits" : {"total" : {"value" : 2,"relation" : "eq"},"max_score" : null,"hits" : [{"_index" : "employees","_type" : "_doc","_id" : "19","_score" : null,"_source" : {"name" : "Boone","age" : 30,"job" : "DBA","gender" : "male","salary" : 30000},"sort" : [30]},{"_index" : "employees","_type" : "_doc","_id" : "20","_score" : null,"_source" : {"name" : "Kathy","age" : 29,"job" : "DBA","gender" : "female","salary" : 20000},"sort" : [29]}]}}},{"key" : "Web Designer","doc_count" : 2,"old_employee" : {"hits" : {"total" : {"value" : 2,"relation" : "eq"},"max_score" : null,"hits" : [{"_index" : "employees","_type" : "_doc","_id" : "4","_score" : null,"_source" : {"name" : "Rivera","age" : 26,"job" : "Web Designer","gender" : "female","salary" : 22000},"sort" : [26]},{"_index" : "employees","_type" : "_doc","_id" : "3","_score" : null,"_source" : {"name" : "Tran","age" : 25,"job" : "Web Designer","gender" : "male","salary" : 18000},"sort" : [25]}]}}},{"key" : "Dev Manager","doc_count" : 1,"old_employee" : {"hits" : {"total" : {"value" : 1,"relation" : "eq"},"max_score" : null,"hits" : [{"_index" : "employees","_type" : "_doc","_id" : "2","_score" : null,"_source" : {"name" : "Underwood","age" : 41,"job" : "Dev Manager","gender" : "male","salary" : 50000},"sort" : [41]}]}}},{"key" : "Product Manager","doc_count" : 1,"old_employee" : {"hits" : {"total" : {"value" : 1,"relation" : "eq"},"max_score" : null,"hits" : [{"_index" : "employees","_type" : "_doc","_id" : "1","_score" : null,"_source" : {"name" : "Emma","age" : 32,"job" : "Product Manager","gender" : "female","salary" : 35000},"sort" : [32]}]}}}]}}
}

嵌套聚合示例

# 嵌套聚合1,按照工作类型分桶,并统计工资信息
POST employees/_search
{"size": 0,"aggs": {"Job_salary_stats": {"terms": {"field": "job.keyword"},"aggs": {"salary": {"stats": {"field": "salary"}}}}}
}

运行结果

{"took" : 1,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 20,"relation" : "eq"},"max_score" : null,"hits" : [ ]},"aggregations" : {"Job_salary_stats" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "Java Programmer","doc_count" : 7,"salary" : {"count" : 7,"min" : 9000.0,"max" : 38000.0,"avg" : 25571.428571428572,"sum" : 179000.0}},{"key" : "Javascript Programmer","doc_count" : 4,"salary" : {"count" : 4,"min" : 16000.0,"max" : 25000.0,"avg" : 19250.0,"sum" : 77000.0}},{"key" : "QA","doc_count" : 3,"salary" : {"count" : 3,"min" : 18000.0,"max" : 25000.0,"avg" : 21000.0,"sum" : 63000.0}},{"key" : "DBA","doc_count" : 2,"salary" : {"count" : 2,"min" : 20000.0,"max" : 30000.0,"avg" : 25000.0,"sum" : 50000.0}},{"key" : "Web Designer","doc_count" : 2,"salary" : {"count" : 2,"min" : 18000.0,"max" : 22000.0,"avg" : 20000.0,"sum" : 40000.0}},{"key" : "Dev Manager","doc_count" : 1,"salary" : {"count" : 1,"min" : 50000.0,"max" : 50000.0,"avg" : 50000.0,"sum" : 50000.0}},{"key" : "Product Manager","doc_count" : 1,"salary" : {"count" : 1,"min" : 35000.0,"max" : 35000.0,"avg" : 35000.0,"sum" : 35000.0}}]}}
}

# 多次嵌套。根据工作类型分桶,然后按照性别分桶,计算工资的统计信息
POST employees/_search
{"size": 0,"aggs": {"Job_gender_stats": {"terms": {"field": "job.keyword"},"aggs": {"gender_stats": {"terms": {"field": "gender"},"aggs": {"salary_stats": {"stats": {"field": "salary"}}}}}}}
}

运行结果

{"took" : 1,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 20,"relation" : "eq"},"max_score" : null,"hits" : [ ]},"aggregations" : {"Job_gender_stats" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "Java Programmer","doc_count" : 7,"gender_stats" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "male","doc_count" : 5,"salary_stats" : {"count" : 5,"min" : 9000.0,"max" : 32000.0,"avg" : 22200.0,"sum" : 111000.0}},{"key" : "female","doc_count" : 2,"salary_stats" : {"count" : 2,"min" : 30000.0,"max" : 38000.0,"avg" : 34000.0,"sum" : 68000.0}}]}},{"key" : "Javascript Programmer","doc_count" : 4,"gender_stats" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "male","doc_count" : 3,"salary_stats" : {"count" : 3,"min" : 16000.0,"max" : 25000.0,"avg" : 19000.0,"sum" : 57000.0}},{"key" : "female","doc_count" : 1,"salary_stats" : {"count" : 1,"min" : 20000.0,"max" : 20000.0,"avg" : 20000.0,"sum" : 20000.0}}]}},{"key" : "QA","doc_count" : 3,"gender_stats" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "female","doc_count" : 2,"salary_stats" : {"count" : 2,"min" : 18000.0,"max" : 25000.0,"avg" : 21500.0,"sum" : 43000.0}},{"key" : "male","doc_count" : 1,"salary_stats" : {"count" : 1,"min" : 20000.0,"max" : 20000.0,"avg" : 20000.0,"sum" : 20000.0}}]}},{"key" : "DBA","doc_count" : 2,"gender_stats" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "female","doc_count" : 1,"salary_stats" : {"count" : 1,"min" : 20000.0,"max" : 20000.0,"avg" : 20000.0,"sum" : 20000.0}},{"key" : "male","doc_count" : 1,"salary_stats" : {"count" : 1,"min" : 30000.0,"max" : 30000.0,"avg" : 30000.0,"sum" : 30000.0}}]}},{"key" : "Web Designer","doc_count" : 2,"gender_stats" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "female","doc_count" : 1,"salary_stats" : {"count" : 1,"min" : 22000.0,"max" : 22000.0,"avg" : 22000.0,"sum" : 22000.0}},{"key" : "male","doc_count" : 1,"salary_stats" : {"count" : 1,"min" : 18000.0,"max" : 18000.0,"avg" : 18000.0,"sum" : 18000.0}}]}},{"key" : "Dev Manager","doc_count" : 1,"gender_stats" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "male","doc_count" : 1,"salary_stats" : {"count" : 1,"min" : 50000.0,"max" : 50000.0,"avg" : 50000.0,"sum" : 50000.0}}]}},{"key" : "Product Manager","doc_count" : 1,"gender_stats" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "female","doc_count" : 1,"salary_stats" : {"count" : 1,"min" : 35000.0,"max" : 35000.0,"avg" : 35000.0,"sum" : 35000.0}}]}}]}}
}

4.4 Pipeline Aggregation

支持对聚合分析的结果,再次进行聚合分析。

Pipeline 的分析结果会输出到原结果中,根据位置的不同,分为两类:

  • Sibling - 结果和现有分析结果同级
    • Max,min,Avg & Sum Bucket
    • Stats,Extended Status Bucket
    • Percentiles Bucket
  • Parent -结果内嵌到现有的聚合分析结果之中
    • Derivative(求导)
    • Cumultive Sum(累计求和)
    • Moving Function(移动平均值 )

min_bucket示例

在员工数最多的工种里,找出平均工资最低的工种

# 平均工资最低的工种
POST employees/_search
{"size": 0,"aggs": {"jobs": {"terms": {"field": "job.keyword","size": 10},"aggs": {"avg_salary": {"avg": {"field": "salary"}}}},"min_salary_by_job":{   "min_bucket": {    "buckets_path": "jobs>avg_salary"  }}}
}

运行结果

{"took" : 2,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 20,"relation" : "eq"},"max_score" : null,"hits" : [ ]},"aggregations" : {"jobs" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "Java Programmer","doc_count" : 7,"avg_salary" : {"value" : 25571.428571428572}},{"key" : "Javascript Programmer","doc_count" : 4,"avg_salary" : {"value" : 19250.0}},{"key" : "QA","doc_count" : 3,"avg_salary" : {"value" : 21000.0}},{"key" : "DBA","doc_count" : 2,"avg_salary" : {"value" : 25000.0}},{"key" : "Web Designer","doc_count" : 2,"avg_salary" : {"value" : 20000.0}},{"key" : "Dev Manager","doc_count" : 1,"avg_salary" : {"value" : 50000.0}},{"key" : "Product Manager","doc_count" : 1,"avg_salary" : {"value" : 35000.0}}]},"min_salary_by_job" : {"value" : 19250.0,"keys" : ["Javascript Programmer"]}}
}
  • min_salary_by_job结果和jobs的聚合同级
  • min_bucket求之前结果的最小值
  • 通过bucket_path关键字指定路径

Stats示例

# 平均工资的统计分析
POST employees/_search
{"size": 0,"aggs": {"jobs": {"terms": {"field": "job.keyword","size": 10},"aggs": {"avg_salary": {"avg": {"field": "salary"}}}},"stats_salary_by_job":{"stats_bucket": {"buckets_path": "jobs>avg_salary"}}}
}

运行结果

{"took" : 1,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 20,"relation" : "eq"},"max_score" : null,"hits" : [ ]},"aggregations" : {"jobs" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "Java Programmer","doc_count" : 7,"avg_salary" : {"value" : 25571.428571428572}},{"key" : "Javascript Programmer","doc_count" : 4,"avg_salary" : {"value" : 19250.0}},{"key" : "QA","doc_count" : 3,"avg_salary" : {"value" : 21000.0}},{"key" : "DBA","doc_count" : 2,"avg_salary" : {"value" : 25000.0}},{"key" : "Web Designer","doc_count" : 2,"avg_salary" : {"value" : 20000.0}},{"key" : "Dev Manager","doc_count" : 1,"avg_salary" : {"value" : 50000.0}},{"key" : "Product Manager","doc_count" : 1,"avg_salary" : {"value" : 35000.0}}]},"stats_salary_by_job" : {"count" : 7,"min" : 19250.0,"max" : 50000.0,"avg" : 27974.48979591837,"sum" : 195821.42857142858}}
}

percentiles示例

# 平均工资的百分位数
POST employees/_search
{"size": 0,"aggs": {"jobs": {"terms": {"field": "job.keyword","size": 10},"aggs": {"avg_salary": {"avg": {"field": "salary"}}}},"percentiles_salary_by_job":{"percentiles_bucket": {"buckets_path": "jobs>avg_salary"}}}
}

运行结果

{"took" : 1,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 20,"relation" : "eq"},"max_score" : null,"hits" : [ ]},"aggregations" : {"jobs" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "Java Programmer","doc_count" : 7,"avg_salary" : {"value" : 25571.428571428572}},{"key" : "Javascript Programmer","doc_count" : 4,"avg_salary" : {"value" : 19250.0}},{"key" : "QA","doc_count" : 3,"avg_salary" : {"value" : 21000.0}},{"key" : "DBA","doc_count" : 2,"avg_salary" : {"value" : 25000.0}},{"key" : "Web Designer","doc_count" : 2,"avg_salary" : {"value" : 20000.0}},{"key" : "Dev Manager","doc_count" : 1,"avg_salary" : {"value" : 50000.0}},{"key" : "Product Manager","doc_count" : 1,"avg_salary" : {"value" : 35000.0}}]},"percentiles_salary_by_job" : {"values" : {"1.0" : 19250.0,"5.0" : 19250.0,"25.0" : 21000.0,"50.0" : 25000.0,"75.0" : 35000.0,"95.0" : 50000.0,"99.0" : 50000.0}}}
}

Cumulative_sum示例

#Cumulative_sum   累计求和
POST employees/_search
{"size": 0,"aggs": {"age": {"histogram": {"field": "age","min_doc_count": 0,"interval": 1},"aggs": {"avg_salary": {"avg": {"field": "salary"}},"cumulative_salary":{"cumulative_sum": {"buckets_path": "avg_salary"}}}}}
}

运行结果

{"took" : 1,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 20,"relation" : "eq"},"max_score" : null,"hits" : [ ]},"aggregations" : {"age" : {"buckets" : [{"key" : 20.0,"doc_count" : 1,"avg_salary" : {"value" : 9000.0},"cumulative_salary" : {"value" : 9000.0}},{"key" : 21.0,"doc_count" : 1,"avg_salary" : {"value" : 16000.0},"cumulative_salary" : {"value" : 25000.0}},{"key" : 22.0,"doc_count" : 0,"avg_salary" : {"value" : null},"cumulative_salary" : {"value" : 25000.0}},{"key" : 23.0,"doc_count" : 0,"avg_salary" : {"value" : null},"cumulative_salary" : {"value" : 25000.0}},{"key" : 24.0,"doc_count" : 0,"avg_salary" : {"value" : null},"cumulative_salary" : {"value" : 25000.0}},{"key" : 25.0,"doc_count" : 3,"avg_salary" : {"value" : 17333.333333333332},"cumulative_salary" : {"value" : 42333.33333333333}},{"key" : 26.0,"doc_count" : 1,"avg_salary" : {"value" : 22000.0},"cumulative_salary" : {"value" : 64333.33333333333}},{"key" : 27.0,"doc_count" : 2,"avg_salary" : {"value" : 20000.0},"cumulative_salary" : {"value" : 84333.33333333333}},{"key" : 28.0,"doc_count" : 0,"avg_salary" : {"value" : null},"cumulative_salary" : {"value" : 84333.33333333333}},{"key" : 29.0,"doc_count" : 2,"avg_salary" : {"value" : 20000.0},"cumulative_salary" : {"value" : 104333.33333333333}},{"key" : 30.0,"doc_count" : 2,"avg_salary" : {"value" : 30000.0},"cumulative_salary" : {"value" : 134333.3333333333}},{"key" : 31.0,"doc_count" : 2,"avg_salary" : {"value" : 28500.0},"cumulative_salary" : {"value" : 162833.3333333333}},{"key" : 32.0,"doc_count" : 3,"avg_salary" : {"value" : 27333.333333333332},"cumulative_salary" : {"value" : 190166.66666666666}},{"key" : 33.0,"doc_count" : 1,"avg_salary" : {"value" : 28000.0},"cumulative_salary" : {"value" : 218166.66666666666}},{"key" : 34.0,"doc_count" : 0,"avg_salary" : {"value" : null},"cumulative_salary" : {"value" : 218166.66666666666}},{"key" : 35.0,"doc_count" : 0,"avg_salary" : {"value" : null},"cumulative_salary" : {"value" : 218166.66666666666}},{"key" : 36.0,"doc_count" : 1,"avg_salary" : {"value" : 38000.0},"cumulative_salary" : {"value" : 256166.66666666666}},{"key" : 37.0,"doc_count" : 0,"avg_salary" : {"value" : null},"cumulative_salary" : {"value" : 256166.66666666666}},{"key" : 38.0,"doc_count" : 0,"avg_salary" : {"value" : null},"cumulative_salary" : {"value" : 256166.66666666666}},{"key" : 39.0,"doc_count" : 0,"avg_salary" : {"value" : null},"cumulative_salary" : {"value" : 256166.66666666666}},{"key" : 40.0,"doc_count" : 0,"avg_salary" : {"value" : null},"cumulative_salary" : {"value" : 256166.66666666666}},{"key" : 41.0,"doc_count" : 1,"avg_salary" : {"value" : 50000.0},"cumulative_salary" : {"value" : 306166.6666666666}}]}}
}

4.5 聚合的作用范围

ES聚合分析的默认作用范围是query的查询结果集,同时ES还支持以下方式改变聚合的作用范围:

  • Filter
  • Post Filter
  • Global
#Query
POST employees/_search
{"size": 0,"query": {"range": {"age": {"gte": 20}}},"aggs": {"jobs": {"terms": {"field":"job.keyword"}}}
}

运行结果

{"took" : 0,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 10,"relation" : "eq"},"max_score" : null,"hits" : [ ]},"aggregations" : {"jobs" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "Java Programmer","doc_count" : 5},{"key" : "DBA","doc_count" : 1},{"key" : "Dev Manager","doc_count" : 1},{"key" : "Javascript Programmer","doc_count" : 1},{"key" : "Product Manager","doc_count" : 1},{"key" : "QA","doc_count" : 1}]}}
}

#Filter
POST employees/_search
{"size": 0,"aggs": {"older_person": {"filter":{"range":{"age":{"from":35}}},"aggs":{"jobs":{"terms": {"field":"job.keyword"}}}},"all_jobs": {"terms": {"field":"job.keyword"}}}
}

运行结果

{"took" : 0,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 20,"relation" : "eq"},"max_score" : null,"hits" : [ ]},"aggregations" : {"older_person" : {"doc_count" : 2,"jobs" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "Dev Manager","doc_count" : 1},{"key" : "Java Programmer","doc_count" : 1}]}},"all_jobs" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "Java Programmer","doc_count" : 7},{"key" : "Javascript Programmer","doc_count" : 4},{"key" : "QA","doc_count" : 3},{"key" : "DBA","doc_count" : 2},{"key" : "Web Designer","doc_count" : 2},{"key" : "Dev Manager","doc_count" : 1},{"key" : "Product Manager","doc_count" : 1}]}}
}

#Post field. 一条语句,找出所有的job类型。还能找到聚合后符合条件的结果
POST employees/_search
{"aggs": {"jobs": {"terms": {"field": "job.keyword"}}},"post_filter": {"match": {"job.keyword": "Dev Manager"}}
}

运行结果

{"took" : 0,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 1,"relation" : "eq"},"max_score" : 1.0,"hits" : [{"_index" : "employees","_type" : "_doc","_id" : "2","_score" : 1.0,"_source" : {"name" : "Underwood","age" : 41,"job" : "Dev Manager","gender" : "male","salary" : 50000}}]},"aggregations" : {"jobs" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "Java Programmer","doc_count" : 7},{"key" : "Javascript Programmer","doc_count" : 4},{"key" : "QA","doc_count" : 3},{"key" : "DBA","doc_count" : 2},{"key" : "Web Designer","doc_count" : 2},{"key" : "Dev Manager","doc_count" : 1},{"key" : "Product Manager","doc_count" : 1}]}}
}

#global 
POST employees/_search
{"size": 0,"query": {"range": {"age": {"gte": 40}}},"aggs": {"jobs": {"terms": {"field":"job.keyword"}},"all":{"global":{},"aggs":{"salary_avg":{"avg":{"field":"salary"}}}}}
}

运行结果

{"took" : 0,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 1,"relation" : "eq"},"max_score" : null,"hits" : [ ]},"aggregations" : {"all" : {"doc_count" : 20,"salary_avg" : {"value" : 24700.0}},"jobs" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "Dev Manager","doc_count" : 1}]}}
}

4.6 排序

指定order,按照count和key进行排序:

  • 默认情况,按照count降序排序
  • 指定size,就能返回相应的桶
#排序 order
#count and key
POST employees/_search
{"size": 0,"query": {"range": {"age": {"gte": 20}}},"aggs": {"jobs": {"terms": {"field":"job.keyword","order":[{"_count":"asc"},{"_key":"desc"}]}}}
}

运行结果

{"took" : 1,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 20,"relation" : "eq"},"max_score" : null,"hits" : [ ]},"aggregations" : {"jobs" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "Product Manager","doc_count" : 1},{"key" : "Dev Manager","doc_count" : 1},{"key" : "Web Designer","doc_count" : 2},{"key" : "DBA","doc_count" : 2},{"key" : "QA","doc_count" : 3},{"key" : "Javascript Programmer","doc_count" : 4},{"key" : "Java Programmer","doc_count" : 7}]}}
}

#排序 order
#count and key
POST employees/_search
{"size": 0,"aggs": {"jobs": {"terms": {"field":"job.keyword","order":[  {"avg_salary":"desc"}]},"aggs": {"avg_salary": {"avg": {"field":"salary"}}}}}
}

运行结果

{"took" : 2,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 20,"relation" : "eq"},"max_score" : null,"hits" : [ ]},"aggregations" : {"jobs" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "Dev Manager","doc_count" : 1,"avg_salary" : {"value" : 50000.0}},{"key" : "Product Manager","doc_count" : 1,"avg_salary" : {"value" : 35000.0}},{"key" : "Java Programmer","doc_count" : 7,"avg_salary" : {"value" : 25571.428571428572}},{"key" : "DBA","doc_count" : 2,"avg_salary" : {"value" : 25000.0}},{"key" : "QA","doc_count" : 3,"avg_salary" : {"value" : 21000.0}},{"key" : "Web Designer","doc_count" : 2,"avg_salary" : {"value" : 20000.0}},{"key" : "Javascript Programmer","doc_count" : 4,"avg_salary" : {"value" : 19250.0}}]}}
}

#排序 order
#count and key
POST employees/_search
{"size": 0,"aggs": {"jobs": {"terms": {"field":"job.keyword","order":[  {"stats_salary.min":"desc"}]},"aggs": {"stats_salary": {"stats": {"field":"salary"}}}}}
}

运行结果

{"took" : 1,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 20,"relation" : "eq"},"max_score" : null,"hits" : [ ]},"aggregations" : {"jobs" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "Dev Manager","doc_count" : 1,"stats_salary" : {"count" : 1,"min" : 50000.0,"max" : 50000.0,"avg" : 50000.0,"sum" : 50000.0}},{"key" : "Product Manager","doc_count" : 1,"stats_salary" : {"count" : 1,"min" : 35000.0,"max" : 35000.0,"avg" : 35000.0,"sum" : 35000.0}},{"key" : "DBA","doc_count" : 2,"stats_salary" : {"count" : 2,"min" : 20000.0,"max" : 30000.0,"avg" : 25000.0,"sum" : 50000.0}},{"key" : "QA","doc_count" : 3,"stats_salary" : {"count" : 3,"min" : 18000.0,"max" : 25000.0,"avg" : 21000.0,"sum" : 63000.0}},{"key" : "Web Designer","doc_count" : 2,"stats_salary" : {"count" : 2,"min" : 18000.0,"max" : 22000.0,"avg" : 20000.0,"sum" : 40000.0}},{"key" : "Javascript Programmer","doc_count" : 4,"stats_salary" : {"count" : 4,"min" : 16000.0,"max" : 25000.0,"avg" : 19250.0,"sum" : 77000.0}},{"key" : "Java Programmer","doc_count" : 7,"stats_salary" : {"count" : 7,"min" : 9000.0,"max" : 38000.0,"avg" : 25571.428571428572,"sum" : 179000.0}}]}}
}

4.7 ES聚合分析不精准原因分析

ElasticSearch在对海量数据进行聚合分析的时候会损失搜索的精准度来满足实时性的需求。

Terms聚合分析的执行流程:

不精准的原因: 数据分散到多个分片,聚合是每个分片的取 Top X,导致结果不精准。ES 可以不每个分片Top X,而是全量聚合,但势必这会有很大的性能问题。

思考:如何提高聚合精确度?

方案1:设置主分片为1

注意7.x版本已经默认为1。

适用场景:数据量小的小集群规模业务场景。

方案2:调大 shard_size 值

设置 shard_size 为比较大的值,官方推荐:size*1.5+10。shard_size 值越大,结果越趋近于精准聚合结果值。此外,还可以通过show_term_doc_count_error参数显示最差情况下的错误值,用于辅助确定 shard_size 大小。

  • size:是聚合结果的返回值,客户期望返回聚合排名前三,size值就是 3。
  • shard_size: 每个分片上聚合的数据条数。shard_size 原则上要大于等于 size

适用场景:数据量大、分片数多的集群业务场景。

测试: 使用kibana的测试数据

DELETE my_flights
PUT my_flights
{"settings": {"number_of_shards": 20},"mappings" : {"properties" : {"AvgTicketPrice" : {"type" : "float"},"Cancelled" : {"type" : "boolean"},"Carrier" : {"type" : "keyword"},"Dest" : {"type" : "keyword"},"DestAirportID" : {"type" : "keyword"},"DestCityName" : {"type" : "keyword"},"DestCountry" : {"type" : "keyword"},"DestLocation" : {"type" : "geo_point"},"DestRegion" : {"type" : "keyword"},"DestWeather" : {"type" : "keyword"},"DistanceKilometers" : {"type" : "float"},"DistanceMiles" : {"type" : "float"},"FlightDelay" : {"type" : "boolean"},"FlightDelayMin" : {"type" : "integer"},"FlightDelayType" : {"type" : "keyword"},"FlightNum" : {"type" : "keyword"},"FlightTimeHour" : {"type" : "keyword"},"FlightTimeMin" : {"type" : "float"},"Origin" : {"type" : "keyword"},"OriginAirportID" : {"type" : "keyword"},"OriginCityName" : {"type" : "keyword"},"OriginCountry" : {"type" : "keyword"},"OriginLocation" : {"type" : "geo_point"},"OriginRegion" : {"type" : "keyword"},"OriginWeather" : {"type" : "keyword"},"dayOfWeek" : {"type" : "integer"},"timestamp" : {"type" : "date"}}}
}POST _reindex
{"source": {"index": "kibana_sample_data_flights"},"dest": {"index": "my_flights"}
}GET my_flights/_count
GET kibana_sample_data_flights/_search
{"size": 0,"aggs": {"weather": {"terms": {"field":"OriginWeather","size":5,"show_term_doc_count_error":true}}}
}GET my_flights/_search
{"size": 0,"aggs": {"weather": {"terms": {"field":"OriginWeather","size":5,"shard_size":10,"show_term_doc_count_error":true}}}
}

在Terms Aggregation的返回中有两个特殊的数值:

  • doc_count_error_upper_bound : 被遗漏的term 分桶,包含的文档,有可能的最大值
  • sum_other_doc_count: 除了返回结果 bucket的terms以外,其他 terms 的文档总数(总数-返回的总数)

方案3:将size设置为全量值,来解决精度问题

将size设置为2的32次方减去1也就是分片支持的最大值,来解决精度问题。

原因:1.x版本,size等于 0 代表全部,高版本取消 0 值,所以设置了最大值(大于业务的全量值)。

全量带来的弊端就是:如果分片数据量极大,这样做会耗费巨大的CPU 资源来排序,而且可能会阻塞网络。

适用场景:对聚合精准度要求极高的业务场景,由于性能问题,不推荐使用。

方案4:使用Clickhouse/ Spark 进行精准聚合

适用场景:数据量非常大、聚合精度要求高、响应速度快的业务场景。

4.8 Elasticsearch 聚合性能优化

启用 eager global ordinals 提升高基数聚合性能

适用场景:高基数聚合 。高基数聚合场景中的高基数含义:一个字段包含很大比例的唯一值。

global ordinals 中文翻译成全局序号,是一种数据结构,应用场景如下:

  • 基于 keyword,ip 等字段的分桶聚合,包含:terms聚合、composite 聚合等。
  • 基于text 字段的分桶聚合(前提条件是:fielddata 开启)。
  • 基于父子文档 Join 类型的 has_child 查询和 父聚合。

global ordinals 使用一个数值代表字段中的字符串值,然后为每一个数值分配一个 bucket(分桶)。

global ordinals 的本质是:启用 eager_global_ordinals 时,会在刷新(refresh)分片时构建全局序号。这将构建全局序号的成本从搜索阶段转移到了数据索引化(写入)阶段。

创建索引的同时开启:eager_global_ordinals。

PUT /my-index
{"mappings": {"properties": {"tags": {"type": "keyword","eager_global_ordinals": true}}}

注意:开启 eager_global_ordinals 会影响写入性能,因为每次刷新时都会创建新的全局序号。为了最大程度地减少由于频繁刷新建立全局序号而导致的额外开销,请调大刷新间隔 refresh_interval。

动态调整刷新频率的方法如下:

PUT my-index/_settings
{"index": {"refresh_interval": "30s"}

该招数的本质是:以空间换时间。

插入数据时对索引进行预排序

  • Index sorting (索引排序)可用于在插入时对索引进行预排序,而不是在查询时再对索引进行排序,这将提高范围查询(range query)和排序操作的性能。
  • 在 Elasticsearch 中创建新索引时,可以配置如何对每个分片内的段进行排序。
  • 这是 Elasticsearch 6.X 之后版本才有的特性。
PUT /my_index
{"settings": {"index":{"sort.field": "create_time","sort.order": "desc"}},"mappings": {"properties": {"create_time":{"type": "date"}}}
}

注意:预排序将增加 Elasticsearch 写入的成本。在某些用户特定场景下,开启索引预排序会导致大约 40%-50% 的写性能下降。也就是说,如果用户场景更关注写性能的业务,开启索引预排序不是一个很好的选择。

使用节点查询缓存

节点查询缓存(Node query cache)可用于有效缓存过滤器(filter)操作的结果。如果多次执行同一 filter 操作,这将很有效,但是即便更改过滤器中的某一个值,也将意味着需要计算新的过滤器结果。

例如,由于 “now” 值一直在变化,因此无法缓存在过滤器上下文中使用 “now” 的查询。

那怎么使用缓存呢?通过在 now 字段上应用 datemath 格式将其四舍五入到最接近的分钟/小时等,可以使此类请求更具可缓存性,以便可以对筛选结果进行缓存。

PUT /my_index/_doc/1
{"create_time":"2022-05-11T16:30:55.328Z"
}#下面的示例无法使用缓存
GET /my_index/_search
{"query":{"constant_score": {"filter": {"range": {"create_time": {"gte": "now-1h","lte": "now"}}}}}
}# 下面的示例就可以使用节点查询缓存。
GET /my_index/_search
{"query":{"constant_score": {"filter": {"range": {"create_time": {"gte": "now-1h/m","lte": "now/m"}}}}}
}

上述示例中的“now-1h/m” 就是 datemath 的格式。

如果当前时间 now 是:16:31:29,那么range query 将匹配 my_date 介于:15:31:00 和 15:31:59 之间的时间数据。同理,聚合的前半部分 query 中如果有基于时间查询,或者后半部分 aggs 部分中有基于时间聚合的,建议都使用 datemath 方式做缓存处理以优化性能。

使用分片请求缓存

聚合语句中,设置:size:0,就会使用分片请求缓存缓存结果。size = 0 的含义是:只返回聚合结果,不返回查询结果。

GET /es_db/_search
{"size": 0,"aggs": {"remark_agg": {"terms": {"field": "remark.keyword"}}}
}

拆分聚合,使聚合并行化

Elasticsearch 查询条件中同时有多个条件聚合,默认情况下聚合不是并行运行的。当为每个聚合提供自己的查询并执行 msearch 时,性能会有显著提升。因此,在 CPU 资源不是瓶颈的前提下,如果想缩短响应时间,可以将多个聚合拆分为多个查询,借助:msearch 实现并行聚合。

#常规的多条件聚合实现
GET /employees/_search
{"size": 0,"aggs": {"job_agg": {"terms": {"field": "job.keyword"}},"max_salary":{"max": {"field": "salary"}}}
}
# msearch 拆分多个语句的聚合实现
GET _msearch
{"index":"employees"}
{"size":0,"aggs":{"job_agg":{"terms":{"field": "job.keyword"}}}}
{"index":"employees"}
{"size":0,"aggs":{"max_salary":{"max":{"field": "salary"}}}}

相关文章:

ElasticSearch分词器、相关性详解与聚合查询实战

目录 1. ES分词器详解 1.1 基本概念 1.2 分词发生时期 1.3 分词器的组成 切词器&#xff1a;Tokenizer 词项过滤器&#xff1a;Token Filter 字符过滤器&#xff1a;Character Filter 1.4 倒排索引的数据结构 2. 相关性详解 2.1 什么是相关性&#xff08;Relevance&am…...

删除二叉树中以x为根节点的子树(包括根结点)

已知二叉树以二叉链表存储&#xff0c;编写算法完成&#xff1a;对于树中每个元素值为x的结点&#xff0c;删除以它为根的子树&#xff0c;并释放相应的空间。 思想&#xff1a; 删除二叉树采用后序遍历。先删除左子树&#xff0c;然后右子树&#xff0c;最后根。 利用层次遍…...

Netty 与 WebSocket之间的关系

WebSocketProtocolHandler 和 Netty 在处理 WebSocket 连接时扮演不同的角色&#xff0c;但它们通常是一起使用的&#xff0c;尤其是在基于 Netty 的项目中。为了更好地理解它们之间的区别&#xff0c;我们首先需要了解 WebSocket 和 Netty 的基本概念。 WebSocket WebSocket…...

通信工程学习:什么是CSMA/CA载波监听多路访问/冲突避免

CSMA/CA&#xff1a;载波监听多路访问/冲突避免 CSMA/CA&#xff08;Carrier Sense Multiple Access/Collision Avoidance&#xff09;&#xff0c;即载波监听多路访问/冲突避免&#xff0c;是一种用于数据传输时避免各站点之间冲突的算法&#xff0c;尤其适用于无线局域网&…...

JAVA并发编程系列(13)Future、FutureTask异步小王子

美团本地生活面试&#xff1a;模拟外卖订单处理&#xff0c;客户支付提交订单后&#xff0c;查询订单详情&#xff0c;后台需要查询店铺备餐进度、以及外卖员目前位置信息后再返回。 时间好快&#xff0c;一转眼不到一个月时间&#xff0c;已经完成分享synchronized、volatile、…...

【python爬虫可以获取到谷歌影像吗?】如何有效下载谷歌影像?

【python爬虫可以获取到谷歌影像吗&#xff1f;】如何有效下载谷歌影像&#xff1f; 【python爬虫可以获取到谷歌影像吗&#xff1f;】如何有效下载谷歌影像&#xff1f; 文章目录 【python爬虫可以获取到谷歌影像吗&#xff1f;】如何有效下载谷歌影像&#xff1f;前言1. 使用…...

Windows 上安装 PostgreSQL

Windows 上安装 PostgreSQL PostgreSQL 是一款功能强大的开源关系数据库管理系统,广泛用于各种应用场景。在 Windows 系统上安装 PostgreSQL 相对简单,但需要遵循一系列步骤。本文将详细介绍在 Windows 上安装 PostgreSQL 的过程,并提供一些关键的配置和优化建议。 一、下…...

Vue 技术进阶 day2 数据监视的原理、其他内置指令、自定义指令、生命周期、组件化、VueComponent构造函数

目录 1.Vue监测数据的原理 1.1 原理 1.1.1 数据劫持 1.1.2 观察者模式(Vue内部的实现) 1.1.3 更新组件 1.1.4 计算属性和侦听器 1.2 后添加属性做响应式&#xff08;Vue.set / vm.$set&#xff09; 1.3 对象和数组的响应式 1.4 数据监视案例 2.指令 2.1 内置指令 2.…...

vue.js 原生js app端实现图片旋转、放大、缩小、拖拽

效果图&#xff1a; 旋转 放大&#xff1a;手机上可以双指放大缩小 拖拽 代码实现&#xff1a; html <div id"home" class"" v-cloak><!-- 上面三个按钮 图片自己解决 --><div class"headImage" v-if"showBtn">&l…...

MyBatis的注入问题

对之前文章的补充&#xff1a;MyBatis中的#{}与${}注入问题----原文链接 前言&#xff1a; MyBatis是一个流行的Java持久层框架&#xff0c;用于将对象与数据库中的数据进行映射。然而&#xff0c;如果不当使用&#xff0c;MyBatis也可能受到诸如SQL注入这类的安全问题的影响。…...

基于springboot的评分评教管理系统

&#x1f449;文末查看项目功能视频演示获取源码sql脚本视频导入教程视频 1 、功能描述 基于springboot的评分评教管理系统1拥有三种角色 管理员&#xff1a;评价管理、学生管理、评分指标管理、课程管理、教师管理、管理员管理等教师&#xff1a;课程管理、学生管理、个人信…...

C嘎嘎入门篇:类和对象(2)

前言&#xff1a; 上一篇小编讲了类和对象&#xff08;1&#xff09;&#xff0c;当然&#xff0c;在看这篇文章之前&#xff0c;读者朋友们一定要掌握好前面的基础内容&#xff0c;因为这篇和前面息息相关&#xff0c;废话不多说&#xff0c;下面小编就加快步伐&#xff0c;开…...

数据库 - Mongo数据库

目录 前言 一、MongoDB的特点 二、Mongo的核心概念 三、MongoDB的优劣势 四、使用场景 五、MongoDB与其他数据库的对比 六、如何安装MongoDB 七、数据库指令操作 &#xff08;一&#xff09;基本数据库操作 &#xff08;1&#xff09;连接 MongoDB &#xff08;2&am…...

工业控制过等保三级需要的网络安全设备及详细讲解

在工业控制系统&#xff08;ICS&#xff09;的安全性日益受到重视的背景下&#xff0c;网络安全等级保护&#xff08;过等保&#xff09;三级作为一种重要的安全标准&#xff0c;对保障工业控制系统的安全运行有着重要的意义。过等保三级主要针对那些对安全性要求较高的系统&am…...

Android开发高级篇:MVVM框架与数据双向绑定

在Android开发中&#xff0c;MVVM&#xff08;Model-View-ViewModel&#xff09;架构模式以其高效、简洁的特点&#xff0c;成为越来越多开发者的首选。MVVM不仅实现了界面&#xff08;UI&#xff09;与业务逻辑的分离&#xff0c;还通过数据双向绑定技术&#xff0c;极大地简化…...

智能招聘系统小程序的设计

管理员账户功能包括&#xff1a;系统首页&#xff0c;个人中心&#xff0c;用户管理&#xff0c;企业管理&#xff0c;招聘信息管理&#xff0c;应聘信息管理&#xff0c;系统管理 微信端账号功能包括&#xff1a;系统首页&#xff0c;招聘信息&#xff0c;我的 开发系统&…...

Wireshark抓包GRPC协议查看Protobuf编码内容

1.说明 对通过GRPC协议进行通信的流量进行抓包后&#xff0c; 需要先转换为HTTP2协议&#xff0c; 因为默认解析的HTTP协议和TCP协议无法进行后续的查看操作&#xff0c; 然后再通过加载protobuf文件&#xff0c; 对GRPC内的DATA字段进行解码。 2.抓包 本文为了测试方便&…...

selenium 强制、隐式、显示等待(11种预置条件)

注&#xff1a;显示等待和隐式等待不可混用 强制等待 让当前正在执行的代码线程暂停运行。 示例&#xff1a;在电商网站的商品搜索页面&#xff0c;等待 5 秒之后&#xff0c;点击搜索按钮&#xff0c;如果页面加载速度很快&#xff0c;在 2 秒内生成&#xff0c;那么还需要…...

ffmpeg拉取rtsp网络视频流报错解析

在使用ffmpeg调用api方式对一个rtsp网络视频流拉流播放时&#xff0c;应用程序出现了一些错误提示&#xff0c;并且拉流播放的画面也出现了一些马赛克的现象。所以这里便对应用程序所产生的错误提示进行了详细的研究和分析。这里将分析结果贴在下面&#xff0c;若其他朋友遇到类…...

c# iTextSharp 读取PDF

安装 iTextSharp&#xff1a; 可以通过 NuGet 包管理器安装 iTextSharp&#xff1a; Install-Package itext7创建 PDF 文件&#xff1a; using System; using System.IO; using iText.Kernel.Pdf; using iText.Layout; using iText.Layout.Element;class Program {static voi…...

<<迷雾>> 第5章 从逻辑学到逻辑电路(3)--与门 示例电路

与门及其符号&#xff08;2输入端&#xff09; info::操作说明 鼠标单击开关切换开合状态 系统中使用 半方形半圆形 表示与门 primary::在线交互操作链接 https://cc.xiaogd.net/?startCircuitLinkhttps://book.xiaogd.net/cyjsjdmw-examples/assets/circuit/cyjsjdmw-ch05-11…...

Java应用的数据库连接池连接超时处理

Java应用的数据库连接池连接超时处理 大家好&#xff0c;我是微赚淘客返利系统3.0的小编&#xff0c;是个冬天不穿秋裤&#xff0c;天冷也要风度的程序猿&#xff01; 在Java应用中&#xff0c;数据库连接池是管理数据库连接的重要组件。然而&#xff0c;当数据库负载过高或网…...

机器学习:opencv--摄像头OCR

目录 前言 一、三个函数 1.显示图像 2.点排序 3.透视变换 二、代码实例 1.打开摄像头 2.图像预处理 3.检测特定轮廓 4.对轮廓进行处理 5.释放资源 前言 摄像头OCR指的是利用摄像头捕捉图像中的文字信息&#xff0c;并通过光学字符识别&#xff08;OCR&#xff09;技…...

基于二分查找的动态规划 leetcode 300.最长递增子序列

如题&#xff1a; https://leetcode.cn/problems/longest-increasing-subsequence/description/ 其实常规动态规划的解法就没什么好说的了&#xff0c;有意思的是官方放出了一个二分查找的动态规化解法&#xff0c;时间复杂度能降到O(nlog(n))&#xff0c;但是为什么这样能解&…...

Java8 IntStream流sum的Bug

做. - 力扣&#xff08;LeetCode&#xff09;的时候发现 IntStream流中的sum在相加的过程中会加到突破Int上限导致数据不对&#xff0c;需要装成LongStream流才能有正确的输出。 long sum Arrays.stream(milestones).asLongStream().sum(); 要这样子写&#xff0c;只把sum改…...

PCL 索引空间采样

目录 一、概述 1.1原理 1.2实现步骤 1.3应用场景 二、代码实现 2.1关键函数 2.1.1 索引空间采样 2.1.2 可视化原始点云和下采样后的点云 2.2完整代码 三、实现效果 PCL点云算法汇总及实战案例汇总的目录地址链接&#xff1a; PCL点云算法与项目实战案例汇总&#xf…...

PasteForm最佳CRUD实践,实际案例PasteTemplate详解之3000问(三)

作为“贴代码”力推的一个CRUD实践项目PasteTemplate,在对现有的3个项目进行实战后效果非常舒服&#xff01;下面就针对PasteForm为啥我愿称为最佳CRUD做一些回答: 哪里可以下载这个PasteForm的项目案例 目前“贴代码”对外使用PasteForm的项目有"贴Builder(PasteSpide…...

【无标题】logistic映射

当Logistic映射中的控制参数 μ \mu μ 为负数时&#xff0c;系统的行为与正数 μ \mu μ 的情况截然不同。Logistic映射的一般形式是&#xff1a; x ( t 1 ) μ x ( t ) ( 1 − x ( t ) ) x(t1) \mu x(t) (1 - x(t)) x(t1)μx(t)(1−x(t))其中 x ( t ) x(t) x(t) 表示时…...

基于Node.js+Express+MySQL+VUE科研成果网站发布查看科研信息科研成果论文下载免费安装部署

目录 1.技术选型‌ ‌2.功能设计‌ ‌3.系统架构‌ ‌4.开发流程‌ 5.开发背景 6.开发目标 7.技术可行性 8.功能可行性 8.1功能图 8.2 界面设计 8.3 部分代码 构建一个基于Spring Boot、Java Web、J2EE、MySQL数据库以及Vue前后端分离的科研成果网站&#xff0c;可…...

提升C++代码质量的一些建议

文章目录 1. 命名清晰2. 简洁性3. 一致性4. 注释5. 避免复杂性6. 重构7. 测试8. 错误处理9. 文档10. 代码复用11. 性能优化12. 安全性- 代码规范推荐 C开发中&#xff0c;写出优雅且可维护的代码不仅能提升代码质量&#xff0c;还能提高团队协作效率和项目长期的可扩展性。以下…...

wordpress插件分享显示/中国新闻

首先声明本文旨在介绍centos环境下安装Docker-CE&#xff08;社区版&#xff09;&#xff0c;社区版是免费提供给个人开发者和小团队&#xff0c;Docker-EE (企业版)有额外费用&#xff0c;想了解其他系统下搭建&#xff0c;请传送《docker官网》 准备工作 1、docker要求Linux内…...

怎么样做网站/seo优化工具

最近因为工作需求原因一直使用VUE框架&#xff0c;作为时下最热门的渐进式框架&#xff0c;开发起来确实非常给力~ 当然一个好的工具也不可能完全对你百依百顺&#xff0c;最近在工作中就遇到了一个问题&#xff0c;经过一下午的奋战终于搞定了&#xff0c;秉承着本熊一贯的无私…...

北京网站建设策划/百度信息流代运营

1 先到49服务器上&#xff0c;用nc发送消息 2 详细代码如下&#xff0c;注意&#xff1a;保存前先用 repartition(1)&#xff0c;不然会有很多小文件 package cn.taobao; import org.apache.hadoop.io.NullWritable; import org.apache.hadoop.io.Text; import org.apache.had…...

400元做网站送网推/怎么做推广赚钱

http://httpsegmenter.googlecode.com/svn/...

中国菲律宾男篮直播/东莞seo外包公司

Linux内核模块编程入门看到昨天有好几个问linux内核编程问题的帖子&#xff0c;不少是卡在了入门问题上&#xff0c;就整理一下入门的初步流程。针对2.6内核的Linux系统&#xff0c;需要你的机器上已经安装了kernel-devel这个包&#xff0c;也就是编译模块所必须的东西&#xf…...

平板上做网站的软件/上海网络seo公司

如果是开平方根可以使用函数 SQRT(number)&#xff0c;返回数值的平方根 比如SQRT(9) 计算返回3如果是开立方根或者n次方根&#xff0c;可以进e68a847a64364行幂运算&#xff0c;POWER(number,power)函数表示返回number数值的power次乘幂&#xff0c;如POWER(5,2)表示5的2次方&…...