当前位置: 首页 > news >正文

Elasticsearch:通过摄取管道加上嵌套向量对大型文档进行分块轻松地实现段落搜索

作者:VECTOR SEARCH

向量搜索是一种基于含义而不是精确或不精确的 token 匹配技术来搜索数据的强大方法。 然而,强大的向量搜索的文本嵌入模型只能按几个句子的顺序处理短文本段落,而不是可以处理任意大量文本的基于 BM25 的技术。 现在,Elasticsearch 可以将大型文档与向量搜索无缝结合。

简单地说,它是如何在发挥作用的呢?

Elasticsearch 功能(例如摄取管道、脚本处理器的灵活性以及对使用密集向量的嵌套文档的新支持)的组合允许以一种简单的方式在摄取时将大文档分成足够小的段落,然后由文本嵌入模型处理这些段落以生成表示大型文档的全部含义所需的所有向量。

像平常一样摄取文档数据,并向摄取管道添加一个脚本处理器,将大型文本数据分解为句子数组或其他类型的块,然后添加一个 for_each 处理器,对每个块运行推理处理器。 索引的映射被定义为使得块数组被设置为嵌套对象,其中以密集向量映射作为子对象,然后该子对象将正确索引每个向量并使它们可搜索。

让我们详细了解一下:

加载文本嵌入模型

你需要的第一件事是一个模型,用于从块中创建文本嵌入,你可以使用任何你想要的东西,但此示例将在 all-distilroberta-v1 模型上端到端运行。 创建 Elastic Cloud 集群或准备另一个 Elasticsearch 集群后,我们可以使用 eland 库上传文本嵌入模型。

MODEL_ID = "sentence-transformers/all-MiniLM-L6-v2"
ELASTIC_PASSWORD = "YOURPASSWORD"
CLOUD_ID = "YOURCLOUDID"eland_import_hub_model \--cloud-id $CLOUD_ID \--es-username elastic \--es-password $ELASTIC_PASSWORD \--hub-model-id $MODEL_ID \--task-type text_embedding \--start

映射示例

下一步是准备映射以处理将在摄取管道期间创建的句子数组和向量对象。 对于这个特定的文本嵌入模型,维度为 384,dot_product 相似度将用于最近邻计算:

PUT chunker
{"mappings": {"dynamic": "true","properties": {"passages": {"type": "nested","properties": {"vector": {"properties": {"predicted_value": {"type": "dense_vector","index": true,"dims": 384,"similarity": "dot_product"}}}}}}}
}

摄取管道示例

最后的准备步骤是定义一个摄取管道,将 body_content 字段分解为存储在 passages 字段中的文本块。 该管道有两个处理器,第一个脚本处理器将 body_content 字段分解为通过正则表达式存储在 passages 字段中的句子数组。 要进一步研究,请阅读正则表达式的高级功能,例如负向后查找和正向后查找,以了解它如何尝试在句子边界上正确分割,而不是在 Mr. 或 Mrs. 或 Ms. 上分割,并保留句子中的标点符号。 只要总字符串长度低于传递给脚本的参数,它还会尝试将句子块连接在一起。 接下来每个处理器通过推理处理器在每个句子上运行文本嵌入模型:

PUT _ingest/pipeline/chunker
{"processors": [{"script": {"description": "Chunk body_content into sentences by looking for . followed by a space","lang": "painless","source": """String[] envSplit = /((?<!M(r|s|rs)\.)(?<=\.) |(?<=\!) |(?<=\?) )/.split(ctx['body_content']);ctx['passages'] = new ArrayList();int i = 0;boolean remaining = true;if (envSplit.length == 0) {return} else if (envSplit.length == 1) {Map passage = ['text': envSplit[0]];ctx['passages'].add(passage)} else {while (remaining) {Map passage = ['text': envSplit[i++]];while (i < envSplit.length && passage.text.length() + envSplit[i].length() < params.model_limit) {passage.text = passage.text + ' ' + envSplit[i++]}if (i == envSplit.length) {remaining = false}ctx['passages'].add(passage)}}""","params": {"model_limit": 400}}},{"foreach": {"field": "passages","processor": {"inference": {"field_map": {"_ingest._value.text": "text_field"},"model_id": "sentence-transformers__all-minilm-l6-v2","target_field": "_ingest._value.vector","on_failure": [{"append": {"field": "_source._ingest.inference_errors","value": [{"message": "Processor 'inference' in pipeline 'ml-inference-title-vector' failed with message '{{ _ingest.on_failure_message }}'","pipeline": "ml-inference-title-vector","timestamp": "{{{ _ingest.timestamp }}}"}]}}]}}}}]
}

添加一些文档

现在我们可以在 body_content 中添加包含大量文本的文档,并自动将它们分块,并将每个块文本通过模型嵌入到向量中:

PUT chunker/_doc/1?pipeline=chunker
{
"title": "Adding passage vector search to Lucene",
"body_content": "Vector search is a powerful tool in the information retrieval tool box. Using vectors alongside lexical search like BM25 is quickly becoming commonplace. But there are still a few pain points within vector search that need to be addressed. A major one is text embedding models and handling larger text input. Where lexical search like BM25 is already designed for long documents, text embedding models are not. All embedding models have limitations on the number of tokens they can embed. So, for longer text input it must be chunked into passages shorter than the model’s limit. Now instead of having one document with all its metadata, you have multiple passages and embeddings. And if you want to preserve your metadata, it must be added to every new document. A way to address this is with Lucene's “join” functionality. This is an integral part of Elasticsearch’s nested field type. It makes it possible to have a top-level document with multiple nested documents, allowing you to search over nested documents and join back against their parent documents. This sounds perfect for multiple passages and vectors belonging to a single top-level document! This is all awesome! But, wait, Elasticsearch® doesn’t support vectors in nested fields. Why not, and what needs to change? The key issue is how Lucene can join back to the parent documents when searching child vector passages. Like with kNN pre-filtering versus post-filtering, when the joining occurs determines the result quality and quantity. If a user searches for the top four nearest parent documents (not passages) to a query vector, they usually expect four documents. But what if they are searching over child vector passages and all four of the nearest vectors are from the same parent document? This would end up returning just one parent document, which would be surprising. This same kind of issue occurs with post-filtering."
}PUT chunker/_doc/3?pipeline=chunker
{
"title": "Automatic Byte Quantization in Lucene",
"body_content": "While HNSW is a powerful and flexible way to store and search vectors, it does require a significant amount of memory to run quickly. For example, querying 1MM float32 vectors of 768 dimensions requires roughly 1,000,000∗4∗(768+12)=3120000000≈31,000,000∗4∗(768+12)=3120000000bytes≈3GB of ram. Once you start searching a significant number of vectors, this gets expensive. One way to use around 75% less memory is through byte quantization. Lucene and consequently Elasticsearch has supported indexing byte vectors for some time, but building these vectors has been the user's responsibility. This is about to change, as we have introduced int8 scalar quantization in Lucene. All quantization techniques are considered lossy transformations of the raw data. Meaning some information is lost for the sake of space. For an in depth explanation of scalar quantization, see: Scalar Quantization 101. At a high level, scalar quantization is a lossy compression technique. Some simple math gives significant space savings with very little impact on recall. Those used to working with Elasticsearch may be familiar with these concepts already, but here is a quick overview of the distribution of documents for search. Each Elasticsearch index is composed of multiple shards. While each shard can only be assigned to a single node, multiple shards per index gives you compute parallelism across nodes. Each shard is composed as a single Lucene Index. A Lucene index consists of multiple read-only segments. During indexing, documents are buffered and periodically flushed into a read-only segment. When certain conditions are met, these segments can be merged in the background into a larger segment. All of this is configurable and has its own set of complexities. But, when we talk about segments and merging, we are talking about read-only Lucene segments and the automatic periodic merging of these segments. Here is a deeper dive into segment merging and design decisions."
}PUT chunker/_doc/2?pipeline=chunker
{
"title": "Use a Japanese language NLP model in Elasticsearch to enable semantic searches",
"body_content": "Quickly finding necessary documents from among the large volume of internal documents and product information generated every day is an extremely important task in both work and daily life. However, if there is a high volume of documents to search through, it can be a time-consuming process even for computers to re-read all of the documents in real time and find the target file. That is what led to the appearance of Elasticsearch® and other search engine software. When a search engine is used, search index data is first created so that key search terms included in documents can be used to quickly find those documents. However, even if the user has a general idea of what type of information they are searching for, they may not be able to recall a suitable keyword or they may search for another expression that has the same meaning. Elasticsearch enables synonyms and similar terms to be defined to handle such situations, but in some cases it can be difficult to simply use a correspondence table to convert a search query into a more suitable one. To address this need, Elasticsearch 8.0 released the vector search feature, which searches by the semantic content of a phrase. Alongside that, we also have a blog series on how to use Elasticsearch to perform vector searches and other NLP tasks. However, up through the 8.8 release, it was not able to correctly analyze text in languages other than English. With the 8.9 release, Elastic added functionality for properly analyzing Japanese in text analysis processing. This functionality enables Elasticsearch to perform semantic searches like vector search on Japanese text, as well as natural language processing tasks such as sentiment analysis in Japanese. In this article, we will provide specific step-by-step instructions on how to use these features."
}PUT chunker/_doc/5?pipeline=chunker
{
"title": "We can chunk whatever we want now basically to the limits of a document ingest",
"body_content": """Chonk is an internet slang term used to describe overweight cats that grew popular in the late summer of 2018 after a photoshopped chart of cat body-fat indexes renamed the "Chonk" scale grew popular on Twitter and Reddit. Additionally, "Oh Lawd He Comin'," the final level of the Chonk Chart, was adopted as an online catchphrase used to describe large objects, animals or people. It is not to be confused with the Saturday Night Live sketch of the same name. The term "Chonk" was popularized in a photoshopped edit of a chart illustrating cat body-fat indexes and the risk of health problems for each class (original chart shown below). The first known post of the "Chonk" photoshop, which classifies each cat to a certain level of "chonk"-ness ranging from "A fine boi" to "OH LAWD HE COMIN," was posted to Facebook group THIS CAT IS C H O N K Y on August 2nd, 2018 by Emilie Chang (shown below). The chart surged in popularity after it was tweeted by @dreamlandtea[1] on August 10th, 2018, gaining over 37,000 retweets and 94,000 likes (shown below). After the chart was posted there, it began growing popular on Reddit. It was reposted to /r/Delighfullychubby[2] on August 13th, 2018, and /r/fatcats on August 16th.[3] Additionally, cats were shared with variations on the phrase "Chonk." In @dreamlandtea's Twitter thread, she rated several cats on the Chonk scale (example, shown below, left). On /r/tumblr, a screenshot of a post featuring a "good luck cat" titled "Lucky Chonk" gained over 27,000 points (shown below, right). The popularity of the phrase led to the creation of a subreddit, /r/chonkers,[4] that gained nearly 400 subscribers in less than a month. Some photoshops of the chonk chart also spread on Reddit. For example, an edit showing various versions of Pikachu on the chart posted to /r/me_irl gained over 1,200 points (shown below, left). The chart gained further popularity when it was posted to /r/pics[5] September 29th, 2018."""
}

搜索那些文档

要搜索数据并返回与查询最匹配的块,你可以使用 inner_hits 和 knn 子句,以在查询的命中输出中返回文档的最佳匹配块:

GET chunker/_search
{"_source": false,"fields": ["title"],"knn": {"inner_hits": {"_source": false,"fields": ["passages.text"]},"field": "passages.vector.predicted_value","k": 1,"num_candidates": 100,"query_vector_builder": {"text_embedding": {"model_id": "sentence-transformers__all-minilm-l6-v2","model_text": "Can I use multiple vectors per document now?"}}}
}

将返回最佳文档和较大文档文本的相关部分:

{"took": 4,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 1,"relation": "eq"},"max_score": 0.75261426,"hits": [{"_index": "chunker","_id": "1","_score": 0.75261426,"_ignored": ["body_content.keyword","passages.text.keyword"],"fields": {"title": ["Adding passage vector search to Lucene"]},"inner_hits": {"passages": {"hits": {"total": {"value": 1,"relation": "eq"},"max_score": 0.75261426,"hits": [{"_index": "chunker","_id": "1","_nested": {"field": "passages","offset": 3},"_score": 0.75261426,"fields": {"passages": [{"text": ["This sounds perfect for multiple passages and vectors belonging to a single top-level document! This is all awesome! But, wait, Elasticsearch® doesn’t support vectors in nested fields. Why not, and what needs to change? The key issue is how Lucene can join back to the parent documents when searching child vector passages."]}]}}]}}}}]}
}

回顾一下

这里使用的方法展示了利用 Elasticsearch 的不同功能来解决更大问题的力量。

摄取管道允许你在索引之前对文档进行预处理,虽然有许多处理器可以执行特定的目标任务,但有时你需要脚本语言的强大功能才能执行诸如将文本分解为句子数组之类的操作。 因为你可以在对文档建立索引之前访问该文档,所以只要所有信息都在文档本身中,你就可以以几乎任何你能想象到的方式重新创建数据。 foreach 处理器允许我们包装一些可能运行零到 N 次的东西,而无需事先知道它需要执行多少次。 在本例中,我们使用它来运行与提取的句子一样多的句子,以运行推断处理器来生成向量。

索引的映射准备好处理原始文档中不存在的文本和向量的现在的对象数组,并使用嵌套对象来索引数据,以便我们可以正确搜索文档。

使用带有向量嵌套支持的 knn 允许使用 inner_hits 来呈现文档的最佳评分部分,这可以替代通常通过在 BM25 查询中使用 highlight 显示来完成的操作。

原文:Chunking Large Documents via Ingest pipelines plus nested vectors equals easy passage search — Elastic Search Labs

相关文章:

Elasticsearch:通过摄取管道加上嵌套向量对大型文档进行分块轻松地实现段落搜索

作者&#xff1a;VECTOR SEARCH 向量搜索是一种基于含义而不是精确或不精确的 token 匹配技术来搜索数据的强大方法。 然而&#xff0c;强大的向量搜索的文本嵌入模型只能按几个句子的顺序处理短文本段落&#xff0c;而不是可以处理任意大量文本的基于 BM25 的技术。 现在&…...

OpenCV图像纹理

LBP描述 LBP&#xff08;Local Binary Pattern&#xff0c;局部二值模式&#xff09;是一种用来描述图像局部纹理特征的算子&#xff1b;它具有旋转不变性和灰度不变性等显著的优点。它是首先由T. Ojala, M.Pietikinen, 和D. Harwood 在1994年提出&#xff0c;用于纹理特征提取…...

自媒体写手提问常用的ChatGPT通用提示词模板

如何撰写一篇具有吸引力和可读性的自媒体文章&#xff1f; 如何确定自媒体文章的主题和受众群体&#xff1f; 如何为自媒体文章取一个引人入胜的标题&#xff1f; 如何让自媒体文章的开头更加吸引人&#xff1f; 如何为自媒体文章构建一个清晰、逻辑严谨的框架&#xff1f;…...

分类预测 | Matlab实现PSO-LSTM-Attention粒子群算法优化长短期记忆神经网络融合注意力机制多特征分类预测

分类预测 | Matlab实现PSO-LSTM-Attention粒子群算法优化长短期记忆神经网络融合注意力机制多特征分类预测 目录 分类预测 | Matlab实现PSO-LSTM-Attention粒子群算法优化长短期记忆神经网络融合注意力机制多特征分类预测分类效果基本描述程序设计参考资料 分类效果 基本描述 1…...

3GPP TS38.201 NR; Physical layer; General description (Release 18)

TS38.201是介绍性的标准&#xff0c;简单介绍了RAN的信道组成和PHY层承担的功能&#xff0c;下图是PHY层相关标准的关系。 文章目录 结构信道类型调制方式PHY层支持的过程物理层测量其他标准TS 38.202: Physical layer services provided by the physical layerTS 38.211: Ph…...

【GitLab】-HTTP 500 curl 22 The requested URL returned error: 500~SSH解决

写在前面 本文主要介绍通过SSH的方式拉取GitLab代码。 目录 写在前面一、场景描述二、具体步骤1.环境说明2.生成秘钥3.GitLab添加秘钥4.验证SSH方式4.更改原有HTTP方式为SSH 三、参考资料写在后面系列文章 一、场景描述 之前笔者是通过 HTTP Personal access token 的方式拉取…...

【如何学习Python自动化测试】—— 自动化测试环境搭建

1、 自动化测试环境搭建 1.1 为什么选择 Python 什么是python&#xff0c;引用python官方的说法就是“一种解释型的、面向对象、带有励志语义的高级程序设计语言”&#xff0c;对于很多测试人员来说&#xff0c;这段话包含了很多术语&#xff0c;而测试人员大多是希望利用编程…...

在通用jar包中引入其他spring boot starter,并在通用jar包中直接配置这些starter的yml相关属性

场景 我在通用jar包中引入 spring-boot-starter-actuator 这样希望引用通用jar的所有服务都可以直接使用 actuator 中的功能&#xff0c; 问题在于&#xff0c;正常情况下&#xff0c;actuator的配置都写在每个项目的yml文件中&#xff0c;这就意味着&#xff0c;虽然每个项目…...

Seaborn 回归(Regression)及矩阵(Matrix)绘图

Seaborn中的回归包括回归拟合曲线图以及回归误差图。Matrix图主要是热度图。 1. 回归及矩阵绘图API概述 seaborn中“回归”绘图函数共3个&#xff1a; lmplot&#xff08;回归统计绘图&#xff09;&#xff1a;figure级regplot函数&#xff0c;绘图同regplot完全相同。(lm指lin…...

nginx学习(1)

一、下载安装NGINX&#xff1a; 先安装gcc-c编译器 yum install gcc-c yum install -y openssl openssl-devel&#xff08;1&#xff09;下载pcre-8.3.7.tar.gz 直接访问&#xff1a;http://downloads.sourceforge.net/project/pcre/pcre/8.37/pcre-8.37.tar.gz&#xff0c;就…...

CLEARTEXT communication to XX not permitted by network security policy 报错

在进行网络请求时&#xff0c;日志中打印 CLEARTEXT communication to XX not permitted by network security policy 原因&#xff1a; Android P系统网络访问安全策略升级&#xff0c;限制了非加密的流量请求 Android P系统限制了明文流量的网络请求&#xff0c;之下的版本…...

91.移动零(力扣)

问题描述 代码解决以及思想 class Solution { public:void moveZeroes(vector<int>& nums) {int left 0; // 左指针&#xff0c;用于指向当前非零元素应该放置的位置int right 0; // 右指针&#xff0c;用于遍历数组int len nums.size(); // 数组长度while …...

PatchMatchNet笔记

PatchMatchNet笔记 1 概述2 PatchmatchNet网络结构图2.1 多尺度特征提取2.2 基于学习的补丁匹配 3 性能评价 PatchmatchNet: Learned Multi-View Patchmatch Stereo&#xff1a;基于学习的多视角补丁匹配立体算法 1 概述 特点   高速&#xff0c;低内存&#xff0c;可以处理…...

实时人眼追踪、内置3D引擎,联想ThinkVision裸眼3D显示器创新四大应用场景

11月17日&#xff0c;在以“因思而变 智领未来”为主题的Think Centre和ThinkVision 20周年纪念活动上&#xff0c;联想正式发布了业内首款2D/3D 可切换裸眼3D显示器——联想ThinkVision 27 3D。该产品首次将裸眼2D、3D可切换技术应用在显示器领域&#xff0c;并拓展了3D技术多…...

SELinux零知识学习十四、SELinux策略语言之客体类别和许可(8)

接前一篇文章&#xff1a;SELinux零知识学习十三、SELinux策略语言之客体类别和许可&#xff08;7&#xff09; 一、SELinux策略语言之客体类别和许可 4. 客体类别许可实例 &#xff08;2&#xff09;文件客体类别许可 文件客体类别有三类许可&#xff1a;直接映像到标准Lin…...

Unity——URP相机详解

2021版本URP项目下的相机&#xff0c;一般新建一个相机有如下组件 1:Render Type(渲染类型) 有Base和Overlay两种选项&#xff0c;默认是Base选项 Base:主相机使用该种渲染方式&#xff0c;负责渲染场景中的主要图形元素 Overlay&#xff08;叠加&#xff09;:使用了Oveylay的…...

CRUD-SQL

文章目录 前置insertSelective和upsertSelective使用姿势手写sql&#xff0c;有两种方式 一、增当导入的数据不存在时则进行添加&#xff0c;有则更新 1.1 唯一键&#xff0c;先查&#xff0c;后插1.2 批量插1.2.1 批次一200、批次二200、批次三200&#xff0c;有一条数据写入失…...

【C语言 | 数组】C语言数组详解(经典,超详细)

&#x1f601;博客主页&#x1f601;&#xff1a;&#x1f680;https://blog.csdn.net/wkd_007&#x1f680; &#x1f911;博客内容&#x1f911;&#xff1a;&#x1f36d;嵌入式开发、Linux、C语言、C、数据结构、音视频&#x1f36d; &#x1f923;本文内容&#x1f923;&a…...

第三十三节——组合式API生命周期

一、基本使用 组合式api生命周期几乎和选项式一致。注意组合式api是从挂载阶段开始 <template><div></div> </template> <script setup> import {onBeforeMount, onMounted,onBeforeUpdate, onUpdated, onBeforeUnmount, onUnmounted, } from …...

【Linux】Alibaba Cloud Linux 3 安装 PHP8.1

一、系统安装 请参考 【Linux】Alibaba Cloud Linux 3 中第二硬盘、MySQL8.、MySQL7.、Redis、Nginx、Java 系统安装 二、安装源 rpm -ivh --nodeps https://rpms.remirepo.net/enterprise/remi-release-8.rpm sed -i s/PLATFORM_ID"platform:al8"/PLATFORM_ID&q…...

【容器化】Kubernetes(k8s)

文章目录 概述Docker 的管理痛点什么是 K8s云架构 & 云原生 架构核心组件K8s 的服务注册与发现组件调用流程部署单机版部署主从版本Operator来源拓展阅读 概述 Docker 虽好用&#xff0c;但面对强大的集群&#xff0c;成千上万的容器&#xff0c;突然感觉不香了。 这时候就…...

stm32 HSUSB

/ stm32f407xx.h #define USB_OTG_HS_PERIPH_BASE 0x40040000UL #define USB_OTG_HS ((USB_OTG_GlobalTypeDef *) USB_OTG_HS_PERIPH_BASE) // // 定义全局变量 USBD_HandleTypeDef hUsbDeviceHS;并默认全零初始化/* USB Device handle structure */ typedef struct _USB…...

C# String.Trim 方法

String.Trim()方法定义&#xff1a; 命名空间&#xff1a;System 程序集&#xff1a;System.Runtime.dll 返回结果&#xff1a;返回一个新字符串&#xff0c;它相当于从当前字符串中删除了一组指定字符的所有前导匹配项和尾随匹配项。 Trim方法有三个重载的方法&#xff0c;…...

<Linux>(极简关键、省时省力)《Linux操作系统原理分析之Linux 进程管理 4》(8)

《Linux操作系统原理分析之Linux 进程管理 4》&#xff08;8&#xff09; 4 Linux 进程管理4.4 Linux 进程的创建和撤销4.4.1 Linux 进程的族亲关系4.4.2 Linux 进程的创建4.4.3 Linux 进程创建的过程4.4.4 Linux 进程的执行4.4.5 Linux 进程的终止和撤销 4 Linux 进程管理 4.…...

RT-Thread STM32F407 PWM

为了展示PWM效果&#xff0c;这里用ADC来采集PWM输出通道的电平变化 第一步&#xff0c;进入RT-Thread Settings配置PWM驱动 第二步&#xff0c;进入board.h&#xff0c;打开PWM宏 第三步&#xff0c;进入STM32CubeMX&#xff0c;配置时钟及PWM 第四步&#xff0c;回到R…...

idea中把spring boot项目打成jar包

打jar包 打开项目&#xff0c;右击项目选中Open Module Settings进入project Structure 选中Artifacts&#xff0c;点击中间的加号&#xff08;Project Settings->Artifacts->JAR->From modules with dependencies &#xff09; 弹出Create JAR from Modules&#…...

levelDB之基础数据结构-Slice

Slice是levelDB中用于操作字符串的数据结构&#xff0c;以字节为单位。 定义与实现 namespace leveldb {class LEVELDB_EXPORT Slice {public:// Create an empty slice.Slice() : data_(""), size_(0) {}// Create a slice that refers to d[0,n-1].Slice(const c…...

上位机模块之通用重写相机类

在常用的视觉上位机中&#xff0c;我们通常会使用单个上位机匹配多个相机或者多品牌相机&#xff0c;所以在此记录一个可重写的通用相机类&#xff0c;用于后续长期维护开发。 先上代码。 using HalconDotNet; using System.Collections.Generic;namespace WeldingInspection.M…...

机器人导航+OPENCV透视变换示例代码

透视变换又称四点变换&#xff0c;所以不能用于5边形这样的图形变换&#xff0c;不是真正的透视变换&#xff0c;但是这个方法可以把机器人看到的图像转换为俯视图&#xff0c;这样就可以建立地图&#xff0c;要不然怎么建立地图呢。 void CrelaxMyFriendDlg::OnBnClickedOk()…...

KofamScan-KEGG官方推荐的使用系同源和隐马尔可夫模型进行KO注释

文章目录 简介安装使用输入蛋白序列输出detail-tsv格式输出detail格式输出mapper格式 输出结果detail和detail-tsv格式mapper格式常用命令tmp目录 与emapper结果比较其他参数参考 简介 KofamScan 是一款基于 KEGG 直系同源和隐马尔可夫模型&#xff08;HMM&#xff09;的基因功…...

代码随想录算法训练营第五十五天丨 动态规划part16

583. 两个字符串的删除操作 思路 #动态规划一 本题和动态规划&#xff1a;115.不同的子序列 (opens new window)相比&#xff0c;其实就是两个字符串都可以删除了&#xff0c;情况虽说复杂一些&#xff0c;但整体思路是不变的。 这次是两个字符串可以相互删了&#xff0c;这…...

【Linux】kernel与应用消息队列的一种设计

Linux进程间通讯的方式有很多种&#xff0c;这里介绍一种通过消息队列的方式来实现kernel与APP之间的消息收发实现方式&#xff0c;这种方式特别适用于&#xff0c;kernel中发送消息&#xff0c;应用层收取消息。 消息队列设备驱动 该方法的设计思路即是创建一个消息队列的设…...

我们常说的网络资产,具体是如何定义的?

文章目录 什么叫网络资产&#xff1f;官方定义的网络资产网络资产数字化定义推荐阅读 什么叫网络资产&#xff1f; 通过百度查询搜索什么叫网络资产&#xff1f;大体上都将网络资产归类为计算机网络中的各类设备。 基本上会定义网络传输通信架构中用到的主机、网络设备、防火…...

WPF中可冻结对象

在WPF&#xff08;Windows Presentation Foundation&#xff09;中&#xff0c;"可冻结对象"指的是那些在创建之后可以被设置为不可更改状态的对象。这种特性允许这些对象更有效地被共享和复制&#xff0c;并且可以增加性能。 例如&#xff0c;Brushes&#xff0c;P…...

【人工智能实验】A*算法求解8数码问题 golang

人工智能经典问题八数码求解 实际上是将求解转为寻找最优节点的问题&#xff0c;算法流程如下&#xff1a; 求非0元素的逆序数的和&#xff0c;判断是否有解将开始状态放到节点集&#xff0c;并设置访问标识位为true从节点集中取出h(x)g(x)最小的节点判断取出的节点的状态是不…...

Kafka学习笔记(二)

目录 第3章 Kafka架构深入3.3 Kafka消费者3.3.1 消费方式3.3.2 分区分配策略3.3.3 offset的维护 3.4 Kafka高效读写数据3.5 Zookeeper在Kafka中的作用3.6 Kafka事务3.6.1 Producer事务3.6.2 Consumer事务&#xff08;精准一次性消费&#xff09; 第4章 Kafka API4.1 Producer A…...

Typora for Mac:打造全新文本编辑体验

Typora for Mac是一款与众不同的文本编辑器&#xff0c;它不仅拥有直观易用的界面&#xff0c;还融合了Markdown语法和富文本编辑的功能&#xff0c;为用户带来了前所未有的写作和编辑体验。 一、简洁明了的界面设计 Typora for Mac的界面简洁明了&#xff0c;让用户可以专注…...

TikTok与媒体素养:如何辨别虚假信息?

在当今数字时代&#xff0c;社交媒体平台如TikTok已经成为信息传播和社交互动的主要渠道之一。然而&#xff0c;随之而来的是虚假信息的泛滥&#xff0c;这对用户的媒体素养提出了严峻的挑战。本文将探讨TikTok平台上虚假信息的现象&#xff0c;以及如何提高媒体素养&#xff0…...

Spring Boot 中使用 ResourceLoader 加载资源的完整示例

ResourceLoader 是 Spring 框架中用于加载资源的接口。它定义了一系列用于获取资源的方法&#xff0c;可以处理各种资源&#xff0c;包括类路径资源、文件系统资源、URL 资源等。 以下是 ResourceLoader 接口的主要方法&#xff1a; Resource getResource(String location)&am…...

1688往微信小程序自营商城铺货商品采集API接口

一、背景介绍 随着移动互联网的快速发展&#xff0c;微信小程序作为一种新型的电商形态&#xff0c;正逐渐成为广大商家拓展销售渠道、提升品牌影响力的重要平台。然而&#xff0c;对于许多传统企业而言&#xff0c;如何将商品信息快速、准确地铺货到微信小程序自营商城是一个…...

QStatusBar开发详解

一、QStatusBar接口说明 QStatusBar 类是 Qt 中用于创建和管理状态栏的类。它继承自 QFrame 类&#xff0c;提供了在主窗口底部显示消息、进度等信息的功能。以下是一些 QStatusBar 类的重要接口&#xff1a; 1.1 QStatusBar构造函数 QStatusBar(QWidget *parent nullptr);…...

后端接口性能优化分析-程序结构优化

&#x1f44f;作者简介&#xff1a;大家好&#xff0c;我是爱吃芝士的土豆倪&#xff0c;24届校招生Java选手&#xff0c;很高兴认识大家&#x1f4d5;系列专栏&#xff1a;Spring源码、JUC源码&#x1f525;如果感觉博主的文章还不错的话&#xff0c;请&#x1f44d;三连支持&…...

【SpringBoot3+Vue3】三【实战篇】-后端(优化)

目录 一、登录优化-redis 1、SpringBoot集成redis 1.1 pom 1.2 yml 1.3 测试程序&#xff08;非必须&#xff09; 1.4 启动redis&#xff0c;执行测试程序 2、令牌主动失效&#xff08;代码优化&#xff09; 2.1 UserController设置token到redis 2.2 登录拦截器Log…...

DevExpress中文教程 - 如何在macOS和Linux (CTP)上创建、修改报表(上)

DevExpress Reporting是.NET Framework下功能完善的报表平台&#xff0c;它附带了易于使用的Visual Studio报表设计器和丰富的报表控件集&#xff0c;包括数据透视表、图表&#xff0c;因此您可以构建无与伦比、信息清晰的报表。 DevExpress Reports — 跨平台报表组件&#x…...

一个iOS tableView 滚动标题联动效果的实现

效果图 情景 tableview 是从屏幕顶部开始的&#xff0c;现在有导航栏&#xff0c;和栏目标题视图将tableView的顶部覆盖了 分析 我们为了达到滚动到某个分区选中标题的效果&#xff0c;就得知道 展示最顶部的cell或者区头在哪个分区范围内 所以我们必须首先获取顶部的位置 …...

代码执行相关函数以及简单例题

代码/命令 执行系列 相关函数 &#xff08;代码注入&#xff09;...

大数据爬虫分析基于Python+Django旅游大数据分析系统

欢迎大家点赞、收藏、关注、评论啦 &#xff0c;由于篇幅有限&#xff0c;只展示了部分核心代码。 文章目录 一项目简介 二、功能三、系统四. 总结 一项目简介 基于Python和Django的旅游大数据分析系统是一种使用Python编程语言和Django框架开发的系统&#xff0c;用于处理和分…...

C# 结构体介绍

文章目录 定义结构体实例化结构体结构体的值类型特性结构体和类的区别限制 C# 中的结构体&#xff08;Struct&#xff09;是一种值类型数据结构&#xff0c;用于封装不同或相同类型的数据成一个单一的实体。结构体非常适合用来表示轻量级的对象&#xff0c;比如坐标点、颜色值或…...

【机器学习】特征工程:特征预处理,归一化、标准化、处理缺失值

特征预处理采用的是特定的统计方法&#xff08;数学方法&#xff09;将数据转化为算法要求的数字 1. 数值型数据 归一化&#xff0c;将原始数据变换到[0,1]之间 标准化&#xff0c;数据转化到均值为0&#xff0c;方差为1的范围内 缺失值&#xff0c;缺失值处理成均值、中…...

Pytorch torch.norm函数详解用法

torch.norm参数定义 torch版本1.6 def norm(input, p"fro", dimNone, keepdimFalse, outNone, dtypeNone)input input (Tensor): the input tensor 输入为tensorp p (int, float, inf, -inf, fro, nuc, optional): the order of norm. Default: froThe following …...