当前位置：首页 > news >正文

利用langchain 做大模型 Few-shot Learning 提示，包括固定和向量相似的动态样本筛选

news 文章来源：https://blog.csdn.net/sjxgghg/article/details/140855919 2025/4/18 22:37:12

文章目录

- few-shot
- Fixed Examples 固定样本
- Dynamic few-shot prompting 动态样本提示
- 辅助
- 参考资料

few-shot

相比大模型微调，在有些情况下，我们更想使用 Few-shot Learning 通过给模型喂相关样本示例，让模型能够提升相应任务的能力。

固定样本提示 VS 动态样本提示：

固定样本提示：每次都用同样的样本提示去推理；
动态样本提示：根据当前要推理的样本，基于向量相似度算法，在训练集中找出相似的样本作为提示去推理。

Few-shot Learning (少样本提示学习)：

定义：Few-shot learning 是通过给模型提供少量示例（例如 1-5 个）来进行任务的学习方式。这些示例通常包括输入和相应的输出。
实现方式：在大多数情况下，few-shot learning 是在模型的输入中直接包含这些示例作为提示。这意味着模型本身没有经过任何额外的训练或调整。
优点：可以快速适应新任务，无需额外的训练时间和资源。

项目开源地址：
https://github.com/JieShenAI/csdn/blob/main/24/07/few_shot_prompt/langchain_fewshot.ipynb

Fixed Examples 固定样本

以聊天模型为例，

from langchain import PromptTemplate, FewShotPromptTemplate
from langchain_openai import ChatOpenAIparser = StrOutputParser()model = ChatOpenAI(model="gpt-4o-mini")

from langchain_core.prompts import ChatPromptTemplate, FewShotChatMessagePromptTemplateexamples = [{"input": "2 🦜 2", "output": "4"},{"input": "2 🦜 3", "output": "5"},
]

🦜 代表加法。想让大模型根据给出的例子学会🦜 代表加法。

# This is a prompt template used to format each individual example.
example_prompt = ChatPromptTemplate.from_messages([("human", "{input}"),("ai", "{output}"),]
)
few_shot_prompt = FewShotChatMessagePromptTemplate(example_prompt=example_prompt,examples=examples,
)

few_shot_prompt.invoke({}).messages

Output:

[HumanMessage(content='2 🦜 2'),AIMessage(content='4'),HumanMessage(content='2 🦜 3'),AIMessage(content='5')]

few_shot_prompt.format()

Output:

'Human: 2 🦜 2\nAI: 4\nHuman: 2 🦜 3\nAI: 5'

final_prompt = ChatPromptTemplate.from_messages([("system", "You are a wondrous wizard of math."),few_shot_prompt,("human", "{input}"),]
)
# chain = model | final_prompt
chain = final_prompt | modelchain.invoke({"input": "What's 3 🦜 3?"})

Output:

AIMessage(content='Based on the previous pattern, the 🦜 operation appears to be addition. Therefore:\n\n\\[ 3 🦜 3 = 3 + 3 = 6 \\]', response_metadata={'token_usage': {'completion_tokens': 37, 'prompt_tokens': 30, 'total_tokens': 67}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': '', 'finish_reason': 'stop', 'logprobs': None}, id='run-xxx', usage_metadata={'input_tokens': 30, 'output_tokens': 37, 'total_tokens': 67})

如上模型的输出结果所示，模型已经能够学到🦜是加法，并返回 3 🦜 3 = 3 + 3 = 6 。

Dynamic few-shot prompting 动态样本提示

为什么要有一个动态的 few-shot 呢？

在上一节 Fixed Examples中，无论输入什么问题，都只使用固定的例子作为提示。

动态例子提示是：针对不同的问题，使用不同的例子进行提示。目的是为了提高模型的性能。

如果你想评估动态few-shot的效果，那么便逐个遍历测试集的样本数据，根据测试集的样本使用向量相似度算法从训练集中拿到最相似的几个样本，再去做 few-shot prompting。

我们考虑在下一篇文章，为大家评估动态few-shot的效果。当前文章只是教学文章，不想整的太复杂。

在前一个章节中使用：
ChatPromptTemplate 和FewShotChatMessagePromptTemplate，

在本章节中使用：
PromptTemplate 和 FewShotPromptTemplate

上述一一对应，不能混用。

from langchain_core.prompts import PromptTemplateexample_prompt = PromptTemplate.from_template("Question: {question}\n{answer}")

下述代码展示了 example_prompt 使用效果：

print(example_prompt.invoke(qa_examples[0]).text)

Output:

Question: Who lived longer, Muhammad Ali or Alan Turing?Are follow up questions needed here: Yes.Follow up: How old was Muhammad Ali when he died?Intermediate answer: Muhammad Ali was 74 years old when he died.Follow up: How old was Alan Turing when he died?Intermediate answer: Alan Turing was 41 years old when he died.So the final answer is: Muhammad Ali

下述的 qa_examples 是一个训练集，供模型推理时，在其中选择向量最相似的样本。

qa_examples = [{"question": "Who lived longer, Muhammad Ali or Alan Turing?","answer": """Are follow up questions needed here: Yes.Follow up: How old was Muhammad Ali when he died?Intermediate answer: Muhammad Ali was 74 years old when he died.Follow up: How old was Alan Turing when he died?Intermediate answer: Alan Turing was 41 years old when he died.So the final answer is: Muhammad Ali""",},{"question": "When was the founder of craigslist born?","answer": """Are follow up questions needed here: Yes.Follow up: Who was the founder of craigslist?Intermediate answer: Craigslist was founded by Craig Newmark.Follow up: When was Craig Newmark born?Intermediate answer: Craig Newmark was born on December 6, 1952.So the final answer is: December 6, 1952""",},{"question": "Who was the maternal grandfather of George Washington?","answer": """Are follow up questions needed here: Yes.Follow up: Who was the mother of George Washington?Intermediate answer: The mother of George Washington was Mary Ball Washington.Follow up: Who was the father of Mary Ball Washington?Intermediate answer: The father of Mary Ball Washington was Joseph Ball.So the final answer is: Joseph Ball""",},{"question": "Are both the directors of Jaws and Casino Royale from the same country?","answer": """Are follow up questions needed here: Yes.Follow up: Who is the director of Jaws?Intermediate Answer: The director of Jaws is Steven Spielberg.Follow up: Where is Steven Spielberg from?Intermediate Answer: The United States.Follow up: Who is the director of Casino Royale?Intermediate Answer: The director of Casino Royale is Martin Campbell.Follow up: Where is Martin Campbell from?Intermediate Answer: New Zealand.So the final answer is: No""",},
]

example_prompt 作为参数放入到 FewShotPromptTemplate 模版中，实现对 qa_examples中的数据进行封装。

from langchain_core.prompts import FewShotPromptTemplateprompt = FewShotPromptTemplate(examples=qa_examples,example_prompt=example_prompt,# prefix="You are a helpful assistant.",suffix="Question: {input}",input_variables=["input"],)print(prompt.invoke({"input": "Who was the father of Mary Ball Washington?"}).to_string()
)

这里是不使用向量筛选器prompt。若调用 invoke 方法，FewShotPromptTemplate会把qa_examples中所有的样本都封装好作为上下文。

Output:

Question: Who lived longer, Muhammad Ali or Alan Turing?Are follow up questions needed here: Yes.Follow up: How old was Muhammad Ali when he died?Intermediate answer: Muhammad Ali was 74 years old when he died.Follow up: How old was Alan Turing when he died?Intermediate answer: Alan Turing was 41 years old when he died.So the final answer is: Muhammad Ali......
Question: Who was the father of Mary Ball Washington?

使用编码模型构建向量筛选器，将qa_examples经过编码后，保存到 Chroma 向量数据库中。

from langchain_chroma import Chroma
from langchain_core.example_selectors import SemanticSimilarityExampleSelector
from langchain_openai import OpenAIEmbeddingsexample_selector = SemanticSimilarityExampleSelector.from_examples(# This is the list of examples available to select from.qa_examples,# This is the embedding class used to produce embeddings which are used to measure semantic similarity.OpenAIEmbeddings(),# This is the VectorStore class that is used to store the embeddings and do a similarity search over.Chroma,# This is the number of examples to produce.k=1,
)

使用 example_selector 根据用户输入的问题，找一个最相似的样本出来：

# Select the most similar example to the input.
question = "Who was the father of Mary Ball Washington?"
selected_examples = example_selector.select_examples({"question": question})
print(f"Examples most similar to the input: {question}")
for example in selected_examples:print("\n")print('【')for k, v in example.items():print(f"{k}: {v}")print('】')

Output:

Examples most similar to the input: Who was the father of Mary Ball Washington?【
answer: Are follow up questions needed here: Yes.Follow up: Who was the mother of George Washington?Intermediate answer: The mother of George Washington was Mary Ball Washington.Follow up: Who was the father of Mary Ball Washington?Intermediate answer: The father of Mary Ball Washington was Joseph Ball.So the final answer is: Joseph Ballquestion: Who was the maternal grandfather of George Washington?
】

使用向量选择器example_selector和提示词封装器example_prompt，构建最终的prompt。

同时可以在 FewShotPromptTemplate 添加后缀和前缀。一般前缀用来添加系统提示词，后缀用来添加问题。


prompt = FewShotPromptTemplate(example_selector=example_selector,example_prompt=example_prompt,# prefix="You are a helpful assistant.",suffix="Question: {input}",input_variables=["input"],
)print(prompt.invoke({"input": "Who was the father of Mary Ball Washington?"}).to_string()
)

Output:

Question: Who was the maternal grandfather of George Washington?Are follow up questions needed here: Yes.Follow up: Who was the mother of George Washington?Intermediate answer: The mother of George Washington was Mary Ball Washington.Follow up: Who was the father of Mary Ball Washington?Intermediate answer: The father of Mary Ball Washington was Joseph Ball.So the final answer is: Joseph BallQuestion: Who was the father of Mary Ball Washington?

chain = prompt | model
chain.invoke({"input": "Who was the father of Mary Ball Washington?"})

Output:

AIMessage(content='The father of Mary Ball Washington was Joseph Ball.', response_metadata={'token_usage': {'completion_tokens': 10, 'prompt_tokens': 103, 'total_tokens': 113}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_0f03d4f0ee', 'finish_reason': 'stop', 'logprobs': None}, id='run-ae96f9c7-ac89-47ba-8074-69197b89bef5-0', usage_metadata={'input_tokens': 103, 'output_tokens': 10, 'total_tokens': 113})

辅助

与huggingface 通过代理连接

import os
os.environ['HTTP_PROXY'] = 'http://127.0.0.1:7890'
os.environ['HTTPS_PROXY'] = 'http://127.0.0.1:7890'

参考资料

下述是2个langchain的官方说明文档，均写的很不错：

https://python.langchain.com/v0.2/docs/how_to/few_shot_examples_chat/ How to use few shot examples in chat models
https://python.langchain.com/v0.2/docs/how_to/few_shot_examples/#pass-the-examples-and-formatter-to-fewshotprompttemplate How to use few shot examples

利用langchain 做大模型 Few-shot Learning 提示，包括固定和向量相似的动态样本筛选

文章目录 few-shotFixed Examples 固定样本Dynamic few-shot prompting 动态样本提示辅助参考资料 few-shot 相比大模型微调，在有些情况下，我们更想使用 Few-shot Learning 通过给模型喂相关样本示例，让模型能够提升相应任务的能力。固定样…...

编程日记 2024/8/2 3:22:14

基于python的百度迁徙迁入、迁出数据分析（五）

终于在第五篇文章我们进入了这个系列的正题：数据分析这里我选择上海2024年5月1日——5月5日的迁入、迁出数据作为分析的基础，首先选择节假日的数据作为分析的原因呢，主要是节假日人们出行目的比较单一（出游、探亲）&a…...

编程日记 2024/8/2 3:21:12

SpringBoot 如何处理跨域请求

SpringBoot 处理跨域请求，通常是通过配置全局的 CORS（跨源资源共享）策略来实现的。CORS 是一种机制，它使用额外的 HTTP 头部来告诉浏览器，让运行在一个 origin (domain) 上的 web 应用被准许访问来自不同源服务器上的指…...

编程日记 2024/8/2 3:20:09

大数据技术基础编程、实验和案例----大数据课程综合实验案例

一、实验目的 (1）熟悉Linux系统、MySQL、Hadoop、HBase、Hive、Sqoop、R、Eclipse等系统和软件的安装和使用； (2）了解大数据处理的基本流程； (3）熟悉数据预处理方法； (4）熟悉在不同类型数据库之…...

编程日记 2024/8/2 3:17:05

微信小程序-获取手机号：HttpClientErrorException: 412 Precondition Failed: [no body]

问题： 412 异常就是你的请求参数获取请求头与服务器的不符，缺少请求体！ 我的问题： 我这里获取微信手机号的时候突然给我报错142，但是代码用的是原来的代码，换了一个框架就噶了！ 排查问题&am…...

编程日记 2024/8/2 3:16:04

大数据核心概念与技术架构简介

大数据基本概念大数据是指无法在一定时间范围内用常规软件工具进行捕捉、管理和处理的数据集合，是具有更强的决策力、洞察发现力和流程优化能力的海量、高增长率和多样化的信息资产。大数据特征： 数据量大：一般以P（1000个TB&a…...

编程日记 2024/8/2 3:15:03

原题 Whos in the Middle FJ is surveying his herd to find the most average cow. He wants to know how much milk this median cow gives: half of the cows give as much or more than the median; half give as much or less. FJ正在调查他的牛群，以找到最…...

编程日记 2024/8/2 3:14:02

ORA-00911: invalid character

场景： 调用接口查询oracle的数据库数据时报错ORA-00911: invalid character，但是sql语句没有问题放在navicat控制台中运行也没有问题，但是代码中跑就会报无效字符集分析： 代码中Oracle的语法解析器比较严格，比如句…...

编程日记 2024/8/2 3:12:00

Pytorch实现线性回归Linear Regression

借助 PyTorch 实现深度神经网络 - 线性回归 - 第 2 周 | Coursera 线性回归预测用PyTorch实现线性回归模块创建自定义模块（内含一个线性回归） 训练线性回归模型对于线性回归，特定类型的噪声是高斯噪声平均损失均方误差函数&#xff1a…...

编程日记 2024/8/2 3:10:59

十八次（虚拟主机与vue项目、samba磁盘映射、nfs共享）

1、虚拟主机搭建环境准备将原有的nginx.conf文件备份 [rootserver ~]# cp /usr/local/nginx/conf/nginx.conf /usr/local/nginx/conf/nginx.conf.bak[rootserver ~]# grep -Ev "#|^$" /usr/local/nginx/conf/nginx.conf[rootserver ~]# grep -Ev "#|^$"…...

编程日记 2024/8/2 3:08:57

P1340 兽径管理题解|最小生成树

题目大意洛谷中链接推荐文章：并查集入门原文约翰农场的牛群希望能够在 N N N 个草地之间任意移动。草地的编号由 1 1 1 到 N N N。草地之间有树林隔开。牛群希望能够选择草地间的路径，使牛群能够从任一片草地移动到任一片其它草地。牛群可在…...

编程日记 2024/8/2 3:02:52

Python，Maskrcnn训练，cannot import name ‘saving‘ from ‘keras.engine‘ ，等问题集合

Python版本3.9，tensorflow2.11.0，keras2.11.0 问题一、module keras.engine has no attribute Layer Traceback (most recent call last):File "C:\Users\Administrator\Desktop\20240801\代码\test.py", line 16, in <module>from mrc…...

编程日记 2024/8/2 3:00:50

Linux常用工具

文章目录 tar打包命令详解unzip命令：解压zip文件vim操作详解netstat详解df命令详解ps命令详解find命令详解 tar打包命令详解 tar命令做打包操作当 tar 命令用于打包操作时，该命令的基本格式为： tar [选项] 源文件或目录此命令常用的选项及…...

编程日记 2024/8/2 2:59:49

AI未来的发展如何

AI（人工智能）的发展前景非常广阔，随着技术的不断进步和应用场景的不断拓展，AI将在多个领域发挥重要作用。以下是对AI发展前景的详细分析： 一、技术突破与创新生成式AI的兴起：以ChatGPT为代表的生成式AI技…...

编程日记 2024/8/2 2:58:48

若依替换首页上的logo

...

编程日记 2024/8/2 2:54:43

sed的使用示例

场景:使用sed将多个空格变成单空格,再使用cut来切分得到需要的结果得到后面这个文件名: ls ./ drwxr-x— 2 root root 6 Jul 18 9:00 7b40f1412d83c1524af7977593607f15 drwxr-x— 2 root root 6 Jul 18 14:00 50af29cef2c65a9d28905a3ce831bcb7 drwxr-x— 2 root root 6 Jul…...

编程日记 2024/8/2 2:53:42

学历不是障碍：大专生如何成功进入软件测试行业

摘要： 在当今技术驱动的职场环境中，软件测试已成为一个关键的职业领域。尽管许多人认为高学历是进入这一行业的先决条件，但实际上，大专学历的学生同样有机会在软件测试领域取得成功。本文将探讨大专生如何通过技能提升、实践经验和…...

编程日记 2024/8/2 2:52:41

文件解析漏洞—IIS解析漏洞—IIS6.X

目录方式 1：目录解析方式 2：畸形文件解析方式 3：PUT 上传漏洞（123.asp;.jpg 解析成 asp） 环境：Windows server 2003 添加 IIS 管理工具——打开 IIS——添加网站创建完成之后，右击创建的…...

编程日记 2024/8/2 2:51:40

Sqlmap中文使用手册 - Brute force模块参数使用

目录 1. Brute force模块的帮助文档2. 各个参数的介绍2.1 --common-tables2.2 --common-columns2.3 --common-files 1. Brute force模块的帮助文档 Brute force:These options can be used to run brute force checks--common-tables Check existence of common tables--c…...

编程日记 2024/8/2 2:49:38

ubuntu20.04 开源鸿蒙源码编译配置

替换华为源 sudo sed -i "shttp://.*archive.ubuntu.comhttp://repo.huaweicloud.comg" /etc/apt/sources.list && sudo sed -i "shttp://.*security.ubuntu.comhttp://repo.huaweicloud.comg" /etc/apt/sources.list 安装依赖工具如果是ubun…...

编程日记 2024/8/2 2:48:37

程序员面试 “八股文”在实际工作中是助力、阻力还是空谈？

“八股文”在实际工作中是助力、阻力还是空谈？ 作为现在各类大中小企业面试程序员时的必问内容，“八股文”似乎是很重要的存在。但“八股文”是否能在实际工作中发挥它“敲门砖”应有的作用呢？有IT人士不禁发出疑问：程序员面试考…...

编程日记 2024/8/2 2:47:36

广告从用户点击开始到最终扣费的过程

用户点击广告用户在网页或移动应用上看到广告，并点击广告。这一事件触发了整个广告处理流程。广告请求触发用户点击广告后，客户端（如浏览器、APP）向广告系统发送广告点击请求。请求通常包含以下信息： 用户ID 设备信…...

编程日记 2024/8/2 2:46:35

Linux系统编程-信号进程间通信

目录异步（Asynchronous） 信号数据结构 1.kill 2.alarm 3.pause 4.setitimer 5.abort 信号集(sigset_t类型) 1.sigemptyset 2.sigfillset 3.sigaddset 4.sigdelset 5.sigismember 信号屏蔽 1.sigprocmask 2.sigpending 3.sigsus…...

编程日记 2024/8/2 2:45:34

Attention Module (SAM)是什么？

SAM（Spatial Attention Module，空间注意力模块）是一种在神经网络中应用的注意力机制，特别是在处理图像数据时，它能够帮助模型更好地关注输入数据中不同空间位置的重要性。以下是关于SAM的详细解释： 1. 基本…...

编程日记 2024/8/2 2:44:32

【C语言】堆排序

堆排序即利用堆的思想来进行排序，总共分为两个步骤： 1. 建堆升序：建大堆降序：建小堆原因分析： 若升序建小堆时间复杂度是O(N^2) 升序建大堆，时间复杂度O（N*logN） 所以升序建大堆…...

编程日记 2024/8/2 2:41:29

ntp服务重启报错Failed to restart ntpd.service: Unit is masked.

问题概述： 重启ntp服务报错Failed to restart ntpd.service: Unit is masked，使用systemctl unmask ntpd.service命令关闭屏蔽还是报错Failed to restart ntpd.service: Unit is masked 解决方法： 重装ntp服务 yum remove ntpyum install…...

编程日记 2024/8/2 2:39:27

面试题-每日5到

16.Files的常用方法都有哪些？ Files.exists():检测文件路径是否存在 Files.createFile():创建文件 Files.createDirectory():创建文件夹 Files.delete():删除一个文件或目录 Files.copy():复制文件 Files.move():移动文件 Files.size():查看文件个数 Files.read():读…...

编程日记 2024/8/2 2:38:26

代码美学大师：打造Perl中的个性化代码格式化工具

代码美学大师：打造Perl中的个性化代码格式化工具在软件开发过程中，代码的可读性至关重要。Perl，作为一种灵活的脚本语言，允许开发者以多种方式实现代码格式化。自定义代码格式化工具不仅能提升代码质量，还能加强团队…...

编程日记 2024/8/2 2:37:25

成为一名月薪 2 万的 web 安全工程师需要掌握哪些技能？

现在 web 安全工程师比较火，岗位比较稀缺，现在除了一些大公司对学历要求严格，其余公司看中的大部分是能力。有个亲戚的儿子已经工作 2 年了……当初也是因为其他的行业要求比较高，所以才选择的 web 安全方向。资料免费分享给你…...

编程日记 2024/8/2 2:36:24

Linux中如何添加磁盘分区

在Linux中添加分区通常涉及到几个步骤，包括识别磁盘、创建分区、格式化分区，以及挂载或将其用作特定的文件系统类型（如LVM、RAID等）。以下是一个基本的步骤指南，假设你正在使用命令行界面（CLI）和…...

编程日记 2024/8/2 2:35:23

利用langchain 做大模型 Few-shot Learning 提示，包括固定和向量相似的动态样本筛选

文章目录

few-shot

Fixed Examples 固定样本

Dynamic few-shot prompting 动态样本提示

辅助

参考资料

相关文章：

利用langchain 做大模型 Few-shot Learning 提示，包括固定和向量相似的动态样本筛选

基于python的百度迁徙迁入、迁出数据分析（五）

SpringBoot 如何处理跨域请求

大数据技术基础编程、实验和案例----大数据课程综合实验案例

微信小程序-获取手机号：HttpClientErrorException: 412 Precondition Failed: [no body]

大数据核心概念与技术架构简介

快排谁在中间

ORA-00911: invalid character

Pytorch实现线性回归Linear Regression

十八次（虚拟主机与vue项目、samba磁盘映射、nfs共享）

P1340 兽径管理题解|最小生成树

Python，Maskrcnn训练，cannot import name ‘saving‘ from ‘keras.engine‘ ，等问题集合

Linux常用工具

AI未来的发展如何

若依替换首页上的logo

sed的使用示例

学历不是障碍：大专生如何成功进入软件测试行业

文件解析漏洞—IIS解析漏洞—IIS6.X

Sqlmap中文使用手册 - Brute force模块参数使用

ubuntu20.04 开源鸿蒙源码编译配置

程序员面试 “八股文”在实际工作中是助力、阻力还是空谈？

广告从用户点击开始到最终扣费的过程

Linux系统编程-信号进程间通信

Attention Module (SAM)是什么？

【C语言】堆排序

ntp服务重启报错Failed to restart ntpd.service: Unit is masked.

面试题-每日5到

代码美学大师：打造Perl中的个性化代码格式化工具

成为一名月薪 2 万的 web 安全工程师需要掌握哪些技能？

Linux中如何添加磁盘分区