当前位置：首页 > news >正文

【Qwen-Audio部署实战】Qwen-Audio-Chat模型之对话机器人部署测试

news 2026/4/8 11:01:30

系列篇章💥

No.	文章
1	【Qwen部署实战】探索Qwen-7B-Chat：阿里云大型语言模型的对话实践
2	【Qwen2部署实战】Qwen2初体验：用Transformers打造智能聊天机器人
3	【Qwen2部署实战】探索Qwen2-7B：通过FastApi框架实现API的部署与调用
4	【Qwen2部署实战】Ollama上的Qwen2-7B：一键部署大型语言模型指南
5	【Qwen2部署实战】llama.cpp：一键部署高效运行Qwen2-7B模型
6	【Qwen2部署实战】部署高效AI模型：使用vLLM进行Qwen2-7B模型推理
7	【AI大模型Agent探索】Qwen-Agent：基于Qwen的LLM应用开发框架
8	【AI大模型Agent探索】深入探索实践 Qwen-Agent 的 Function Calling
9	【AI大模型Agent探索】Qwen-Agent之RAG智能助手实践
10	【RAG检索增强生成】LlamaIndex与Qwen2的高效检索增强生成实践
11	【Qwen2微调实战】Lora微调Qwen2-7B-Instruct实践指南
12	【Qwen2微调实战】LLaMA-Factory框架对Qwen2-7B模型的微调实践
13	【Qwen-Audio部署实战】Qwen-Audio-Chat模型之FastApi部署实战
14	【Qwen-Audio部署实战】Qwen-Audio-Chat模型之对话机器人部署测试

系列篇章💥
引言
一、环境准备
二、安装依赖
- 1、升级pip并更换源
- 2、安装基础依赖包
- 3、安装特定工具包及版本
- 4、安装ffmpeg
三、模型下载
- 1、模型下载准备
- 2、模型下载执行
四、对话聊天机器人代码准备
五、对话聊天机器人运行实践
- 1、修改默认端口
- 2、启动运行web chat机器人
- 3、端口代理映射
- 4、访问web聊天对话界面
- 5、普通对话聊天
- 6、音频文件识别
结语

引言

在自然语言处理的浩瀚星海中，Qwen-Audio-Chat 模型以其卓越的性能和创新性，犹如一颗冉冉升起的新星，照亮了智能对话技术的前行之路。它不仅代表着对话系统的前沿发展，更是为实现自然、流畅且富有洞察力的交流体验提供了坚实的技术基础。本文将带领读者深入探讨 Qwen-Audio-Chat 模型的部署与测试流程，揭示其背后的技术奥秘，共同踏上这段充满挑战与创新的技术探索之旅。我们将重点介绍如何在 web 端构建并测试一个基于 Qwen-Audio-Chat 模型的对话机器人。

一、环境准备

在开始我们的技术之旅之前，确保拥有一个稳定而强大的运行环境是至关重要的。为此，可以在 autodl 平台上租赁一台性能卓越的服务器，该服务器应配备至少 24GB 的显存，例如 NVIDIA 的 RTX 3090 显卡，以满足模型训练和推理过程中对计算资源的高需求。

在镜像的选择上，我们建议采用 PyTorch-2.0.0 作为基础框架，并搭配 Python 3.8 环境（基于 Ubuntu 20.04 系统），同时推荐使用 CUDA 11.8 版本（至少 11.3 版本）以确保与 PyTorch 的兼容性和性能优化。完成服务器的租赁后，您可以通过 JupyterLab 图形界面快速访问服务器，并在其终端中进行环境配置、模型下载以及运行演示等关键步骤。
在这里插入图片描述

二、安装依赖

在终端中，我们需要执行一系列命令来完成 pip 换源以及相关依赖包的安装。为了确保顺利完成这些步骤，请按照以下指令操作。

1、升级pip并更换源

# 升级pip
python -m pip install --upgrade pip
# 更换 pypi 源加速库的安装
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple

2、安装基础依赖包

# 安装常用的科学计算和机器学习库
pip install scipy torchvision pillow tensorboard matplotlib

3、安装特定工具包及版本

# 安装模型管理和优化相关的包
pip install modelscope==1.9.5 accelerate tiktoken einops transformers_stream_generator==0.0.4
# 安装较新版本的Transformers 和 gradio 库以支持AI大模型的部署
pip install transformers==4.32.0 gradio==3.39.0 nest_asyncio

4、安装ffmpeg

打开终端，输入以下命令安装ffmpeg：

sudo apt update
sudo apt install ffmpeg

通过以上步骤，您可以顺利更新pip、更换为更快的软件源，并安装所需的Python包和库，为您的Python开发环境做好准备。

三、模型下载

1、模型下载准备

使用 modelscope 中的snapshot_download函数下载模型，第一个参数为模型名称，参数cache_dir为模型的下载路径。
在 /root/autodl-tmp 路径下新建 d.py 文件并在其中输入以下内容

import torchfrom modelscope import snapshot_download, AutoModel, AutoTokenizerfrom modelscope import GenerationConfigmodel_dir = snapshot_download('qwen/Qwen-Audio-Chat', cache_dir='/root/autodl-tmp', revision='master')

在这里插入图片描述

2、模型下载执行

运行 python /root/autodl-tmp/d.py 执行下载，模型大小为 20 GB，下载模型大概需要10~20分钟
在这里插入图片描述

四、对话聊天机器人代码准备

在/root/autodl-tmp路径下新建web_demo_audio.py 文件并在其中输入以下内容：

# Copyright (c) Alibaba Cloud.
#
# This source code is licensed under the license found in the
# LICENSE file in the root directory of this source tree."""A simple web interactive chat demo based on gradio."""from argparse import ArgumentParser
from pathlib import Pathimport copy
import gradio as gr
import os
import re
import secrets
import tempfile
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig
from pydub import AudioSegment#DEFAULT_CKPT_PATH = 'Qwen/Qwen-Audio-Chat'
DEFAULT_CKPT_PATH = "/root/autodl-tmp/qwen/Qwen-Audio-Chat"def _get_args():parser = ArgumentParser()parser.add_argument("-c", "--checkpoint-path", type=str, default=DEFAULT_CKPT_PATH,help="Checkpoint name or path, default to %(default)r")parser.add_argument("--cpu-only", action="store_true", help="Run demo with CPU only")parser.add_argument("--share", action="store_true", default=False,help="Create a publicly shareable link for the interface.")parser.add_argument("--inbrowser", action="store_true", default=False,help="Automatically launch the interface in a new tab on the default browser.")parser.add_argument("--server-port", type=int, default=6006,help="Demo server port.")parser.add_argument("--server-name", type=str, default="127.0.0.1",help="Demo server name.")args = parser.parse_args()return argsdef _load_model_tokenizer(args):tokenizer = AutoTokenizer.from_pretrained(args.checkpoint_path, trust_remote_code=True, resume_download=True,)if args.cpu_only:device_map = "cpu"else:device_map = "cuda"model = AutoModelForCausalLM.from_pretrained(args.checkpoint_path,device_map=device_map,trust_remote_code=True,resume_download=True,).eval()model.generation_config = GenerationConfig.from_pretrained(args.checkpoint_path, trust_remote_code=True, resume_download=True,)return model, tokenizerdef _parse_text(text):lines = text.split("\n")lines = [line for line in lines if line != ""]count = 0for i, line in enumerate(lines):if "```" in line:count += 1items = line.split("`")if count % 2 == 1:lines[i] = f'<pre><code class="language-{items[-1]}">'else:lines[i] = f"<br></code></pre>"else:if i > 0:if count % 2 == 1:line = line.replace("`", r"\`")line = line.replace("<", "&lt;")line = line.replace(">", "&gt;")line = line.replace(" ", "&nbsp;")line = line.replace("*", "&ast;")line = line.replace("_", "&lowbar;")line = line.replace("-", "&#45;")line = line.replace(".", "&#46;")line = line.replace("!", "&#33;")line = line.replace("(", "&#40;")line = line.replace(")", "&#41;")line = line.replace("$", "&#36;")lines[i] = "<br>" + linetext = "".join(lines)return textdef _launch_demo(args, model, tokenizer):uploaded_file_dir = os.environ.get("GRADIO_TEMP_DIR") or str(Path(tempfile.gettempdir()) / "gradio")def predict(_chatbot, task_history):query = task_history[-1][0]print("User: " + _parse_text(query))history_cp = copy.deepcopy(task_history)full_response = ""history_filter = []audio_idx = 1pre = ""global last_audiofor i, (q, a) in enumerate(history_cp):if isinstance(q, (tuple, list)):last_audio = q[0]q = f'Audio {audio_idx}: <audio>{q[0]}</audio>'pre += q + '\n'audio_idx += 1else:pre += qhistory_filter.append((pre, a))pre = ""history, message = history_filter[:-1], history_filter[-1][0]response, history = model.chat(tokenizer, message, history=history)ts_pattern = r"<\|\d{1,2}\.\d+\|>"all_time_stamps = re.findall(ts_pattern, response)print(response)if (len(all_time_stamps) > 0) and (len(all_time_stamps) % 2 ==0) and last_audio:ts_float = [ float(t.replace("<|","").replace("|>","")) for t in all_time_stamps]ts_float_pair = [ts_float[i:i + 2] for i in range(0,len(all_time_stamps),2)]# 读取音频文件format = os.path.splitext(last_audio)[-1].replace(".","")audio_file = AudioSegment.from_file(last_audio, format=format)chat_response_t = response.replace("<|", "").replace("|>", "")chat_response = chat_response_ttemp_dir = secrets.token_hex(20)temp_dir = Path(uploaded_file_dir) / temp_dirtemp_dir.mkdir(exist_ok=True, parents=True)# 截取音频文件for pair in ts_float_pair:audio_clip = audio_file[pair[0] * 1000: pair[1] * 1000]# 保存音频文件name = f"tmp{secrets.token_hex(5)}.{format}"filename = temp_dir / nameaudio_clip.export(filename, format=format)_chatbot[-1] = (_parse_text(query), chat_response)_chatbot.append((None, (str(filename),)))else:_chatbot[-1] = (_parse_text(query), response)full_response = _parse_text(response)task_history[-1] = (query, full_response)print("Qwen-Audio-Chat: " + _parse_text(full_response))return _chatbotdef regenerate(_chatbot, task_history):if not task_history:return _chatbotitem = task_history[-1]if item[1] is None:return _chatbottask_history[-1] = (item[0], None)chatbot_item = _chatbot.pop(-1)if chatbot_item[0] is None:_chatbot[-1] = (_chatbot[-1][0], None)else:_chatbot.append((chatbot_item[0], None))return predict(_chatbot, task_history)def add_text(history, task_history, text):history = history + [(_parse_text(text), None)]task_history = task_history + [(text, None)]return history, task_history, ""def add_file(history, task_history, file):history = history + [((file.name,), None)]task_history = task_history + [((file.name,), None)]return history, task_historydef add_mic(history, task_history, file):if file is None:return history, task_historyos.rename(file, file + '.wav')print("add_mic file:", file)print("add_mic history:", history)print("add_mic task_history:", task_history)# history = history + [((file.name,), None)]# task_history = task_history + [((file.name,), None)]task_history = task_history + [((file + '.wav',), None)]history = history + [((file + '.wav',), None)]print("task_history", task_history)return history, task_historydef reset_user_input():return gr.update(value="")def reset_state(task_history):task_history.clear()return []with gr.Blocks() as demo:gr.Markdown("""\
<p align="center"><img src="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Audio/logo.jpg" style="height: 80px"/><p>""")  ## todogr.Markdown("""<center><font size=8>Qwen-Audio-Chat Bot</center>""")gr.Markdown("""\
<center><font size=3>This WebUI is based on Qwen-Audio-Chat, developed by Alibaba Cloud. \
(本WebUI基于Qwen-Audio-Chat打造，实现聊天机器人功能。)</center>""")gr.Markdown("""\
<center><font size=4>Qwen-Audio <a href="https://modelscope.cn/models/qwen/Qwen-Audio/summary">🤖 </a> 
| <a href="https://huggingface.co/Qwen/Qwen-Audio">🤗</a>&nbsp ｜ 
Qwen-Audio-Chat <a href="https://modelscope.cn/models/qwen/Qwen-Audio-Chat/summary">🤖 </a> | 
<a href="https://huggingface.co/Qwen/Qwen-Audio-Chat">🤗</a>&nbsp ｜ 
&nbsp<a href="https://github.com/QwenLM/Qwen-Audio">Github</a></center>""")chatbot = gr.Chatbot(label='Qwen-Audio-Chat', elem_classes="control-height", height=750)query = gr.Textbox(lines=2, label='Input')task_history = gr.State([])mic = gr.Audio(source="microphone", type="filepath")with gr.Row():empty_bin = gr.Button("🧹 Clear History (清除历史)")submit_btn = gr.Button("🚀 Submit (发送)")regen_btn = gr.Button("🤔️ Regenerate (重试)")addfile_btn = gr.UploadButton("📁 Upload (上传文件)", file_types=["audio"])mic.change(add_mic, [chatbot, task_history, mic], [chatbot, task_history])submit_btn.click(add_text, [chatbot, task_history, query], [chatbot, task_history]).then(predict, [chatbot, task_history], [chatbot], show_progress=True)submit_btn.click(reset_user_input, [], [query])empty_bin.click(reset_state, [task_history], [chatbot], show_progress=True)regen_btn.click(regenerate, [chatbot, task_history], [chatbot], show_progress=True)addfile_btn.upload(add_file, [chatbot, task_history, addfile_btn], [chatbot, task_history], show_progress=True)gr.Markdown("""\
<font size=2>Note: This demo is governed by the original license of Qwen-Audio. \
We strongly advise users not to knowingly generate or allow others to knowingly generate harmful content, \
including hate speech, violence, pornography, deception, etc. \
(注：本演示受Qwen-Audio的许可协议限制。我们强烈建议，用户不应传播及不应允许他人传播以下内容，\
包括但不限于仇恨言论、暴力、色情、欺诈相关的有害信息。)""")demo.queue().launch(share=args.share,inbrowser=args.inbrowser,server_port=args.server_port,server_name=args.server_name,file_directories=["/tmp/"])def main():args = _get_args()model, tokenizer = _load_model_tokenizer(args)_launch_demo(args, model, tokenizer)if __name__ == '__main__':main()

五、对话聊天机器人运行实践

1、修改默认端口

注意下面代码中默认端口的设置，修改为6006
在这里插入图片描述

2、启动运行web chat机器人

执行以下命令启动对话聊天机器人

python /root/autodl-tmp/web_demo_audio.py

启动成功如下：
在这里插入图片描述

3、端口代理映射

使用autoDL SSH隧道工具代理端口
在这里插入图片描述

4、访问web聊天对话界面

在浏览器中打开链接 http://localhost:6006/ ，即可看到聊天界面。
在这里插入图片描述

5、普通对话聊天