当前位置: 首页 > news >正文

Imagine Flash、StyleMamba 、FlexControl、Multi-Scene T2V、TexControl

本文首发于公众号:机器感知

Imagine Flash、StyleMamba 、FlexControl、Multi-Scene T2V、TexControl

图片

You Only Cache Once: Decoder-Decoder Architectures for Language Models

图片

We introduce a decoder-decoder architecture, YOCO, for large language models, which only caches key-value pairs once. It consists of two components, i.e., a cross-decoder stacked upon a self-decoder. The self-decoder efficiently encodes global key-value (KV) caches that are reused by the cross-decoder via cross-attention. The overall model behaves like a decoder-only Transformer, although YOCO only caches once. The design substantially reduces GPU memory demands, yet retains global attention capability. Additionally, the computation flow enables prefilling to early exit without changing the final output, thereby significantly speeding up the prefill stage. Experimental results demonstrate that YOCO achieves favorable performance compared to Transformer in various settings of scaling up model size and number of training tokens. We also extend YOCO to 1M context length with near-perfect needle retrieval accuracy. The profiling results show that YOCO improves inference memory, p......

Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models

图片

Diffusion Models (DMs) have exhibited superior performance in generating high-quality and diverse images. However, this exceptional performance comes at the cost of expensive architectural design, particularly due to the attention module heavily used in leading models. Existing works mainly adopt a retraining process to enhance DM efficiency. This is computationally expensive and not very scalable. To this end, we introduce the Attention-driven Training-free Efficient Diffusion Model (AT-EDM) framework that leverages attention maps to perform run-time pruning of redundant tokens, without the need for any retraining. Specifically, for single-denoising-step pruning, we develop a novel ranking algorithm, Generalized Weighted Page Rank (G-WPR), to identify redundant tokens, and a similarity-based recovery method to restore tokens for the convolution operation. In addition, we propose a Denoising-Steps-Aware Pruning (DSAP) approach to adjust the pruning budget across different den......

Imagine Flash: Accelerating Emu Diffusion Models with Backward Distillation

图片

Diffusion models are a powerful generative framework, but come with expensive inference. Existing acceleration methods often compromise image quality or fail under complex conditioning when operating in an extremely low-step regime. In this work, we propose a novel distillation framework tailored to enable high-fidelity, diverse sample generation using just one to three steps. Our approach comprises three key components: (i) Backward Distillation, which mitigates training-inference discrepancies by calibrating the student on its own backward trajectory; (ii) Shifted Reconstruction Loss that dynamically adapts knowledge transfer based on the current time step; and (iii) Noise Correction, an inference-time technique that enhances sample quality by addressing singularities in noise prediction. Through extensive experiments, we demonstrate that our method outperforms existing competitors in quantitative metrics and human evaluations. Remarkably, it achieves performance comparable......

Reviewing Intelligent Cinematography: AI research for camera-based video production

图片

This paper offers a comprehensive review of artificial intelligence (AI) research in the context of real camera content acquisition for entertainment purposes and is aimed at both researchers and cinematographers. Considering the breadth of computer vision research and the lack of review papers tied to intelligent cinematography (IC), this review introduces a holistic view of the IC landscape while providing the technical insight for experts across across disciplines. We preface the main discussion with technical background on generative AI, object detection, automated camera calibration and 3-D content acquisition, and link explanatory articles to assist non-technical readers. The main discussion categorizes work by four production types: General Production, Virtual Production, Live Production and Aerial Production. Note that for Virtual Production we do not discuss research relating to virtual content acquisition, including work on automated video generation, like Stable Di......

StyleMamba : State Space Model for Efficient Text-driven Image Style Transfer

图片

We present StyleMamba, an efficient image style transfer framework that translates text prompts into corresponding visual styles while preserving the content integrity of the original images. Existing text-guided stylization requires hundreds of training iterations and takes a lot of computing resources. To speed up the process, we propose a conditional State Space Model for Efficient Text-driven Image Style Transfer, dubbed StyleMamba, that sequentially aligns the image features to the target text prompts. To enhance the local and global style consistency between text and image, we propose masked and second-order directional losses to optimize the stylization direction to significantly reduce the training iterations by 5 times and the inference time by 3 times. Extensive experiments and qualitative evaluation confirm the robust and superior stylization performance of our methods compared to the existing baselines.......

FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation

图片

Controllable text-to-image (T2I) diffusion models generate images conditioned on both text prompts and semantic inputs of other modalities like edge maps. Nevertheless, current controllable T2I methods commonly face challenges related to efficiency and faithfulness, especially when conditioning on multiple inputs from either the same or diverse modalities. In this paper, we propose a novel Flexible and Efficient method, FlexEControl, for controllable T2I generation. At the core of FlexEControl is a unique weight decomposition strategy, which allows for streamlined integration of various input types. This approach not only enhances the faithfulness of the generated image to the control, but also significantly reduces the computational overhead typically associated with multimodal conditioning. Our approach achieves a reduction of 41% in trainable parameters and 30% in memory usage compared with Uni-ControlNet. Moreover, it doubles data efficiency and can flexibly generate imag......

TALC: Time-Aligned Captions for Multi-Scene Text-to-Video Generation

图片

Recent advances in diffusion-based generative modeling have led to the development of text-to-video (T2V) models that can generate high-quality videos conditioned on a text prompt. Most of these T2V models often produce single-scene video clips that depict an entity performing a particular action (e.g., `a red panda climbing a tree'). However, it is pertinent to generate multi-scene videos since they are ubiquitous in the real-world (e.g., `a red panda climbing a tree' followed by `the red panda sleeps on the top of the tree'). To generate multi-scene videos from the pretrained T2V model, we introduce Time-Aligned Captions (TALC) framework. Specifically, we enhance the text-conditioning mechanism in the T2V architecture to recognize the temporal alignment between the video scenes and scene descriptions. For instance, we condition the visual features of the earlier and later scenes of the generated video with the representations of the first scene description (e.g., `a red pan......

TexControl: Sketch-Based Two-Stage Fashion Image Generation Using Diffusion Model

图片

Deep learning-based sketch-to-clothing image generation provides the initial designs and inspiration in the fashion design processes. However, clothing generation from freehand drawing is challenging due to the sparse and ambiguous information from the drawn sketches. The current generation models may have difficulty generating detailed texture information. In this work, we propose TexControl, a sketch-based fashion generation framework that uses a two-stage pipeline to generate the fashion image corresponding to the sketch input. First, we adopt ControlNet to generate the fashion image from sketch and keep the image outline stable. Then, we use an image-to-image method to optimize the detailed textures of the generated images and obtain the final results. The evaluation results show that TexControl can generate fashion images with high-quality texture as fine-grained image generation.......

相关文章:

Imagine Flash、StyleMamba 、FlexControl、Multi-Scene T2V、TexControl

本文首发于公众号:机器感知 Imagine Flash、StyleMamba 、FlexControl、Multi-Scene T2V、TexControl You Only Cache Once: Decoder-Decoder Architectures for Language Models We introduce a decoder-decoder architecture, YOCO, for large language models, …...

Java Collections.emptyList() 方法详解

前言 在Java开发的日常中,我们常常需要处理集合数据结构,而这其中就免不了要面对“空集合”的场景。传统的做法可能是直接返回 null,但这往往会引入空指针异常的风险,降低了代码的健壮性。幸运的是,Java为我们提供了一…...

Vue前端环境准备

vue-cli Vue-cli是Vue官方提供的脚手架,用于快速生成一个Vue项目模板 提供功能: 统一的目录结构 本地调试 热部署 单元测试 集成打包上线 依赖环境:NodeJs 安装NodeJs与Vue-Cli 1、安装nodejs(已经安装就不用了) node-…...

代码随想录算法训练营第四十二天| 01背包问题(二维、一维)、416.分割等和子集

系列文章目录 目录 系列文章目录动态规划:01背包理论基础①二维数组②一维数组(滚动数组) 416. 分割等和子集①回溯法(超时)②动态规划(01背包)未剪枝版剪枝版 动态规划:01背包理论基…...

故障——蓝桥杯十三届2022国赛大学B组真题

问题分析 这道题纯数学&#xff0c;考察贝叶斯公式 AC_Code #include <bits/stdc.h> using namespace std; typedef pair<int,double> PI; bool cmp(PI a,PI b){if(a.second!b.second)return a.second>b.second;return a.first<b.first; } int main() {i…...

SSD存储基本知识

存储技术随着时间的推移经历了显著变化&#xff0c;新兴的存储介质正逐步挑战已经成为行业标准的硬盘驱动器&#xff08;HDD&#xff09;。在众多竞争者中&#xff0c;固态硬盘&#xff08;SSD&#xff09;是最广泛采用且最有潜力占据主导地位的——它们速度快、运行安静&#…...

buuctf-misc题目练习二

ningen 打开题目后是一张图片&#xff0c;放进winhex里面 发现PK&#xff0c;PK是压缩包ZIP 文件的文件头&#xff0c;下一步是想办法进行分离 Foremost可以依据文件内的文件头和文件尾对一个文件进行分离&#xff0c;或者识别当前的文件是什么文件。比如拓展名被删除、被附加…...

Nginx rewrite项目练习

Nginx rewrite练习 1、访问ip/xcz&#xff0c;返回400状态码&#xff0c;要求用rewrite匹配/xcz a、访问/xcz返回400 b、访问/hello时正常访问xcz.html页面server {listen 192.168.99.137:80;server_name 192.168.99.137;charset utf-8;root /var/www/html;location / {root …...

2024,AI手机“元年”? | 最新快讯

文 | 伯虎财经&#xff0c;作者 | 铁观音 2024年&#xff0c;小米、荣耀、vivo、一加、努比亚等品牌的AI手机新品如雨后春笋般涌现。因此&#xff0c;这一年也被业界广泛视为AI手机的“元年” 试想&#xff0c;当你轻触屏幕&#xff0c;你的手机不仅响应你的指令&#xff0c;更…...

5月9(信息差)

&#x1f30d; 可再生能源发电量首次占全球电力供应的三成 &#x1f384;马斯克脑机接口公司 Neuralink 计划将 Link 功能扩展至现实世界&#xff0c;实现控制机械臂、轮椅等 马斯克脑机接口公司 Neuralink 计划将 Link 功能扩展至现实世界&#xff0c;实现控制机械臂、轮椅等…...

leetcode203-Remove Linked List Elements

题目 给你一个链表的头节点 head 和一个整数 val &#xff0c;请你删除链表中所有满足 Node.val val 的节点&#xff0c;并返回 新的头节点 。 示例 1&#xff1a; 输入&#xff1a;head [1,2,6,3,4,5,6], val 6 输出&#xff1a;[1,2,3,4,5] 示例 2&#xff1a; 输入&…...

2024付费进群系统,源码及搭建变现视频课程(教程+源码)

自从我做资源站项目盈利稳定后&#xff0c;我越来越对网站类项目感兴趣了&#xff0c;毕竟很多网站类项目还是需要一定技术门槛的&#xff0c;可以过滤掉一些人&#xff0c;很多新人做项目就只盯着短视频&#xff0c;所以网站类项目也就没那么的卷。 这个付费进群系统&#xf…...

深入理解Django:中间件与信号处理的艺术

title: 深入理解Django&#xff1a;中间件与信号处理的艺术 date: 2024/5/9 18:41:21 updated: 2024/5/9 18:41:21 categories: 后端开发 tags: Django中间件信号异步性能缓存多语言 引言 在当今的Web开发领域&#xff0c;Django以其强大的功能、简洁的代码结构和高度的可扩…...

rk3588局域网推流

最近无意间看见在网上有使用MediaMtx插件配合ffmpeg在Windows来进行推流&#xff0c;然后在使用其他软件进行拉流显示数据图像的&#xff0c;既然windows都可以使用 &#xff0c;我想linux应该也可以&#xff0c;正好手上也有一块RK3588的开发板&#xff0c;就测试了一下&#…...

Android虚拟机机制

目录 一、Android 虚拟机 dalvik/art&#xff08;6版本后&#xff09;二、Android dex、odex、oat、vdex、art区别 一、Android 虚拟机 dalvik/art&#xff08;6版本后&#xff09; 每个应用都在其自己的进程中运行&#xff0c;都有自己的虚拟机实例。ART通过执行DEX文件可在设…...

【触摸案例-手势解锁案例-按钮高亮 Objective-C语言】

一、我们来说这个self.btns,这个问题啊,为什么不用_btns, 1.我们说,在懒加载里边儿,经常是写下划线啊,_btns,为什么不写,首先啊,这个layoutSubviews:我们第一次,肯定会去执行这个layoutSubviews: 然后呢,去懒加载这个数组, 然后呢,接下来啊,走这一句话, 第一次…...

ChatPPT开启高效办公新时代,AI赋能PPT创作

目录 一、前言二、ChatPPT的几种用法1、通过在线生成2、通过插件生成演讲者模式最终成品遇到问题改进建议 三、ChatPPT其他功能 一、前言 想想以前啊&#xff0c;为了做个PPT&#xff0c;我得去网上找各种模板&#xff0c;有时候还得在某宝上花钱买。结果一做PPT&#xff0c;经…...

【C语言项目】贪吃蛇(上)

个人主页 ~ gitee仓库~ 欢迎大家来到C语言系列的最后一个篇章–贪吃蛇游戏的实现&#xff0c;当我们实现了贪吃蛇之后&#xff0c;我们的C语言就算是登堂入室了&#xff0c;基本会使用了&#xff0c;当然&#xff0c;想要更加熟练地使用还需要多多练习 贪吃蛇 一、目标二、需要…...

LeNet-5上手敲代码

LeNet-5 LeNet-5由Yann LeCun在1998年提出&#xff0c;旨在解决手写数字识别问题&#xff0c;被认为是卷积神经网络的开创性工作之一。该网络是第一个被广泛应用于数字图像识别的神经网络之一&#xff0c;也是深度学习领域的里程碑之一。 LeNet-5的整体架构&#xff1a; 总体…...

javaWeb入门(自用)

1. vue学习 <!DOCTYPE html> <html lang"en"> <head><meta charset"UTF-8"><title>Title</title><script src"https://unpkg.com/vue2"></script> </head> <body><div id"…...

web3风格的网页怎么设计?分享几个,找找感觉。

web3风格的网站是指基于区块链技术和去中心化理念的网站设计风格。这种设计风格强调开放性、透明性和用户自治&#xff0c;体现了Web3的核心价值观。 以下是一些常见的Web3风格网站设计元素&#xff1a; 去中心化标志&#xff1a;在网站的设计中使用去中心化的标志&#xff0…...

ASP.NET MVC(-)表单的提交、获取表单数据

FromCollection 方式...

[AIGC] 《MyBatis-Plus 结合 Spring Boot 的动态数据源介绍及 Demo 演示》

在现代的 Web 应用开发中&#xff0c;Spring Boot 已经成为了一种流行的框架选择。而 MyBatis-Plus 则为 MyBatis 框架提供了更强大的功能和便利。当它们结合使用时&#xff0c;动态数据源的运用变得更加简单和高效。 动态数据源的概念允许我们在运行时根据不同的条件或需求选…...

【华为OD机试C卷D卷】部门人力分配(C++/Java/Python)

【华为OD机试】-(A卷+B卷+C卷+D卷)-2024真题合集目录 【华为OD机试】-(C卷+D卷)-2024最新真题目录 题目描述 部门在进行需求开发时需要进行人力安排。 当前部门需要完成 N 个需求,需求用 requirements 表述,requirements[i] 表示第 i 个需求的工作量大小,单位:人月。 这部…...

毕业设计:《基于 Prometheus 和 ELK 的基础平台监控系统设计与实现》

前言 《基于 Prometheus 和 ELK 的基础平台监控系统设计与实现》&#xff0c;这是我在本科阶段的毕业设计&#xff0c;通过引入 Prometheus 和 ELK 架构实现企业对指标与日志的全方位监控。并且基于云原生&#xff0c;使用容器化持续集成部署的开发方式&#xff0c;通过 Sprin…...

docker私有仓库部署与管理

一、搭建本地公有仓库 1.1 首先下载registry镜像 docker pull registry 1.2 在daemon.json文件中添加私有镜像仓库地址并重新启动docker服务 vim /etc/docker/daemon.json 1.3 运行registry容器 docker run -itd -v /data/registry:/var/lib/registry -p 5000:5000 --restartal…...

2024第六届济南国际大健康产业博会将于5月27日如期开幕

由山东省城市经济学会、山东省科学养生协会主办的第六届中国&#xff08;济南&#xff09;国际大健康产业博览会&#xff0c;将于5月27-29日&#xff0c;在济南黄河国际会展中心盛大举办。 近年来&#xff0c;健康越来越受到大众的重视&#xff0c;在我国经济重要的转型阶段成…...

计算方法实验9:Romberg积分求解速度、位移

任务 输出质点的轨迹 ( x ( t ) , y ( t ) ) , t ∈ { 0.1 , 0.2 , 0.3 , . . . , 10 } (x(t), y(t)), t\in \{0.1, 0.2, 0.3, ..., 10\} (x(t),y(t)),t∈{0.1,0.2,0.3,...,10}&#xff0c;并在二维平面中画出该轨迹.请比较M分别取4, 8, 12, 16, 20 时&#xff0c;Romberg积分达…...

设计模式有哪些基本原则

目录 开闭原则(Open Closed Principle) 里氏替换原则(Liskov Substitution principle) 单一职责原则(Single Responsibility Principle,SRP)...

别再出错了!华为交换机到底如何配置access、trunk、hybird端口?

号主&#xff1a;老杨丨11年资深网络工程师&#xff0c;更多网工提升干货&#xff0c;请关注公众号&#xff1a;网络工程师俱乐部 下午好&#xff0c;我的网工朋友。 我们都知道&#xff0c;网络工程师的工作离不开对交换机的熟练操作。华为交换机的配置&#xff0c;绝对是考验…...

bootstrap模板网/枣庄网络推广seo

前言 InfluxDB 是一个用于存储和分析时间序列数据的开源数据库&#xff0c;内置 HTTP API&#xff0c;类 SQL 语句的支持和无结构的特性对使用者而言都非常友好。它强大的数据吞吐能力以及稳定的性能表现使其非常适合 IoT 领域。 通过 EMQ X 消息引擎&#xff0c;我们可以自定…...

c2c网站模板/汕头网站建设公司

迭代器自始至终&#xff0c;都有一个概念一直在用&#xff0c;但是我们却没来都没有人在的深入剖析它。这个概念就是迭代。迭代的意思有点类似循环&#xff0c;每一次的重复的过程被称为迭代的过程&#xff0c;而每一次迭代得到的结果会被用来作为下一次迭代的初始值。提供迭代…...

合肥做网站优化/互联网营销师是干什么

4.1 Scala数组 数组是最常用的收集一系列元素的结构数组的声明与其他数值类型声明类似定长数组声明 val A new Array [T] (N) val Anew Array [Int] (10)数组声明时若不给出值&#xff0c;会被初始化&#xff0c;初始化后同样能对数组成员进行赋值 数组成员初始化&#xff0c;…...

前后端分离的网站怎么做/种子库

作者&#xff1a; 朱天顺 一、前言 随着时代的发展与科学技术的不断进步&#xff0c;云计算、物联网技术等也得到迅猛的发展和进步&#xff0c;物联网虽起源于传媒领域&#xff0c;然已在各行各业得到深入的应用&#xff1b;在云计算与物联网共同协作下将采集到的数据进行精准…...

一个网站做数据维护3天正常吗/企业网站怎么优化

4.1 实战&#xff1a;在域环境安装和配置企业CA 本节目标&#xff1a; 安装企业CA 为域中的计算机安装pop3和SMTP服务 在域中创建用户zhang 电子邮箱地址为zhangess.com 在域中创建用户wang电子邮箱地址为wangess.com 掌握批量设置域用户电子邮件地址 掌握域用户申请证书…...

万网怎么做网站/槐荫区网络营销seo

现在博客很流行&#xff0c;相信应该上网时间稍微长点的朋友都会在这或者在那的有一个自己的博客。对于一些有一定能力的朋友&#xff0c;可能更喜欢自己去下载一个博客程序来架设一个自己的博客&#xff0c;而不是使用一些博客网站提供的服务。而大部分博客程序所带的搜索功能…...