当前位置: 首页 > news >正文

AIGC笔记--Stable Diffusion源码剖析之UNetModel

1--前言

        以论文《High-Resolution Image Synthesis with Latent Diffusion Models》  开源的项目为例,剖析Stable Diffusion经典组成部分,巩固学习加深印象。

2--UNetModel

一个可以debug的小demo:SD_UNet​​​​​​​

        以文生图为例,剖析UNetModel核心组成模块。

2-1--Forward总揽

提供的文生图Demo中,实际传入的参数只有x、timesteps和context三个,其中:

        x 表示随机初始化的噪声Tensor(shape: [B*2, 4, 64, 64],*2表示使用Classifier-Free Diffusion Guidance)。

        timesteps 表示去噪过程中每一轮传入的timestep(shape: [B*2])。

        context表示经过CLIP编码后对应的文本Prompt(shape: [B*2, 77, 768])。

    def forward(self, x, timesteps=None, context=None, y=None,**kwargs):"""Apply the model to an input batch.:param x: an [N x C x ...] Tensor of inputs.:param timesteps: a 1-D batch of timesteps.:param context: conditioning plugged in via crossattn:param y: an [N] Tensor of labels, if class-conditional.:return: an [N x C x ...] Tensor of outputs."""assert (y is not None) == (self.num_classes is not None), "must specify y if and only if the model is class-conditional"hs = []t_emb = timestep_embedding(timesteps, self.model_channels, repeat_only=False) # Create sinusoidal timestep embeddings.emb = self.time_embed(t_emb) # MLPif self.num_classes is not None:assert y.shape == (x.shape[0],)emb = emb + self.label_emb(y)h = x.type(self.dtype)for module in self.input_blocks:h = module(h, emb, context)hs.append(h)h = self.middle_block(h, emb, context)for module in self.output_blocks:h = th.cat([h, hs.pop()], dim=1)h = module(h, emb, context)h = h.type(x.dtype)if self.predict_codebook_ids:return self.id_predictor(h)else:return self.out(h)

2-2--timestep embedding生成

        使用函数 timestep_embedding() 和 self.time_embed() 对传入的timestep进行位置编码,生成sinusoidal timestep embeddings。

        其中 timestep_embedding() 函数定义如下,而self.time_embed()是一个MLP函数。

def timestep_embedding(timesteps, dim, max_period=10000, repeat_only=False):"""Create sinusoidal timestep embeddings.:param timesteps: a 1-D Tensor of N indices, one per batch element.These may be fractional.:param dim: the dimension of the output.:param max_period: controls the minimum frequency of the embeddings.:return: an [N x dim] Tensor of positional embeddings."""if not repeat_only:half = dim // 2freqs = torch.exp(-math.log(max_period) * torch.arange(start=0, end=half, dtype=torch.float32) / half).to(device=timesteps.device)args = timesteps[:, None].float() * freqs[None]embedding = torch.cat([torch.cos(args), torch.sin(args)], dim=-1)if dim % 2:embedding = torch.cat([embedding, torch.zeros_like(embedding[:, :1])], dim=-1)else:embedding = repeat(timesteps, 'b -> b d', d=dim)return embedding
self.time_embed = nn.Sequential(linear(model_channels, time_embed_dim),nn.SiLU(),linear(time_embed_dim, time_embed_dim),
)

2-3--self.input_blocks下采样

        在 Forward() 中,使用 self.input_blocks 将输入噪声进行分辨率下采样,经过下采样具体维度变化为:[B*2, 4, 64, 64] > [B*2, 1280, 8, 8];

        下采样模块共有12个 module,其组成如下:

ModuleList((0): TimestepEmbedSequential((0): Conv2d(4, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)))(1-2): 2 x TimestepEmbedSequential((0): ResBlock((in_layers): Sequential((0): GroupNorm32(32, 320, eps=1e-05, affine=True)(1): SiLU()(2): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)))(h_upd): Identity()(x_upd): Identity()(emb_layers): Sequential((0): SiLU()(1): Linear(in_features=1280, out_features=320, bias=True))(out_layers): Sequential((0): GroupNorm32(32, 320, eps=1e-05, affine=True)(1): SiLU()(2): Dropout(p=0, inplace=False)(3): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)))(skip_connection): Identity())(1): SpatialTransformer((norm): GroupNorm(32, 320, eps=1e-06, affine=True)(proj_in): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1))(transformer_blocks): ModuleList((0): BasicTransformerBlock((attn1): CrossAttention((to_q): Linear(in_features=320, out_features=320, bias=False)(to_k): Linear(in_features=320, out_features=320, bias=False)(to_v): Linear(in_features=320, out_features=320, bias=False)(to_out): Sequential((0): Linear(in_features=320, out_features=320, bias=True)(1): Dropout(p=0.0, inplace=False)))(ff): FeedForward((net): Sequential((0): GEGLU((proj): Linear(in_features=320, out_features=2560, bias=True))(1): Dropout(p=0.0, inplace=False)(2): Linear(in_features=1280, out_features=320, bias=True)))(attn2): CrossAttention((to_q): Linear(in_features=320, out_features=320, bias=False)(to_k): Linear(in_features=768, out_features=320, bias=False)(to_v): Linear(in_features=768, out_features=320, bias=False)(to_out): Sequential((0): Linear(in_features=320, out_features=320, bias=True)(1): Dropout(p=0.0, inplace=False)))(norm1): LayerNorm((320,), eps=1e-05, elementwise_affine=True)(norm2): LayerNorm((320,), eps=1e-05, elementwise_affine=True)(norm3): LayerNorm((320,), eps=1e-05, elementwise_affine=True)))(proj_out): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1))))(3): TimestepEmbedSequential((0): Downsample((op): Conv2d(320, 320, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))))(4): TimestepEmbedSequential((0): ResBlock((in_layers): Sequential((0): GroupNorm32(32, 320, eps=1e-05, affine=True)(1): SiLU()(2): Conv2d(320, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)))(h_upd): Identity()(x_upd): Identity()(emb_layers): Sequential((0): SiLU()(1): Linear(in_features=1280, out_features=640, bias=True))(out_layers): Sequential((0): GroupNorm32(32, 640, eps=1e-05, affine=True)(1): SiLU()(2): Dropout(p=0, inplace=False)(3): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)))(skip_connection): Conv2d(320, 640, kernel_size=(1, 1), stride=(1, 1)))(1): SpatialTransformer((norm): GroupNorm(32, 640, eps=1e-06, affine=True)(proj_in): Conv2d(640, 640, kernel_size=(1, 1), stride=(1, 1))(transformer_blocks): ModuleList((0): BasicTransformerBlock((attn1): CrossAttention((to_q): Linear(in_features=640, out_features=640, bias=False)(to_k): Linear(in_features=640, out_features=640, bias=False)(to_v): Linear(in_features=640, out_features=640, bias=False)(to_out): Sequential((0): Linear(in_features=640, out_features=640, bias=True)(1): Dropout(p=0.0, inplace=False)))(ff): FeedForward((net): Sequential((0): GEGLU((proj): Linear(in_features=640, out_features=5120, bias=True))(1): Dropout(p=0.0, inplace=False)(2): Linear(in_features=2560, out_features=640, bias=True)))(attn2): CrossAttention((to_q): Linear(in_features=640, out_features=640, bias=False)(to_k): Linear(in_features=768, out_features=640, bias=False)(to_v): Linear(in_features=768, out_features=640, bias=False)(to_out): Sequential((0): Linear(in_features=640, out_features=640, bias=True)(1): Dropout(p=0.0, inplace=False)))(norm1): LayerNorm((640,), eps=1e-05, elementwise_affine=True)(norm2): LayerNorm((640,), eps=1e-05, elementwise_affine=True)(norm3): LayerNorm((640,), eps=1e-05, elementwise_affine=True)))(proj_out): Conv2d(640, 640, kernel_size=(1, 1), stride=(1, 1))))(5): TimestepEmbedSequential((0): ResBlock((in_layers): Sequential((0): GroupNorm32(32, 640, eps=1e-05, affine=True)(1): SiLU()(2): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)))(h_upd): Identity()(x_upd): Identity()(emb_layers): Sequential((0): SiLU()(1): Linear(in_features=1280, out_features=640, bias=True))(out_layers): Sequential((0): GroupNorm32(32, 640, eps=1e-05, affine=True)(1): SiLU()(2): Dropout(p=0, inplace=False)(3): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)))(skip_connection): Identity())(1): SpatialTransformer((norm): GroupNorm(32, 640, eps=1e-06, affine=True)(proj_in): Conv2d(640, 640, kernel_size=(1, 1), stride=(1, 1))(transformer_blocks): ModuleList((0): BasicTransformerBlock((attn1): CrossAttention((to_q): Linear(in_features=640, out_features=640, bias=False)(to_k): Linear(in_features=640, out_features=640, bias=False)(to_v): Linear(in_features=640, out_features=640, bias=False)(to_out): Sequential((0): Linear(in_features=640, out_features=640, bias=True)(1): Dropout(p=0.0, inplace=False)))(ff): FeedForward((net): Sequential((0): GEGLU((proj): Linear(in_features=640, out_features=5120, bias=True))(1): Dropout(p=0.0, inplace=False)(2): Linear(in_features=2560, out_features=640, bias=True)))(attn2): CrossAttention((to_q): Linear(in_features=640, out_features=640, bias=False)(to_k): Linear(in_features=768, out_features=640, bias=False)(to_v): Linear(in_features=768, out_features=640, bias=False)(to_out): Sequential((0): Linear(in_features=640, out_features=640, bias=True)(1): Dropout(p=0.0, inplace=False)))(norm1): LayerNorm((640,), eps=1e-05, elementwise_affine=True)(norm2): LayerNorm((640,), eps=1e-05, elementwise_affine=True)(norm3): LayerNorm((640,), eps=1e-05, elementwise_affine=True)))(proj_out): Conv2d(640, 640, kernel_size=(1, 1), stride=(1, 1))))(6): TimestepEmbedSequential((0): Downsample((op): Conv2d(640, 640, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))))(7): TimestepEmbedSequential((0): ResBlock((in_layers): Sequential((0): GroupNorm32(32, 640, eps=1e-05, affine=True)(1): SiLU()(2): Conv2d(640, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)))(h_upd): Identity()(x_upd): Identity()(emb_layers): Sequential((0): SiLU()(1): Linear(in_features=1280, out_features=1280, bias=True))(out_layers): Sequential((0): GroupNorm32(32, 1280, eps=1e-05, affine=True)(1): SiLU()(2): Dropout(p=0, inplace=False)(3): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)))(skip_connection): Conv2d(640, 1280, kernel_size=(1, 1), stride=(1, 1)))(1): SpatialTransformer((norm): GroupNorm(32, 1280, eps=1e-06, affine=True)(proj_in): Conv2d(1280, 1280, kernel_size=(1, 1), stride=(1, 1))(transformer_blocks): ModuleList((0): BasicTransformerBlock((attn1): CrossAttention((to_q): Linear(in_features=1280, out_features=1280, bias=False)(to_k): Linear(in_features=1280, out_features=1280, bias=False)(to_v): Linear(in_features=1280, out_features=1280, bias=False)(to_out): Sequential((0): Linear(in_features=1280, out_features=1280, bias=True)(1): Dropout(p=0.0, inplace=False)))(ff): FeedForward((net): Sequential((0): GEGLU((proj): Linear(in_features=1280, out_features=10240, bias=True))(1): Dropout(p=0.0, inplace=False)(2): Linear(in_features=5120, out_features=1280, bias=True)))(attn2): CrossAttention((to_q): Linear(in_features=1280, out_features=1280, bias=False)(to_k): Linear(in_features=768, out_features=1280, bias=False)(to_v): Linear(in_features=768, out_features=1280, bias=False)(to_out): Sequential((0): Linear(in_features=1280, out_features=1280, bias=True)(1): Dropout(p=0.0, inplace=False)))(norm1): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)(norm2): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)(norm3): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)))(proj_out): Conv2d(1280, 1280, kernel_size=(1, 1), stride=(1, 1))))(8): TimestepEmbedSequential((0): ResBlock((in_layers): Sequential((0): GroupNorm32(32, 1280, eps=1e-05, affine=True)(1): SiLU()(2): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)))(h_upd): Identity()(x_upd): Identity()(emb_layers): Sequential((0): SiLU()(1): Linear(in_features=1280, out_features=1280, bias=True))(out_layers): Sequential((0): GroupNorm32(32, 1280, eps=1e-05, affine=True)(1): SiLU()(2): Dropout(p=0, inplace=False)(3): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)))(skip_connection): Identity())(1): SpatialTransformer((norm): GroupNorm(32, 1280, eps=1e-06, affine=True)(proj_in): Conv2d(1280, 1280, kernel_size=(1, 1), stride=(1, 1))(transformer_blocks): ModuleList((0): BasicTransformerBlock((attn1): CrossAttention((to_q): Linear(in_features=1280, out_features=1280, bias=False)(to_k): Linear(in_features=1280, out_features=1280, bias=False)(to_v): Linear(in_features=1280, out_features=1280, bias=False)(to_out): Sequential((0): Linear(in_features=1280, out_features=1280, bias=True)(1): Dropout(p=0.0, inplace=False)))(ff): FeedForward((net): Sequential((0): GEGLU((proj): Linear(in_features=1280, out_features=10240, bias=True))(1): Dropout(p=0.0, inplace=False)(2): Linear(in_features=5120, out_features=1280, bias=True)))(attn2): CrossAttention((to_q): Linear(in_features=1280, out_features=1280, bias=False)(to_k): Linear(in_features=768, out_features=1280, bias=False)(to_v): Linear(in_features=768, out_features=1280, bias=False)(to_out): Sequential((0): Linear(in_features=1280, out_features=1280, bias=True)(1): Dropout(p=0.0, inplace=False)))(norm1): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)(norm2): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)(norm3): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)))(proj_out): Conv2d(1280, 1280, kernel_size=(1, 1), stride=(1, 1))))(9): TimestepEmbedSequential((0): Downsample((op): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))))(10-11): 2 x TimestepEmbedSequential((0): ResBlock((in_layers): Sequential((0): GroupNorm32(32, 1280, eps=1e-05, affine=True)(1): SiLU()(2): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)))(h_upd): Identity()(x_upd): Identity()(emb_layers): Sequential((0): SiLU()(1): Linear(in_features=1280, out_features=1280, bias=True))(out_layers): Sequential((0): GroupNorm32(32, 1280, eps=1e-05, affine=True)(1): SiLU()(2): Dropout(p=0, inplace=False)(3): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)))(skip_connection): Identity()))
)

        12个 module 都使用了 TimestepEmbedSequential 类进行封装,根据不同的网络层,将输入噪声x与timestep embedding和prompt context进行运算。

class TimestepEmbedSequential(nn.Sequential, TimestepBlock):"""A sequential module that passes timestep embeddings to the children thatsupport it as an extra input."""def forward(self, x, emb, context=None):for layer in self:if isinstance(layer, TimestepBlock):x = layer(x, emb)elif isinstance(layer, SpatialTransformer):x = layer(x, context)else:x = layer(x)return x

2-3-1--Module0

 Module 0 是一个2D卷积层,主要对输入噪声进行特征提取;

# init 初始化
self.input_blocks = nn.ModuleList([TimestepEmbedSequential(conv_nd(dims, in_channels, model_channels, 3, padding=1))]
)# 打印 self.input_blocks[0]
TimestepEmbedSequential((0): Conv2d(4, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)

2-3-2--Module1和Module2

        Module1和Module2的结构相同,都由一个ResBlock和一个SpatialTransformer组成;

# init 初始化
for _ in range(num_res_blocks):layers = [ResBlock(ch,time_embed_dim,dropout,out_channels=mult * model_channels,dims=dims,use_checkpoint=use_checkpoint,use_scale_shift_norm=use_scale_shift_norm,)]ch = mult * model_channelsif ds in attention_resolutions:if num_head_channels == -1:dim_head = ch // num_headselse:num_heads = ch // num_head_channelsdim_head = num_head_channelsif legacy:#num_heads = 1dim_head = ch // num_heads if use_spatial_transformer else num_head_channelslayers.append(AttentionBlock(ch,use_checkpoint=use_checkpoint,num_heads=num_heads,num_head_channels=dim_head,use_new_attention_order=use_new_attention_order,) if not use_spatial_transformer else SpatialTransformer(ch, num_heads, dim_head, depth=transformer_depth, context_dim=context_dim))self.input_blocks.append(TimestepEmbedSequential(*layers))self._feature_size += chinput_block_chans.append(ch)# 打印 self.input_blocks[1]
TimestepEmbedSequential((0): ResBlock((in_layers): Sequential((0): GroupNorm32(32, 320, eps=1e-05, affine=True)(1): SiLU()(2): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)))(h_upd): Identity()(x_upd): Identity()(emb_layers): Sequential((0): SiLU()(1): Linear(in_features=1280, out_features=320, bias=True))(out_layers): Sequential((0): GroupNorm32(32, 320, eps=1e-05, affine=True)(1): SiLU()(2): Dropout(p=0, inplace=False)(3): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)))(skip_connection): Identity())(1): SpatialTransformer((norm): GroupNorm(32, 320, eps=1e-06, affine=True)(proj_in): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1))(transformer_blocks): ModuleList((0): BasicTransformerBlock((attn1): CrossAttention((to_q): Linear(in_features=320, out_features=320, bias=False)(to_k): Linear(in_features=320, out_features=320, bias=False)(to_v): Linear(in_features=320, out_features=320, bias=False)(to_out): Sequential((0): Linear(in_features=320, out_features=320, bias=True)(1): Dropout(p=0.0, inplace=False)))(ff): FeedForward((net): Sequential((0): GEGLU((proj): Linear(in_features=320, out_features=2560, bias=True))(1): Dropout(p=0.0, inplace=False)(2): Linear(in_features=1280, out_features=320, bias=True)))(attn2): CrossAttention((to_q): Linear(in_features=320, out_features=320, bias=False)(to_k): Linear(in_features=768, out_features=320, bias=False)(to_v): Linear(in_features=768, out_features=320, bias=False)(to_out): Sequential((0): Linear(in_features=320, out_features=320, bias=True)(1): Dropout(p=0.0, inplace=False)))(norm1): LayerNorm((320,), eps=1e-05, elementwise_affine=True)(norm2): LayerNorm((320,), eps=1e-05, elementwise_affine=True)(norm3): LayerNorm((320,), eps=1e-05, elementwise_affine=True)))(proj_out): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1)))
)# 打印 self.input_blocks[2]
TimestepEmbedSequential((0): ResBlock((in_layers): Sequential((0): GroupNorm32(32, 320, eps=1e-05, affine=True)(1): SiLU()(2): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)))(h_upd): Identity()(x_upd): Identity()(emb_layers): Sequential((0): SiLU()(1): Linear(in_features=1280, out_features=320, bias=True))(out_layers): Sequential((0): GroupNorm32(32, 320, eps=1e-05, affine=True)(1): SiLU()(2): Dropout(p=0, inplace=False)(3): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)))(skip_connection): Identity())(1): SpatialTransformer((norm): GroupNorm(32, 320, eps=1e-06, affine=True)(proj_in): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1))(transformer_blocks): ModuleList((0): BasicTransformerBlock((attn1): CrossAttention((to_q): Linear(in_features=320, out_features=320, bias=False)(to_k): Linear(in_features=320, out_features=320, bias=False)(to_v): Linear(in_features=320, out_features=320, bias=False)(to_out): Sequential((0): Linear(in_features=320, out_features=320, bias=True)(1): Dropout(p=0.0, inplace=False)))(ff): FeedForward((net): Sequential((0): GEGLU((proj): Linear(in_features=320, out_features=2560, bias=True))(1): Dropout(p=0.0, inplace=False)(2): Linear(in_features=1280, out_features=320, bias=True)))(attn2): CrossAttention((to_q): Linear(in_features=320, out_features=320, bias=False)(to_k): Linear(in_features=768, out_features=320, bias=False)(to_v): Linear(in_features=768, out_features=320, bias=False)(to_out): Sequential((0): Linear(in_features=320, out_features=320, bias=True)(1): Dropout(p=0.0, inplace=False)))(norm1): LayerNorm((320,), eps=1e-05, elementwise_affine=True)(norm2): LayerNorm((320,), eps=1e-05, elementwise_affine=True)(norm3): LayerNorm((320,), eps=1e-05, elementwise_affine=True)))(proj_out): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1)))
)

2-3-3--Module3

        Module3是一个下采样2D卷积层。

# init 初始化
if level != len(channel_mult) - 1:out_ch = chself.input_blocks.append(TimestepEmbedSequential(ResBlock(ch,time_embed_dim,dropout,out_channels=out_ch,dims=dims,use_checkpoint=use_checkpoint,use_scale_shift_norm=use_scale_shift_norm,down=True,)if resblock_updownelse Downsample(ch, conv_resample, dims=dims, out_channels=out_ch)))# 打印 self.input_blocks[3]
TimestepEmbedSequential((0): Downsample((op): Conv2d(320, 320, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)))
)

2-3-4--Module4、Module5、Module7和Module8

        与Module1和Module2的结构相同,都由一个ResBlock和一个SpatialTransformer组成,只有特征维度上的区别;

2-3-4--Module6和Module9

        与Module3的结构相同,是一个下采样2D卷积层。

2-3--5--Module10和Module11

        Module10和Module12的结构相同,只由一个ResBlock组成。

# 打印 self.input_blocks[10]
TimestepEmbedSequential((0): ResBlock((in_layers): Sequential((0): GroupNorm32(32, 1280, eps=1e-05, affine=True)(1): SiLU()(2): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)))(h_upd): Identity()(x_upd): Identity()(emb_layers): Sequential((0): SiLU()(1): Linear(in_features=1280, out_features=1280, bias=True))(out_layers): Sequential((0): GroupNorm32(32, 1280, eps=1e-05, affine=True)(1): SiLU()(2): Dropout(p=0, inplace=False)(3): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)))(skip_connection): Identity())
)# 打印 self.input_blocks[11]
TimestepEmbedSequential((0): ResBlock((in_layers): Sequential((0): GroupNorm32(32, 1280, eps=1e-05, affine=True)(1): SiLU()(2): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)))(h_upd): Identity()(x_upd): Identity()(emb_layers): Sequential((0): SiLU()(1): Linear(in_features=1280, out_features=1280, bias=True))(out_layers): Sequential((0): GroupNorm32(32, 1280, eps=1e-05, affine=True)(1): SiLU()(2): Dropout(p=0, inplace=False)(3): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)))(skip_connection): Identity())
)

相关文章:

AIGC笔记--Stable Diffusion源码剖析之UNetModel

1--前言 以论文《High-Resolution Image Synthesis with Latent Diffusion Models》 开源的项目为例,剖析Stable Diffusion经典组成部分,巩固学习加深印象。 2--UNetModel 一个可以debug的小demo:SD_UNet​​​​​​​ 以文生图为例&#…...

Linux文件系统与日志分析

目录 inode block 链接 文件修复 实验步骤 针对ext文件系统恢复删除文件 针对xfs文件系统恢复删除文件 日志 日志级别 rsyslogd服务 日志目录 messages日志文件(系统日志) 集中管理日志 - 实验 1.环境配置 1.1 1.2 1.3 1.4 1.5 2.远…...

【SkyWalking】使用PostgreSQL做存储K8s部署

拉取镜像 docker pull apache/skywalking-ui:10.0.1 docker tag apache/skywalking-ui:10.0.1 xxx/xxx/skywalking-ui:10.0.1 docker push xxx/xxx/skywalking-ui:10.0.1docker pull apache/skywalking-oap-server:10.0.1 docker tag apache/skywalking-oap-server:10.0.1 xxx…...

详解大模型微调数据集构建方法(持续更新)

大家好,我是herosunly。985院校硕士毕业,现担任算法t研究员一职,热衷于机器学习算法研究与应用。曾获得阿里云天池比赛第一名,CCF比赛第二名,科大讯飞比赛第三名。拥有多项发明专利。对机器学习和深度学习拥有自己独到的见解。曾经辅导过若干个非计算机专业的学生进入到算…...

自制植物大战僵尸:HTML5与JavaScript实现的简单游戏

引言 在本文中,我们将一起探索如何使用HTML5和JavaScript来创建一个简单的植物大战僵尸游戏。这不仅是一项有趣的编程挑战,也是学习游戏开发基础的绝佳机会。 什么是植物大战僵尸? 植物大战僵尸是一款流行的策略塔防游戏,玩家需…...

Istio_1.17.8安装

项目背景 按照istio官网的命令一路安装下来,安装好的istio版本为目前的最新版本,1.22.0。而我的k8s集群的版本并不支持istio_1.22的版本,导致ingress-gate网关安装不上,再仔细查看istio的发布文档,如果用istio_1.22版本…...

[数据集][目标检测]室内积水检测数据集VOC+YOLO格式761张1类别

数据集格式:Pascal VOC格式YOLO格式(不包含分割路径的txt文件,仅仅包含jpg图片以及对应的VOC格式xml文件和yolo格式txt文件) 图片数量(jpg文件个数):761 标注数量(xml文件个数):761 标注数量(txt文件个数):761 标注类别…...

17_Vue高级监听器生命周期Vue组件组件通信

文章目录 1. 数据监听器watch2. Vue生命周期3. Vue组件4. Vue组件通信Appendix 1. 数据监听器watch 首先watch需要单独引 import {watch} from vuewatch函数监听ref响应式数据 watch(监听的内容,监听行为)监听行为默认为(newValue,oldValue) let firstname ref…...

【ROS使用记录】—— ros使用过程中的rosbag录制播放和ros话题信息相关的指令与操作记录

提示:文章写完后,目录可以自动生成,如何生成可参考右边的帮助文档 文章目录 前言一、rosbag的介绍二、rosbag的在线和离线录制三、rosbag的播放相关的指令四、其他rosbag和ros话题相关的指令总结 前言 rosbag是ROS(机器人操作系统…...

Laravel 富文本内容

Laravel 获取富文本的纯文本内容-CSDN博客 Laravel 富文本内容里面的图片添加前缀URL-CSDN博客 Laravel 富文本图片的style样式删除-CSDN博客. Laravel 获取富文本中的所有图片-CSDN博客 富文本字体font-famly删除 $data preg_replace(/(<[^>])style["\][^"…...

Spark Python环境搭建与优化:深入剖析四个方面、五个方面、六个方面及七个关键要点

Spark Python环境搭建与优化&#xff1a;深入剖析四个方面、五个方面、六个方面及七个关键要点 在大数据处理领域&#xff0c;Apache Spark凭借其出色的性能和灵活性备受瞩目。而要在Python中利用Spark的强大功能&#xff0c;首先需要搭建一个稳定且高效的Spark Python环境。本…...

【微信小程序开发】小程序中的上滑加载更多,下拉刷新是如何实现的?

✨✨ 欢迎大家来到景天科技苑✨✨ &#x1f388;&#x1f388; 养成好习惯&#xff0c;先赞后看哦~&#x1f388;&#x1f388; &#x1f3c6; 作者简介&#xff1a;景天科技苑 &#x1f3c6;《头衔》&#xff1a;大厂架构师&#xff0c;华为云开发者社区专家博主&#xff0c;…...

从 Android 恢复已删除的备份录

本文介绍了几种在 Android 上恢复丢失和删除的短信的方法。这些方法都不能保证一定成功&#xff0c;但您可能能够恢复一些短信或其中存储的文件。 首先要尝试什么 首先&#xff0c;尝试保留数据。如果你刚刚删除了信息&#xff0c;请立即将手机置于飞行模式&#xff0c;方法是…...

如何使用Python中的random模块生成随机数

在Python中&#xff0c;random模块提供了多种用于生成随机数的函数。以下是一些基本示例&#xff1a; 生成随机整数&#xff1a; 使用random.randint(a, b)函数生成一个介于a和b之间的随机整数&#xff08;包括a和b&#xff09;。 python复制代码 import random random_int …...

AI大数据处理与分析实战--体育问卷分析

AI大数据处理与分析实战–体育问卷分析 前言&#xff1a;前一段时间接了一个需求&#xff0c;使用AI进行数据分析与处理&#xff0c;遂整理了一下大致过程和大致简要结果&#xff08;更详细就不方便放了&#xff09;。 文章目录 AI大数据处理与分析实战--体育问卷分析一、数据…...

C++第二十五弹---从零开始模拟STL中的list(下)

✨个人主页&#xff1a; 熬夜学编程的小林 &#x1f497;系列专栏&#xff1a; 【C语言详解】 【数据结构详解】【C详解】 目录 1、函数补充 2、迭代器完善 3、const迭代器 总结 1、函数补充 拷贝构造 思路&#xff1a; 先构造一个头结点&#xff0c;然后将 lt 类中的元…...

STM32/keil把多个c文件编译为静态库lib

把常用的、不经常修改的代码库编译成lib以后&#xff0c;可以加快整个工程的编译速度。 一个常见的应用场景就是&#xff0c;把ST的标准库或HAL库等编译成lib&#xff0c;这样以后再编译整个工程时&#xff0c;就无需再次编译他们了&#xff0c;可以节省编译时间。当然&#x…...

L45---506.相对名次(java)--排序

1.题目描述 2.知识点 &#xff08;1&#xff09;String.join(" ", words) 是 Java 中的一个语法&#xff0c;用于将数组或集合中的元素连接成一个单独的字符串&#xff0c;连接时使用指定的分隔符。这里的 " " 是作为分隔符使用的一个空格字符串。 Strin…...

跨网段路由

跨网段路由通常是指在网络中配置路由&#xff0c;以允许不同子网之间的通信。要设置跨网段的永久路由&#xff0c;取决于你是在操作路由器、交换机这样的网络设备&#xff0c;还是在配置个人计算机&#xff08;如Windows或Linux系统&#xff09;。下面是两种常见情况下的简要指…...

HO-3D 数据集

// 由于非刚体的追踪比较困难&#xff0c;所以看看刚体数据集 HOnnotate: A method for 3D Annotation of Hand and Object Poses // cvpr20https://arxiv.org/abs/1907.01481 https://github.com/shreyashampali/ho3d https://paperswithcode.com/paper/ho-3d-a-mult…...

Elasticsearch 认证模拟题 - 8

一、题目 在集群中输入以下指令&#xff1a; PUT phones/_doc/1 {"brand":"Samsumg","model":"Galaxy S9","features":[{"type":"os", "value":"Android"},{"type":&q…...

【Postman接口测试】第四节.Postman接口测试项目实战(中)

文章目录 前言五、Postman断言 5.1 Postman断言介绍 5.2 响应状态码断言 5.3 包含指定字符串断言 5.4 JSON数据断言六、参数化 5.1 Postman参数化介绍 5.2 Postman参数化实现 5.3 针对项目登录接口参数化实现 总结 前言 五、Postman断言 5.1 Postman断言介…...

Hadoop的Windows环境准备

一、将Hadoop传输到Windows中 1、备份副本 cp -r /opt/softs/hadoop3.1.3/ /opt/softs/hadoop3.1.3_temp 2、删除备份的share目录 cd /opt/softs/hadoop3.1.3_temp rm -rf share/ 3、下载到Windows中 重命名去掉_temp 4、删除备份文件 rm -rf /opt/softs/hadoop3.1.3_t…...

使用亮数据代理IP爬取PubMed文章链接和邮箱地址

&#x1f482; 个人网站:【 摸鱼游戏】【神级代码资源网站】【工具大全】&#x1f91f; 一站式轻松构建小程序、Web网站、移动应用&#xff1a;&#x1f449;注册地址&#x1f91f; 基于Web端打造的&#xff1a;&#x1f449;轻量化工具创作平台&#x1f485; 想寻找共同学习交…...

electron调用dll时应用程序闪退

electron调用dll时&#xff0c;直接闪退&#xff0c;且用如下方式监听无任何输出&#xff1a; window-all-closed 或 will-quit 此时需要检查传给dll的参数及参数类型是否正确&#xff0c;特别是使用ffi-napi时调用dll&#xff0c;使用 ref-napi定义类型&#xff0c;经常容易…...

单片机原理及技术(三)—— AT89S51单片机(二)(C51编程)

一、AT89S51单片机的并行I/O端口 1.1 P0口 AT89S51的P0口是一个通用的I/O口&#xff0c;可以用于输入和输出。每个引脚都可以通过软件控制为输入或输出模式。 1.1.1 P0口的工作原理 P0口的工作原理是通过对P0寄存器的读写操作来控制P0口的引脚。 输出模式&#xff1a;当P0口…...

摄影店展示服务预约小程序的作用是什么

摄影店包含婚照、毕业照、写真、儿童照、工作照等多个服务项目&#xff0c;虽然如今人们手机打开便可随时拍照摄影&#xff0c;但在专业程度和场景应用方面&#xff0c;却是需要前往专业门店服务获取。 除了进店&#xff0c;也有外部预约及活动、同行合作等场景&#xff0c;重…...

【数据分析基础】实验一 Python运算符、内置函数、序列基本用法

一、实验目的 熟练运用Python运算符。熟练运用Python内置函数。掌握Python的基本输入输出方法。了解lambda表达式作为函数参数的用法。掌握列表、元组、字典、集合的概念和基本用法。了解Python函数式编程模式。 二、实验内容&#xff1a; 1. 在命令模式测试如下命令&#x…...

【Redis】构建强韧的远程Redis连接与端口保障机制完美指南

【Redis】构建强韧的远程Redis连接与端口保障机制完美指南 大家好 我是寸铁&#x1f44a; 总结了【Redis】构建强韧的远程Redis连接与端口保障机制完美指南✨ 喜欢的小伙伴可以点点关注 &#x1f49d; 前言 在当今的软件开发领域中&#xff0c;远程访问和操作数据存储是极为常见…...

Flowable项目启动报错#java.time.LocalDateTime cannot be cast to java.lang.String

Flowable 项目启动后报错 flow项目第一次启动创建表成功&#xff0c;但是第二次启动时报错信息如下&#xff1a; 1、Error creating bean with name ‘appRepositoryServiceBean’ defined in class 2、Error creating bean with name ‘flowableAppEngine’: FactoryBean t…...