Title 题目 Semi-Supervised Medical Image Segmentation Using Adversarial Consistency Learning and Dynamic Convolution Network 半监督医学图像分割:基于对抗一致性学习和动态卷积网络的方法 01 文献速递介绍 医学图像分割在计算辅助诊断和治疗研究中扮演…



Semi-Supervised Medical Image Segmentation Using Adversarial Consistency Learning and Dynamic Convolution Network




医学图像分割在计算辅助诊断和治疗研究中扮演着重要角色,因为它能够在异常图像中提取重要的器官或病变。近年来,许多基于监督学习的编码器-解码器网络,如U-Net [1]、U-Net++、H-DenseUNet 等,在医学图像分割方面取得了显著的成果。然而,这些技术的成功在很大程度上依赖于大量的像素级标记数据,但在实践中标注医学图像通常非常昂贵。其中一个原因是医学图像由于低对比度和噪声干扰通常显示出较差的视觉效果。此外,医学图像的标注需要比自然图像更多的专业知识。因此,几乎不可能建立大量带有高精度标签的医学图像数据集。


主要的半监督医学图像分割方法大致可以分类为一致性学习 、对抗学习 、自训练 、对比学习  和协作训练 。本文将重点讨论一致性学习和对抗学习。一致性学习通常使用不同的扰动进行一致性正则化来训练网络。其中最具代表性的方法之一是自我集成Mean Teacher (MT) ,它利用基于扰动的一致性损失在未标记数据上的自我集成教师模型与学生模型之间,同时结合在标记数据上的监督损失。在MT的基础上,随后改进的方法侧重于选择不同的数据扰动和特征扰动以实现性能增益。准确地说,分割网络在生成一致的伪标签方面的质量决定了网络对未标记数据的知识挖掘能力。




Popular semi-supervised medical image segmentation networks often suffer from error supervisionfromunlabeled data since they usually use consistency learningunder different data perturbations to regularize model training. These networks ignore the relationshipbetween labeledand unlabeleddata, and only compute single pixel-levelconsistency leading to uncertain prediction results. Besides,these networks often require a large number of parameterssince their backbone networks are designed depending onsupervised image segmentation tasks. Moreover, these networks often face a high over-fittingrisk since a small numberof training samples are popular for semi-supervised imagesegmentation. To address the above problems, in this paper,we propose a novel adversarial self-ensembling networkusing dynamic convolution (ASE-Net) for semi-supervisedmedical image segmentation. First, we use an adversarial consistency training strategy (ACTS) that employs twodiscriminators based on consistency learning to obtainprior relationships between labeled and unlabeled data.The ACTS can simultaneously compute pixel-level andimage-level consistency of unlabeled data under differentdata perturbations to improve the prediction quality oflabels. Second, we design a dynamic convolution-basedbidirectional attention component (DyBAC) that can beembedded in any segmentation network, aiming at adaptively adjusting the weights of ASE-Net based on thestructural information of input samples. This componenteffectively improves the feature representation ability ofASE-Net and reduces the overfitting risk of the network.The proposed ASE-Net has been extensively tested onthree publicly available datasets, and experiments indicatethat ASE-Net is superior to state-of-the-art networks, andreduces computational costs and memory overhead.





In this paper, we propose an adversarial self-ensemblingnetwork (ASE-Net) for semi-supervised medical image segmentation. As shown in Fig. 1, our ASE-Net consists ofsegmentation networks and discriminator networks. The segmentation networks consist of a student model and a teachermodel. The student model has the same structure as the teachermodel and both of them are based on the encoder-decoderstructure; the difference is that the former is trained by theloss function while the latter is the exponential moving average(EMA) of the student model weights. The discriminator networks consist of convolutional layers, the proposed DyBAC,and the global average pooling, whose specific structure of ourASE-Net is shown in Fig. 1.




In this work, we have proposed ASE-Net for semisupervised medical image segmentation. First, the proposedACTS effectively combines adversarial learning and consistency learning, using adversarial training to maximize consistency learning. This allows the network to learn quicklythe prior relationship between unlabeled and labeled data,and further mines the potential knowledge existing in unlabeled data. Then, our proposed DyBAC adaptively adjusts theparameter values of convolutional kernels according to inputsamples, which not only effectively avoids network overfittingand improves the feature representation ability of the networkbut also reduces the memory overhead. Experiments on threepublicly available benchmark datasets demonstrate that ourproposed ASE-Net outperforms state-of-the-art methods andprovides an effective solution for semi-supervised medicalimage segmentation, significantly reducing network overfittingrisk and uncertainty prediction in consistency learning.



Fig. 1. The framework of the proposed ASE-Net. The ASE-Net consists of two main parts: the segmentation networks (left) and the discriminatornetworks (right). The segmentation network is based on the encoder-decoder architecture. The right figure shows the detailed structure of thediscriminative network, where k, s, and p represent the kernel size, the stride, and the padding of convolutional kernels, respectively. The discriminatorsare unnecessary in the inference stage.

图 1. 提出的ASE-Net框架。ASE-Net包括两个主要部分:分割网络(左侧)和鉴别器网络(右侧)。分割网络基于编码器-解码器架构。右侧图显示了鉴别网络的详细结构,其中k,s和p分别表示卷积核的核大小,步幅和填充。在推理阶段,鉴别器是不必要的。

Fig. 2. The structure of DyBAC. (a) Spatial attention, (b) Dynamic convolution. The dynamic convolutional kernels are generated mainly based onthe channel and spatial information of samples. For different input samples, the values of convolution kernel parameters change adaptively

图 2. DyBAC的结构。(a) 空间注意力, (b) 动态卷积。动态卷积核主要基于样本的通道和空间信息生成。对于不同的输入样本,卷积核参数的值会自适应地改变。


Fig. 3. Visualization of the feature heat maps for each convolutional layerin the encoding phase. The first and third rows are feature heat mapsof U-Net employing the standard convolution, and the second and fourthrows are feature heat maps of U-Net employing DyBAC. The encoding ofU-Net has five stages, and we replace the convolution after the first layerwith the proposed dynamic convolution-based bi-directional attentioncomponent (DyBAC). From left to right, the feature maps are shown fromshallow to deep layers respectively, and different colors indicate differentspatial weights.



Fig. 4. The learning curves on the dermoscopy image training and validation sets by utilizing 2,594 labeled data, the blue and red curves representU-Net++ employing DyBAC and the gray and yellow curves representU-Net++ employing the standard convolution. (a) The accuracy curveof training and validation sets on the dermoscopy image dataset and(b) The loss curve of training and validation sets on the dermoscopyimage dataset.

Fig. 4. 利用2,594个标记数据在皮肤镜图像训练集和验证集上的学习曲线,蓝色和红色曲线代表使用DyBAC的U-Net++,灰色和黄色曲线代表使用标准卷积的U-Net++。(a) 皮肤镜图像数据集上训练集和验证集的准确率曲线,以及 (b) 皮肤镜图像数据集上训练集和验证集的损失曲线。


Fig. 5. Visualization result of different methods on the LiTS testing set by utilizing 10% labeled data of training set. Green is the ground truth, red isthe segmentation result, and yellow is the overlap region of the segmentation result and ground truth. Therefore, fewer green and red regions implybetter segmentation results.



Fig. 6. Visualization result of different methods on the dermoscopy image validation set by utilizing 20% labeled data of training set.

图. 6. 利用训练集20%标记数据的皮肤镜图像验证集上不同方法的可视化结果。


Fig. 7. Visualization result of different methods on the left atriumvalidation set by utilizing 10% and 20% of the labeled data in the trainingset, respectively.

图. 7. 分别利用训练集中10%和20%标记数据的左心房验证集上不同方法的可视化结果。



TABLE I comparison of ablation experiments on the lits-liver testing set by utilizing 10% labeled data of the training set. the best values are in bold

表1 比较在LITS肝脏测试集上利用训练集10%标记数据的消融实验结果。最佳数值用粗体表示。


TABLE II  comparison of ablation experiments on the dermoscopy image validation set utilizing different proportions of labeled data from the training set. the best values are in bold

表II 在皮肤镜图像验证集上利用不同比例的训练集标记数据进行消融实验的比较。最佳数值用粗体表示。


TABLE III  comparison of ablation experiments on the left atrium validation set by utilizing 10% labeled data of training set. the best values are in bold

表 III 在左心房验证集上利用训练集10%标记数据的消融实验比较。最佳数值用粗体表示。


TABLE IV  quantitative comparison between our method and other comparison methods on the lits-liver testing set by utilizing 10% labeled data of training set. the backbone network of all evaluated methods is u-net. the best values are in bold

表 IV利用训练集10%标记数据在LITS肝脏测试集上我们方法与其他比较方法的定量比较。所有评估方法的骨干网络均为U-Net。最佳数值用粗体表示。


TABLE V quantitative comparison between our method and other comparison methods on the lits-liver test dataset by utilizing 20% labeled data of train dataset. the backbone network of all evaluated methods is u-net. the best values are in bold

表V 利用训练集20%标记数据在LITS肝脏测试数据集上我们方法与其他比较方法的定量比较。所有评估方法的骨干网络均为U-Net。最佳数值用粗体表示。


TABLE VI  quantitative comparison between our method and other comparison methods on the dermoscopy image validation set by utilizing 10% labeled data of the training set. the backbone network of all evaluated methods is u-net++. the best values are in bold

表VI 我们方法与其他比较方法在利用训练集10%标记数据的皮肤镜图像验证集上的定量比较。所有评估方法的骨干网络均为U-Net++。最佳数值用粗体表示。

TABLE VII  quantitative comparison between our method and other comparison methods on the dermoscopy image validation set by utilizing 20% labeled data of the training set. the backbone network of all evaluated methods is u-net++. the best values are in bold

表 VII  利用训练集20%标记数据的皮肤镜图像验证集上我们方法与其他比较方法的定量比较。所有评估方法的骨干网络均为U-Net++。最佳数值用粗体表示。


TABLE VIII quantitative comparison between our method and other comparison methods on the left atrium validation set by utilizing 10% labeled data of training set. the backbone network of all evaluated methods is v-net. the best values are in bold

表VIII  利用训练集10%标记数据的左心房验证集上我们方法与其他比较方法的定量比较。所有评估方法的骨干网络均为V-Net。最佳数值用粗体表示。


TABLE IX quantitative comparison between our method and other comparison methods on the left atrium validation set by utilizing 20% labeled data of training set. the backbone network of all evaluated methods is v-net. the best values are in bold

表 IX利用训练集20%标记数据的左心房验证集上我们方法与其他比较方法的定量比较。所有评估方法的骨干网络均为V-Net。最佳数值用粗体表示。


TABLE X comparison of the efficiency of different networks, the best values are in bold



TABLE XI statistical significance of the proposed ase-net and baseline mt methods on different datasets



