Self-supervised Learning

Context Many specific tasks in Computer Vision, such as object detection (e.g., YOLOv71), image classification (e.g., ResNet-50), or semantic segmentation (e.g., UNet, Swin Transformer) have reached astonishing results in the last years. This has been possible mainly because large (N > \(10^6\)), labeled data-sets were easily accessible and freely available. However, many applications in medical imaging lack such large datasets and annotation might be very time-consuming and difficult even for experienced medical doctors. For instance, predicting mental or neurodevelopmental disorders from neuroanatomical imaging data (e.g., T1-w MRI) has not yet achieved the expected results (i.e., AUC ≥ 90 (Dufumier et al., 2024)). Furthermore, recent studies yielded contradictory results when comparing Deep Learning with Standard Machine Learning (SML) on top of classical feature extraction (Dufumier et al., 2024).

Challenges The first challenge concerns the small number of pathological samples. In supervised learning, when dealing with a small labeled dataset, the most used and well-known solution is supervised Transfer Learning from ImageNet (or other large vision datasets). However, it has been recently shown that this strategy is useful, namely features are re-used, only when there is a high visual similarity between the pre-train and target domain (e.g., low Fréchet inception distance (FID)). This is not the case when comparing natural and medical images. Furthermore, many medical images, and in particular brain MRI scans, are 3D volumes, differently from the 2D images of ImageNet. This entails a great domain gap between the large labeled datasets used in computer vision and medical images. Another approach comprises self-supervised learning (SSL) methods which leverage an annotation-free pretext task to provide a surrogate supervision signal for feature learning. Nonetheless, these methods still need large (unannotated) datasets, which should comprise, to reduce the domain gap, data similar to the ones in the (labeled) target dataset, namely pathological patients. However, the large majority of images currently stored in hospitals and clinical laboratories belong to healthy subjects. Indeed, the largest datasets currently available (e.g., UKBioBank and OpenBHB) mostly contain data of healthy subjects. Furthermore, these datasets usually comprise one or multiple imaging modalities, as well as clinical data, such as age, gender and weight. The research challenge thus becomes how to leverage large datasets of healthy subjects and combine the heterogeneous sources of information (i.e., clinical and imaging data) to improve the diagnostic and understanding of patients.

A second challenge concerns the data biases. In our work, we define data biases as the visual patterns that correlate with the target task and/or are easy to learn, but are not relevant for the target task. For instance, the site effect in MRI images refers to systematic variations or discrepancies in feature distributions across different imaging sites, that arise from differences in equipment, protocols, or settings , and are not related to a disease (i.e., target task). When working with MRI samples in a binary classification problem (healthy Vs patients), these spurious differences can be visually more accentuated, and thus easy to learn, than the relevant differences between the two classes. This can result in a biased model, whose predictions majorly rely on the bias attributes and not on the true, generalizable, and discriminative features.

In this project, we propose a new paradigm where, instead than learning a network from scratch using a small pathological dataset, we first pre-train it on a large dataset of healthy subjects, thus learning a representation space describing the healthy population. We leverage our new weakly-supervised contrastive loss, combining clinical, biological and imaging data. Then, we transfer it and fine-tune it on the small pathological dataset.

Contributions In this project, we have proposed a new geometric approach for contrastive learning (Barbano et al., 2023) that can be used in different settings:

unsupervised (i.e., no labels) (Sarfati et al., 2023), (Ruppli et al., 2022), (Barbano et al., 2023),
supervised (i.e., class labels) (Barbano et al., 2023), and
weakly-supervised (i.e., weak attributes or regression) (Dufumier et al., 2021) (Barbano et al., 2023) (Dufumier et al., 2023), (Ruppli et al., 2023), (Dufumier et al., 2021).

It is well adapted to integrate prior information, such as weak attributes or representations learned from generative models, and can thus be used to learn a representation of the healthy population by leveraging both clinical and imaging data.

In a) we show a visual explanation of the proposed geometric approach for Contrastive Learning. We aim at increasing the minimal margin ϵ, between the distance d+ of a positive sample x+ (+ symbol inside and yellow color) from an anchor x and the distance d− of the closest negative sample x− (− symbol inside and blue color). By increasing the margin, we can achieve a better separation between positive and negative samples. In b) and c), we show two different scenarios without margin (b) and with margin (c). Filling colors of datapoints represent different biases. In both b) and c) the contrastive conditions are fulfilled and thus the loss is minimized (i.e., positives are closer to the anchor than the negatives). However, we observe that, without imposing a margin, biased clusters might appear containing both positive and negative samples (b). This issue can be mitigated by increasing the ϵ margin (c) and using the proposed regularization loss Fair KL (Barbano et al., 2023).

Using standard Contrastive Learning losses (e.g., SimCLR), samples are uniformly scattered in the representation space, without considering their clinical variables (metadata, y), such as age. By employing our weakly-supervised losses, two subjects with similar metadata are mapped closer in the representation space compared to two subjects with different metadata.

Based on the proposed geometric approach, we show why recent contrastive losses (InfoNCE, SupCon, etc.) can fail when dealing with biased data and derive a new debiasing regularization loss, that work well even with extremely biased data. (Barbano et al., 2023). You can find a visual explanation below using the color-MNIST dataset.

A positive bias-aligned sample x+,b is semantically similar (positive) to the anchor (same digit) but it has also the same bias b (yellow color). A positive bias-conflicting sample shares the same digit but it has a different bias b′(different color). Here, the color is defined as a data bias since it’s a visual feature that is correlated with the semantic content related to the target task (digit recognition), but it doesn’t characterize it.

In (Barbano et al., 2023), we propose the FairKL regularization term for debiasing. Ideally, we would like that the distances between all positive (resp. negative) samples and the anchor, whatever their bias, are equal. However, this condition is very strict, as it would enforce uniform distance among all positive (resp. negative) samples. We have proposed a more relaxed condition where we force the distributions of distances of positives with different biases to be similar (same for negative). Assuming that the distance distributions follow a normal distribution, we propose to minimize the Kullback-Leibler divergence of the two distributions obtaining a closed form solution that we called FairKL.

References

2024

NeuroImage

Exploring the potential of representation and transfer learning for anatomical neuroimaging: Application to psychiatry

Benoit Dufumier, Pietro Gori, Sara Petiton, Robin Louiset, Jean-François Mangin, Antoine Grigis, and Edouard Duchesnay

NeuroImage, 2024

DOI Paper

2023

ICLR

Unbiased Supervised Contrastive Learning

Carlo Alberto Barbano, Benoit Dufumier, Enzo Tartaglione, Marco Grangetto, and Pietro Gori

In The Eleventh International Conference on Learning Representations (ICLR), 2023

Abstract Paper Code

Many datasets are biased, namely they contain easy-to-learn features that are highly correlated with the target class only in the dataset but not in the true underlying distribution of the data. For this reason, learning unbiased models from biased data has become a very relevant research topic in the last years. In this work, we tackle the problem of learning representations that are robust to biases. We first present a margin-based theoretical framework that allows us to clarify why recent contrastive losses (InfoNCE, SupCon, etc.) can fail when dealing with biased data. Based on that, we derive a novel formulation of the supervised contrastive loss ({}epsilon\-SupInfoNCE), providing more accurate control of the minimal distance between positive and negative samples. Furthermore, thanks to our theoretical framework, we also propose FairKL, a new debiasing regularization loss, that works well even with extremely biased data. We validate the proposed losses on standard vision datasets including CIFAR10, CIFAR100, and ImageNet, and we assess the debiasing capability of FairKL with {}epsilon\-SupInfoNCE, reaching state-of-the-art performance on a number of biased datasets, including real instances of biases "in the wild".
IEEE ISBI

Learning to diagnose cirrhosis from radiological and histological labels with joint self and weakly-supervised pretraining strategies

Emma Sarfati, Alexandre Bone, Marc-Michel Rohe, Pietro Gori, and Isabelle Bloch

In IEEE 20th International Symposium on Biomedical Imaging (ISBI), 2023

Abstract DOI Paper

Identifying cirrhosis is key to correctly assess the health of the liver. However, the gold standard diagnosis of the cirrhosis needs a medical intervention to obtain the histological confirmation, e.g. the METAVIR score, as the radiological presentation can be equivocal. In this work, we propose to leverage transfer learning from large datasets annotated by radiologists, which we consider as a weak annotation, to predict the histological score available on a small annex dataset. To this end, we propose to compare different pretraining methods, namely weakly-supervised and self-supervised ones, to improve the prediction of the cirrhosis. Finally, we introduce a loss function combining both supervised and self-supervised frameworks for pretraining. This method outperforms the baseline classification of the METAVIR score, reaching an AUC of 0.84 and a balanced accuracy of 0.75, compared to 0.77 and 0.72 for a baseline classifier.
IEEE ISBI

Contrastive learning for regression in multi-site brain age prediction

Carlo Alberto Barbano, Benoit Dufumier, Edouard Duchesnay, Marco Grangetto, and Pietro Gori

In IEEE 20th International Symposium on Biomedical Imaging (ISBI), 2023

Abstract Paper Code

Building accurate Deep Learning (DL) models for brain age prediction is a very relevant topic in neuroimaging, as it could help better understand neurodegenerative disorders and find new biomarkers. To estimate accurate and generalizable models, large datasets have been collected, which are often multi-site and multi-scanner. This large heterogeneity negatively affects the generalization performance of DL models since they are prone to overfit site-related noise. Recently, contrastive learning approaches have been shown to be more robust against noise in data or labels. For this reason, we propose a novel contrastive learning regression loss for robust brain age prediction using MRI scans. Our method achieves state-of-the-art performance on the OpenBHB challenge, yielding the best generalization capability and robustness to site-related noise.
ICML

Integrating Prior Knowledge in Contrastive Learning with Kernel

Benoit Dufumier, Carlo Alberto Barbano, Robin Louiset, Edouard Duchesnay, and Pietro Gori

In International Conference on Machine Learning (ICML), 2023

Paper Code
MICCAI-W

Decoupled conditional contrastive learning with variable metadata for prostate lesion detection

Camille Ruppli, Pietro Gori, Roberto Ardon, and Isabelle Bloch

In MILLanD workshop (MICCAI), 2023

Paper Code

2022

MICCAI-W

Optimizing Transformations for Contrastive Learning in a Differentiable Framework

Camille Ruppli, Pietro Gori, Roberto Ardon, and Isabelle Bloch

In Medical Image Learning with Limited and Noisy Data - MILLanD (Workshop MICCAI), 2022

Abstract DOI Paper

Current contrastive learning methods use random transformations sampled from a large list of transformations, with fixed hyper-parameters, to learn invariance from an unannotated database. Following previous works that introduce a small amount of supervision, we propose a framework to find optimal transformations for contrastive learning using a differentiable transformation network. Our method increases performances at low annotated data regime both in supervision accuracy and in convergence speed. In contrast to previous work, no generative model is needed for transformation optimization. Transformed images keep relevant information to solve the supervised task, here classification. Experiments were performed on 34000 2D slices of brain Magnetic Resonance Images and 11200 chest X-ray images. On both datasets, with 10% of labeled data, our model achieves better performances than a fully supervised model with 100% labels.

2021

MICCAI

Contrastive Learning with Continuous Proxy Meta-data for 3D MRI Classification

Benoit Dufumier, Pietro Gori, Julie Victor, Antoine Grigis, and Edouard Duchesnay

In Medical Image Computing and Computer Assisted Intervention - MICCAI, 2021

Abstract Paper Code

Traditional supervised learning with deep neural networks requires a tremendous amount of labelled data to converge to a good solution. For 3D medical images, it is often impractical to build a large homogeneous annotated dataset for a specific pathology. Self-supervised methods offer a new way to learn a representation of the images in an unsupervised manner with a neural network. In particular, contrastive learning has shown great promises by (almost) matching the performance of fully-supervised CNN on vision tasks. Nonetheless, this method does not take advantage of available meta-data, such as participant’s age, viewed as prior knowledge. Here, we propose to leverage continuous proxy metadata, in the contrastive learning framework, by introducing a new loss called y-Aware InfoNCE loss. Specifically, we improve the positive sampling during pre-training by adding more positive examples with similar proxy meta-data with the anchor, assuming they share similar discriminative semantic features. With our method, a 3D CNN model pre-trained on \\10^4\\104multi-site healthy brain MRI scans can extract relevant features for three classification tasks: schizophrenia, bipolar diagnosis and Alzheimer’s detection. When fine-tuned, it also outperforms 3D CNN trained from scratch on these tasks, as well as state-of-the-art self-supervised methods. Our code is made publicly available here.
MedNeurIPS

Conditional Alignment and Uniformity for Contrastive Learning with Continuous Proxy Labels

Benoit Dufumier, Pietro Gori, Julie Victor, Antoine Grigis, and Edouard Duchesnay

In MedNeurIPS, Workshop NeurIPS, 2021

Abstract Paper

Contrastive Learning has shown impressive results on natural and medical images, without requiring annotated data. However, a particularity of medical images is the availability of meta-data (such as age or sex) that can be exploited for learning representations. Here, we show that the recently proposed contrastive y-Aware InfoNCE loss, that integrates multi-dimensional meta-data, asymptotically optimizes two properties: conditional alignment and global uniformity. Similarly to [Wang, 2020], conditional alignment means that similar samples should have similar features, but conditionally on the meta-data. Instead, global uniformity means that the (normalized) features should be uniformly distributed on the unit hyper-sphere, independently of the meta-data. Here, we propose to define conditional uniformity, relying on the meta-data, that repel only samples with dissimilar meta-data. We show that direct optimization of both conditional alignment and uniformity improves the representations, in terms of linear evaluation, on both CIFAR-100 and a brain MRI dataset.