Self-supervised Learning

Context Many specific tasks in Computer Vision, such as object detection (e.g., YOLOv71), image classification (e.g., ResNet-50), or semantic segmentation (e.g., UNet, Swin Transformer) have reached astonishing results in the last years. This has been possible mainly because large (N > \(10^6\)), labeled data-sets were easily accessible and freely available. However, many applications in medical imaging lack such large datasets and annotation might be very time-consuming and difficult even for experienced medical doctors. For instance, predicting mental or neurodevelopmental disorders from neuroanatomical imaging data (e.g., T1-w MRI) has not yet achieved the expected results (i.e., AUC ≥ 90 (Dufumier et al., 2024)). Furthermore, recent studies yielded contradictory results when comparing Deep Learning with Standard Machine Learning (SML) on top of classical feature extraction (Dufumier et al., 2024).

Challenges The first challenge concerns the small number of pathological samples. In supervised learning, when dealing with a small labeled dataset, the most used and well-known solution is supervised Transfer Learning from ImageNet (or other large vision datasets). However, it has been recently shown that this strategy is useful, namely features are re-used, only when there is a high visual similarity between the pre-train and target domain (e.g., low Fréchet inception distance (FID)). This is not the case when comparing natural and medical images. Furthermore, many medical images, and in particular brain MRI scans, are 3D volumes, differently from the 2D images of ImageNet. This entails a great domain gap between the large labeled datasets used in computer vision and medical images. Another approach comprises self-supervised learning (SSL) methods which leverage an annotation-free pretext task to provide a surrogate supervision signal for feature learning. Nonetheless, these methods still need large (unannotated) datasets, which should comprise, to reduce the domain gap, data similar to the ones in the (labeled) target dataset, namely pathological patients. However, the large majority of images currently stored in hospitals and clinical laboratories belong to healthy subjects. Indeed, the largest datasets currently available (e.g., UKBioBank and OpenBHB) mostly contain data of healthy subjects. Furthermore, these datasets usually comprise one or multiple imaging modalities, as well as clinical data, such as age, gender and weight. The research challenge thus becomes how to leverage large datasets of healthy subjects and combine the heterogeneous sources of information (i.e., clinical and imaging data) to improve the diagnostic and understanding of patients.

A second challenge concerns the data biases. In our work, we define data biases as the visual patterns that correlate with the target task and/or are easy to learn, but are not relevant for the target task. For instance, the site effect in MRI images refers to systematic variations or discrepancies in feature distributions across different imaging sites, that arise from differences in equipment, protocols, or settings , and are not related to a disease (i.e., target task). When working with MRI samples in a binary classification problem (healthy Vs patients), these spurious differences can be visually more accentuated, and thus easy to learn, than the relevant differences between the two classes. This can result in a biased model, whose predictions majorly rely on the bias attributes and not on the true, generalizable, and discriminative features.

In this project, we propose a new paradigm where, instead than learning a network from scratch using a small pathological dataset, we first pre-train it on a large dataset of healthy subjects, thus learning a representation space describing the healthy population. We leverage our new weakly-supervised contrastive loss, combining clinical, biological and imaging data. Then, we transfer it and fine-tune it on the small pathological dataset.

Contributions In this project, we have proposed a new geometric approach for contrastive learning (Barbano et al., 2023) that can be used in different settings:

  1. unsupervised (i.e., no labels) (Sarfati et al., 2023), (Ruppli et al., 2022), (Barbano et al., 2023),
  2. supervised (i.e., class labels) (Barbano et al., 2023), and
  3. weakly-supervised (i.e., weak attributes or regression) (Dufumier et al., 2021) (Barbano et al., 2023) (Dufumier et al., 2023), (Ruppli et al., 2023), (Dufumier et al., 2021).

It is well adapted to integrate prior information, such as weak attributes or representations learned from generative models, and can thus be used to learn a representation of the healthy population by leveraging both clinical and imaging data.

In a) we show a visual explanation of the proposed geometric approach for Contrastive Learning. We aim at increasing the minimal margin ϵ, between the distance d+ of a positive sample x+ (+ symbol inside and yellow color) from an anchor x and the distance d− of the closest negative sample x− (− symbol inside and blue color). By increasing the margin, we can achieve a better separation between positive and negative samples. In b) and c), we show two different scenarios without margin (b) and with margin (c). Filling colors of datapoints represent different biases. In both b) and c) the contrastive conditions are fulfilled and thus the loss is minimized (i.e., positives are closer to the anchor than the negatives). However, we observe that, without imposing a margin, biased clusters might appear containing both positive and negative samples (b). This issue can be mitigated by increasing the ϵ margin (c) and using the proposed regularization loss Fair KL (Barbano et al., 2023).
Using standard Contrastive Learning losses (e.g., SimCLR), samples are uniformly scattered in the representation space, without considering their clinical variables (metadata, y), such as age. By employing our weakly-supervised losses, two subjects with similar metadata are mapped closer in the representation space compared to two subjects with different metadata.

Based on the proposed geometric approach, we show why recent contrastive losses (InfoNCE, SupCon, etc.) can fail when dealing with biased data and derive a new debiasing regularization loss, that work well even with extremely biased data. (Barbano et al., 2023). You can find a visual explanation below using the color-MNIST dataset.

A positive bias-aligned sample x+,b is semantically similar (positive) to the anchor (same digit) but it has also the same bias b (yellow color). A positive bias-conflicting sample shares the same digit but it has a different bias b′(different color). Here, the color is defined as a data bias since it’s a visual feature that is correlated with the semantic content related to the target task (digit recognition), but it doesn’t characterize it.
In (Barbano et al., 2023), we propose the FairKL regularization term for debiasing. Ideally, we would like that the distances between all positive (resp. negative) samples and the anchor, whatever their bias, are equal. However, this condition is very strict, as it would enforce uniform distance among all positive (resp. negative) samples. We have proposed a more relaxed condition where we force the distributions of distances of positives with different biases to be similar (same for negative). Assuming that the distance distributions follow a normal distribution, we propose to minimize the Kullback-Leibler divergence of the two distributions obtaining a closed form solution that we called FairKL.

References

2024

  1. NeuroImage
    Exploring the potential of representation and transfer learning for anatomical neuroimaging: Application to psychiatry
    Benoit Dufumier, Pietro Gori, Sara Petiton, Robin Louiset, Jean-François Mangin, Antoine Grigis, and Edouard Duchesnay
    NeuroImage, 2024

2023

  1. ICLR
    Unbiased Supervised Contrastive Learning
    Carlo Alberto Barbano, Benoit Dufumier, Enzo Tartaglione, Marco Grangetto, and Pietro Gori
    In The Eleventh International Conference on Learning Representations (ICLR), 2023
  2. IEEE ISBI
    Learning to diagnose cirrhosis from radiological and histological labels with joint self and weakly-supervised pretraining strategies
    Emma Sarfati, Alexandre Bone, Marc-Michel Rohe, Pietro Gori, and Isabelle Bloch
    In IEEE 20th International Symposium on Biomedical Imaging (ISBI), 2023
  3. IEEE ISBI
    Contrastive learning for regression in multi-site brain age prediction
    Carlo Alberto Barbano, Benoit Dufumier, Edouard Duchesnay, Marco Grangetto, and Pietro Gori
    In IEEE 20th International Symposium on Biomedical Imaging (ISBI), 2023
  4. ICML
    Integrating Prior Knowledge in Contrastive Learning with Kernel
    Benoit Dufumier, Carlo Alberto Barbano, Robin Louiset, Edouard Duchesnay, and Pietro Gori
    In International Conference on Machine Learning (ICML), 2023
  5. MICCAI-W
    Decoupled conditional contrastive learning with variable metadata for prostate lesion detection
    Camille Ruppli, Pietro Gori, Roberto Ardon, and Isabelle Bloch
    In MILLanD workshop (MICCAI), 2023

2022

  1. MICCAI-W
    Optimizing Transformations for Contrastive Learning in a Differentiable Framework
    Camille Ruppli, Pietro Gori, Roberto Ardon, and Isabelle Bloch
    In Medical Image Learning with Limited and Noisy Data - MILLanD (Workshop MICCAI), 2022

2021

  1. MICCAI
    Contrastive Learning with Continuous Proxy Meta-data for 3D MRI Classification
    Benoit Dufumier, Pietro Gori, Julie Victor, Antoine Grigis, and Edouard Duchesnay
    In Medical Image Computing and Computer Assisted Intervention - MICCAI, 2021
  2. MedNeurIPS
    Conditional Alignment and Uniformity for Contrastive Learning with Continuous Proxy Labels
    Benoit Dufumier, Pietro Gori, Julie Victor, Antoine Grigis, and Edouard Duchesnay
    In MedNeurIPS, Workshop NeurIPS, 2021