Learning Representations of Satellite Images From Metadata Supervision

Jules Bourcier; Gohar Dashyan; Karteek Alahari; Jocelyn Chanussot

Communication Dans Un Congrès Année : 2024

Learning Representations of Satellite Images From Metadata Supervision

(1, 2) , (2) , (1) , (1)

1
2

Jules Bourcier

Fonction : Auteur
PersonId : 1165029
IdHAL : jules-bourcier
ORCID : 0000-0001-8577-7297

Apprentissage de modèles à partir de données massives

Preligens [Paris]

Gohar Dashyan

Fonction : Auteur

Preligens [Paris]

Karteek Alahari

Fonction : Auteur
PersonId : 19670
IdHAL : karteek
ORCID : 0000-0002-1838-5936
IdRef : 196283892

Apprentissage de modèles à partir de données massives

Jocelyn Chanussot

Fonction : Auteur
PersonId : 21313
IdHAL : jocelyn-chanussot
ORCID : 0000-0003-4817-2875
IdRef : 104503181

Apprentissage de modèles à partir de données massives

Résumé

Self-supervised learning is increasingly applied to Earth observation problems that leverage satellite and other remotely sensed data. Within satellite imagery, metadata such as time and location often hold significant semantic information that improves scene understanding. In this paper, we introduce Satellite Metadata-Image Pretraining (SatMIP), a new approach for harnessing metadata in the pretraining phase through a flexible and unified multimodal learning objective. SatMIP represents metadata as textual captions and aligns images with metadata in a shared embedding space by solving a metadata-image contrastive task. Our model learns a non-trivial image representation that can effectively handle recognition tasks. We further enhance this model by combining image self-supervision and metadata supervision, introducing SatMIPS. As a result,SatMIPS improves over its image-image pretraining baseline, SimCLR, and accelerates convergence. Comparison against four recent contrastive and masked autoencoding-based methods for remote sensing also highlight the efficacy of our approach. Furthermore, our framework enables multimodal classification with metadata to improve the performance of visual features, and yields more robust hierarchical pretraining. Code and pretrained models will be made available at: https://github.com/preligens-lab/satmip.

Mots clés

self-supervised learning multimodal learning remote sensing

Domaines

Vision par ordinateur et reconnaissance de formes [cs.CV]

Fichier principal

base-main-supp.pdf (3.36 Mo)

Origine	Fichiers produits par l'(les) auteur(s)
licence	Paternité

Jules Bourcier : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04709749

Soumis le : mercredi 25 septembre 2024-18:49:04

Dernière modification le : mardi 8 octobre 2024-03:22:44

Dates et versions

hal-04709749 , version 1 (25-09-2024)

Licence

Paternité

Identifiants

HAL Id : hal-04709749 , version 1

Citer

Jules Bourcier, Gohar Dashyan, Karteek Alahari, Jocelyn Chanussot. Learning Representations of Satellite Images From Metadata Supervision. ECCV 2024 - 18th European Conference on Computer Vision, European Computer Vision Association, Oct 2024, Milano, Italy. pp.1-30. ⟨hal-04709749⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS INRIA INSMI LJK LJK_GI INRIA2 GENCI LJK-GI-THOTH MIAI ANR

209 Consultations

74 Téléchargements

Learning Representations of Satellite Images From Metadata Supervision

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Partager