Moleco: Molecular Contrastive Learning with Chemical Language Models for Molecular Property Prediction

  • Jun Hyung Park
  • , Hyuntae Park
  • , Yeachan Kim
  • , Woosang Lim
  • , Sang Keun Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Pre-trained chemical language models (CLMs) excel in the field of molecular property prediction, utilizing string-based molecular descriptors such as SMILES for learning universal representations. However, such string-based descriptors implicitly contain limited structural information, which is closely associated with molecular property prediction. In this work, we introduce Moleco, a novel contrastive learning framework to enhance the understanding of molecular structures within CLMs. Based on the similarity of fingerprint vectors among different molecules, we train CLMs to distinguish structurally similar and dissimilar molecules in a contrastive manner. Experimental results demonstrate that Moleco significantly improves the molecular property prediction performance of CLMs, outperforming state-of-the-art models.

Original languageEnglish
Title of host publicationEMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Industry Track
EditorsFranck Dernoncourt, Daniel Preotiuc-Pietro, Anastasia Shimorina
PublisherAssociation for Computational Linguistics (ACL)
Pages408-420
Number of pages13
ISBN (Electronic)9798891761667
DOIs
Publication statusPublished - 2024
Event2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024 - Hybrid, Miami, United States
Duration: 2024 Nov 122024 Nov 16

Publication series

NameEMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Industry Track

Conference

Conference2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024
Country/TerritoryUnited States
CityHybrid, Miami
Period24/11/1224/11/16

Bibliographical note

Publisher Copyright:
© 2024 Association for Computational Linguistics.

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Moleco: Molecular Contrastive Learning with Chemical Language Models for Molecular Property Prediction'. Together they form a unique fingerprint.

Cite this