Generative Machine Learning for Drug Discovery

Project: FDCRGP

Project Details

Grant Program

Faculty-development competitive research grants program for 2023-2025

Project Description

While generative ML models as well as DTA binding prediction models have shown great promise for drug discovery and beyond, we have identified a number of important open key issues that need to be addressed in order to strengthen the usefulness and impact of this young applied cross-disciplinary field of computer science:
● Validation of Transmol53 for the de novo molecule generation on a gene regulatory target
example: We have recently developed a novel molecule generation algorithm, which is based on a
variant of transformers, an architecture originally developed for natural language processing. The
analogs generated by Transmol will be validated on the example of the VDR which is an important
target for the prevention and treatment of various diseases.
● Introduction of graph-based molecular similarity measures for improved benchmarking: A
novel metric based on Levenshtein distance (string edit distance) and/or extended reduced graphs, a compact molecular encoding that preserves structural information, that could be applied to SMILES strings will be explored as a remedy for the malfunctioning of MOSES30 benchmark.
● Machine-learning assisted prediction of drug-target affinity binding: A novel deep learning based model, a combination of convolutional and graph neural networks, will be implemented. Various ligand and protein representations will be experimented with, and data fusion approach will be applied on both feature and model output levels.
● Transfer of kinase affinity prediction to the general case of drug-target affinity prediction: The
validated algorithm that perform the best for Davis and KIBA datasets will be applied to other well
studies protein families such as GPSRs, phosphatases and/or nuclear receptors to test their applicability outside of the originally used datasets.
● Biological validation of newly designed, synthetized VDR analogs: As mentioned above, the evaluations of generative ML technology for molecule generation is generally performed in silico. However, only experimental work in the wet lab can ultimately find, whether these novel approaches will prevail for drug discovery. Therefore, we are planning to not only generate and identify lead compounds, but also to synthesize and biologically test them for two particular application scenarios: biochemical and functional assays.

Project Impact

Technical outcomes of the project are summarized below:
● Creation of an attention-based molecular generation algorithm, which creates focussed libraries based on certain pharmacophores or diverse libraries for chemical space exploration
● Validation of Transmol, our recently developed attention-based molecular generation algorithm, which creates focussed libraries based on certain pharmacophores, using known and newly generated VDR analogs
● Establishment of a superior graph-based molecular similarity metric for benchmarking the outcome of generative ML algorithms
● Improvement of current-state-of-the-art kinase affinity prediction models by multi-modal data fusion techniques
● Creation of an ML model, which performs affinity-binding prediction that is applicable to diverse protein families
● Creation of a comprehensive database of likely binders for various protein target families such GPCRs, NRs, kinases or proteases
● The possibility of generating biologically active VDR analogs with therapeutic potential that can be further evaluated and studied in vivo in the future
Effective start/end date1/1/2312/31/25


  • Machine Learning
  • Computational Chemistry
  • Drug Discovery
  • Generative Machine Learning
  • Biology
  • Cheminformatics
  • DTA binding prediction


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.