February Journal Club: machine learning strategies in drug discovery

You are here

In the recent years, advances in Artificial Intelligence (AI) and Machine Learning (ML) have flooded the popular press and captured the attention of professionals from different backgrounds. A wide variety of applications of ML have sprung out and have entered with determination in the drug discovery arena.

However, the use of machine learning approaches and particularly the so-called artificial neural networks (ANN) in drug discovery is not new. Initially inspired in the biological networks of our brains, ANNs first steps in this field date back to early 1970, when Hiller et al. attempted to classify molecules by their activity. Since then, progress has been made to the extent of developing ANN-based methodologies for de novo peptide design.

In this case, Müller T et al. presented a model of recurrent ANN (RNN) with incorporated long short-term memory (LSTM) units whose ultimate goal was the identification of new molecules with desired properties. This type of models can capture long-range dependencies and even sequence correlations. Given that this learning approach had been successfully applied in language processing and speech recognition, authors hypothesized that the aminoacidic grammar of a set of given proteins and peptides could be learnt by LSTM networks in order to generate new sequences.


*Fig 1. Long Short Term Memory (LSTM) Recurrent Neural Network. 
Schematic representation of how models based on RNNs with incorporated LSTM memory cells work. Internal interpretation of training peptides precedes de novo sequence design.


In order to demonstrate the potential of this approach, the model was trained with 1554 linear antimicrobial peptides (AMPs) of 7-48 amino acid residues forming amphipathic helices, a motif deemed as relevant for the pursued antimicrobial activity. Results obtained with this generative model showed that the created peptides are a product of internal interpretation of the training data, given that none of the generated sequences are identical to any training peptide. The sequence generation with RNNs performed significantly better in approximating the AMP sequence space in comparison to other drug design approaches such as random or rule-based peptide design. With this, authors demonstrated the first application of an LSTM RNN to de novo sequence design for natural peptides in combination with predictive models trained on suitable molecular descriptors to evaluate the quality of the generated sequences.

Another recent publication on the topic addressed the Blood-Brain Barrier (BBB) permeability prediction using several machine learning strategies. The BBB is a neurovascular structure that protects the Central Nervous System (CNS) separating it from the bloodstream. Its integrity is essential for the CNS physiological functioning, given that it prevents the entrance of large and small molecules and allows the transport of water and lipid-soluble molecules and selective transport. Determining the BBB permeability of a compound is a critical step in ADMET characterization. For this purpose, the most common methods to determine the BBB permeability are in vitro assays, such as the parallel artificial membrane permeability and the immobilized artificial membrane technique.

In this article, Wang Z et al. propose new methodologies to predict in silico the permeability of small compounds under development. Authors evaluated 5 different methods (random undersampling, SMOTE, ADASYN, SMOTE+ENN, weight loss function) to overcome an initial source problem: the imbalanced number between BBB permeable (BBB+) and impermeable (BBB-) available compounds. Then, 6 types of machine learning algorithms were used to conceive the best model to classify molecules in BBB+/BBB-. After evaluating their respective effects on prediction accuracy and constructing the final consensus model based on 2358 compounds, they achieved an accuracy rate of 0.9456 with great sensitivity to predict BBB+ and specificity to distinguish BBB- compounds in comparison to other BBB classification models. These permeability predictor models for small molecules (molecular weight <1000Da) were afterwards made publicly available through the web server of the research team.

These two examples are only a few of the large number of multiple publications done in the last year about how AI-based tools can guide drug discovery and improve its success rates at less time and cost. At Iproteos, we believe in the advantages of combining AI and ML techniques to accelerate drug development decisions. For that reason, continuous improvements of our physics-based proprietary platform IPROTech includes the newest methodological AI and ML methods to in silico engineer new synthetically accessible compounds to target intracellular proteins or complexes, with no Intellectual Property problems. Up to the date, our AI-IPROTech platform has allowed the development of compounds with excellent BBB permeability for CNS indications, such as Epilepsy and Cognitive Impairment associated to Schizophrenia and Parkinson’s disease, but also the development of multiple oncology programs on challenging and pharmaceutically elusive protein targets.

For more information about how our AI-IPROTech platform works, please feel free to contact us.


*Figure from Alex T Müller et al. J Chem Inf Model 2018;58(2):472-479.



Müller A, Hiss J, Schneider G. Recurrent Neural Network Model for Constructive Peptide Design. J Chem Inf Model. 2018;58(2):472-479.

Wang Z, Yang H, Wu Z, Wang T, Li W, Tang Y et al. In Silico Prediction of Blood-Brain Barrier Permeability of Compounds by Machine Learning and Resampling Methods. ChemMedChem. 2018;13(20):2189-2201.