Brief Bioinform" | Wu Song/Bai Chen team jointly develops molecular generation model InterDiff.
Recently, MoMed Biotech, in collaboration with Professor Wu Song's team from Shenzhen University, has developed a molecular generation model called InterDiff based on hotspots residues and interaction cues. This research was published in the top journal in the field of bioinformatics, Briefings in Bioinformatics. InterDiff utilizes a diffusion model as its core architecture and employs the technique of cue learning to guide the generation process of molecules, specifically targeting interactions with protein pocket residues.
Introduction:
Structure-based drug design (SBDD) aims to obtain molecules that exhibit high affinity and specificity towards a target, and it has become a commonly used and important approach in biomedical research. In this design strategy, understanding the binding modes between proteins and ligands is crucial for deciphering the workings of large molecular machines. Within the binding pocket of proteins for small molecules, there are a small number of residues called hotspots, which contribute the majority of the binding affinity. Mutations in these hotspot residues can significantly reduce binding affinity or even lead to drug resistance in patients. In modern drug development, hotspots play a central role in rational drug design, as researchers often aim for drugs to interact with these hotspots to achieve maximum efficacy.
Diffusion generative models are a relatively novel approach for generating new data by simulating natural diffusion processes. In molecular design, diffusion generative models have also been widely applied. However, existing diffusion models currently overlook the information of protein-ligand interactions, thus unable to customize the binding modes of generated molecules within protein pockets. Taking inspiration from related research in prompt learning techniques in natural language processing, Chenju Technology has developed a diffusion generative model based on prompt learning to customize the binding modes of generated molecules within protein pockets. InterDiff incorporates four learnable prompt embedding vectors to indicate the types of protein residue interactions, including π-π interactions, cation-π interactions, hydrogen bonding interactions, and halogen bonding interactions.
Research Testing:
To evaluate the ability of InterDiff in generating molecules with specific interactions, MoMed Biotech research team conducted tests using molecules from the CrossDocked2020 test set which consisted of 100 samples. Samples without detected interactions were excluded from the test set, leaving a total of 99 samples. As shown in Figure 1, compared to other methods, InterDiff can accurately design molecules with specified interactions under given interaction cues and achieved high accuracy. 3D-SBDD is the only method that comes close to InterDiff in terms of accuracy.
To investigate the effects of interaction cues, the research team conducted an additional test by removing all interaction cues (depicted as InterDiff_noprompt in Figure 1), and the results showed a significant decrease in accuracy in reproducing interactions. InterDiff achieved the highest accuracy when given 9 interaction cues and the lowest accuracy with 1 interaction cue. Under the condition of 9 interaction cues, 5 molecules produced interactions identical to the reference molecules, and 13 molecules matched 8 interactions.
Figure 1: Accuracy of InterDiff in designing specified interactions under different numbers of interaction cues, compared to classic methods including 3D-SBDD, TargetDiff, DiffSBDD, Pocket2Mol, and GraphBP.
In addition to evaluating the accuracy of InterDiff in designing interactions, MoMed Biotech explored the potential of InterDiff in designing drugs for real targets with given molecular binding modes. The authors selected two protein targets with different subtypes and used InterDiff to design molecules that have the same binding modes as native drugs. The first target is the muscarinic acetylcholine receptor (mAChR), which plays an important role in central nervous system diseases such as Alzheimer's disease and schizophrenia. Xanomeline is a commonly used clinical mAChR agonist, and studies have found that this agonist has similar binding affinities to all mAChR subtypes (M1-M5), but with varying activation effects. Recent research has shown differences in the binding modes of Xanomeline between the inactive and active states of mAChR. We used InterDiff to design molecules that bind to the M2 subtype of mAChR, targeting both the inactive and active state structures, and using the interaction pattern of Xanomeline as a generation condition.
The second target is KRAS, a frequently mutated gene in cancer that serves as an important target for cancer treatment, such as in lung, colorectal, and pancreatic cancers. Current KRAS inhibitors only target the KRAS G12C mutation, while non-G12C mutations account for a larger proportion of KRAS-driven cancers. Recently, Kim et al. reported a non-covalent inhibitor BI-2865 that can bind to a wide range of KRAS mutants. Similarly, researchers used InterDiff to design molecules targeting the KRAS protein, using the interaction pattern of BI-2865 in both KRAS G12C and another mutation G13D as a generation condition. We sampled 300 molecules for each state of the two targets and tested the molecular interactions after docking using QuickVina. As shown in Figure 2, we successfully obtained molecules with the same or more interactions as native drugs. Here, we selected four generated molecules for demonstration.
Figure 2: Comparison of the binding modes of molecules designed by InterDiff in mAchR and KRAS proteins with native drugs.
In addition to being applied in de novo molecule generation, InterDiff has also achieved fragment-based drug design (FBDD). Researchers at MoMed Biotech found that InterDiff can generate molecules that interact with specified residues based on given molecular fragments. This functionality is particularly useful in scenarios where medicinal chemists may want to optimize certain segments of a molecule while keeping the molecular scaffold intact. In the experiment, the same protein targets were used, and 100 molecular samples were taken for each of the two target states. Molecular fragments that interacted with protein residues in the native drugs were removed from the remaining molecular scaffolds (transparent parts in the purple molecules in Figure 3). The designed molecules, based on the molecular scaffold and interaction cues, are shown in Figure 3.
We successfully designed fragments that have the same interactions as the native drugs with M2 mAChRs. For the KRAS mutant G12C (PDB: 8azx) and KRAS wild type (PDB: 8azv), one out of the three interactions (ASP-69, hydrogen bond interaction) and two out of the four interactions (ASP-69, hydrogen bond interaction; TYR-64, cation-π interaction) were achieved. We noticed that the accuracy of achieving interactions in the completion mode of InterDiff was lower than in the pocket-conditioned generation mode. This could be because the model has to simultaneously estimate the positions of both protein and ligand atoms during the denoising step. In contrast, in the pocket-conditioned generation mode, only the positions of ligand atoms need to be estimated. When simultaneously estimating the atom types and positions of protein atoms, errors in the model's predictions can affect the accuracy of estimating ligand atom positions.
Figure 3: Design of new molecular fragments by InterDiff in mAchR and KRAS proteins with given drug scaffold fragments.
Link to the original research paper:
https://academic.oup.com/bib/article/25/3/bbae174/7655598
Hot News
-
2022-11-05
The Bai Chen Research Group Publishes Paper in the International Journal of Molecular Sciences
- 2022-12-20
- 2022-12-26
- 2024-02-07