Uncertainty Estimation with its Application to Inverse Protein Folding

Topic: 
Uncertainty Estimation with its Application to Inverse Protein Folding
Date & Time: 
Friday, October 27, 2023 - 09:00 to 10:00
Speaker: 
Guangyong Chen, Zhejiang Lab
Location: 
Hosted via Zoom (Meeting ID: 985 6222 2731; Passcode: 403508)

- Join Via Zoom -

Abstract

Deep learning models have attracted widespread attention due to their excellent prediction performance. It is well known that the training of deep learning models relies on a large number of high-quality samples, and their inference is also exposed to the risk of overfitting, so uncertainty estimation is a key factor for the reliable application of deep learning models. In this talk, we will introduce a new uncertainty estimation method, Fisher information-based evidence deep learning (IEDL). Specifically, we introduce the Fisher Information Matrix (FIM) to measure the amount of information of the evidence carried by each sample, based on which we can dynamically reweight the target loss term to make the network more concerned with the representation learning of those uncertainty categories. We further improve the generalization ability of the network by optimizing the PAC-Bayesian bound. We further demonstrate the gain introduced by uncertainty estimation information in a protein reverse design task. Specifically, by evaluating the uncertainty of deep model sequence design, we divide the protein reverse design task into multiple stages to design protein sequences step-by-step. We conduct experimental comparisons for three mainstream models (GVP-GNN, ProteinMPNN, ESM) on more than 5 different protein datasets, and the incorporation of our models can, on average, reconstruct protein sequences with a success rate of more than 10%.

Biography

Guangyong Chen is a research expert of Zhejiang Lab, a 100 Young Professor of Zhejiang University, worked as an associate researcher at Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, and obtained his Ph.D. from the Department of Computer Science, Chinese University of Hong Kong. His research interest focuses on developing advanced methods for non-perfect data in drug discovery scenarios, including the generation mechanism of non-perfect training data, defending methods, and implementation in biopharmaceutical applications. He has published more than 40 papers in top-tier conferences and journals, such as Nature Communications, Nature Computational Science, Nature Machine Intelligence, TPAMI, ICML, NeurIPS, ICLR, etc.

Seminar Series by the NYU-ECNU Center for Computational Chemistry at NYU Shanghai