Branchpoints provide important clues for the alternative splicing, and thus predicting human branchpoint is an important topic. In this paper, we aim to build the high-accuracy branchpoint prediction models by using machine learning techniques. We observe that almost all branchpoints are located between -50 and -11 upstream of 3SS (3’ of introns) and one intron may include multiple branchpoints. Therefore, we transform the original problem into a task of multi-label learning, whose prediction targets are binary vectors which represent the presence or absence of branchpoints between -50 and -11 upstream. Then, we extract a diversity of intron sequence-derived features which can characterize branchpoints, and consider several multi-label learning methods to build the relationship between features and location of branchpoints. Finally, we adopt an average scoring-based strategy to integrate different methods and features, and develop the prediction model. Computational experiments on the experimentally verified dataset demonstrate that the proposed method can produce better results than other state-of-the-art methods. More importantly, the method can predict not only the ‘A’ branchpoints but also other types of branchpoints.
We develop the web to predict the branchpoints as well as query the branchpoints by human genes.
We also provide a app base on Electron to use our service. You can download from Here.
Multi-label learning approaches to the prediction of human splicing branchpoints, BIBM 2016. Co-corresponding authors: Wen Zhang, Zhiping Weng