Ncpred for Accurate Nuclear Protein Prediction Using N-Mer Statistics with Various Classification Algorithms
In:
Submitted By alaolKabir Words 3249 Pages 13
NcPred for accurate nuclear protein prediction using n-mer statistics with various classification algorithms
Md. Saiful Islam, Alaol Kabir, Kazi Sakib, and Md. Alamgir Hossain
Abstract Prediction of nuclear proteins is one of the major challenges in genome annotation. A method, NcPred is described, for predicting nuclear proteins with higher accuracy exploiting n-mer statistics with different classification algorithms namely Alternating Decision (AD) Tree, Best First (BF) Tree, Random Tree and Adaptive (Ada) Boost. On BaCello dataset [1], NcPred improves about 20% accuracy with Random Tree and about 10% sensitivity with Ada Boost for Animal proteins compared to existing techniques. It also increases the accuracy of Fungal protein prediction by 20% and recall by 4% with AD Tree. In case of Human protein, the accuracy is improved by about 25% and sensitivity about 10% with BF Tree. Performance analysis of NcPred clearly demonstrates its suitability over the contemporary in-silico nuclear protein classification research.
1 Introduction
Nucleus, popularly known as the control center of a cell, is the central unit of eukaryotic cells [2]. Unlike other organelles, its function is regulated by two genomes due to the presence of an explicit nuclear genome. It performs a plethora of biochemical reactions like oxidative phosphorylation, Krebs cycle, DNA replication, transcription, translation, etc. In addition nuclei are also involved in apoptosis and ionic homeostasis [3]. Because of their multidimensional utility, nuclear proteins are associated with several diseases, including Xeroderma pigmentosum, Fanconis anaemia, Bloom syndrome, Ataxia telangiectasia and Retinoblastoma [4] etc.
Md. S. Islam · A. Kabir Institute of Information Technology, University of Dhaka, Bangladesh. e-mail: saifulit@univdhaka.edu, alaol kabir@yahoo.com K. Sakib · Md. A. Hossain Department of