TopBioMarkers ® is a desktop application for analyzing genomic, proteomic and metabolomic expression profile data, useful for the identification of robust and accurate biomarkers. This software package incorporates Systems Analytics' proprietary technology that integrates well-known feature selection methods and classification algorithms through a computational scheme that includes consensus voting and objective ranking techniques, as well as various measures of the feature list quality derived from estimates of reproducibility and accuracy.
One major hurdle in microarray data analysis is the large number of genes and limited number of samples. Selecting “good” genes poses a difficult problem. Since no single feature selection method can be universally applied to all types of data, and there is no a priori knowledge as to which method is best suited to a certain data set, it is difficult to decide which method and what selected features will yield the best results. Scientists from different backgrounds may have different preferences in choosing the “best” methods.
To overcome this difficulty, we have developed an algorithm based on a weighted voting method to increase the robustness of the feature selection process. Multiple individual feature selection methods are first applied to generate corresponding feature lists. These methods are chosen because they best select features with emphasis on diverse aspects of the data. The final feature list is then obtained through a consensus voting using the multiple feature lists generated. Each feature’s rank in the final list is determined by the frequency of its occurrence and its corresponding rank in the multiple feature lists.
The following flowchart illustrates TopBioMarkers’ workflow: