TopBioMarkers Sample Results
We use a colon cancer microarray data set (Alon et al, Proc. Natl. Acad. Sci. USA, Vol. 96, pp. 6745-6750, 1999) as an example to illustrate the effectiveness of the consensus voting method for reliable feature selection. The dataset can be downloaded from http://microarray.princeton.edu/oncology/affydata/index.html.
Consensus Voting Table
In the table, the columns Rank, Voted and Index show the ranks, identities, and indices of the features selected by the consensus voting. The numbers in the subsequent columns are the ranks of the selected features evaluated by the corresponding individual feature selection methods. The section marked Voting Weights of All Chosen Feature Selection Methods shows the voting weights which sum to one for each chosen feature selection method.
Performance Assessment in terms of Reproducibility and Prediction Accuracy
The columns in the table below show the selected feature’s rank, identity, index in the input expression data file, feature score, p value from t test, fold change, regulation direction, and fraction of appearances in a leave-one-out feature selection process. The two additional tables show the general information about the number of misclassified samples in each and all groups, and the detailed information about the misclassified samples.
Comparison with Individual Feature Selection Methods
T test and fold change have been frequently used in feature selection. There has been a continuing debate as to which of these two methods should be used.
The figure below consists of three Volcano plots: (a) using fold change, with p-value cut-off p=0.05, (b) using p-value, with fold change cut-off FC=2, and (c) using the consensus voting method without cutoff values to select features (genes). The red spots are the top 20 genes selected and the associated numbers are their corresponding ranks.