TopBioMarkers Sample Results

We use a colon cancer microarray data set (Alon et al, Proc. Natl. Acad. Sci. USA, Vol. 96, pp. 6745-6750, 1999) as an example to illustrate the effectiveness of the consensus voting method for reliable feature selection. The dataset can be downloaded from http://microarray.princeton.edu/oncology/affydata/index.html.

Consensus Voting Table

In the table, the columns Rank, Voted and Index show the ranks, identities, and indices of the features selected by the consensus voting. The numbers in the subsequent columns are the ranks of the selected features evaluated by the corresponding individual feature selection methods. The section marked Voting Weights of All Chosen Feature Selection Methods shows the voting weights which sum to one for each chosen feature selection method.


          


Performance Assessment in terms of Reproducibility and Prediction Accuracy

The columns in the table below show the selected feature’s rank, identity, index in the input expression data file, feature score, p value from t test, fold change, regulation direction, and fraction of appearances in a leave-one-out feature selection process. The two additional tables show the general information about the number of misclassified samples in each and all groups, and the detailed information about the misclassified samples.

          

Comparison with Individual Feature Selection Methods

T test and fold change have been frequently used in feature selection. There has been a continuing debate as to which of these two methods should be used.

The figure below consists of three Volcano plots: (a) using fold change, with p-value cut-off p=0.05, (b) using p-value, with fold change cut-off FC=2, and (c) using the consensus voting method without cutoff values to select features (genes). The red spots are the top 20 genes selected and the associated numbers are their corresponding ranks.


 
  
     
Figure 3(a) indicates that the top features are on the two extreme sides of the plot. The closer the features to the middle at LOG2 (Fold Change) = 0, the lower their ranks. The twenty selected features have high fold changes, although some have p-values close to 0.05.
Figure 3(b) shows that the higher the spots in the plot, the higher their ranks. The twenty selected features have very low p-values, although some have low fold change values.
Figure 3(c) illustrates the effectiveness of the consensus voting method. It can be seen that the top features are located at the two top side-corners. The closer the spots are to the origin, the lower their ranks. The twenty selected features have both high fold change values and low p-values.
Back to TopBioMarkers
   
Copyright © 2006-2012 - Systems Analytics - Designed by Weboart