Systems Analytics develops and applies state-of-the-science computational technologies for the analysis of high-throughput data; specifically, for the identification of advanced biomarkers and the construction of predictive models. We emphasize the development and application of robust bioinformatics tools to generate analysis results that are accurate, reproducible, reliable, and interpretable.
These tools can be applied to identify biomarkers such as genes, proteins, metabolites, image markers and their combinations; identify panels of biomarkers; calibrate and standardize the output of analytical instruments for quantitative analysis; adjust the measurement data with severe systematic bias (batch effect) for comparative studies; automatically develop accurate and robust predictive models; and perform biological system modeling, simulation, and behavior analysis.
Despite the wealth of analysis algorithms for feature selection and sample classification, most computational approaches for predictive model construction have the problem of model over-training or over-fitting. As a result, the constructed predictive models, or classifiers, although being of relatively high classification performance within the construction dataset, have poor predictive power for independent validation datasets. Therefore, it is necessary to increase the robustness of the computational approaches and the resulting predictive models.
We strive to address the following challenges during our data analysis:
1.How to effectively avoid over-fitting for predictive model construction,
2.How to avoid serious class label information leak in the data analysis procedure,
3.How to effectively treat the unbalanced sample-size problem,
4.How to effectively account for the effect of sample variability,
5.How to effectively remove the batch effect,
6.What performance evaluation criteria to use for selecting model construction processes and constructing predictive models,
7.How to optimize the algorithms and reduce computation time, and,
8.How to apply the constructed predictive models to future datasets.