September 11, 2010
Systems Analytics Ranked Top in Proficiency of Building Predictive Models Using Profiling Data
Needham, MA, September 11, 2010 – Systems Analytic’s proficiency in developing predictive models using microarray profiling data has been ranked the first among MAQC-II project participating teams, according to a Nature Biotechnology paper published in August.
The MicroArray Quality Control (MAQC) project is one of the most ambitious and comprehensive studies to date on microarray quality control, addressing issues including cross-laboratory and cross-platform comparisons and performance evaluation of data analysis methods for the identification of differentially expressed genes and the development and validation of predictive classification models. A consortium of government agencies, academia, and commercial participants has contributed substantial resources, time, and expertise to make this project a success.
In the MAQC-II project, 36 independent teams analyzed six microarray data sets to generate predictive models for classifying a sample with respect to one of 13 endpoints indicative of lung or liver toxicity in rodents, or of breast cancer, multiple myeloma or neuroblastoma in humans. In total, >30,000 models were built using many combinations of analytical methods. The teams generated predictive models without knowing the biological meaning of some of the endpoints and, to mimic clinical reality, tested the models on data that had not been used for training.
The study found that model prediction performance depended largely on the endpoint and team proficiency. Using Systems Analytics’ proprietary data analysis software RAMP (Robust Accurate Modeling Protocol), the company developed predictive models and submitted prediction sample labels after applying these models. These predicted labels were then compared with the true labels and the comparisons indicated that Systems Analytics’ prediction performance was ranked at the top. A summarizing manuscript co-authored by more than 200 participants was published in Nature Biotechnology in its August issue.
Dr. John Zhang, president of Systems Analytics, said, "Predictive model building using profiling data or fingerprint data has been studied widely over the past decade. However, achieving robust and accurate predictive results across labs, platforms, chips, or experimental times is still a challenge. False “success stories” due to numerous over-fitting pitfalls have been widely reported. The success of RAMP software in MAQC-II project is the first step towards a standardized model building procedure for reaching agreeable conclusions using datasets from different sources. Such effort can significantly ease the prediction model building process with profiling data and enhance the prediction performance.”
Many devices generate profiling or fingerprinting data such as chromatography, spectrometry and spectroscopy. Building robust and accurate predictive models is an important step towards many applications. “Besides the applications in drug discovery and medical diagnosis”, Dr. Zhang said, “RAMP have applications in Chinese medicine quality control, brand name wine or liquor confirmation, perfume and dye quality, chemical and petroleum industry, agriculture and food industry, fertilizer and environmental protection, safety and efficacy studies, among many others.”
The Systems Analytics software products and services allow users to: Construct robust predictive models and identify potential markers and targets; Utilize combined datasets from cross-lab, cross-chip, and cross-platform experiments; Eliminate false “success stories” by avoiding numerous over-fitting pitfalls; Accelerate the research process with automated analysis workflow; and increase R&D productivity by focusing on the most reliable conclusions.
About Systems Analytics
Systems Analytics is a bioinformatics company dedicated to facilitating the discovery and evaluation of biomarkers of clinical, diagnostic and scientific significance. The company focuses on developing robust computational algorithms for achieving accurate, reliable and reproducible data analysis results. These algorithms are being used in genomics, proteomics and metabolic profiling studies where innovative and robust tools are required to extract useful practical knowledge from high-throughput datasets.