Learning-Based Predictive Big Data Analytics

Problem

Use Big Healthcare/Biomedical Data (size, heterogeneity, incomplete, multisource, incongruent, multiscale)
Identify important data features that may be highly predictive of specific clinical outcomes (e.g., Dx)
Conduct individual-level inference for clinical applications
Formulate new translational research hypotheses

Solution

Our regularized linear models, machine learning tools, and high-throughput computational infrastructure enable efficient and reproducible (near) real-time processing, analysis and prediction using extremely large, complex and heterogeneous datasets. This enables open-science discovery, tool interoperability, and advanced statistical analysis that can be generalized to many big biomedical data-intense studies.

Result

We have built a generic machine learning based infrastructure for modeling and interrogation of diverse arrays of data-intense biomedical and healthcare challenges. We validated the technique on neuroimaging-genetic studies throughput the age spectrum in health and disease. BDDS tools such as Deriva, BDBags and Minids are being used by this project.

Reference

Methodological challenges and analytic opportunities for modeling and interpreting Big Healthcare