The Big Data for Discovery Science Center (BDDS) - comprised of leading experts in biomedical imaging, genetics, proteomics, and computer science - is taking an "-ome to home" approach toward streamlining big data management, aggregation, manipulation, integration, and the modeling of biological systems across spatial and temporal scales.

Learning-Based Predictive Big Data Analytics


  1. Use Big Healthcare/Biomedical Data (size, heterogeneity, incomplete, multisource, incongruent, multiscale)
  2. Identify important data features that may be highly predictive of specific clinical outcomes (e.g., Dx)
  3. Conduct individual-level inference for clinical applications
  4. Formulate new translational research hypotheses


Our regularized linear models, machine learning tools, and high-throughput computational infrastructure enable efficient and reproducible (near) real-time processing, analysis and prediction using extremely large, complex and heterogeneous datasets. This enables open-science discovery, tool interoperability, and advanced statistical analysis that can be generalized to many big biomedical data-intense studies.


We have built a generic machine learning based infrastructure for modeling and interrogation of diverse arrays of data-intense biomedical and healthcare challenges. We validated the technique on neuroimaging-genetic studies throughput the age spectrum in health and disease. BDDS tools such as Deriva, BDBags and Minids are being used by this project.


Methodological challenges and analytic opportunities for modeling and interpreting Big Healthcare