BIG DATA for DISCOVERY SCIENCE
infographic
The Big Data for Discovery Science Center (BDDS) - comprised of leading experts in biomedical imaging, genetics, proteomics, and computer science - is taking an "-ome to home" approach toward streamlining big data management, aggregation, manipulation, integration, and the modeling of biological systems across spatial and temporal scales.
 
 

UTILITIES




Many labs have existing infrastructure and workflows; BDDS utilities have been designed to work independently of other BDDS components as well as work with existing frameworks. BDDS has developed various utilities that allow users to find correlations in data (SORC Dashboard), discover outliers (BDQC), unambiguously name and identify research data products (MINID) as well as assemble large and complex data sets (BDBag). The BDBag package is a collection of utilities for working with BagIt packages. Ensuring data integrity during exchange between components becomes critical when dealing with large data sets where records may get lost during the transfer process.

Due to the constantly changing nature of biological data, annotations, and tools, validating results becomes difficult not only for reviewers and peers but also for data publishers. MINID solves this problem by providing data publishers a system to unambiguously identify data products.

Additional Utilities

Code for Ultrafast Comparison of Personal Genomes
https://github.com/gglusman/genome-fingerprints

Reproducible high performance pipeline for generating footprints using docker containers
Docker container (https://hub.docker.com/r/bd2kbdds/dnase_footprinting/), workflow definitions (https://bdds.globusgenomics.org/workflow/list_published)

Software for TReNA at Bioconductor. It utilizes footprints.
http://bioconductor.org/packages/release/bioc/html/trena.html
The footprints can be accessed at www.trena.org (which is a redirect to github page with download instructions).