BIG DATA for DISCOVERY SCIENCE
infographic
The Big Data for Discovery Science Center (BDDS) - comprised of leading experts in biomedical imaging, genetics, proteomics, and computer science - is taking an "-ome to home" approach toward streamlining big data management, aggregation, manipulation, integration, and the modeling of biological systems across spatial and temporal scales.
 
Now Available

BDBAG

The BD Bag Software software allows researchers to address a significant Big Data challenge of assembling, identifying, and providing access to subsets of data in a large and complex data collection workflow such as from a catalog search to an analysis pipeline and to a publication service. This collection of utilities work with BagIt packages that conform to the BDDS Baggit and BDDS Bagit/RO (link to https://github.com/ResearchObject/bagit-ro) profiles. A unique aspect of this work is that the data that is aggregated need not be collocated: instead, data collections can be uniquely identified where large elements may be located in cloud or enterprise storage. This is critical for big data elements where the cost of transfer of the data can be prohibitive. Another important feature is the use of JSON-LD to provide a standard way for linking metadata with existing ontologies and vocabularies. As the first example use of JSON-LD metadata, a model has been developed for representing ontology-based file types.

Please note that this software is stable, beta-quality code suitable for development and testing, but not production use. As a pre-release there is no guarantee of backwards compatibility and API changes may occur before the official 1.0 release. For more information and filing bug reports, see the project GitHub repository at http://github.com/ini-bdds/bdbag. Released versions of the software can be downloaded from https://github.com/ini-bdds/bdbag/releases.

Minids have been used extensively in the following “Use Cases”: