BioWarehouse is a component of the Bio-SPICE project. BioWarehouse is an open-source software environment for integrating a set of biological databases into a single physical database management system for data management, mining, and exploration.
Key features of BioWarehouse:
The BioWarehouse is populated using loader programs that translate the flat file representation of a source database into the warehouse schema. A loader is provided for each source database supported by BioWarehouse. Once loaded within a BioWarehouse instance running on e.g. MySQL, a set of source DBs can now be queried together.
Some loaders are specific to a data format rather than to a single source database. For example, the BioPAX and MAGE-ML loaders can load any database that is in BioPAX or MAGE-ML format, respectively.
BioWarehouse loaders:
|
Source DB |
Contents |
Language |
Citation |
|
BioCyc DBs |
Genomes, genes, proteins, metabolic pathways, reactions,
compounds |
C |
|
| BioPAX format | BioPAX format describes biological pathway and protein interaction data. Currently this loader can process BioPAX Level 2 only -- protein interaction data. | JAVA | |
|
Comprehensive Microbial Resource (CMR) |
Genomes, genes, proteins, reactions |
C |
|
|
ENZYME
DB |
Reactions, proteins |
JAVA |
|
| Eco2dbase | E. coli 2D protein gel database | JAVA | |
|
GenBank – bacteria only |
Bacterial genes and proteins |
JAVA |
|
|
A controlled vocabulary to describe gene and gene
product attributes |
JAVA |
|
|
|
|
Genomes, genes, proteins, metabolic pathways, reactions,
compounds |
C |
|
| MetaCyc Ontology | The MetaCyc ontology of metabolic pathways, and
the MetaCyc ontology of chemical compounds |
C | |
| MAGE-ML format | The MAGE-ML file format describes gene expression datasets | JAVA | |
|
Taxonomy
DB |
Taxonomical organism classification |
C |
|
|
UniProt (Swiss-Prot
and TrEMBL) |
Protein knowledgebase |
JAVA |
The BioWarehouse schema (schema documentation) is designed to capture as much of the data of each component DB as possible within a uniform representation.
For example, in encoding data from a set of source DBs pertaining to proteins, BioWarehouse uses a single set of schema definitions that spans all attributes of proteins found across this set of DBs. This approach eliminates the semantic heterogeneity present in these DBs, allowing users to query all protein sequence DBs using the same schema. Such sharing of tables is applied wherever practical. The translation from the component DB to the warehouse is achieved by the DB loaders, which convert the conceptualization used in each component DB into the conceptualization used by the warehouse schema.
The major biological objects of the BioWarehouse and their interrelationships are depicted below. Arrows indicate that the objects in those tables can refer to entries in the same table.

For each loader, there are two pieces of
documentation:
how to build and run the loader, and a "manual" for developers
describing the details of the loader implementation and schema mappings.
Overall documentation:
All of the overall documentation is listed in a table of contents at doc/index.html. The TOC also has a table listing some statistics about all the loaders (latest supported version of the data, input size, #Objects loader, load time, etc.)
BioWarehouse can be used in two ways:
1. Users can query the public BioWarehouse server, PublicHouse
maintained by SRI International
via Internet SQL query.
[Click
here for Information on PublicHouse.]
2. Users can also download the BioWarehouse software
distribution to create their own BioWarehouse instance containing the
subset of
supported BioWarehouse DBs that are of interest. This approach allows
access to
DBs that SRI cannot redistribute, and lets each user control when new
DB
versions are loaded. Users can also apply large hardware configurations
to
their BioWarehouse instance, and add proprietary data to their
BioWarehouse
instance.
The Open Source
release of the BioWarehouse is distributed as a zip file (BioWarehouse-<version>.zip)
and is available
for download here.
Release notes for the current version may be viewed here.
Projects using BioWarehouse include:
Bio-SPICE use
cases
SRI’s Enzyme
Genomics
project
SRI’s Pathway
Hole Filling project
Poster
presented at 2004
Bio-SPICE PI meeting
BioWarehouse paper in BMC Bioinformatics.
The BioWarehouse team:
Peter D. Karp, Principal
Investigator
Thomas J Lee
Yannick Pouliot
Valerie Wagner
Nan Guo
David Dunkley
BioWarehouse is supported by DARPA contract F30602-01-C-0153.
For support and inquiries please contact support@biowarehouse.org