KEGG Loader for Bio-SPICE Warehouse

Version 4.2


(C) 2004 SRI International. All Rights Reserved.  See BioWarehouse Overview for license details.




Introduction
Limitations
Installation and Building
Obtaining KEGG Data
KEGG Dataset
PATHWAY
GENOME
GENES
LIGAND: COMPOUND
LIGAND: ENZYME
References


Introduction

This document describes version 4.2 of the KEGG Loader. It is one of several database loaders comprising the Bio-SPICE Warehouse.

KEGG (the Kyoto Encyclopedia of Genes and Genomes) is a collection of databases curated by the Bioinformtics Center at the Institute for Chemical Research at Kyoto University. KEGG is available online at http://www.genome.ad.jp/kegg/. KEGG contains five types of data:

  1. Pathway maps
  2. Ortholog group tables
  3. Molecular catalogs
  4. Genome maps
  5. Gene catalogs

KEGG contains three major components:

  1. PATHWAY: metabolic and regulatory pathways
  2. GENES: gene catalogs for organisms
  3. LIGAND: enzyme reactions and chemical compounds

LIGAND was originally started by Takaaki Nishioka, and is now maintained in collaboration with the KEGG project. LIGAND itself is a compound of three databases:

  1. COMPOUND: collection of chemical compounds that are related to various cellular processes
  2. REACTION: collection of reactions (mostly enzymatic reactions) involving the compounds from COMPOUND
  3. ENZYME: the enzyme nomenclature

This document describes the semantic mapping between the KEGG database components PATHWAY, GENES and LIGAND to a representation in the Bio-SPICE data warehouse. A chapter is dedicated to each of the KEGG components, defining the mapping to the Bio-SPICE schema.

Overview of Bio-SPICE Warehouse Schema

The Bio-SPICE warehouse schema contains the data definition statements for the Bio-SPICE Warehouse. These include four different types of tables - constant tables, object tables, linking tables, and special tables.

Constant tables specify scientific data such as information from the Periodic Table of Elements, as well as constants used as column values in various warehouse tables.

Object tables describe a type of entity in a source database, such as compounds and proteins. Each column of an object table specifies a parameter that characterizes the object. In addition to the parameters defined by the source database, the loader assigns a unique warehouse ID (WID) to each object, which is used by other tables to reference the object.

A special type of warehouse object is the dataset. A dataset object is created for each dataset loaded into the warehouse, i.e., the SWISS-PROT loader adds one row to this table when it is run. Its WID is referred to as the dataset WID and is a column in each object table, specifying the source database of the object.

A linking table describes relationships among objects. They contain WIDs of the associated objects, and any additional columns needed to characterize the relationship. In general, many-to-many relationships are supported. Special tables exist to capture reference and crossreference information and to facilitate lookup of objects.

Full schema information, including source files and browseable documentation, is available with this distribution.


Limitations

The latest supported data version for the KEGG loader is listed in the loader summary table. The loader may not be compatible with future versions of KEGG. KEGG does not seem to include a current version number in their download, and is not displayed prominently on their website, but some version and release information can be found.

The loader does not load any data from the PATHWAY component of KEGG.

The loader does not load any data from the REACTION section of the LIGAND component. This means that no partial EC numbers, nor their reactions, are loaded.

The loader ignores the MASS keyword on compounds, though it could load this into Chemical.MolecularWeightCalc.


Installation and Building

See Building the KEGG Loader for details on installing and building the loader.

Obtaining KEGG Data

See Running the KEGG Loader for details on installing and building the loader.

KEGG DataSet

All of KEGG (including LIGAND) are loaded as a single dataset in the warehouse. References from one part of KEGG to another (e.g. chemicals used in a reaction) are resolved to the wid within the dataset and do not use the CrossReference table.


DataSet Table

Each loaded version of KEGG will be assigned a new row in the DataSet table as follows:

WID The next available WID in the warehouse.
Name "KEGG Database''.
Version The release number of KEGG, e.g. "34.0''.
LoadDate The time/date the loader was run (SQL `SYSDATE').
ReleaseDate NULL.
ChangeDate The date and time the loader completed, NULL if the loader did not complete successfully.
LoadedBy The value of the system environment variable USER for the account running the loader.
Application 'KEGG Loader'
ApplicationVersion 3.5
HomeURL http://www.genome.ad.jp/.
QueryURL NULL

Entry Table

All entities that are assigned a WID (other than the DataSet above) are also given an Entry row:

OtherWID The WID assigned to the entity.
InsertDate The time/date the loader was run.
CreationDate NULL.
ModifiedDate NULL.
LoadError "T'' if a parse error is detected, "F'' otherwise.
DatasetWID The WID assigned to the DataSet (see above).

The LoadError field is set to true if any error occured in loading the record from the source database. The granularity is based on the source record-i.e. if there was an error on one line of the record, all warehouse entries derived from that record will have the LoadError flag set true.


PATHWAY

The KEGG PATHWAY component is a graphical structure combining the other parts of KEGG. The distributed data contains images and HTML image maps to allow for a convenient visual interaction with the data from LIGAND and GENES. The information it provides above that in LIGAND and GENES, is which reactions occur in which organisms.

PATHWAY is not currently loaded into the warehouse.


GENOME

The Genome database contains descriptions of the organisms whose genomes are present in the GENES database. Information represented includes the organism name, the abbreviation used in KEGG, the categorization of the organism, high-level information about its genome and citations of the source of the information.


Semantic Mapping

The present loading of this file ignores statistical and map/catalog information. The ignored fields are:

  1. STATISTICS
  2. GENOMEMAP
  3. GENECATALOG

In addition, several fields are loaded strictly as comments, without any semantic interpretation of their contents:

  1. DEFINITION
  2. TAXONOMY
  3. LINEAGE
  4. MORPHOLOGY
  5. PHYSIOLOGY
  6. DATA_SOURCE
  7. ORIGINAL_DB
  8. ENVIRONMENT
  9. COMMENT

ENTRY

Each entry in GENOME begins with an ENTRY field, giving the abbreviation used in KEGG for the organism, e.g. `hin' for `H.influenzae'. This is stored in the DBID table:

OtherWID The BioSource.WID assigned to this organism (see GENOME Name below).
XID The three-character organism abbreviation.

NAME

The name entry gives the scientific name for the organism. This is used to populate the BioSource table:

WID A new WID assigned to this object.
Name The organism name.
DataSetWID The WID assigned to the DataSet (see DataSet table above).
all other columns NULL.

CHROMOSOME

The chromosome entry optionally gives the circularity (`Circular' or `Linear'), for initial population of the NucleicAcid table, and optionally a chromosome name (in organisms with multiple chromosomes).

WID A new WID assigned to this object.
Name See SEQUENCE below.
Type "DNA''.
Class "chromosome''.
Topology "circular'' if Circular, "linear'' if Linear, NULL if not specified.
MoleculeLength See LENGTH below.
GeneticCodeWID The GeneticCode.WID associated with the genetic code (see SEQUENCE below).
BioSourceWID The BioSource.WID assigned to this organism (see GENOME Name above).
DataSetWID The WID assigned to the DataSet (see DataSet table above).

PLASMID

The plasmid entry gives the name of the plasmid and (optionally) if it is circular. This is used for initial population of the NucleicAcid table:

WID A new WID assigned to this object.
Name See SEQUENCE below.
Type "DNA''.
Class "plasmid''.
Topology "circular'' if Circular, "linear'' if Linear, NULL if not specified.
MoleculeLength See LENGTH below.
GeneticCodeWID The GeneticCode.WID associated with the genetic code (see SEQUENCE below).
BioSourceWID The BioSource.WID assigned to this organism (see GENOME Name above).
DataSetWID The WID assigned to the DataSet (see DataSet table above).

SEQUENCE

The sequence item gives the Genbank accession number for the chromosome or plasmid, and (optionally) the genetic code number used.

The replicon name is constructed from the accession number and the replicon type, (e.g. "Chromosome GB:L77117'' for the chromosome of Methanococcus jannaschii DSM2661) and is stored in NucleicAcid.Name.

If the NCBI Taxonomy Loader has been run, the loader will use the genetic code number to find the associated entry in the GeneticCode table, and store it in NucleicAcid.GeneticCodeWID for this replicon.NOTE: As of approximately version 27.0 of KEGG, genetic codes do not seem to be provided in the data, so this column will not be populated.

The Genbank accession number is also stored in the CrossReference table:

OtherWID The WID assigned to this NucleicAcid (see or above).
XID The Genbank accession number.
DatasetWID NULL.
DatasetName "GENBANK''

LENGTH

The length entry gives the number of nucleotides in the replicon, and populates NucleicAcid.MoleculeLength.

Literature Citations

Each entry in the genome file gives one or more citations to the literature, contained in the fields REFERENCE (giving the Pubmed ID), AUTHORS, TITLE, and JOURNAL. These are used to populate the Citation table:

WID A new WID assigned to this citation.
Citation The concatenation of the AUTHORS, TITLE, and JOURNAL entries.
PMID The Pubmed ID from the REFERENCE entry.
DataSetWID The WID assigned to the DataSet (see DataSet table above).

Two entries are made in CitationWIDOtherWID to relate the citation back to the BioSource and to the NucleicAcid:

OtherWID The BioSource.WID assigned to this organism (see GENOME Name above).
CitationWID The WID of the citation
OtherWID The NucleicAcid.WID of the replicon (see CHROMOSOME or PLASMID above).
CitationWID The WID of the citation

GENES

The GENES database contains information on the genome of particular organisms, one organism per file. The information includes the name(s) of the gene, its position, the codon usage, amino acid sequence and nucleotide sequence.


Semantic Mapping

An entry in GENES contains up to nine fields.

ENTRY

The ENTRY line gives the gene id and the organism name. The gene id is used in the Gene table (see NAME below). The organism name is used to lookup the previously loaded organism from GENOME. If the organism is found, a row is created in the BioSourceWIDGeneWID table:

BioSourceWID The WID of the organism (see GENOME NAME above).
GeneWID The WID assigned to this gene.

NAME

The first name given is assumed to be the primary name, and other names are synonyms. The name starts populating the Gene table:

WID A new WID assigned to this object.
Name The primary name of the gene.
GenomeID The gene id (from ENTRY above).
CodingRegionStart See POSITION below.
CodingRegionEnd See POSITION below.
Interrupted See POSITION below.
NucleicAcidWID NucleicAcid.WID of the replicon this gene resides on (see GENOME Chromosome and GENOME Plasmid above).
DataSetWID The WID assigned to the DataSet (see DataSet table above).

Alternate names are stored in SynonymTable:

OtherWID The WID assigned to this gene.
Syn The alternative name

DEFINITION

The definition is stored in CommentTable:

OtherWID The WID assigned to this gene.
Comm The definition text.

CLASS

This is presently ignored.

POSITION

The position of the gene can be simply a numerical range, a join (patching together a number of regions), a complement, a range relative to other genes, and also indicate on which replicon the gene resides.

We presently ignore the non-numerical range information. Joins are considered to range from the start of the low range, to the end of the high range and the Interrupted flag is set to `T'.

The Gene entry from above is thereby extended with:

WID ...
Name ...
GenomeID ...
CodingRegionStart The low end of the numerical range(s).
CodingRegionEnd The high end of the numerical range(s).
Direction `F' for forward, `R' for complement.
Interrupted `T' if a join was present, `F' otherwise.
DataSetWID ...

References to the replicon on which the gene resides are represented in the GeneWIDRepliconWID table:

GeneWID The WID of the Gene.
RepliconWID The WID of the Replicon.

DBLINKS

The dblinks item contains cross-link information to other databases. This is used to populate the CrossReference table:
OtherWID The WID assigned to this compound (see above).
XID The external database identifier.
DatasetWID NULL.
DatasetName The external database name.

CODON_USAGE

This is presently ignored.

AASEQ

The AASEQ item gives the amino acid sequence for the protein generated by this gene. This is used to complete the AASEQUENCE in the relevant protein (see NAME below).

WID ...
Name ...
AASequence The given sequence.
Charge ...
Fragment ...
MolecularWeightCalc ...
MolecularWeightExp ...
PlCalc ...
PlExp ...
DataSetWID ...

NTSEQ

This is presently ignored.


LIGAND: COMPOUND

The COMPOUND section of LIGAND is a collection of metabolic compounds including substrates, products and inhibitors. Each of the chemicals referenced in the ENZYME and KEGG PATHWAY components is represented in this component. Information represented includes the naming, chemical formula, structural information, metabolic pathways, related enzymes, related protein structures, prosthetic groups and the CAS registry number.

In our semantic mapping, we ignore the information representing the structural information, as there is no current table for this in the Bio-SPICE warehouse schema.


Semantic Mapping

This section describes how each of the fields in a COMPOUND entry is mapped into the Bio-SPICE warehouse schema.

ENTRY

Each data item begins with an ENTRY field, giving the compound accession number for the LIGAND database. The accession number is stored in the DBID table:

OtherWID The WID assigned to this chemical (see below).
XID The accession number

NAME

The name item contains the recommended name for the compound, and optionally some alternatives. The recommended name is always first, as is mandatory. This item starts populating the Chemical table:

WID A new WID assigned to this object.
Name The recommended name.
BeilsteinName NULL.
SystematicName NULL.
CAS NULL.
Charge NULL.
EmpiricalFormula See below.
MolecularWeightCalc NULL.
MolecularWeightExp NULL.
OctH20PartitionCoeff NULL.
PKA1 NULL.
PKA2 NULL.
PKA3 NULL.
WaterSolubility NULL.
Smiles NULL.
DataSetWID The WID assigned to the DataSet (see DataSet table above).

Alternative names are each stored in SynonymTable:

OtherWID The WID assigned to this chemical.
Syn The alternative name

FORMULA

The formula item is an ascii representation of the chemical formula of this compound, e.g. H2O, C10H16N5O13P3. This is used to populate Chemical.EmpiricalFormula.

PATHWAY

The pathway item is a cross-link to the KEGG PATHWAY data, and consists of the pathway map accession number, followed by the description. This is used to populate the CrossReference table:

OtherWID The WID assigned to this chemical (see above).
XID The pathway accession number.
DatasetWID NULL.
DatasetName "KEGG PATHWAY''

ENZYME

The enzyme item is a cross-link to the KEGG ENZYME data, and consists of the EC number, followed by a type indicating how the compound is related to the enzyme. Valid types are R for reactant, I for inhibitor, C for cofactor and E for effector.

Rather than load this data from COMPOUND, this information is loaded from ENZYME, where it is redundantly replicated in KEGG.

STRUCTURES

The structures item is a cross-link to PDB-the Protein Data Bank-which stores the three dimensional structure information for proteins. This is used to populate the CrossReference table:

OtherWID The WID assigned to this compound (see above).
XID The PDB ID.
DatasetWID NULL.
DatasetName "PDB''

DBLINKS

The dblinks item contains cross-link information to other databases. This is used to populate the CrossReference table:

OtherWID The WID assigned to this compound (see above).
XID The external database identifier.
DatasetWID NULL.
DatasetName The external database name.

RPAIR

This section is ignored.

GLYCAN

This section is ignored.

COMMENT

A row is added to the CommentTable table for each comment:

OtherWID The Chemical WID assigned to this compound.
Comm The comment string.

LIGAND: ENZYME

The ENZYME section of LIGAND is a collection of all known enzymatic reactions classified according to the nomenclature of the International Union of Biochemistry and Molecular Biology (IUBMB). Some of the entries in this data are taken from the ExPASY ENZYME database (http://expasy.hcuge.ch/sprot/enzyme.html) from the Swiss Institute of Bioinformatics.

Each entry is identified by the Enzyme Commission (EC) number, and contains information of naming, chemical reactions, metabloic compounds, metabolic pathways, genes encoding the enzyme (for several organisms), genetic diseases, and links to other databases.


Semantic Mapping

An entry in the ENZYME data contains up to 17 fields. This section describes how each of these fields is mapped into the Bio-SPICE warehouse schema.

ENTRY

Each data item begins with a mandatory ENTRY field, giving the EC number for the enzyme. The EC Number is stored in the Reaction table (see below).

NAME

The name item contains the recommended name for the enzyme, and optionally some alternatives. All names are assumed to refer to proteins, not ribozymes. The recommended name is always first, and is mandatory. This item is stored in the Protein table:

WID A new WID assigned to this object.
Name The recommended name.
AASequence NULL.
Charge NULL.
Fragment NULL.
MolecularWeightCalc NULL.
MolecularWeightExp NULL.
PlCalc NULL.
PlExp NULL.
DataSetWID The WID assigned to the DataSet (see DataSet table above).

One copy of the protein is made for each gene which can generate it (see below). The amino acid sequence is completed when loading the gene (see above).

Alternative names are each stored in SynonymTable:

OtherWID The WID assigned to this Protein.
Syn The alternative name.

CLASS

The class item contains the meaning of the EC number, and is mandatory for all entries. There are three elements: the class, subclass and sub-subclass of the enzyme.

The class entry is not currently loaded.

SYSNAME

The sysname item contains the systematic name given by the Enzyme Commission, representing the nature of the chemical reaction. This is stored as a synonym of the reaction name, in SynonymTable:

OtherWID WID of the reaction (see below).
Syn The Systematic Name.

REACTION

The reaction item contains the chemical reaction(s) in the form of an equation or a text description. If the reaction is given in text, the SUBSTRATE and PRODUCT items are used in preference to the REACTION item, which is left uninterpreted and stored as a comment:

OtherWID The WID assigned to this reaction.
Comm The reaction string.

Each side of the interpreted equations are stored as per the substrate and product items (see below). The reaction is stored in the Reaction table:

WID A new WID assigned to this object.
DeltaG NULL.
ECNumber The EC Number (see above).
ECNumberProposed NULL.
Spontaneous NULL.
DataSetWID The WID assigned to the DataSet (see DataSet table above).

An EnzymaticReaction entry is also created for every reaction, with one copy for each Protein generated:

WID A new WID assigned to this object.
ReactionWID The WID of the Reaction assigned above.
ProteinWID The WID of the Enzyme (see NAME above).
ComplexWID NULL.
ReactionDirectionWID NULL.
DataSetWID The WID assigned to the DataSet (see DataSet table above).

SUBSTRATE

The substrate item contains the chemical compounds that appear on the left side of the reaction. If the REACTION item gave an interpretable reaction, the SUBSTRATE is ignored.

Each substrate chemical is assigned an entry in the Chemical table. If two chemicals occur within KEGG that are textually identical they are considered the same entity. For new chemicals (not previously loaded from LIGAND COMPOUND), the fields are completed as follows:

WID A new WID assigned to this object.
Name The name of the substrate chemical.
BeilsteinName NULL.
CAS NULL.
Charge NULL.
EmpiricalFormula NULL.
MolecularWeightCalc NULL.
MolecularWeightExp NULL.
OctH20PartitionCoeff NULL.
SystematicName NULL.
WaterSolubility NULL.
Smiles NULL.
DataSetWID The WID assigned to the DataSet (see DataSet table above).

Each of the substrate chemicals is linked to the reaction with a Reactant table entry, including the coefficient when specified. If the coefficient is not given, it is assumed to be 1:

ReactionWID The WID of the reaction
OtherWID The Chemical.WID assigned to the substrate
Coefficient Coefficient of this substrate.

PRODUCT

The product item contains the chemical compounds that appear on the right side of the reaction. If the REACTION item gave an interpretable reaction, the PRODUCT is ignored.

Each product chemical is assigned an entry in the Chemical table. If two chemicals occur within LIGAND ENZYME that are textually identical within they are considered the same entity. For new chemicals (not previously loaded from LIGAND COMPOUND), the fields are completed as follows:

WID A new WID assigned to this object.
Name The name of the product chemical.
BeilsteinName NULL.
CAS NULL.
Charge NULL.
EmpiricalFormula NULL.
MolecularWeightCalc NULL.
MolecularWeightExp NULL.
OctH20PartitionCoeff NULL.
SystematicName NULL.
WaterSolubility NULL.
Smiles NULL.
DataSetWID The WID assigned to the DataSet (see DataSet table above).

Each of the product chemicals is linked to the reaction with a Product table entry, including the coefficient when specified. If the coefficient is not given, it is assumed to be 1:

ReactionWID The WID of the reaction
OtherWID The Chemical.WID assigned to the product chemical
Coefficient Coefficient of this product.

INHIBITOR

The inhibitor item names compounds that inhibit the reaction from taking place. Each compound is given an entry in the Chemical table (subject to the textual identical conservation, as in substrate/product):

WID A new WID assigned to this object.
Name The name of the inhibitor compound.
BeilsteinName NULL.
CAS NULL.
Charge NULL.
EmpiricalFormula NULL.
MolecularWeightCalc NULL.
MolecularWeightExp NULL.
OctH20PartitionCoeff NULL.
SystematicName NULL.
WaterSolubility NULL.
Smiles NULL.
DataSetWID The WID assigned to the DataSet (see DataSet table above).

Each of the inhibitors is linked to each of the enzymatic reactions by the EnzReactionWIDChemicalWID table:

EnzymaticReactionWID The WID of the Enzymatic Reaction (see above).
ChemicalWID The WID assigned to the chemical
InhibitOrActivate 'I'
Mechanism NULL.
PhysioRelevant NULL.

COFACTOR

NOTE: As of approximately version 27 of KEGG, cofactor information appears to be missing from the data files. In this case, no cofactor information is loaded.

The cofactor item names compounds that do not appear in the reaction equation, but are described in the comment item as operating as cofactors in the reaction. Each compound is given an entry in the Chemical table (subject to the textual identical conservation, as in substrate/product):

WID A new WID assigned to this object.
Name The name of the cofactor compound.
BeilsteinName NULL.
CAS NULL.
Charge NULL.
EmpiricalFormula NULL.
MolecularWeightCalc NULL.
MolecularWeightExp NULL.
OctH20PartitionCoeff NULL.
SystematicName NULL.
WaterSolubility NULL.
Smiles NULL.
DataSetWID The WID assigned to the DataSet (see DataSet table above).

Each of the cofactor compounds is linked to each of the enzymatic reactions with a EnzReactionCofactor table entry:

EnzymaticReactionWID The WID of the enzymatic reaction (see above).
ChemicalWID The WID assigned to the cofactor compound.
Prosthetic NULL.

EFFECTOR

The effector item names compounds that activate the reaction. Each compound is given an entry in the Chemical table (subject to the textual identical conservation, as in substrate/product):

WID A new WID assigned to this object.
Name The name of the effector compound.
BeilsteinName NULL.
CAS NULL.
Charge NULL.
EmpiricalFormula NULL.
MolecularWeightCalc NULL.
MolecularWeightExp NULL.
OctH20PartitionCoeff NULL.
SystematicName NULL.
WaterSolubility NULL.
Smiles NULL.
DataSetWID The WID assigned to the DataSet (see DataSet table above).

Each of the effectors is linked to each of the enzymatic reactions by the EnzReactionWIDChemicalWID table:

EnzymaticReactionWID The WID of the Enzymatic Reaction (see above).
ChemicalWID The WID assigned to the chemical
InhibitOrActivate 'A'
Mechanism NULL.
PhysioRelevant NULL.

COMMENT

The comment item contains free form text information commenting on the enzyme. This item populates the CommentTable:

OtherWID The WID assigned to this enzyme (see NAME above).
Comm The comment string.

There may be several comments associated with each enzyme.

PATHWAY

The pathway item is a cross-link to the KEGG PATHWAY data, and consists of the pathway map accession number, followed by the description. As that database is not parseable, this entry is used to associate reactions into pathways.

A reference (sum of organisms) pathway is created, if it does not already exist:

WID A new WID assigned to this object.
Name The given descriptive name of the pathway.
Type 'R' (Reference).
BioSourceWID The BioSource.WID assigned to this organism (see GENOME Name above).
DataSetWID The WID assigned to the DataSet (see DataSet table above).

The pathway map accession number is stored in the DBID table:

OtherWID The WID assigned to this pathway.
XID The accession number.

The reactions are then linked to the pathway with the PathwayReaction table:

PathwayWID The WID assigned to this pathway.
ReactionWID The reaction WID.
PriorReactionWID NULL.
Hypothetical 'U' (Unknown).

GENES

The genes item is a cross-link to the KEGG gene catalogs, showing the genes in various organisms that encode this enzyme. This is used to create organism specific pathways, and to indicate the number of proteins to generate in loading: one is generated for each gene, as they may have different amino acid sequences.

For each organism with the necessary gene(s) a new pathway is created (if not already present). The BioSource WID is searched from the organisms previously loaded from the Genome data.

WID A new WID assigned to this object.
Name The given descriptive name of the pathway.
Type 'O' (Organism).
BioSourceWID The BioSource.WID assigned to this organism (see GENOME Name above).
DataSetWID The WID assigned to the DataSet (see DataSet table above).

The pathway map accession number for this pathway is stored in the DBID table:

OtherWID The WID assigned to this pathway.
XID The accession number.

And the Enzyme is linked to the BioSource by the BioSourceWIDProteinWID table:

BioSourceWID The BioSource.WID assigned to this organism (see GENOME Name above).
ProteinWID The WID assigned to the Enzyme.

Each reaction is then assigned to the new pathway (PathwayReaction table) in the same way as for the reference pathway above:

PathwayWID The WID assigned to this pathway.
ReactionWID The reaction WID.
PriorReactionWID NULL.
Hypothetical 'U' (Unknown).

DISEASE

The disease item is a cross-link to OWIM (On-line Mendelian Inheritance in Man) database. This is used to populate the CrossReference table:

OtherWID The WID assigned to this enzyme (see NAME above).
XID The MIM Number.
DatasetWID NULL.
DatasetName "MIM''

MOTIF

The motif item is a cross-link to the PROSITE database. Each PROSITE identifier is used to populate the CrossReference table:

OtherWID The WID assigned to this enzyme (see NAME above).
XID The PROSITE ID.
DatasetWID NULL.
DatasetName "PS''

STRUCTURES

The structures item is a cross-link to PDB-the Protein Data Bank-which stores the three dimensional structure information for proteins. Each PDB identifier is used to populate the CrossReference table:

OtherWID The WID assigned to this enzyme (see NAME above).
XID The PDB ID.
DatasetWID NULL.
DatasetName "PDB''

DBLINKS

The dblinks item contains cross-link information to other databases, including the ENZYME Nomenclature database from the Swiss Institute of Bioinformatics. This is used to populate the CrossReference table:

OtherWID The WID assigned to this enzyme (see NAME above).
XID The external database identifier.
DatasetWID NULL.
DatasetName The external database name.


References