Introduction(C) 2004 SRI International. All Rights Reserved. See BioWarehouse Overview for license details.
This document describes version 4.2 of the KEGG Loader. It is one of several database loaders comprising the Bio-SPICE Warehouse.
KEGG (the Kyoto Encyclopedia of Genes and Genomes) is a collection of databases curated by the Bioinformtics Center at the Institute for Chemical Research at Kyoto University. KEGG is available online at http://www.genome.ad.jp/kegg/. KEGG contains five types of data:
KEGG contains three major components:
LIGAND was originally started by Takaaki Nishioka, and is now maintained in collaboration with the KEGG project. LIGAND itself is a compound of three databases:
This document describes the semantic mapping between the KEGG database components PATHWAY, GENES and LIGAND to a representation in the Bio-SPICE data warehouse. A chapter is dedicated to each of the KEGG components, defining the mapping to the Bio-SPICE schema.
Constant tables specify scientific data such as information from the Periodic Table of Elements, as well as constants used as column values in various warehouse tables.
Object tables describe a type of entity in a source database, such as compounds and proteins. Each column of an object table specifies a parameter that characterizes the object. In addition to the parameters defined by the source database, the loader assigns a unique warehouse ID (WID) to each object, which is used by other tables to reference the object.
A special type of warehouse object is the dataset. A dataset object is created for each dataset loaded into the warehouse, i.e., the SWISS-PROT loader adds one row to this table when it is run. Its WID is referred to as the dataset WID and is a column in each object table, specifying the source database of the object.
A linking table describes relationships among objects. They contain WIDs of the associated objects, and any additional columns needed to characterize the relationship. In general, many-to-many relationships are supported. Special tables exist to capture reference and crossreference information and to facilitate lookup of objects.
Full schema information, including source files and browseable documentation, is available with this distribution.
The latest supported data version for the KEGG loader is listed in the loader summary table. The loader may not be compatible with future versions of KEGG. KEGG does not seem to include a current version number in their download, and is not displayed prominently on their website, but some version and release information can be found.
The loader does not load any data from the PATHWAY component of KEGG.
The loader does not load any data from the REACTION section of the LIGAND component. This means that no partial EC numbers, nor their reactions, are loaded.
The loader ignores the MASS keyword on compounds, though it could
load this
into Chemical.MolecularWeightCalc.
All of KEGG (including LIGAND) are loaded as a single dataset in the
warehouse. References from one part of KEGG to another (e.g. chemicals
used in a reaction) are resolved to the wid within the dataset and do
not use the CrossReference table.
Each loaded version of KEGG will be assigned a new row in the DataSet
table as follows:
| WID | The next available WID in the warehouse. | |
| Name | "KEGG Database''. | |
| Version | The release number of KEGG, e.g. "34.0''. | |
| LoadDate | The time/date the loader was run (SQL `SYSDATE'). | |
| ReleaseDate | NULL. | |
ChangeDate |
The date and time the loader completed, NULL if the loader did not complete successfully. | |
LoadedBy |
The value of the system environment variable USER for the account running the loader. | |
Application |
'KEGG Loader' | |
ApplicationVersion |
3.5 | |
| HomeURL | http://www.genome.ad.jp/. | |
| QueryURL | NULL |
All entities that are assigned a WID (other than the DataSet above) are also given an Entry row:
| OtherWID | The WID assigned to the entity. |
| InsertDate | The time/date the loader was run. |
| CreationDate | NULL. |
| ModifiedDate | NULL. |
| LoadError | "T'' if a parse error is detected, "F'' otherwise. |
| DatasetWID | The WID assigned to the DataSet (see above). |
The LoadError field is set to true if any error occured in loading the record from the source database. The granularity is based on the source record-i.e. if there was an error on one line of the record, all warehouse entries derived from that record will have the LoadError flag set true.
The KEGG PATHWAY component is a graphical structure combining the other parts of KEGG. The distributed data contains images and HTML image maps to allow for a convenient visual interaction with the data from LIGAND and GENES. The information it provides above that in LIGAND and GENES, is which reactions occur in which organisms.
PATHWAY is not currently loaded into the warehouse.
The Genome database contains descriptions of the organisms whose genomes are present in the GENES database. Information represented includes the organism name, the abbreviation used in KEGG, the categorization of the organism, high-level information about its genome and citations of the source of the information.
The present loading of this file ignores statistical and map/catalog information. The ignored fields are:
In addition, several fields are loaded strictly as comments, without any semantic interpretation of their contents:
Each entry in GENOME begins with an ENTRY field, giving the
abbreviation
used in KEGG for the organism, e.g. `hin'
for `H.influenzae'. This is
stored in the DBID table:
| OtherWID | The BioSource.WID assigned to
this organism (see GENOME Name below). |
| XID | The three-character organism abbreviation. |
The name entry gives the scientific name for the organism. This is
used
to populate the BioSource table:
| WID | A new WID assigned to this object. |
| Name | The organism name. |
| DataSetWID | The WID assigned to the DataSet (see DataSet table above). |
| all other columns | NULL. |
The chromosome entry optionally gives the circularity (`Circular' or
`Linear'), for initial population
of the NucleicAcid table, and optionally
a chromosome name (in organisms with multiple chromosomes).
| WID | A new WID assigned to this object. |
| Name | See SEQUENCE below. |
| Type | "DNA''. |
| Class | "chromosome''. |
| Topology | "circular'' if Circular, "linear'' if Linear, NULL if not specified. |
| MoleculeLength | See LENGTH below. |
| GeneticCodeWID | The GeneticCode.WID associated
with the genetic code (see SEQUENCE
below). |
| BioSourceWID | The BioSource.WID assigned to
this organism (see GENOME Name above). |
| DataSetWID | The WID assigned to the DataSet (see DataSet table above). |
The plasmid entry gives the name of the plasmid and (optionally) if
it is
circular. This is used for initial population of the NucleicAcid
table:
| WID | A new WID assigned to this object. |
| Name | See SEQUENCE below. |
| Type | "DNA''. |
| Class | "plasmid''. |
| Topology | "circular'' if Circular, "linear'' if Linear, NULL if not specified. |
| MoleculeLength | See LENGTH below. |
| GeneticCodeWID | The GeneticCode.WID associated
with the genetic code (see SEQUENCE
below). |
| BioSourceWID | The BioSource.WID assigned to
this organism (see GENOME Name above). |
| DataSetWID | The WID assigned to the DataSet (see DataSet table above). |
The sequence item gives the Genbank accession number for the chromosome or plasmid, and (optionally) the genetic code number used.
The replicon name is constructed from
the accession number and the replicon type, (e.g. "Chromosome
GB:L77117''
for the chromosome of Methanococcus jannaschii DSM2661)
and is stored in NucleicAcid.Name.
If the NCBI
Taxonomy Loader
has been run, the loader will use
the genetic code number to find the associated entry in the GeneticCode
table,
and store it in NucleicAcid.GeneticCodeWID for this
replicon.NOTE:
As of approximately version 27.0 of KEGG, genetic codes do not seem to
be provided in the data,
so this column will not be populated.
The Genbank accession number is also stored in the CrossReference
table:
| OtherWID | The WID assigned to this NucleicAcid
(see or above). |
| XID | The Genbank accession number. |
| DatasetWID | NULL. |
| DatasetName | "GENBANK'' |
The length entry gives the number of nucleotides in the replicon,
and
populates NucleicAcid.MoleculeLength.
Each entry in the genome file gives one or more citations to the
literature, contained in the fields REFERENCE (giving the
Pubmed ID), AUTHORS, TITLE, and JOURNAL.
These are used to populate
the Citation table:
| WID | A new WID assigned to this citation. |
| Citation | The concatenation of the AUTHORS, TITLE, and JOURNAL entries. |
| PMID | The Pubmed ID from the REFERENCE entry. |
| DataSetWID | The WID assigned to the DataSet (see DataSet table above). |
Two entries are made in CitationWIDOtherWID to relate the citation
back to the BioSource and to the NucleicAcid:
| OtherWID | The BioSource.WID assigned to
this organism (see GENOME Name above). |
| CitationWID | The WID of the citation |
| OtherWID | The NucleicAcid.WID of the
replicon (see CHROMOSOME or PLASMID above). |
| CitationWID | The WID of the citation |
The GENES database contains information on the genome of particular organisms, one organism per file. The information includes the name(s) of the gene, its position, the codon usage, amino acid sequence and nucleotide sequence.
An entry in GENES contains up to nine fields.
The ENTRY line gives the gene id and the organism name. The gene id
is
used in the Gene table (see NAME
below). The organism name is
used to lookup the previously loaded organism from GENOME. If the
organism
is found, a row is created in the BioSourceWIDGeneWID
table:
| BioSourceWID | The WID of the organism (see GENOME NAME above). |
| GeneWID | The WID assigned to this gene. |
The first name given is assumed to be the primary name, and other
names
are synonyms. The name starts populating the Gene table:
| WID | A new WID assigned to this object. |
| Name | The primary name of the gene. |
| GenomeID | The gene id (from ENTRY above). |
| CodingRegionStart | See POSITION below. |
| CodingRegionEnd | See POSITION below. |
| Interrupted | See POSITION below. |
| NucleicAcidWID | NucleicAcid.WID of the replicon
this gene resides on (see GENOME
Chromosome and GENOME Plasmid
above). |
| DataSetWID | The WID assigned to the DataSet (see DataSet table above). |
Alternate names are stored in SynonymTable:
| OtherWID | The WID assigned to this gene. |
| Syn | The alternative name |
The definition is stored in CommentTable:
| OtherWID | The WID assigned to this gene. |
| Comm | The definition text. |
The position of the gene can be simply a numerical range, a join (patching together a number of regions), a complement, a range relative to other genes, and also indicate on which replicon the gene resides.
We presently ignore the non-numerical range information. Joins are considered to range from the start of the low range, to the end of the high range and the Interrupted flag is set to `T'.
The Gene entry from above is thereby extended with:
| WID | ... |
| Name | ... |
| GenomeID | ... |
| CodingRegionStart | The low end of the numerical range(s). |
| CodingRegionEnd | The high end of the numerical range(s). |
| Direction | `F' for forward, `R' for complement. |
| Interrupted | `T' if a join was present, `F' otherwise. |
| DataSetWID | ... |
References to the replicon on which the gene resides are represented
in the GeneWIDRepliconWID table:
| GeneWID | The WID of the Gene. |
| RepliconWID | The WID of the Replicon. |
CrossReference table:
| OtherWID | The WID assigned to this compound (see above). |
| XID | The external database identifier. |
| DatasetWID | NULL. |
| DatasetName | The external database name. |
The AASEQ item gives the amino acid sequence for the protein generated by this gene. This is used to complete the AASEQUENCE in the relevant protein (see NAME below).
| WID | ... |
| Name | ... |
| AASequence | The given sequence. |
| Charge | ... |
| Fragment | ... |
| MolecularWeightCalc | ... |
| MolecularWeightExp | ... |
| PlCalc | ... |
| PlExp | ... |
| DataSetWID | ... |
The COMPOUND section of LIGAND is a collection of metabolic compounds including substrates, products and inhibitors. Each of the chemicals referenced in the ENZYME and KEGG PATHWAY components is represented in this component. Information represented includes the naming, chemical formula, structural information, metabolic pathways, related enzymes, related protein structures, prosthetic groups and the CAS registry number.
In our semantic mapping, we ignore the information representing the structural information, as there is no current table for this in the Bio-SPICE warehouse schema.
This section describes how each of the fields in a COMPOUND entry is mapped into the Bio-SPICE warehouse schema.
Each data item begins with an ENTRY field, giving the compound
accession
number for the LIGAND database. The accession number is stored in the
DBID table:
| OtherWID | The WID assigned to this chemical (see below). |
| XID | The accession number |
The name item contains the recommended name for the compound, and
optionally some alternatives. The recommended name is always first, as
is
mandatory. This item starts populating the Chemical
table:
| WID | A new WID assigned to this object. |
| Name | The recommended name. |
| BeilsteinName | NULL. |
| SystematicName | NULL. |
| CAS | NULL. |
| Charge | NULL. |
| EmpiricalFormula | See below. |
| MolecularWeightCalc | NULL. |
| MolecularWeightExp | NULL. |
| OctH20PartitionCoeff | NULL. |
| PKA1 | NULL. |
| PKA2 | NULL. |
| PKA3 | NULL. |
| WaterSolubility | NULL. |
| Smiles | NULL. |
| DataSetWID | The WID assigned to the DataSet (see DataSet table above). |
Alternative names are each stored in SynonymTable:
| OtherWID | The WID assigned to this chemical. |
| Syn | The alternative name |
The formula item is an ascii representation of the chemical formula
of
this compound, e.g. H2O, C10H16N5O13P3.
This is used to
populate Chemical.EmpiricalFormula.
The pathway item is a cross-link to the KEGG PATHWAY data, and
consists of
the pathway map accession number, followed by the description. This is
used to populate the CrossReference table:
| OtherWID | The WID assigned to this chemical (see above). |
| XID | The pathway accession number. |
| DatasetWID | NULL. |
| DatasetName | "KEGG PATHWAY'' |
The enzyme item is a cross-link to the KEGG ENZYME data, and consists of the EC number, followed by a type indicating how the compound is related to the enzyme. Valid types are R for reactant, I for inhibitor, C for cofactor and E for effector.
Rather than load this data from COMPOUND, this information is loaded from ENZYME, where it is redundantly replicated in KEGG.
The structures item is a cross-link to PDB-the Protein Data
Bank-which
stores the three dimensional structure information for proteins. This
is
used to populate the CrossReference table:
| OtherWID | The WID assigned to this compound (see above). |
| XID | The PDB ID. |
| DatasetWID | NULL. |
| DatasetName | "PDB'' |
The dblinks item contains cross-link information to other databases.
This
is used to populate the CrossReference table:
| OtherWID | The WID assigned to this compound (see above). |
| XID | The external database identifier. |
| DatasetWID | NULL. |
| DatasetName | The external database name. |
This section is ignored.
This section is ignored.
A row is added to the CommentTable table for each
comment:
| OtherWID | The Chemical WID assigned to this
compound. |
| Comm | The comment string. |
The ENZYME section of LIGAND is a collection of all known enzymatic reactions classified according to the nomenclature of the International Union of Biochemistry and Molecular Biology (IUBMB). Some of the entries in this data are taken from the ExPASY ENZYME database (http://expasy.hcuge.ch/sprot/enzyme.html) from the Swiss Institute of Bioinformatics.
Each entry is identified by the Enzyme Commission (EC) number, and contains information of naming, chemical reactions, metabloic compounds, metabolic pathways, genes encoding the enzyme (for several organisms), genetic diseases, and links to other databases.
An entry in the ENZYME data contains up to 17 fields. This section describes how each of these fields is mapped into the Bio-SPICE warehouse schema.
Each data item begins with a mandatory ENTRY field, giving the EC
number
for the enzyme. The EC Number is stored in the Reaction
table (see
below).
The name item contains the recommended name for the enzyme, and
optionally
some alternatives. All names are assumed to refer to proteins, not
ribozymes. The recommended name is always first, and is mandatory. This
item is stored in the Protein table:
| WID | A new WID assigned to this object. |
| Name | The recommended name. |
| AASequence | NULL. |
| Charge | NULL. |
| Fragment | NULL. |
| MolecularWeightCalc | NULL. |
| MolecularWeightExp | NULL. |
| PlCalc | NULL. |
| PlExp | NULL. |
| DataSetWID | The WID assigned to the DataSet (see DataSet table above). |
One copy of the protein is made for each gene which can generate it (see below). The amino acid sequence is completed when loading the gene (see above).
Alternative names are each stored in SynonymTable:
| OtherWID | The WID assigned to this Protein. |
| Syn | The alternative name. |
The class item contains the meaning of the EC number, and is mandatory for all entries. There are three elements: the class, subclass and sub-subclass of the enzyme.
The class entry is not currently loaded.
The sysname item contains the systematic name given by the Enzyme
Commission, representing the nature of the chemical reaction. This
is stored as a synonym of the reaction name, in SynonymTable:
| OtherWID | WID of the reaction (see below). |
| Syn | The Systematic Name. |
The reaction item contains the chemical reaction(s) in the form of an equation or a text description. If the reaction is given in text, the SUBSTRATE and PRODUCT items are used in preference to the REACTION item, which is left uninterpreted and stored as a comment:
| OtherWID | The WID assigned to this reaction. |
| Comm | The reaction string. |
Each side of the interpreted equations are stored as per the substrate
and
product items (see below). The reaction is stored in the Reaction
table:
| WID | A new WID assigned to this object. |
| DeltaG | NULL. |
| ECNumber | The EC Number (see above). |
| ECNumberProposed | NULL. |
| Spontaneous | NULL. |
| DataSetWID | The WID assigned to the DataSet (see DataSet table above). |
An EnzymaticReaction entry is also created for every reaction, with one copy for each Protein generated:
| WID | A new WID assigned to this object. |
| ReactionWID | The WID of the Reaction assigned above. |
| ProteinWID | The WID of the Enzyme (see NAME above). |
| ComplexWID | NULL. |
| ReactionDirectionWID | NULL. |
| DataSetWID | The WID assigned to the DataSet (see DataSet table above). |
The substrate item contains the chemical compounds that appear on the left side of the reaction. If the REACTION item gave an interpretable reaction, the SUBSTRATE is ignored.
Each substrate chemical is assigned an entry in the Chemical
table. If
two chemicals occur within KEGG that are textually identical
they
are considered the same entity. For new chemicals (not previously
loaded
from LIGAND COMPOUND), the fields are completed as follows:
| WID | A new WID assigned to this object. |
| Name | The name of the substrate chemical. |
| BeilsteinName | NULL. |
| CAS | NULL. |
| Charge | NULL. |
| EmpiricalFormula | NULL. |
| MolecularWeightCalc | NULL. |
| MolecularWeightExp | NULL. |
| OctH20PartitionCoeff | NULL. |
| SystematicName | NULL. |
| WaterSolubility | NULL. |
| Smiles | NULL. |
| DataSetWID | The WID assigned to the DataSet (see DataSet table above). |
Each of the substrate chemicals is linked to the reaction with a Reactant
table entry, including the coefficient when specified. If the
coefficient
is not given, it is assumed to be 1:
| ReactionWID | The WID of the reaction |
| OtherWID | The Chemical.WID assigned to the
substrate |
| Coefficient | Coefficient of this substrate. |
The product item contains the chemical compounds that appear on the right side of the reaction. If the REACTION item gave an interpretable reaction, the PRODUCT is ignored.
Each product chemical is assigned an entry in the Chemical
table. If two
chemicals occur within LIGAND ENZYME that are textually identical
within they are considered the same entity. For new chemicals (not
previously loaded from LIGAND COMPOUND), the fields are completed as
follows:
| WID | A new WID assigned to this object. |
| Name | The name of the product chemical. |
| BeilsteinName | NULL. |
| CAS | NULL. |
| Charge | NULL. |
| EmpiricalFormula | NULL. |
| MolecularWeightCalc | NULL. |
| MolecularWeightExp | NULL. |
| OctH20PartitionCoeff | NULL. |
| SystematicName | NULL. |
| WaterSolubility | NULL. |
| Smiles | NULL. |
| DataSetWID | The WID assigned to the DataSet (see DataSet table above). |
Each of the product chemicals is linked to the reaction with a Product
table entry, including the coefficient when specified. If the
coefficient
is not given, it is assumed to be 1:
| ReactionWID | The WID of the reaction |
| OtherWID | The Chemical.WID assigned to the
product chemical |
| Coefficient | Coefficient of this product. |
The inhibitor item names compounds that inhibit the reaction from
taking
place. Each compound is given an entry in the Chemical
table (subject to
the textual identical conservation, as in substrate/product):
| WID | A new WID assigned to this object. |
| Name | The name of the inhibitor compound. |
| BeilsteinName | NULL. |
| CAS | NULL. |
| Charge | NULL. |
| EmpiricalFormula | NULL. |
| MolecularWeightCalc | NULL. |
| MolecularWeightExp | NULL. |
| OctH20PartitionCoeff | NULL. |
| SystematicName | NULL. |
| WaterSolubility | NULL. |
| Smiles | NULL. |
| DataSetWID | The WID assigned to the DataSet (see DataSet table above). |
Each of the inhibitors is linked to each of the enzymatic reactions by
the
EnzReactionWIDChemicalWID table:
| EnzymaticReactionWID | The WID of the Enzymatic Reaction (see above). |
| ChemicalWID | The WID assigned to the chemical |
| InhibitOrActivate | 'I' |
| Mechanism | NULL. |
| PhysioRelevant | NULL. |
NOTE: As of approximately version 27 of KEGG, cofactor information appears to be missing from the data files. In this case, no cofactor information is loaded.
The cofactor item names compounds that do not appear in the reaction equation, but are described in the comment item as operating as cofactors in the reaction. Each compound is given an entry in the Chemical table (subject to the textual identical conservation, as in substrate/product):
| WID | A new WID assigned to this object. |
| Name | The name of the cofactor compound. |
| BeilsteinName | NULL. |
| CAS | NULL. |
| Charge | NULL. |
| EmpiricalFormula | NULL. |
| MolecularWeightCalc | NULL. |
| MolecularWeightExp | NULL. |
| OctH20PartitionCoeff | NULL. |
| SystematicName | NULL. |
| WaterSolubility | NULL. |
| Smiles | NULL. |
| DataSetWID | The WID assigned to the DataSet (see DataSet table above). |
Each of the cofactor compounds is linked to each of the enzymatic
reactions with a
EnzReactionCofactor table entry:
| EnzymaticReactionWID | The WID of the enzymatic reaction (see above). |
| ChemicalWID | The WID assigned to the cofactor compound. |
| Prosthetic | NULL. |
The effector item names compounds that activate the reaction. Each
compound is given an entry in the Chemical table (subject
to the textual
identical conservation, as in substrate/product):
| WID | A new WID assigned to this object. |
| Name | The name of the effector compound. |
| BeilsteinName | NULL. |
| CAS | NULL. |
| Charge | NULL. |
| EmpiricalFormula | NULL. |
| MolecularWeightCalc | NULL. |
| MolecularWeightExp | NULL. |
| OctH20PartitionCoeff | NULL. |
| SystematicName | NULL. |
| WaterSolubility | NULL. |
| Smiles | NULL. |
| DataSetWID | The WID assigned to the DataSet (see DataSet table above). |
Each of the effectors is linked to each of the enzymatic reactions by
the
EnzReactionWIDChemicalWID table:
| EnzymaticReactionWID | The WID of the Enzymatic Reaction (see above). |
| ChemicalWID | The WID assigned to the chemical |
| InhibitOrActivate | 'A' |
| Mechanism | NULL. |
| PhysioRelevant | NULL. |
The comment item contains free form text information commenting on
the
enzyme. This item populates the CommentTable:
| OtherWID | The WID assigned to this enzyme (see NAME above). |
| Comm | The comment string. |
There may be several comments associated with each enzyme.
The pathway item is a cross-link to the KEGG PATHWAY data, and consists of the pathway map accession number, followed by the description. As that database is not parseable, this entry is used to associate reactions into pathways.
A reference (sum of organisms) pathway is created, if it does not already exist:
| WID | A new WID assigned to this object. |
| Name | The given descriptive name of the pathway. |
| Type | 'R' (Reference). |
| BioSourceWID | The BioSource.WID assigned to
this organism (see GENOME Name above). |
| DataSetWID | The WID assigned to the DataSet (see DataSet table above). |
The pathway map accession number is stored in the DBID
table:
| OtherWID | The WID assigned to this pathway. |
| XID | The accession number. |
The reactions are then linked to the pathway with the PathwayReaction
table:
| PathwayWID | The WID assigned to this pathway. |
| ReactionWID | The reaction WID. |
| PriorReactionWID | NULL. |
| Hypothetical | 'U' (Unknown). |
The genes item is a cross-link to the KEGG gene catalogs, showing the genes in various organisms that encode this enzyme. This is used to create organism specific pathways, and to indicate the number of proteins to generate in loading: one is generated for each gene, as they may have different amino acid sequences.
For each organism with the necessary gene(s) a new pathway is created (if not already present). The BioSource WID is searched from the organisms previously loaded from the Genome data.
| WID | A new WID assigned to this object. |
| Name | The given descriptive name of the pathway. |
| Type | 'O' (Organism). |
| BioSourceWID | The BioSource.WID assigned to
this organism (see GENOME Name above). |
| DataSetWID | The WID assigned to the DataSet (see DataSet table above). |
The pathway map accession number for this pathway is stored in the DBID
table:
| OtherWID | The WID assigned to this pathway. |
| XID | The accession number. |
And the Enzyme is linked to the BioSource by the BioSourceWIDProteinWID
table:
| BioSourceWID | The BioSource.WID assigned to
this organism (see GENOME Name above). |
| ProteinWID | The WID assigned to the Enzyme. |
Each reaction is then assigned to the new pathway (PathwayReaction table) in the same way as for the reference pathway above:
| PathwayWID | The WID assigned to this pathway. |
| ReactionWID | The reaction WID. |
| PriorReactionWID | NULL. |
| Hypothetical | 'U' (Unknown). |
The disease item is a cross-link to OWIM (On-line Mendelian
Inheritance in
Man) database. This is used to populate the CrossReference
table:
| OtherWID | The WID assigned to this enzyme (see NAME above). |
| XID | The MIM Number. |
| DatasetWID | NULL. |
| DatasetName | "MIM'' |
The motif item is a cross-link to the PROSITE database. Each PROSITE
identifier is used to populate the CrossReference table:
| OtherWID | The WID assigned to this enzyme (see NAME above). |
| XID | The PROSITE ID. |
| DatasetWID | NULL. |
| DatasetName | "PS'' |
The structures item is a cross-link to PDB-the Protein Data
Bank-which
stores the three dimensional structure information for proteins. Each
PDB
identifier is used to populate the CrossReference table:
| OtherWID | The WID assigned to this enzyme (see NAME above). |
| XID | The PDB ID. |
| DatasetWID | NULL. |
| DatasetName | "PDB'' |
The dblinks item contains cross-link information to other databases,
including the ENZYME Nomenclature database from the Swiss Institute of
Bioinformatics. This is used to populate the CrossReference
table:
| OtherWID | The WID assigned to this enzyme (see NAME above). |
| XID | The external database identifier. |
| DatasetWID | NULL. |
| DatasetName | The external database name. |