The Sequence Data Bases and Formats

Entrez Gene.  Entrez Gene NCBI's database for gene-specific information. It does not include all known or predicted genes, but focuses on the genomes that have been completely sequenced and presents the curated results from RefSeq. It provides a first glance at the gene construct (introns and introns), and genomics (the sequence of the neighboring coding and non-coding regions, and the cytogenetic locus). It also provides links to the Nucleotide data base and the Protein data base.

Nucleotide Data Bases.   The Nucleotides Data Base is really a compendium of many databases, the most important of which are GenBank and RefSeq.

This data base provides the base sequence and the translated amino acid sequence for each record stored in GenBank.  In addition information is given about the position of any coding sequences identified within the record, and the putative function of the protein.

GenBank is a collaboration of NCBI, the DNA Data Base of Japan (DDJB) and the European Molecular Biology Laboratory (EMBL). Sequences may be submitted by investigators to any of these databases, and appears in all of them.

Ref Seq.
The GenBank archival sequence database includes publicly available DNA sequences from individual laboratories and large-scale sequencing projects which have been submitted to EMBL or DDBJ as well as GenBank.  Whereas GenBank is an archival repository of sequences, the RefSeq database is a non-redundant set of curated reference sequences which represent our current knowledge of known genes. Moreover RefSeq records are owned by NCBI and therefore can be updated as needed to maintain current annotation or to incorporate additional sequence information.

Other Data Bases at NCBI.  There are many other databases at NCBI. The few which follow may be among the most interesting.

The Protein Data Base.
The Protein database contains sequence data from the translated coding regions from DNA sequences in GenBank, EMBL, and DDBJ.

The Cancer Chromosome Data Base.
Cancer Chromosomes contains databases related to genes associated with cancers.

Online Mendelian Inheritance n Animals (OMIA).
Online Mendelian Inheritance in Animals (OMIA) is a database of genes, inherited disorders and traits in animal species (other than human and mouse). 

The SNP Data Base.
SNP is a central repository database for both single-base nucleotide substitutions and short deletion and insertion polymorphisms. 

