The Reference Sequence (RefSeq) project provides sequence data
and related information for the scientific community to use as a standard.
These RefSeq database therefore provides standard sequences for gene characterization,
mutation analysis, expression studies, and polymorphism discovery.
The difference between RefSeq and GenBank is that GenBank is an archival
repository of sequences submitted by innumerable investigators from all
over the world, many of which are essentially the same. As a product of
expert curation at NCBI, RefSeq addresses several limitations of GenBank:
-
The RefSeq database is non-redundant
because it is
composed of a single sequence, derived from all the similar sequences in
GenBank.
-
Each RefSeq record serves as a reference standard because
in principle it is more accurate, and more completely annotated, than any
single sequence in GenBank.
-
GenBank sequence records are owned by the original submitter and can not
be altered by a third party. RefSeq sequences are created by NCBI
curators from primary sequences submitted to GenBank.
-
Since they are owned by NCBI, RefSeq records can be updated as needed to
incorporate additional sequence information, and to update annotation which
reflects current knowledge of the corresponding biology.
The RefSeq database contains many different kinds of records, which
are distinguished by a prefix to their accession numbers. Some of the more
common prefixes, and the types of records to which they are affixed are
given below:
Format
|
Description
|
NM_123456 |
This prefix indicates that
the record contains the sequence of mature RNA transcripts. These are the
sequences after exon splicing, which code for proteins. |
NP_123456 |
This prefix indicates that
the record contains the amino acid sequence of proteins, derived from translation
of the mature RNAs. |
NC_123456 |
This prefix indicates that
the record contains the complete genome sequence of organisms, organelles,
chromosomes or plasmids. These records provide the sequence of entire genes,
including the exons, introns and splice points. |
NT_123456 |
This prefix indicates that
the record contains the DNA sequence of a contig which is part of a larger
assembly. These records provide the sequence of entire genes, including
the exons, introns and splice points. |
NR_123456 |
This prefix indicates that
the record contains the DNA sequence of non-coding transcripts such as
structural RNAs and transcribed pseudogenes. |
|
X |
Note:
RefSeq flat files are all in the GenBank format!
-
Locus section
-
References section
-
Features Table with Locations and Qualifiers
-
ORIGIN
|
|