Features of the Genome: coding DNA


barnes@mail.clarion.edu             Send questions or comments to Dr. Barnes!
 
 
 



 

Coding DNA.

This is DNA  which can be related to a cell function or a phenotype at the level of the organism. It is usually thought of, either as coding for a protein or some functional element in the cell, or as playing a role in expression of a gene or circuit of genes.



cis-acting sequences: The sequences just 5' of the start site of transcription are the most important for the initiation of transcription. This is where the transcription complex is built. In general, this region is called the promoter. For eukaryotes, several sequences same to be conserved among many genes. One such sequences is the TATA box. The sequence is located about 30 bases upstream (-30) from the transcription start site and is the one sequence required for any significant transcription to occur. Other sequences add in transcription but are not always part of promoter. The two most found are the CCAAT box (called the CAT box) and the GC box. Because mutants of these three sequences only express mRNAs at low levels, these are considered the most important sequences of the basic transcription complex. [Phillip McClean, "Control of gene expression in eukaryotes, North Dakota State Univ. 1997]
http://www.ndsu.nodak.edu/instruct/mcclean/plsc431/geneexpress/eukaryex3.htm

 

enhancers: A cis- acting sequence that increases the utilization of (some)  eukaryotic promoters, and can function in either orientation and in any location (upstream or downstream) relative to the promoter. Eukaryotes and eukaryotic viruses.
http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html

 

splice sites / splice junctions:   Boundaries between exons and intron, there are two varieties:

  1. the border going from exon to intron is called a donor site or a 5' site;
  2. the border separating intron from exon is called an acceptor site or a 3' site.
[TP Speed, S. Cawley,  "Locating splice sites"  Statistics 260 Statistics in Genetics, Univ. of California- Berkeley, 1998]    http://www.stat.berkeley.edu/users/terry/Classes/s260.1998/Week12/week12/node14.html
 

Alternative initiation:    Transcription of genes with promoters containing a TATA box or initiator element begins at a well-defined initiation site.  However, transcription of many protein-coding genes has been shown to begin at any one of multiple possible sites over an extended region, often 20 - 200 base pairs in length.  As a result, such genes give rise to mRNAs with multiple alternative 5' ends. These genes, which generally are transcribed at low rates (e.g., genes encoding the enzymes of intermediary metabolism, often called "housekeeping genes"), do not contain a TATA box or an initiator. Most genes of this type contain a CG-rich stretch of 20 - 50 nucleotides within approximately 100 base pairs upstream of the start-site region. A transcription factor called SP1 recognizes these CG-rich sequences.

The dinucleotide CG is statistically underrepresented in vertebrate DNAs, and the presence of CG-rich regions just upstream from start sites is a distinctly nonrandom distribution. Such CpG islands  can be identified by their susceptibility to restriction enzymes (e.g., HpaII) that have CG in their recognition sequences. The presence of a CpG island in a newly cloned DNA fragment suggests that it may contain a transcription-initiation region. [Alison Stewart "The human gene map initiative" Genome Digest 2 (2) : 1-4]  http://www.gene.ucl.ac.uk/hugo/london.htm

 

trans-acting factors:  Trans- acting factors functionally have two domains.

The first function was discovered by studying deletion mutants of the factors. Mutants factors were found that could bind DNA but could not activate transcription.  Other experiments, in which a hybrid protein  (consisting of the non- DNA binding segment of one trans-acting factor fused to the DNA-binding region of a second trans- acting factor)  activated transcription, defined the second function of trans- acting factors.
http://www.ndsu.nodak.edu/instruct/mcclean/plsc431/geneexpress/eukaryex6.htm

 

plus strand DNA:    in retroviruses, the DNA strand whose sequence codes for protein products
minus strand DNA: in retroviruses, the complementary DNA strand
 

Go to the top of the page.


BACK to 
Introduction to Genomics