Molecular
Cell BiologyRNA
Processing, Nuclear Transport, and Post-Transcriptional Control
11.2. Processing of Eukaryotic mRNA
As discussed in Chapter
4, the initial primary transcript synthesized by RNA polymerase II
undergoes several processing steps before a functional mRNA is produced.
In this section, we take a closer look at how eukaryotic cells carry out
mRNA processing, which includes three major processes: 5′ capping,
3′ cleavage/polyadenylation, and RNA
splicing (Figure
11-7). Processing occurs in the nucleus, and the functional mRNA produced
is transported to the cytoplasm by mechanisms discussed later.
After nascent RNA molecules produced by RNA polymerase II reach a length of 25 – 30 nucleotides, 7-methylguanosine is added to their 5′ end. This initial step in RNA processing is catalyzed by a dimeric capping enzyme, which associates with the phosphorylated carboxyl-terminal tail domain (CTD) of RNA polymerase II. Recall that the CTD becomes phosphorylated during transcription initiation (see Figure 10-50). Because the capping enzyme does not associate with polymerase I or III, capping is specific for transcripts produced by RNA polymerase II. One subunit of the capping enzyme removes the γ-phosphate from the
5′ end of the nascent RNA emerging from the surface of a RNA polymerase
II (Figure
11-8). The other subunit transfers the GMP moiety from GTP to the 5′-diphosphate
of the nascent transcript, creating the guanosine 5′-5′-triphosphate
structure. In the final steps, separate enzymes transfer methyl groups
from S-adenosylmethionine to the N7 position of the guanine
and the 2′ oxygens of riboses at the 5′ end of the nascent RNA.
Nascent RNA transcripts from protein-coding genes and mRNA processing intermediates, collectively referred to as pre-mRNA, do not exist as free RNA molecules in the nuclei of eukaryotic cells. From the time nascent transcripts first emerge from RNA polymerase II until mature mRNAs are transported into the cytoplasm, the RNA molecules are associated with an abundant set of nuclear proteins, as numerous in growing eukaryotic cells as histones. These proteins are the major protein components of heterogeneous ribonucleoprotein particles (hnRNPs), which contain heterogeneous nuclear RNA (hnRNA), a collective term referring to pre-mRNA and other nuclear RNAs of various sizes. The proteins in these ribonucleoprotein particles can be dramatically visualized with fluorescentlabeled monoclonal antibodies (Figure 11-9). To identify hnRNP proteins, researchers exposed cells to high-dose UV irradiation, which causes covalent cross-links to form between RNA bases and closely associated proteins. Chromatography of nuclear extracts from treated cells on an oligo-dT cellulose column, which binds RNAs with a poly(A) tail, was used to recover proteins that had become cross-linked to nuclear mRNA in living cells (i.e., hnRNP proteins). Subsequent treatment of cell extracts from unirradiated human cells with monoclonal antibodies specific for the major hnRNP proteins identified by this cross-linking technique revealed a complex set of abundant hnRNP proteins ranging in size from 34 to 120 kDa. Characterization of the mRNAs encoding these proteins has shown that some of them (e.g., A2 and B1) are related proteins derived by alternative splicing of exons from the same transcription unit. Binding studies with purified hnRNP proteins suggest that different hnRNP proteins associate with different regions of a newly made pre-mRNA molecule as determined by the sequence of the RNA. For example, the hnRNP proteins A1, C, and D bind preferentially to the pyrimidine-rich sequences at the 3′ ends of introns, discussed in a later section. Like transcription factors, most hnRNP proteins have a modular structure. They contain one or more RNA-binding domains and at least one other domain that is thought to interact with other proteins. Several different RNA-binding motifs have been identified by constructing deletions of hnRNP proteins and testing their ability to bind RNA. Although some RNA-binding proteins contain domains with the zinc-finger motif common in DNA-binding proteins (see Figure 10-41), this motif has not yet been described in any hnRNP proteins. The RNP motif, also called the RNA-binding domain (RBD), is the most common RNA-binding domain in hnRNP proteins. This ≈80-residue motif, which occurs in many other RNA-binding proteins, contains two highly conserved regions (RNP1 and RNP2) that allow the motif to be recognized in newly sequenced proteins. X-ray crystallographic analysis has shown that the RNP motif consists of a four-stranded β sheet flanked on one side by two α helices. The conserved RNP1 and RNP2 sequences lie side by side on the two central β strands, and their side chains make multiple contacts with a single-stranded region of RNA. The single-stranded RNA loop lies across the surface of the β sheet and fits into a groove between the protein loop connecting strands β2 and β3 and the C-terminal region (Figure 11-10). The RGG box, another RNA-binding motif found in hnRNP proteins, contains five Arg-Gly-Gly (RGG) repeats with several interspersed aromatic amino acids. Although the structure of this motif has not yet been determined, its arginine-rich nature is similar to the RNA-binding domains of the λ-phage N and HIV Tat proteins. The 45-residue
KH
motif is found in the hnRNP K protein and several other RNA-binding
proteins; commonly two or more copies of the KH motif are interspersed
with RGG repeats. The three-dimensional structure of a representative KH
motif, determined by NMR methods (Section 3.5), is similar to that of the
RNP motif but smaller, consisting of a three-stranded β sheet supported
from one side by a single α helix. It is not yet clear how this motif
binds RNA. Mutations in the fragile-X gene (FMR1), which encodes
a protein containing the KH motif, are associated with the most common
form of heritable mental retardation. Although the molecular function of
the Fmr1 protein is unknown, it presumably involves RNA binding.
The association of pre-mRNAs with hnRNP proteins may prevent formation of short secondary structures dependent on base-pairing of complementary regions, thereby making the pre-mRNAs accessible for interaction with other macromolecules (Figure 11-11). Moreover, pre-mRNAs associated with hnRNP proteins present a more uniform substrate for further processing steps than would free, unbound pre-mRNAs each type of which forms a unique secondary structure dependent on its specific sequence. The diversity of hnRNP proteins suggests that they probably have other
functions as well. For example, various hnRNP proteins may interact with
the RNA sequences that specify RNA splicing or cleavage/polyadenylation
and contribute to the structure recognized by RNA-processing factors. Finally,
cell-fusion experiments have shown that some hnRNP proteins remain localized
in the nucleus, whereas others cycle in and out of the cytoplasm, suggesting
that they function in the transport of mRNA (see later section).
In animal cells, all mRNAs, except histone mRNAs, have a 3′ poly(A) tail. Early studies of pulse-labeled adenovirus and SV40 RNA demonstrated that the viral primary transcripts extend beyond the poly(A) site in the viral mRNAs. These results suggested that A residues are added to a 3′ hydroxyl generated by endonucleolytic cleavage, but the predicted downstream RNA fragments are degraded so rapidly in vivo that they cannot be detected. However, this cleavage mechanism was firmly established by detection of both predicted cleavage products in in vitro processing reactions performed with extracts of HeLa-cell nuclei. Early sequencing of cDNA clones from animal cells showed that nearly all mRNAs contain the sequence AAUAAA 10 – 35 nucleotides upstream from the poly(A) tail. Polyadenylation of RNA transcripts from transfected genes is virtually eliminated when template DNA encoding the AAUAAA sequence is mutated to any other sequence except one encoding AUUAAA. The unprocessed RNA transcripts produced from such mutant templates do not accumulate in nuclei, but are rapidly degraded. Further mutagenesis of sequences within a few hundred bases of poly(A) sites revealed that a second signal downstream from the cleavage site is required for efficient cleavage and polyadenylation of most pre-mRNAs in animal cells. This downstream poly(A) signal is not a specific sequence but rather a GU-rich or simply a U-rich region within ≈50 nucleotides of the cleavage site. Identification and purification of the proteins required for cleavage and polyadenylation of pre-mRNA has led to the model shown in Figure 11-12. According to this model, a 360-kDa cleavage and polyadenylation specificity factor (CPSF), composed of four different polypeptides, first forms an unstable complex with the upstream AU-rich poly(A) signal. Then at least three additional proteins — a 200-kDa heterotrimer called cleavage stimulatory factor (CStF), a 150-kDa heterotrimer called cleavage factor I (CFI), and a second cleavage factor (CFII), as-yet poorly characterized — bind to the CPSF-RNA complex. Interaction between CStF and the GU- or U-rich downstream poly(A) signal stabilizes the multiprotein complex. Finally, a poly(A) polymerase (PAP) binds to the complex before cleavage can occur. This requirement for PAP binding links cleavage and polyadenylation, so that the free 3′ ends generated are rapidly polyadenylated. Assembly of this large, multiprotein cleavage-polyadenylation complex around the AU-rich poly(A) signal in a pre-mRNA is analogous in many ways to formation of the transcription-initiation complex at the AT-rich TATA box of a template DNA molecule (see Figure 10-50). In both cases, multiprotein complexes assemble cooperatively through a network of specific protein – nucleic acid and protein-protein interactions. Following cleavage at the poly(A) site, polyadenylation proceeds in
two phases. Addition of the first 12 or so A residues occurs slowly, followed
by rapid addition of up to 200 – 250 more A residues. The rapid phase
requires the binding of multiple copies of a poly(A)-binding protein containing
the RNP motif. This protein is designated PABII to distinguish it
from the poly(A)-binding protein that binds to the poly(A) tail of cytoplasmic
mRNAs. PABII binds to the short A tail initially added by PAP, stimulating
polymerization of additional A residues by PAP (see Figure
11-12). PABII is also responsible for signaling poly(A) polymerase
to terminate polymerization when the poly(A) tail reaches a length of 200
– 250 residues, although the mechanism for measuring this length is not
yet understood.
Portions of Two Different RNAs Are Trans-Spliced in Some OrganismsVirtually all functional mRNAs in vertebrate and insect cells are derived from a single molecule of the corresponding pre-mRNA by removal of internal introns and splicing of exons. However, in two types of protozoa — trypanosomes and euglenoids — mRNAs are constructed by splicing together separate RNA molecules. This process, referred to as trans-splicing, is also used in the synthesis of 10 – 15 percent of the mRNAs in the round worm Caenorhabditis elegans, an important model organism for studying embryonic development.The parasitic trypanosomes produce abundant amounts of a single 140-nucleotide
leader RNA from tandemly repeated transcription units. In a two-step reaction
analogous to spliceosomal pre-mRNA splicing, a 39-nucleotide portion of
the leader RNA, termed a mini-exon, is spliced to the 5′ end of
protein-coding exons in primary transcripts, which lack internal introns.
The 5′ mini-exon, present in all trypanosome mRNAs, is thought to assist
in initiation of translation. Because of trans-splicing, polycistronic
protein- coding transcription units in trypanosomes, which are common,
yield monocistronic mRNAs from their polycistronic primary transcripts.
Splicing of a 5′ mini-exon to a coding region in a primary transcript
triggers cleavage and polyadenylation at the 3′ end of the exon. Consequently,
trypanosomes use trans-splicing and linked cleavage and polyadenylation
to combine the operon organization of polycistronic transcription units
characteristic of bacteria with the monocistronic organization of mRNAs
characteristic of eukaryotes.
Under certain nonphysiological in vitro conditions, pure preparations of some RNA transcripts slowly splice out introns in the absence of any protein. This observation led to recognition that some introns are self-splicing. Two types of self-splicing introns have been discovered: group I introns, present in nuclear rRNA genes of protozoans, and group II introns, present in protein-coding genes and some rRNA and tRNA genes of mitochondria and chloroplasts in plants and fungi. Discovery of the catalytic activity of self-splicing introns revolutionized concepts about the functions of RNA. As discussed in Chapter 4, RNA is now thought to catalyze peptide-bond formation during protein synthesis in ribosomes. Here we discuss the probable role of group II introns, now found only in mitochondrial and chloroplast DNA, in the evolution of snRNAs; the functioning of group I introns is considered in the later section on rRNA processing. Even though their precise sequences are not highly conserved, all group II introns fold into a conserved, complex secondary structure containing numerous stem-loops (Figure 11-20a). Self-splicing by a group II intron occurs via two transesterification reactions, involving intermediates and products analogous to those found in nuclear pre-mRNA splicing. The mechanistic similarities between group II intron self-splicing and spliceosomal splicing led to the hypothesis that snRNAs function analogously to the stem-loops in the secondary structure of group II introns. According to this hypothesis, snRNAs interact with 5′ and 3′ splice sites of pre-mRNAs and with each other to produce an RNA structure functionally analogous to that of group II self-splicing introns (Figure 11-20b). An extension of this hypothesis is that introns in present-day nuclear pre-mRNAs evolved from ancient group II self-splicing introns through the progressive loss of internal RNA structures, which concurrently evolved into transacting snRNAs that perform the same functions. In support of this kind of evolutionary model, group II intron mutants have been constructed in which domain V and part of domain I are deleted. Such mutants are defective in self-splicing, but when RNA molecules equivalent to the deleted regions are added to the in vitro reaction, self-splicing occurs. This finding demonstrates that these domains in group II introns can be trans-acting, like snRNAs. The similarity in the mechanisms of group II intron self-splicing and spliceosomal splicing of pre-mRNAs also suggests that the splicing reaction is catalyzed by the snRNA, not the protein, components of spliceosomes. Although group II introns can self-splice in vitro at elevated temperatures and Mg2+ concentrations, under in vivo conditions proteins called maturases, which bind to group II intron RNA, are required for rapid splicing. Maturases, encoded by group II introns themselves, are thought to stabilize the precise three-dimensional interactions of the intron RNA required to catalyze the two splicing transesterification reactions. By analogy, snRNP proteins in spliceosomes are thought to stabilize the precise geometry of snRNAs and intron nucleotides required to catalyze pre-mRNA splicing. The evolution of snRNAs may have been an important step in the rapid evolution of higher eukaryotes. As internal intron sequences were lost and their functions in RNA splicing supplanted by trans-acting snRNAs, the remaining intron sequences would be free to diverge. This in turn likely facilitated the evolution of new genes through exon shuffling (Section 9.3). It also permitted the increase in protein diversity that results from alternative RNA splicing and an additional level of gene control resulting from regulated RNA splicing. One more remarkable property of group II introns deserves mention, namely,
their ability to behave as mobile DNA elements in the genome. The maturases
that increase the rate of self-splicing of these introns also contain a
domain that is homologous to reverse transcriptase. Thus group II introns
can move in the genome like other nonviral retrotransposons discussed in
Chapter
9. As is generally true for mobile DNA elements, transposition of group
II introns is rare. However, when a group II intron does transpose, it
does not inactivate the gene into which it inserts, because the inserted
intron is spliced out of the transcript produced from the target gene by
self-splicing!
The digital imaging micrographs in Figure 11-21 demonstrate that most of the nuclear polyadenylated RNA (including unspliced and partially spliced pre-mRNA and nuclear mRNA) occurs in discrete foci lying between dense regions of chromatin and that a required protein splicing factor (SC-35) is localized to the center of these same loci. The results of these and other studies suggest that transcription and RNA processing do not occur randomly throughout the eukaryotic nucleus; rather, the nucleus is organized into discreet domains (≈20 – 100 in human fibroblasts) where the bulk of transcription and RNA processing occurs. This highly organized view of the nucleus implies that there is an underlying
nuclear substructure. It has been known for many years that when mammalian
cells are treated with a mild nonionic detergent, DNase I, and high concentrations
of salt, a fibrillar network of protein and RNA remains in the region of
the nucleus (Figure
11-22). This protein network has been called the nuclear matrix,
or nuclear skeleton. It is composed of actin and numerous other
protein components that have not been fully characterized, including components
of the chromosomal scaffold that rearranges and condenses to form metaphase
chromosomes during mitosis (see Figure
9-34). However, snRNPs remain associated with the nuclear matrix prepared
from detergent-extracted, DNase I – treated cells. Moreover, when the
nuclear matrix is prepared with a low concentration of salt, pre-mRNAs
associated with the matrix undergo splicing when ATP is added. These results
suggest that the RNA-processing foci observed microscopically may be associated
with specific regions of the nuclear matrix.
• Eukaryotic mRNA precursors are processed by 5′ capping, 3′ cleavage and polyadenylation, and RNA splicing to remove introns before being transported to the cytoplasm where they are translated by ribosomes. • The cap is added to the 5′ end of a pre-mRNA nascent transcript by a capping enzyme that associates with the phosphorylated CTD of RNA polymerase II shortly after transcription initiation. • Nascent pre-mRNA transcripts are associated with a class of abundant RNA-binding proteins called hnRNP proteins. • In most protein-coding genes, a conserved polyadenylation signal (AAUAAA) lies 10 – 30 nucleotides upstream from a poly(A) site where cleavage and polyadenylation occur. A GU- or U-rich sequence downstream from the poly(A) site contributes to the efficiency of cleavage/ polyadenylation. • A multiprotein complex that includes poly(A) polymerase (PAP) carries out the cleavage and polyadenylation of a pre-mRNA. A nuclear poly(A)-binding protein, PABII, stimulates addition of A residues by PAP and stops addition once the poly(A) tail reaches 200 – 250 residues (see Figure 11-12). • RNA splicing is carried out by a very large ribonucleoprotein complex, the spliceosome, that is assembled by interactions of five different snRNP particles with each other and with pre-mRNA (see Figure 11-19). The spliceosome catalyzes two transesterification reactions that join the exons and remove the intron as a lariat structure, which is subsequently degraded (see Figure 11-16). • Group II self-splicing introns, which are found in chloroplast genes and mitochondrial genes of plants and fungi, exhibit a largely conserved secondary structure, which is necessary for self-splicing. The snRNAs in the spliceosome are thought to have an overall secondary structure similar to that of group II introns. • Most transcription and RNA processing in a mammalian cell nucleus
takes place in a limited number of domains. A nuclear matrix or scaffold
is formed by a fibrous protein network throughout the nucleus. This nuclear
matrix may help to organize the foci of RNA transcription and processing.
|
|
© 2000 by W. H. Freeman and Company. All rights reserved.