Having introduced you to the genetic code and mRNA, however briefly, we now return to the process by which a polypeptide is specified by a DNA sequence. Our first task is to understand how it is that we can find the specific region of the DNA molecule that encodes a specific polypeptide; we are looking for a (relatively) short region of DNA within millions (in prokaryotes) or billions (in eukaryotes) of base pairs of DNA. So while the double stranded nature of DNA makes the information stored in it redundant (a fact that makes DNA replication straightforward), the specific nucleotide sequence that will be decoded using the genetic code is present in only one of the two strands. From the point of view of polypeptide sequence the other strand is nonsense.
As we have noted, a gene is a region(s) of a larger DNA molecule. Part of the gene’s sequence is known as its regulatory region; this region of DNA is used (as part of a larger system involving the products of other genes) to specify when, where, and how much the gene is “expressed”. So what is expressed? This is the part of the gene’s sequence that is used to direct the synthesis of an RNA molecule, known as the transcribed region or a transcript. Within the transcribed region is the region of the RNA that actually encodes the polypeptide, through the process of translation - this is known as the coding region. The regions of the RNA that are not translated are known as untranslated regions (UTRs). Typically the coding region of an RNA molecule is located between a 5’ UTR and 3’ UTR.
Once a gene’s regulatory region is identified (by the binding of specific type of protein - see below), a DNA-dependent, RNA polymerase binds to the protein-DNA complex and the synthesis of an mRNA molecule begins. As a general simplification, we will say that a gene is expressed when the RNA that its transcribed region encodes is synthesized (note: while regulatory regions are generally not transcribed, they are still part of the gene). We can postpone further complexities to later on (and to subsequent classes). It is important to recognize that an organism as “simple” as a bacterium contains thousands of genes and that different sets of genes are used in different environments and situations, and in different combinations to produce specific behaviors. In some cases, these behaviors may be mutually antagonistic. For example, a bacterium facing a rapidly drying out environment might turn on specific genes involved in rapid growth and division in order to prepare itself (through the expression of other genes that turn on) to survive in a more hostile environment. Our goal is not to have you accurate predictions about the behavior of an organism in a particular situation, but rather to be able to make plausible predictions about how gene expression will change in response to various perturbations. This requires us to go consider, although at a rather elementary level, a few of the regulatory processes are active in cells.
So you need to think, what are the molecular components that can recognize a gene’s regulatory sequences? The answer is proteins. The class of proteins that do this are known generically as transcription factors. Their shared property is that they bind with high affinity to specific sequences of nucleotides within DNA molecules. The next question is how is an RNA made based on a DNA sequence? The answer is DNA-dependent RNA polymerase, which we will refer to as RNA polymerase. Often groups of genes share regulatory sequences recognized by specific transcription factors. As we will see this makes it possible to regulate groups of particular genes in a coordinated manner. Now let us turn to how, exactly (although at low resolution), this is done, first in bacteria and then in eukaryotic cells.
At this point, we need to explicitly recognize common aspects of biological systems. They are highly regulated, adaptive and homeostatic - that is, they can adjust their behavior to changes in their environment (both internal and external) to maintain the living state. These types of behaviors are based on various forms of feedback regulation. In the case of the bacterial gene expression system, there are genes that encode specific transcription factors. Which of these genes are expressed determines which transcription factor proteins are present and which genes are actively expressed. Of course, the gene encoding a specific transcription factor is itself regulated. Transcription factors can act positively or negatively, which means that they can lead to the activation of transcription or its inhibition. In addition the activity of a particular transcription factors can be regulated (a topic we will return to later on in this chapter).
For a transcription factor to regulate a specific gene, either positively or negatively, it must be able to bind to specific sites on the DNA. Whether or not a gene is expressed (whether it is “on” or “off”) depends upon which transcription factors are expressed, are active, and can interact productively with the DNA-dependent, RNA polymerase (RNA polymerase). Inactivation of a transcription factor can involve a number of mechanisms, including its destruction, modification, or interactions with other proteins, so that it can no longer interacts productively with either its target DNA sequence or the RNA polymerase. Once a transcription factor is active, it can diffuse through out the cell and (in prokaryotic cells that do not have barrier control interactions with DNA) can bind to its target DNA sequences. Now an RNA polymerase can bind to the DNA-transcription factor complex, an interactions that leads to the activation of the RNA polymerase and the initiation of RNA synthesis, using one DNA strand to direct RNA synthesis. Once RNA polymerase has been activated, it will move away from the transcription factor-DNA complex. The DNA bound transcription factor can then bind another polymerase or the transcription factor can release from the DNA (in response to molecular level collisions), which will diffuse away, interact with other regulatory factors, or rebind to other sites in the DNA. Clearly the number of copies of the transcription factor and its interaction partners and DNA binding sites will impact the behavior of the system.
As a reminder, RNA synthesis is a thermodynamically unfavorable reaction, so for it to occur it must be coupled to a thermodynamically favorable reaction, in particular nucleotide triphosphate hydrolysis (see previous chapter). The RNA polymerase moves along the DNA (or the DNA moves through the RNA polymerase, your choice), to generate an RNA molecule (the transcript). Other signals within with the DNA lead to the termination of transcription and the release of the RNA polymerase. Once released, the RNA polymerase returns to its inactive state. It can act on another gene if the RNA polymerase interacts with a transcription factor bound to its promoter. Since multiple types transcription factor proteins are present within the cell and RNA polymerase can interact with all of them, which genes are expressed within a cell will depend upon the relative concentrations and activities of specific transcription factors and their regulatory proteins, together with the binding affinities of particular transcription factors for specific DNA sequences (compared to their general low-affinity binding to DNA in general).