Eukaryotic Species Complexity and Control of Gene Transcription
The increasing complexity of eukaryotic organisms was thought to arise from an increasing number of genes. This simplistic assumption has not been validated from the results of sequencing and annotating the genomes of many eukaryotic organisms. Compare these statistics: the number of putative genes in the simple nematode round worm C. Elegans, the fruit fly drosophila, and the human are approximately 20,000, 14,000, and about 30,000. There seems to be little correlation of species complexity with number of genes. Other possible mechanisms for increasing complexity from a given genome size include producing different proteins from the same genes through differential splicing of RNA transcripts and rearranging DNA as occurs in immune cells to produce the huge repertoire of possible antibody molecules necessary for recognition of nonself molecules (such as viruses and bacteria). These mechanisms can not account for the incredible complexity of the human species. Levine and Tjian have proposed two other mechanisms that could account for increasing complexity. Complexity would arise from the number of gene expression patterns and involve the involvement of nonprotein-coding regions of the genome, which in humans accounts for up to 98% of the genome. One mechanism requires the presence of greater numbers and complexity of DNA regulatory sequences (enhancers, silencers, promoters) in more complex organisms. Since these sequences are in the DNA (the molecule that is transcribed), they are called cis-regulatory sequences. The second mechanism involves an increase in the elaboration and complexity of proteins (trans-regulatory elements) that regulate gene expression in more complex organisms. These proteins could include transcription factors, proteins interacting with enhancer sequences, and proteins involved in chromatin remodeling (described above). They estimate that up to a third of the human genome (1 billion base pairs) might be involved in the regulation of gene transcription. In addition, 5-10% of all proteins expressed from genes appear to regulate gene transcription. There appears to be about 300, 1000, and 3000 transcription factors in yeast, drosophila and C. elegans, and humans, respectively. There is about one transcription factor for every gene in yeast, but one for every ten in humans.
In simple eukaryotes, cis regulatory elements would include the promoter (TATA box region), and upstream regulatory sequences (enhancer) and silencers about 100-200 base pairs from the promoter. In more complex eukaryotic species like humans, the promoter is more complex, containing the TATA box, initiator sequences (INR) and downstream promoter elements (DPE). Upstream cis regulatory elements (as far as 10 kb from the promoter) include multiple enhancers, silencers, and insulators. Most promoters have TATA boxes, where TATA Binding Protein (TBP) binds. Upstream elements in turn regulate the binding of TBP.
- Eukaryotic promoters and regulatory regions
- Eukaryotic multisubunit general transcription apparatus
- Biological Regulation: BioBase - Gene Regulation (TRANS-FAC 6.0 public site free with registration)
Comparative Genomics - Gene Expression Differences Between Humans and Chimpanzees
Our closest biological relative is the chimpanzee, who branched off from a common ancestor of both of us about six million years ago. Our DNA sequence appears to be 98.6% identical (not just homologous). If we are so close in our genetic blue print, how can we be so different? There are many possible conjectures that can be answered by comparing the chimp and human genomes. Our genes are presumably very similar. People suspect that there are two major kinds of differences that make our species different:
- our genes are very similar but are transcribed differently in the two species. Recent evidence show that the types of RNA transcribed by human and chimp livers are very similar, but many more genes are transcribed in human brains compared to chimp brains.
- humans may have lost genes (or their function) that are required for chimp survival in the jungle. The observations that chimps are resistant to many of the disease pathogens that affect humans (immunodeficiency viruses like HIV, influenza A virus, hepatitis B/C, malarial parasite) could be explained by the loss of "protective" genes in humans. In addition, cardiovascular disease and certain types of cancer are rarer in chimps. Humans have apparently "lost" genes involved in body hair, strength, and early maturation, traits that would adapt the chimp to life in the jungle.
We previously discussed an example of a loss of gene function in humans. We have lost a hydroxylase gene involved in formation of certain types of sialic acids, specifically N-glycolylneuraminic acid, found on cell surface glycogroteins of mammals other than humans. Chimps have a lectin receptor for this sialic acid. Recent work has shown that humans lack a critical Arg in our version of the lectin that would recognize N-glycolylneuraminic acid, making it unable to bind this ligand. Hence both pairs of genes involved in these type of interactions (cell:cell) are missing. Since sialic acid molecules are often involved in pathogen:host binding, these differences in humans compared to chimps might account for the difference in disease susceptibility as mentioned above.
With respect to gene transcription in the brain, Lai et al. have found a mutation in the human gene FOXP2, a transcription factor, in a family that has significant difficulty in controlling muscles required for articulation of words. This mutation also causes problems in language processing and grammar construction. Comparison of the normal human gene with other primate genes shows distinct differences in the human gene which may have conferred on humans the ability to use speech.
Chimp chromo 22 is homologous to human chromosome 21. Recently, sequencers have found 1.44% single nucleotide changes between the two, a finding in line with overall homology between chimp and human DNA of 98.6%. The surprising finding was 68,000 insertions and deletions (indels) compared to humans. Most were short (<30 nucleotides). Those longer than 300 involved mobile genetic elements (transposons). Humans have a much higher incidence of insertions called Alu repeats. A high figure of 20% of homologous genes displayed significantly different expression levels.
In September 2005, a draft sequence of the chimpanzee genome and a comparison with the human genome was published by The Chimpanzee Sequencing and Analysis Consortium. Here are some of their findings:
- "single nucleotide substitutions occur at a mean rate of 1.23% between copies of the human and chimpanzee genome."
- "insertion and deletion (indel) events are fewer in number than single-nucleotide substitutions, but result in 1.5% of the euchromatic sequence in each species being lineage-specific."
- "There are notable differences in the rate of transposable element insertions: short interspersed elements (SINEs) have been threefold more active in humans, whereas chimpanzees have acquired two new families of retroviral elements."
- "Orthologous proteins in human and chimpanzee are extremely similar, with ~29% being identical and the typical orthologue differing by only two amino acids."
Since their genomes are over 3 billion base pairs, a 2% difference would mean around 60 million differences. The actual number appears to be 35 million single nucleotide differences (not counting insertions and deletions). Most of these would be expected not to be in genes and have little overall effect on phenotypic differences between the species. Finding the critical difference will be time consuming, and may require the sequencing of other primate genomes.
In their summary of the finding, Li and Saunders discuss changes in nucleotides that are synonymous (no changes in amino acids in the protein) and nonsynonymous. If a region of a gene can not tolerate changes that lead to amino acid alterations (i.e the nucleotides are under significant selective pressure not to change), the nonsyonymous rate of substitution would be lower than the rate of synonymous change. If change can occur without structure/function loss in the protein, the two rates would be similar. Comparing over 13,000 gene pairs from both organisms, they found the nonsynonymous rate to be about 25% of the synonymous rate. Hence most of the genes are conserved between species and would not be expected to contribute to the phenotypic difference in the organisms. Of the genes that showed higher nonsynonymous rates, none were obviously linked to brain function, but many were involved in immune function.
The biggest differences between the genomes were insertions/deletions (indels, numbering around 5 million) and gene duplications, not single nucleotide mutation. Insertions are often of two classes. Insertions include duplication of DNA stretches and addition of transposons ("jumping" gene or moveable DNA elements). These can be small (such as Alu repeats) or long (such as L1 insertions). In the human genome there are 7000 Alu sequences but in chimps there are 2300. Both have about equivalent numbers of Li insertions. Given that we have lost some traits (such as hair and strength), perhaps some chimp genes were lost in the human genome by the presence of indels. 53 such human genes were found. Perhaps the biggest change between chimps and humans is altered gene expression, which was not studied in this paper.
In another study by Xiaoxia Wang et al, a comparison was made of "pseudogenes" in humans (genes that acquired mutations in the past that disrupted their expression as functional protein) and corresponding genes in chimps that still maintain function (i.e. they lead to functional proteins). Analysis showed that the identified pseudogenes were not randomly distributed among different classes of genes. Rather, they were concentrated in genes encoding olfactory receptor proteins, bitter tastant receptors, and immune system genes. Homo sapiens have a much diminished sense of smell. Bitter receptors probably became less important as humans switched from plants which contain many bitter toxins to meats. They attribute changes in immune system genes to changes in environment which might lead to gene loss if the intensity of the immune response, and the balance of immune self and nonself recognition, might be altered in different environmental conditions.
Another major difference has been noted in gene copy number. Work by Hahn et al shows that gene copy number between human and chimps differ by 6.4%. After diverging from a common ancestor, humans gained 689 copies of some genes, compared to 26 for chimps. Likewise, humans lost 86 copies of some genes compare to a lose in chimps of 729 copies.
What Maintains Species? Barriers to Interspecies Hybrids
New species seem to arise, according to evolutionary theory, when members of a species become geographically isolated. Each separated population accrues different mutations in their genomes, which confer adaptive advantages to each population in their different environment. With a long enough divergence time, genetic barriers to the production of viable hybrids between the population develop, leading to the divergence of the populations into separate species. This rational explanation doesn't give a specific molecular mechanism causing hybrid failure. In the 1930s, Dobzhansky and Muller proposed that changes in two genes that produce proteins that interact could account for interspecies hybrid failure. These genes would presumably mutate at a faster rate than usual. Within a species, the two genes would co-mutate at similar rates to produce proteins that still interact, but fast evolutionary change in the other "soon to be new species" gene pair would make hybrids produced from mating infertile at best, or lethal.
Brideau et al. have found a gene pair, lethal hybrid rescue (Lhr), that in Drosophila simulans diverged functionally, and hybrid male rescue (Hmr) in Drosophila melanogaster, which also has diverged functionally. F1 hybrid male offspring from crosses died. The Hmr gene in D. melanogaster is a transcription factor. The hmr gene is one of the most rapidly evolving genes in the genome. The exact function of the Lhr gene is uncertain but is associated with condensed chromatin (heterochromatin).