Mutations and selection
Evolution at the genetic level begins with mutations generating genetic variants. Variants can be selected for or against if they affect fitness and thus increase or decrease in frequency in a population. If there is no selection on the variant, we say it is "neutral". In this case, it is subject to genetic drift and will increase or decrease in frequency just by chance. Most mutations in animals are in intergenic regions, but some will naturally occur in genes. We usually think of these genic mutations as the ones most likely to be seen by selection. If we are looking over long-ish periods of evolutionary time, we will likely not see many mutations that reduce fitness since these should be weeded out by natural or sexual selection. Instead, we expect to see mutations that are:
- neutral (no effect on fitness)
- nearly neutral (very little effect on fitness)
- buffered (invisible due to epistatic effects) or
- positive (positive effect on fitness)
When we compare sequences of homologous genes (genes that come from the same common ancestral gene) we can run analyses to tell us whether the sequence is under "positive" or "purifying" selection. Genes under purifying selection have had mutations "purified" out. In this case, there is one "best" sequence and most species in a taxon will have very similar sequences of this gene. When mutations arise in the gene, the individuals carrying the mutations tend to have lowered fitness and will not produce many offspring carrying those mutations. Comparing the sequences of distantly related species we see fewer differences than we would expect to see by chance (or neutral genetic drift).
Genes under positive selection show the opposite - more differences in genetic sequence in a taxon than we would expect to see by chance. A positively selected gene has had a history of mutations that gave individuals higher fitness. These mutations were different in different lineages, giving rise to species with different sets of positively selected mutations. In summary, if most mutations in a gene will make it less functional and result in a negative effect on fitness, we might expect to see purifying selection on that gene (i.e. most mutations will be weeded out by natural selection, Figure 1). But if there are specific mutations that function better in different ecosystems, we might expect to see positive selection on that gene (i.e. different mutations will prevail in different environments).
Figure 1: Positive vs. Purifying Selection. A mutation that affects the function of a gene (i.e. not neutral) can change the shape and function of the product (here the product is a protein, but it could also be an RNA) or the expression pattern of the product. If this change in function or expression increases organismal fitness, we expect the frequency of the allele to increase over time (Positive Selection). If the mutation decreases organismal fitness, we expect the frequency of the new allele to decrease over time (Purifying Selection).
To find out whether a gene is under positive or purifying selection, we compare the gene sequence across organisms. If one species or lineage has many functional changes in a gene compared to its relatives, we say that the gene is under positive selection. If there are very few functional changes between distantly related species, we say that the gene is under purifying selection. Under purifying selection new gene variants (alleles) tend to decrease fitness. This could be because the gene is already at peak fitness or because the gene affects multiple processes (pleiotropy).
The "background rate" of mutation over evolutionary time can be estimated by the accumulation of neutral (or nearly neutral) mutations. These are mutations that should not change the function of the gene. The easiest place to look for these are synonymous changes to the DNA sequence. Synonymous changes are mutations in protein-coding DNA that do not affect protein sequence. These are often in the third codon. Nonsynonymous changes can affect function by changing the protein sequence Because synonymous changes don't affect protein function, we can use them to estimate the rate of neutral drift.
In the example shown, mouse, human, and snail sequences in a protein coding region are compared. DNA sequence that is the same for all species is shown in red. Pairwise comparisons counting the number of mutations that result in a amino acid change (NS for nonsynonymous) are compared to the number of mutations that do not result in an amino acid change (S for synonymous). When the number of S changes is higher than the number of NS changes, the gene is likely under purifying selection. That is, the number of functional changes is much lower than the number of neutral changes. When the number of S changes is lower than the number of NS changes, the gene is likely under positive selection.
Mutation location in a gene
In the section above, we categorized mutations by their effect on the phenotype. We can also categorize mutations by their position within a gene. There can be mutations in a coding region or mutations in a cis-regulatory region (there can also be mutations in a "junk" region but we don't need to consider those here).
Cis-regulatory mutations can change the expression pattern of a gene and cause loss of expression, lowered expression, increased expression, or ectopic expression (expression in a new place and/or at a new time). If the gene mutated is a regulatory gene itself, this can also have downstream effects on the expression of its targets. Overall, cis-regulatory mutations can change the "cellular fingerprint," which is the suite of genes expressed in a certain cell that gives that cell its identity and function. This can change the way a cell "behaves", both in terms of its own anatomy and physiology as well as the way it interacts with the cells around it, potentially even changing their fate via cell-cell signaling.
Mutations in coding regions can change the structure and function of a protein or RNA. These changes can be strictly structural in that they affect cellular anatomy, physiology, and/or cell signaling. For example, a mutation in the functional protein Myoglobin can change how well a muscle cell stores oxygen. On the other hand, a mutation in the Wnt pathway receptor Frizzled can change whether a cell is responsive to Wnt signaling. The Frizzled mutation isn't just structural since it might also affect target gene expression, turning Wnt-pathway targets on or off in a new pattern.
A third way to categorize mutations is by examining their physical effect on the genome - did the mutational event delete a large portion of the genome? Did it convert one nucleotide into another? The three main types of mutation we will consider are:
- Point mutations: converting one nucleotide pair into another
- Indels: Insertions and deletions that add or remove nucleotides from the genome. These can be caused by transposons, slippage during replication, or DNA repair enzymes.
- Duplications: This is a specific type of insertion mutation that inserts a piece of existing genetic sequence into a new locus without removing it from the original locus.
Of these three, duplications and insertions can add genomic complexity. Here I am defining complexity as the number of unique parts in a system. Genomic complexity would be increased if we add in new genetic information. It seems straightforward to imagine a case where an increase in genomic complexity would result in an increase in organismal complexity. For example, we add in an extra gene for digesting amylose and we increase the amount of grain we can eat - diversifying our diet and making it more complex. However, we can also increase genomic complexity without increasing organismal complexity and we can even increase organismal complexity without increasing genomic complexity.