23.2: DNA Transposable Elements
- Page ID
- 15190
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)-
Define Transposable Elements (TEs) and Their Significance
- Explain what TEs are and describe their ability to move within genomes.
- Discuss the prevalence of TEs across different organisms (from bacteria to plants and animals) and how they can constitute large portions of a genome.
-
Historical and Scientific Context
- Summarize Barbara McClintock’s discovery of TEs and its initial reception.
- Recognize how later discoveries (e.g., insertion sequences in bacteria) validated and expanded upon McClintock’s work.
-
Classifications and Mechanisms of TEs
- Distinguish between the two major classes of TEs: Class I (retrotransposons, "copy-and-paste") and Class II (DNA transposons, "cut-and-paste").
- Identify and describe key orders within Class I (e.g., LTR retrotransposons, LINEs, SINEs, DIRS-like elements, Penelope-like elements) and subclass divisions within Class II (e.g., TIR elements, Helitrons, Mavericks).
-
Structural Features and Molecular Mechanisms
- Recognize characteristic structural elements such as long terminal repeats (LTRs), target site duplications (TSDs), and terminal inverted repeats (TIRs).
- Outline the general molecular mechanisms by which TEs mobilize and integrate into genomes (retrotranscription, excision, rolling-circle replication).
-
Impact on Genome Evolution and Function
- Analyze how TE insertions can disrupt gene function, modulate gene expression, or promote genomic rearrangements (e.g., deletions, duplications, inversions).
- Discuss the role of TEs in increasing genetic diversity and shaping genome architecture over evolutionary time.
-
Host–TE Interactions and Regulation
- Explain the selective pressures on both TEs and host genomes that lead to a balance between TE proliferation and host fitness.
- Describe the cellular defense mechanisms (e.g., DNA methylation, RNA interference, and sequence-specific repressors) that repress excessive TE activity.
-
Domestication and Exaptation of TEs
- Identify examples of TE “domestication” where TE-derived proteins (e.g., RAG proteins, CENP-B, SETMAR) have been repurposed for essential cellular functions.
- Evaluate how TE sequences contribute to the evolution of new regulatory elements and noncoding RNAs.
-
TEs as Sources of Mutation and Genetic Variation
- Summarize how active TE mobilization can lead to insertional mutagenesis in both the germline and somatic cells.
- Discuss the implications of TE-induced mutations in human diseases (e.g., genetic disorders, cancer) and natural population diversity.
-
Horizontal Transfer and Genome Dynamics
- Describe the concept of horizontal transposon transfer and its role in spreading TEs between different species.
- Assess how horizontal transfer contributes to rapid genomic evolution and the dissemination of TE families across taxa.
-
Interpreting TE-Related Data and Diagrams
- Develop skills in reading and interpreting TE classification schematics and genome distribution maps.
- Connect the structural and functional properties of TEs with their broader impact on genome organization and regulation.
These goals provide a framework for mastering the diverse roles that transposable elements play in genome dynamics, evolution, and cellular regulation, and will prepare you to critically evaluate both experimental data and current research in this exciting field.
Introduction
Eukaryotic genomes contain a substantial amount of repeated DNA, and some of these repeated sequences are capable of moving. Transposable elements (TEs) are defined as DNA sequences that can move from one location to another in the genome. TEs have been identified in all organisms, prokaryotic and eukaryotic, and can occupy a high proportion of a species’ genome. For example, transposable elements comprise approximately 10% of the genomes of several fish species, 12% of the C. elegans genome, 37% of the mouse genome, 45% of the human genome, and up to more than 80% of the genomes of some plants, such as maize. From bacteria to humans, transposable elements have accumulated over time and continue to shape genomes through their mobilization.
TEs were discovered by Barbara McClintock during experiments conducted in 1944 on maize. Since they appeared to influence phenotypic traits, she named them controlling elements. However, her discovery was met with less than enthusiastic reception by the genetic community. Her presentation at the 1951 Cold Spring Harbor Symposium was not understood and was at least not very well received. She had no better luck with her follow-up publications, and after several years of frustration, decided not to publish on the subject for the next two decades. Not for the first time in the history of science, an unappreciated discovery was brought back to life after another discovery had been made. In this case, it was the discovery of insertion sequences (IS) in bacteria by the Szybalski group in the early 1970s. In the original paper, they wrote: “Genetic elements were found in higher organisms which appear to be readily transposed from one to another site in the genome. Such elements, identifiable by their controlling functions, were described by McClintock in maize. They might be somehow analogous to the presently studied IS insertions”. The importance of McClintock’s original work was eventually appreciated by the genetic community with numerous awards, including 14 honorary doctoral degrees and a Nobel Prize in 1983 “for her discovery of mobile genetic elements”. Her picture is shown in Figure \(\PageIndex{1}\).

The mobilization of TEs is termed transposition or retrotransposition, depending on the nature of the intermediate used for mobilization. There are several ways in which the activity of TEs can positively and negatively impact a genome; for example, TE mobilization can promote gene inactivation, modulate gene expression or induce illegitimate recombination. Thus, TEs have played a significant role in genome evolution. For example, DNA transposons can inactivate or alter the expression of genes by insertion within introns, exons, or regulatory regions. In addition, TEs can participate in genome reorganization by mobilizing non-transposon DNA or by acting as substrates for recombination. This recombination occurs through homology between two sequences of a transposon located on the same or different chromosomes, which may be the origin of various types of chromosome alterations. Indeed, TEs can contribute to the loss of genomic DNA through internal deletions or other mechanisms.
The reduction in fitness suffered by the host due to transposition ultimately affects the transposon, since host survival is critical to the perpetuation of the transposon. Therefore, strategies have been developed by host and transposable elements to minimize the deleterious impact of transposition and to reach equilibrium. For example, some transposons tend to insert in non-essential regions of the genome, such as heterochromatic regions, where insertions are likely to have a minimal deleterious impact. In addition, they might be active in the germ line or embryonic stage, where most deleterious mutations can be selected against during fecundation or development, allowing only non-deleterious or mildly deleterious insertions to pass to successive generations. New insertions may also occur within an existing genomic insertion to generate an inactive transposon, or can undergo self-regulation by overproduction-inhibition. On the other hand, host organisms have developed various mechanisms of defense against high rates of transposon activity, including DNA methylation to reduce TE expression, several RNA interference-mediated mechanisms, primarily in the germ line, or through the inactivation of transposon activity by the action of specific proteins.
In some cases, transposable elements have been “domesticated” by the host to perform a specific function in the cell. A well-known example is RAG proteins, which participate in V(D)J recombination during antibody class switching, and exhibit a high similarity to DNA transposons, from which these proteins appear to be derived. Another example is the centromeric protein CENP-B, which seems to have originated from the pogo-like transposon. The analogous human mariner Himar1 element has been incorporated into the SETMAR gene, which consists of the histone H3 methylase gene and the Himar1 transposase domain. This gene is involved in the non-homologous end-joining pathway of DNA repair and has been shown to confer resistance to ionizing radiation. From a genome-wide perspective, it has been estimated that approximately 25% of human promoter regions and 4% of human exons contain sequences derived from transposable elements (TEs). Thus, we are likely underestimating the rate of domestication events in mammalian genomes.
The first TE classification system was proposed by Finnegan in 1989, distinguishing two classes of TEs characterized by their transposition intermediates: RNA (class I, or retrotransposons) or DNA (class II, or DNA transposons). The transposition mechanism of class I is commonly called “copy and paste,” and that of class II, “cut and paste.” In 2007, Wicker et al. proposed a hierarchical classification based on the structural characteristics and mode of replication of TEs, as shown in Figure \(\PageIndex{2}\).

Class I: Mobile Elements
As mentioned above, class I transposable elements (TEs) transpose through an RNA intermediary. The RNA intermediate is transcribed from genomic DNA and then reverse-transcribed into DNA by a TE-encoded reverse transcriptase (RT), followed by reintegration into a genome. Each replication cycle produces one new copy, and as a result, class I elements are the major contributors to the repetitive fraction in large genomes. Retrotransposons are divided into five orders: LTR retrotransposons, DIRS-like elements, Penelope-like elements (PLEs), LINEs (long interspersed elements), and SINEs (short interspersed elements). This scheme is based on the mechanistic features, organization, and phylogeny of the reverse transcriptase of these retroelements. Accidentally, the retrotranscriptase coded by an autonomous transposable element (TE) can reverse-transcribe another RNA present in the cell, such as mRNA, and produce a retrocopy of it, which in most cases results in a pseudogene.
The LTR retrotransposons are characterized by the presence of long terminal repeats (LTRs) ranging from several hundred to several thousand base pairs. Both exogenous retroviruses and LTR retrotransposons contain a gag gene that encodes a viral particle coat and a pol gene that encodes a reverse transcriptase, ribonuclease H, and an integrase, which provide the enzymatic machinery for reverse transcription and integration into the host genome. Reverse transcription occurs within the viral or viral-like particle (GAG) in the cytoplasm, and it is a multi-step process. Unlike LTR retrotransposons, exogenous retroviruses contain an env gene, which encodes an envelope that facilitates their migration to other cells. Some LTR retrotransposons may contain remnants of an env gene, but their insertion capabilities are limited to the genome from which they originated. This would rather suggest that they originated from exogenous retroviruses by losing the env gene. However, there is evidence that suggests the contrary, given that LTR retrotransposons can acquire the env gene and become infectious entities. Currently, most long terminal repeat (LTR) sequences (85%) in the human genome are found only as isolated LTRs, with the internal sequence likely lost due to homologous recombination between flanking LTRs. Interestingly, LTR retrotransposons target their reinsertion to specific genomic sites, often around genes, with putative important functional implications for a host gene. It is estimated that 450,000 LTR copies comprise approximately 8% of our genome. LTR retrotransposons inhabiting large genomes, such as maize, wheat, or barley, can contain thousands of families. However, despite the diversity, very few families comprise most of the repetitive fraction in these large genomes. Notable examples are Angela (wheat), BARE1 (barley), Opie (maize), and Retrosor6 (sorghum).
The DIRS order clusters structurally diverged groups of transposons that possess a tyrosine recombinase (YR) gene instead of an integrase (INT) and do not form target site duplications (TSDs). Their termini resemble either split direct repeats (SDR) or inverted repeats. Such features indicate a different integration mechanism than that of other class I mobile elements. DIRS were discovered in the genome of the slime mold (Dictyostelium discoideum) in the early 1980s and are present in all major phylogenetic lineages, including vertebrates. It has been shown that they are also common in hydrothermal vent organisms.
Another order, termed Penelope-like elements (PLE), has a wide, though patchy, distribution, ranging from amoebae and fungi to vertebrates, with copy numbers of up to thousands per genome. Interestingly, no PLE sequences have been found in mammalian genomes, and it appears that they were lost from the genome of C. elegans. Although PLEs with an intact open reading frame (ORF) have been identified in several genomes, including Ciona and Danio, the only transcriptionally active representative, Penelope, is known from Drosophila virilis. It causes the hybrid dysgenesis syndrome, characterized by the simultaneous mobilization of several unrelated transposable element families in the progeny of dysgenic crosses. It appears that Penelope invaded D. virilis relatively recently, and its invasive potential has been demonstrated in D. melanogaster. PLEs harbor a single ORF that codes for a protein containing reverse transcriptase (RT) and endonuclease (EN) domains. The PLE RT domain more closely resembles telomerase than the RT from LTRs or LINEs. The EN domain is related to GIY-YIG intron-encoded endonucleases. Some PLE members also have LTR-like sequences, which can be in either a direct or an inverted orientation, and contain a functional intron.
LINEs do not have LTRs; however, they have a poly-A tail at the 3′ ends and are flanked by the TSDs. They comprise about 21% of the human genome, and among them, L1, with about 850,000 copies, is the most abundant and best-described LINE family. L1 is the only LINE retrotransposon still active in the human genome. In the human genome, there are two other LINE-like repeats, L2 and L3, distantly related to L1. A contrasting situation has been observed in the malaria mosquito Anopheles gambiae, where approximately 100 divergent LINE families comprise only 3% of its genome. LINEs in plants, such as Cin4 in maize and Ta11 in Arabidopsis thaliana, appear to be rare compared to LTR retrotransposons. A full copy of mammalian L1 is about 6 kb long and contains a PolII promoter and two ORFs. The ORF1 codes for a non-sequence-specific RNA binding protein that contains zinc finger, leucine zipper, and coiled-coil motifs. The ORF1p functions as a chaperone for the L1 mRNA. The second ORF encodes an endonuclease, which makes a single-stranded nick in the genomic DNA, and a reverse transcriptase, which uses the nicked DNA to prime reverse transcription of LINE RNA from the 3′ end. Reverse transcription is often unfinished, leaving behind fragmented copies of LINE elements; hence, most of the L1-derived repeats are short, with an average size of 900 bp. LINEs are part of the CR1 clade, which has members in various metazoan species, including fruit flies, mosquitoes, zebrafish, pufferfish, turtles, and chickens. Because they encode their own retrotransposition machinery, LINE elements are regarded as autonomous retrotransposons.
SINEs evolved from RNA genes, such as 7SL and tRNA genes. By definition, they are short (up to 1000 base pairs). They do not encode their own retrotranscription machinery and are considered nonautonomous elements; in most cases, they are mobilized by the L1 machinery. The outstanding member of this class from the human genome is the Alu repeat, which contains a cleavage site for the AluI restriction enzyme, giving it its name. With over a million copies in the human genome, Alu is probably the most successful transposon in the history of life. Primate-specific Alu and its rodent relative, B1, have a limited phylogenetic distribution, suggesting their relatively recent origins. The mammalian-wide interspersed repeats (MIRs), by contrast, spread before eutherian radiation, and their copies can be found in different mammalian groups, including marsupials and monotremes. SVA elements are unique primate elements due to their composite structure. They are named after their main components: SINE, VNTR (a variable number of tandem repeats), and Alu. Typically, they exhibit the hallmarks of retroposition, i.e., they are flanked by TSDs and terminated by a poly(A) tail. It appears that SVA elements are non-autonomous retrotransposons mobilized by L1 machinery, and they are believed to be transcribed by RNA polymerase II. SVAs are transpositionally active and are responsible for some human diseases. They originated less than 25 million years ago and form the youngest retrotransposon family, with approximately 3,000 copies in the human genome.
Retro(pseudo)genes are a special group of retroposed sequences, which are products of reverse transcription of a spliced (mature) mRNA. Hence, their characteristic features are an absence of promoter sequence and introns, the presence of flanking direct repeats, and a 3′-end polyadenosine tract. Processed pseudogenes, also known as retropseudogenes, have been generated in vitro at a low frequency in human HeLa cells using mRNA from a reporter gene. The source of the reverse transcription machinery in humans and other vertebrates seems to be active L1 elements. However, not all retroposed messages have to end up as pseudogenes. About 20% of mammalian protein-encoding genes lack introns in their ORFs. It is conceivable that many genes lacking introns arose by retroposition. Some genes are known to be more frequently retroposed than others. For instance, in the human genome, there are over 2000 retropseudogenes of ribosomal proteins. A genome-wide study revealed that the human genome contains approximately 20,000 pseudogenes, 72% of which are likely to have arisen through retroposition. Interestingly, the vast majority (92%) of these are quite recent transpositions that occurred after the divergence of primates and rodents. Some of the retroposed genes may undergo quite complicated evolutionary paths. An example is the RNF13B retrogene, which replaced its parental gene in mammalian genomes. This retrocopy was duplicated in primates, and the evolution of this primate-specific copy was accompanied by the exaptation of two TEs, Alu and L1, and intron gain via changing a part of the coding sequence into an intron, leading to the origin of a functional, primate-specific retrogene with two splicing variants.
Class II: Mobile Elements
Class II elements move by a conservative cut-and-paste mechanism; the excision of the donor element is followed by its reinsertion elsewhere in the genome. DNA transposons are abundant in bacteria, where they are called insertion sequences, but are also present in all phyla. Two subclasses of DNA transposons have been distinguished, based on the number of DNA strands that are cut during transposition.
Classical “cut-and-paste” transposons belong to subclass I, and they are classified as the TIR order. They are characterized by terminal inverted repeats (TIRs) and encode a transposase that binds near the inverted repeats, mediating mobility. This process is not typically replicative, unless the gap caused by excision is repaired using the sister chromatid. When inserted at a new location, the transposon is flanked by small gaps, which, when filled by host enzymes, cause duplication of the sequence at the target site. The length of these TSDs is characteristic of particular transposons. Nine superfamilies belong to the TIR order, including Tc1-Mariner, Merlin, Mutator, and PiggyBac. The second-order Crypton consists of a single superfamily of the same name. Originally thought to be limited to fungi, it is now clear that they have a wide distribution, including animals and heterokonts. A heterogeneous, small, nonautonomous group of elements, MITEs, also belong to the TIR order, which in some genomes is amplified to thousands of copies, e.g., Stowaway in the rice genome, Tourist in most bamboo genomes, or Galluhop in the chicken genome.
Subclass II includes two orders of TEs that, just as those from subclass I, do not form RNA intermediates. However, unlike “classical” DNA transposons, they replicate without double-strand cleavage. Helitrons replicate using a rolling-circle mechanism, and their insertion does not result in duplication of the target site. They encode tyrosine recombinase along with some other proteins. Helitrons were first described in plants, but they are also present in other phyla, including fungi and mammals. Mavericks are large transposons that have been found in different eukaryotic lineages, excluding plants. They encode various proteins, including DNA polymerase B and an integrase. Kapitonov and Jurka suggested that their life cycle includes a single-strand excision, followed by extrachromosomal replication and reintegration to a new location.
TEs are not randomly distributed in the genome
As seen in the previous section, TEs are highly diverse and, in principle, every TE sequence in a genome can be affiliated with a (sub)family, superfamily, subclass, and class. This is summarized in Figure \(\PageIndex{3}\). However, much like the taxonomy of species, the classification of TEs is in constant flux, perpetually subject to revision due to the discovery of completely novel TE types, the introduction of new levels of granularity in the classification, and the ongoing development of methods and criteria to detect and classify TEs.

The genome can be viewed as an ecosystem inhabited by diverse communities of transposable elements (TEs), which seek to propagate and multiply through sophisticated interactions with one another and with other cellular components. These interactions encompass processes familiar to ecologists, such as parasitism, cooperation, and competition. Thus, it is perhaps not surprising that TEs are rarely, if ever, randomly distributed in the genome, as shown in Figure \(\PageIndex{4}\). TEs exhibit various levels of preference for insertion within certain features or compartments of the genome. These are often guided by opposing selective forces, a balancing act that facilitates future propagation while mitigating deleterious effects on host cell function. At the end of the site-selection spectrum, many elements have evolved mechanisms to target specific loci where their insertions are less detrimental to the host but favorable for their propagation. For instance, several retrotransposons in species as diverse as slime mold, budding, and fission yeast have evolved independently, but convergently, the capacity to target the upstream regions of genes transcribed by RNA polymerase III, where they do not appear to affect host gene expression but retain the ability to be transcribed themselves.

TEs are an extensive source of mutations and genetic polymorphisms
TEs occupy a substantial portion of the genome of a species, including a large fraction of the DNA unique to that species. In maize, where Barbara McClintock did her seminal work, an astonishing 60 to 70% of the genome is comprised of LTR retrotransposons, many of which are unique to this species or its close wild relatives. Still, the less prevalent DNA transposons are currently the most active and mutagenic (Fig. 24.2.4). Similarly, the vast majority of TE insertions in Drosophila melanogaster are absent at the orthologous site in its closest relative D. simulans (and vice versa). Most are not fixed in the population. Many TE families are still actively transposing, and the process is highly mutagenic; more than half of all known phenotypic mutants of D. melanogaster isolated in the laboratory are caused by spontaneous insertions of a wide variety of TEs. Transposition events are also common and mutagenic in laboratory mice, where the ongoing activity of several families of LTR elements is responsible for 10–15% of all inherited mutant phenotypes. This contribution of TEs to genetic diversity may be underestimated, as TEs can be more active when organisms are under stress, such as in their natural environment.
Because TE insertions rarely provide an immediate fitness advantage to their host, those that reach fixation in the population do so largely through genetic drift and are subsequently eroded by point mutations that accumulate neutrally. Over time, these mutations result in transposable elements (TEs) that can no longer encode transposition enzymes and therefore produce new integration events. For instance, our haploid genome contains approximately 500,000 L1 copies, but more than 99.9% of these L1 copies are fixed and no longer mobile due to various forms of mutations and truncations. It is estimated that each person carries a set of approximately 100 active L1 elements, and most of these are young insertions that are still segregating within the human population. Thus, like any other organism, the ‘reference’ human genome sequence does not represent a comprehensive inventory of transposable elements (TEs) in humans. Thousands of ‘non-reference’, unfixed TE insertions have been cataloged through whole-genome sequencing and other targeted approaches. On average, any two human haploid genomes differ by approximately a thousand transposable elements (TEs) insertions, primarily from the L1 or Alu families. The number of TE insertion polymorphisms in a species with much higher TE activity, such as maize, dwarfs the number in humans.
If TEs bring no immediate benefit to their host and are largely decaying neutrally once inserted, how do they persist in evolution? One key to this conundrum is the ability of TEs not only to propagate vertically but also horizontally between individuals and species. There is now a substantial body of evidence supporting the idea that horizontal transposon transfer is a common phenomenon that affects virtually every major type of transposable element (TE) and all branches of the tree of life. While the cellular mechanisms underlying horizontal transposon transfer remain murky, it is increasingly apparent that the intrinsic mobility of TEs and ecological interactions between their host species, including those with pathogens and parasites, facilitate the transmission of elements between widely diverged taxa.
Transposition represents a potent mechanism of genome expansion that, over time, is counteracted by the removal of DNA via deletion. The balance between the two processes is a major driver in the evolution of genome size in eukaryotes. Several studies have demonstrated the impact and range of this shuffling and cycling of genomic content on the evolution of plant and animal genomes. Because the insertion and removal of TEs are often imprecise, these processes can indirectly affect surrounding host sequences. Some of these events occur at high enough frequency to result in vast amounts of duplication and reshuffling of host sequences, including genes and regulatory sequences. For example, a single group of DNA transposons, known as MULEs, has been responsible for the capture and reshuffling of approximately 1,000 gene fragments in the rice genome. Such studies have led to the conclusion that the rate at which transposable elements (TEs) transpose, which is in part under host control, is a significant driver of genome evolution.
In addition to rearrangements induced as a byproduct of transposition, TEs can promote genomic structural variation long after they have lost the capacity to mobilize. In particular, recombination events can occur between the highly homologous regions dispersed by related TEs at distant genomic positions and result in large-scale deletions, duplications, and inversions (Fig. 24.2.4). TEs also provide regions of microhomology that predispose to template switching during repair of replication errors leading to another source of structural variants. These non-transposition-based mechanisms for TE-induced or TE-enabled structural variation have contributed substantially to genome evolution. These processes can also complicate the identification of actively transposing elements in population studies that infer the existence of active elements through the detection of non-reference insertions.
TEs also contribute to the specialized features of chromosomes. An intriguing example is found in Drosophila, where LINE-like retrotransposons form and maintain telomeres in place of the telomerase enzyme, which was lost during dipteran evolution. This domestication event could be viewed as a replay of what might have happened much earlier in eukaryotic evolution to solve the ‘end problem’ created by the linearization of chromosomes. Indeed, the reverse transcriptase component of telomerase is thought to have originated from an ancient lineage of retroelements. TE sequences and domesticated transposase genes also play structural roles at centromeres.
To persist in evolution, TEs must strike a delicate balance between expression and repression (Fig. 24.2.4). Expression should be sufficient to promote amplification, but not so vigorous as to lead to a fitness disadvantage for the host that would offset the benefit to the TE of increased copy numbers. This balancing act may explain why TE-encoded enzymes are naturally suboptimal for transposition and why some transposable elements (TEs) have evolved self-regulatory mechanisms that control their copy numbers. A variety of host factors are also employed to control TE expression, which includes small RNA, chromatin, and DNA modification pathways, as well as sequence-specific repressors such as the recently profiled KRAB zinc-finger proteins. However, many of these silencing mechanisms must be at least partially released to permit the developmental regulation of host gene expression programs, particularly during early embryonic development. For example, genome-wide loss of DNA methylation is necessary to reset imprinted genes in primordial germ cells. This affords TEs an opportunity, as reduced DNA methylation often promotes the expression of TEs. Robust expression of a transposable element (TE) in the germ lineage (but not necessarily in the gametes themselves) is often its own downfall. In one example of a clever trick employed by the host, TE repression is relieved in a companion cell derived from the same meiotic product as flowering plant sperm. However, this companion cell does not contribute genetic material to the next generation. Thus, although TEs transpose in a meiotic product, the events are not inherited. Instead, TE activity in the companion cell may further dampen TE activity in sperm via the import of TE-derived small RNAs.
Another important consequence of the intrinsic expression/repression balance is that the effects of TEs on a host can vary considerably among tissue types and stages of an organism’s life cycle. From the TE’s perspective, an ideal scenario is to be expressed and active in the germline, but not in the soma, where expression would gain the TE no advantage, only disadvantages. This phenomenon is indeed observed among many species, with ciliates representing an extreme example of this division—TEs are actively deleted from the somatic macronucleus but retained in the micronucleus, which serves as the germline. Another example is the P-elements in Drosophila, which are differentially spliced in the germline versus soma. Many organisms, including plants, do not differentiate germ lineage cells early in development; rather, they are specified from somatic cells shortly before meiosis commences. Thus, TEs that transpose in somatic cells in plants have the potential to be inherited, which suggests that the interests of TEs and hosts conflict in many more cells and tissues than in animals with a segregated germline.
Like other species, humans contend with a contingent of currently active transposable elements (TEs), where the intrinsic balance between expression and repression is still at play. For us, this includes L1 and other mobile elements that depend on L1-encoded proteins for retrotransposition. These elements are responsible for new germline insertions that can lead to genetic diseases. More than 120 independent TE insertions have been associated with human disease. The rate of de novo germline transposition in humans is approximately one in 21 births for Alu and one in 95 births for L1.
Historically, little attention has been given to transposition in somatic cells and its consequences, as somatic transposition may be viewed as an evolutionary dead end for the transposable element (TE) with no long-term consequences for the host species. Yet, there is abundant evidence that TEs are active in somatic cells in many organisms (Fig. 24.2.4). In humans, L1 expression and transposition have been detected in various somatic contexts, including early embryos and specific stem cells. There is also considerable interest in the expression and activity of mobile elements in the mammalian brain, where L1 transposition has been proposed to diversify neuronal cell populations. One challenge in assessing somatic activity has been the development of reliable single-cell insertion site mapping strategies.
Somatic activity has also been observed in human cancers, where tumors can acquire hundreds of new L1 insertions. As for human polymorphisms, somatic activity in human cancers is caused by small numbers of so-called ‘hot’ L1 loci. The activities of these master copies vary depending on the individual, tumor type, and timeframe in the clonal evolution of the tumor. Some of these de novo L1 insertions disrupt critical tumor suppressors and oncogenes and thus drive cancer formation, although the vast majority appear to be ‘passenger’ mutations. Host cells have evolved several mechanisms to regulate transposable elements (TEs). However, as the force of natural selection begins to diminish with age and completely drops in post-reproductive life, TEs may become more active.
TEs are best known for their mobility and their ability to transpose to new locations. While the breakage and insertion of DNA associated with transposition represent an obvious source of cell damage, this is not the only, nor perhaps even the most common, mechanism by which TEs can be harmful to their host. Reactivated transposons harm the host in multiple ways. First, derepression of transposon loci, including their own transcription, may interfere with transcription or processing of host mRNAs through a myriad of mechanisms. Genome-wide transcriptional derepression of TEs has been documented during replicative senescence of human cells and several mouse tissues, including the liver, muscle, and brain. De-repression of LTR and L1 promoters can also cause oncogene activation in cancer. Second, TE-encoded proteins, such as the endonuclease activity of L1 ORF2p, can induce DNA breaks and genomic instability. Third, accumulation of RNA transcripts and extrachromosomal DNA copies derived from TEs may trigger an innate immune response leading to autoimmune diseases and sterile inflammation (Fig. 24.2.4). Activation of interferon response is now a well-documented property of transcripts derived from endogenous retroviruses and may give immunotherapies a boost in identifying and attacking cancer cells. The relative contribution of all the above mechanisms in organismal pathologies remains to be determined.
Following transcription (and sometimes splicing) of TEs, the next step in the process involves the translation of the encoded proteins and, for retroelements, reverse transcription of the TEs into cDNA substrates suitable for transposition. Once engaged by a TE-encoded reverse transcriptase protein, the resulting cytosolic DNAs and RNA:DNA hybrids can alert inflammatory pathways. An example of this is seen in patients with Aicardi–Goutières syndrome, where the accumulation of TE-derived cytosolic DNA is due to mutations in pathways that normally block TE processing or degrade TE-derived DNA. Although not all TEs encode functional proteins, some do, including a few endogenous retroviruses capable of producing Gag, Pol, or envelope (Env) proteins. Overexpression of these Env proteins can be cytotoxic and has been linked to at least two neurodegenerative diseases, multiple sclerosis and amyotrophic lateral sclerosis. Small accessory proteins produced by the youngest human endogenous retrovirus (HERV) group, HERV-K (HML-2), may play a role in some cancers, but the evidence remains circumstantial.
Although usually detrimental, there is growing evidence that TE insertions can provide the raw material for the emergence of protein-coding genes and non-coding RNAs, which can take on important and, in some cases essential, cellular function (Fig. 24.2.4). The process of TE gene ‘domestication’ or exaptation over evolutionary time contributes to both deeply conserved functions and more recent, species-specific traits. Most often, the ancestral or a somewhat modified role of a TE-encoded gene is harnessed by the host and conserved, while the rest of the TE sequence, and hence its ability to autonomously transpose, has been lost. Spectacular examples of deeply conserved TE-derived genes are Rag1 and Rag2, which catalyze V(D)J somatic recombination in the vertebrate immune system. Both genes, and probably the DNA signals they recognize, were derived from an ancestral DNA transposon around 500 million years ago. Indeed, DNA transposases have been co-opted multiple times to form new cellular genes.
The gag and env genes of LTR retrotransposons or endogenous retroviruses (ERVs) have been domesticated numerous times to perform various functions in placental development, contribute to host defense against exogenous retroviruses, influence brain development, and play other diverse roles. One of the most intriguing examples of TE domestication is the repeated, independent capture of ERV env genes, termed syncytins, which appear to function in placentation by facilitating cell–cell fusion and the formation of syncytiotrophoblasts. Notably, one or more syncytin genes have been found in virtually every placental mammalian lineage where they have been sought, strongly suggesting that ERVs have played essential roles in the evolution and extreme phenotypic variability of the mammalian placenta. Another example of a viral-like activity re-purposed for host cell function is provided by the neuronal Arc gene, which arose from the gag gene of an LTR retrotransposon domesticated in the common ancestor of tetrapod vertebrates. Genetic and biochemical studies of murine Arc show that it is involved in memory and synaptic plasticity and has preserved most of the ancestral activities of Gag, including the packaging and intercellular trafficking of its own RNA. Remarkably, flies appear to have independently evolved a similar system of trans-synaptic RNA delivery involving a gag-like protein derived from a similar yet distinct lineage of LTR retrotransposons. Thus, the biochemical activities of TE-derived proteins have been repeatedly co-opted during evolution to foster the emergence of convergent cellular innovations in different organisms.
TEs can donate their own genes to the host, but they can also add exons and rearrange and duplicate existing host genes. In humans, intronic Alu elements are particularly prone to be captured as alternative exons through cryptic splice sites residing within their sequences. L1 and SVA (SINE/VNTR/Alu) elements also contribute to exon shuffling through transduction events of adjacent host sequences during their mobilization. The reverse transcriptase activity of retroelements is also responsible for the trans-duplication of cellular mRNAs to create ‘processed’ retrogenes in a wide range of organisms. The L1 enzymatic machinery is thought to be involved in the generation of tens of thousands of retrogene copies in mammalian genomes, many of which remain transcribed and some of which have acquired new cellular functions. This is a process that is still actively shaping our genomes; it has been estimated that one in every 6,000 humans carries a novel retrogene insertion.
TEs also make substantial contributions to the non-protein-coding functions of the cell. They are major components of thousands of long non-coding RNAs in human and mouse genomes, often transcriptionally driven by retroviral long terminal repeats (LTRs). Some of these TE-driven lncRNAs appear to play crucial roles in maintaining stem cell pluripotency and other developmental processes. Numerous studies have shown that TE sequences embedded within lncRNAs and mRNAs can directly modulate RNA stability, processing, or localization, with significant regulatory consequences. Furthermore, TE-derived microRNAs and other small RNAs processed from transposable elements (TEs) can also adopt regulatory roles, serving host cell functions. The myriad of mechanisms by which TEs contribute to coding and non-coding RNAs illustrate the multi-faceted interactions between these elements and their host.
Cis-regulatory networks coordinate the transcription of multiple genes that function in concert to orchestrate entire pathways and complex biological processes. In line with Barbara McClintock’s insightful predictions, there is now mounting evidence that TEs have been a rich source of material for the modulation of eukaryotic gene expression (Fig. 24.2.4). Indeed, TEs can disperse vast amounts of promoters and enhancers, transcription factor binding sites, insulator sequences, and repressive elements. The varying coat colors of agouti mice provide a striking example of a host gene controlling coat color whose expression can be altered by the methylation levels of a transposable element (TE) upstream of its promoter. In the oil palm, the methylation level of a transposable element (TE) that sits within a gene important for flowering ultimately controls whether the plants bear oil-rich fruit.
As TE families typically populate a genome as a multitude of related copies, it has long been postulated that they have the potential to donate the same cis-regulatory module to ‘wire’ batteries of genes dispersed throughout the genome. An increasing number of studies support this model, suggesting that TEs have provided the building blocks for the assembly and remodeling of cis-regulatory networks during evolution. These networks underlie processes as diverse as pregnancy, stem cell pluripotency, neocortex development, innate immunity in mammals, and the response to abiotic stress in maize. Indeed, TE sequences harbor all the necessary features of a ‘classical’ gene regulatory network. They are bound by diverse sets of transcription factors that integrate multiple inputs (activation/repression), respond to signals in both cis and trans, and are capable of coordinately regulating gene expression. In this context, TEs are highly suitable agents to modify biological processes by creating novel cis-regulatory circuits and fine-tuning pre-existing networks.
As potent insertional mutagens, TEs can have both positive and negative effects on host fitness. Still, it is likely that the majority of TE copies in any given species—and especially those such as humans with a small effective population size—have reached fixation through genetic drift alone and are now largely neutral to their host. When can we say that TEs have been co-opted for cellular function? The publication of the initial ENCODE paper, which asserted ‘function for 80% of the genome’, was the subject of much debate and controversy. Technically speaking, ENCODE assigned only ‘biochemical’ activity to this large fraction of the genome. Yet critics objected to the grand proclamations in the popular press (The Washington Post Headline: “Junk DNA concept debunked by new analysis of the human genome”) and to the ENCODE consortium’s failure to prevent this misinterpretation. To these critics, ignoring evolutionary definitions of function was a major misstep.
This debate can be easily extended to include TEs. TEs make up the vast majority of what is often referred to as ‘junk DNA’. Today, the term is mainly used—and abused—by the media, but it has deep roots in evolutionary biology. Regardless of the semantics, what evidence is needed to assign a TE with a function? Many TEs encode a wide range of biochemical activities that normally benefit their own propagation. For example, TEs often contain promoter or enhancer elements that hijack cellular RNA polymerases for transcription and autonomous elements encode proteins with various biochemical and enzymatic activities, all of which are necessary for the transposon to replicate. Do these activities make them functional?
The vast differences in TEs between species make standard approaches to establishing their regulatory roles particularly challenging. For example, intriguing studies on the impact of HERVs, particularly HERV-H, in stem cells and pluripotency must be interpreted using novel paradigms that do not invoke deep evolutionary conservation to imply function, as these particular ERVs are absent outside of the great apes. Evolutionary constraints can be measured at shorter time scales, including the population level, but this remains a statistically challenging task, especially for non-coding sequences. Natural loss-of-function alleles may exist in the human population, and their effect on fitness can be studied if their impact is apparent; however, these are quite rare and do not allow for systematic studies. It is possible to engineer genetic knockouts of a particular human transposable element (TE) locus to test its regulatory role; however, these are typically restricted to in vitro systems, especially when the orthologous TE does not exist in the model species. In this context, studying the impact of TEs in model species with powerful genome engineering tools and vast collections of mutants and other genetic resources, such as plants, fungi, and insects, will also continue to be extremely valuable.
Finally, a growing consensus is urging more rigor when assigning cellular function to TEs, particularly for the fitness benefit of the host. Indeed, a TE displaying biochemical activity (such as those bound by transcription factors or lying within open chromatin regions) cannot be equated to a TE that shows evidence of purifying selection at the sequence level or, when genetically altered, results in a deleterious or dysfunctional phenotype. Recent advances in editing and manipulating the genome and the epigenome en masse, yet with precision, including repetitive elements, offer the promise for a systematic assessment of the functional significance of TEs.
Summary
Transposable elements (TEs) are mobile DNA sequences capable of moving within and between genomes, a discovery that revolutionized our understanding of genome plasticity. First described by Barbara McClintock in maize and later validated by studies on bacterial insertion sequences, TEs are now known to constitute a significant fraction of genomes across all domains of life—from approximately 10% in some fish to over 80% in large plant genomes like maize.
TEs are classified into two major classes based on their transposition mechanisms. Class I retrotransposons (the "copy-and-paste" elements) mobilize via an RNA intermediate, which is reverse transcribed into DNA before integration. These include LTR retrotransposons (related to retroviruses), LINEs, SINEs, and other less common orders. In contrast, Class II DNA transposons (the "cut-and-paste" elements) move directly as DNA, typically excising from one location and reinserting at another. Additionally, novel subclasses such as Helitrons (which replicate via rolling-circle mechanisms) and Mavericks add further diversity.
TEs impact genome structure and function in multiple ways. They can disrupt gene coding regions or regulatory sequences upon insertion, causing mutations or altering gene expression. They also provide substrates for homologous recombination, leading to large-scale genomic rearrangements such as deletions, duplications, and inversions. Over evolutionary time, many TEs become inactivated by accumulating mutations, yet their remnants—often termed "junk DNA"—can be exapted to serve new functions. For example, several key cellular proteins, such as the RAG recombinases involved in immune receptor diversity and centromere protein CENP‑B, have evolved from TE sequences.
The host genome employs various mechanisms to control TE activity and minimize deleterious effects. Epigenetic modifications (like DNA methylation) and RNA interference pathways, including small RNA-mediated silencing, restrict TE expression. In some cases, TE activity is differentially regulated between the germline and somatic tissues, as somatic TE activity is generally more detrimental to cell function.
Furthermore, TEs contribute to genetic diversity not only through their mutagenic potential but also by donating sequences that form regulatory elements or new exons via exon shuffling. Horizontal transfer between species further spreads TEs, providing an evolutionary force that shapes genomes over both short and long timescales.
In summary, transposable elements are far more than genomic parasites; they are dynamic components of the genome that drive evolution, contribute to genetic variation, and even become co-opted for essential cellular functions. Understanding TEs requires an integration of biochemistry, genetics, and evolutionary biology, and underscores the complex interplay between mobile genetic elements and host genome stability.
References
1. Munoz-Lopez, M. and Garcia-Perez, J.L. (2010) DNA Transposons: Nature and Applications in Genomics. Curr Genomics 11(2):115-128. Available at: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2874221/
2. Makalowski, W., Gotea, V., Pande, A. and Makalowska, I. (2019) Transposable Elements: Classification, Identification, and Their Use As a Tool For Comparative Genomics. In: Anisimova M. (eds) Evolutionary Genomics. Methods in Molecular Biology, vol 1910. Humana, New York, NY. https://doi.org/10.1007/978-1-4939-9074-0_6
3. Bourque, G., Burns, K.H., Gehring, M., Borbunova, V., Seluanov, A., Hammell, M., Imbeault, M., Izvak, Z., Levin, H.L., Macfarlan, T.S., Mager, D.L., Feschotte, C. (2018) Ten things you should know about transposable elements. Genome Biology 19: 199. Available at: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1577-z#Fig1