Skip to main content
Biology LibreTexts

1.5: Genomes and variation

  • Page ID
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    Learning Objectives

    • Distinguish between genes, chromosomes, and genomes.
    • Describe the range in the size of genomes and numbers of genes between species.
    • Distinguish between SNPs and STRs as variations in DNA sequence.
    • List applications of the field of genomics.

    What is a genome?

    The complete set of an organism's genetic information is the genome.

    Variation exists when sequences of bases differ between chromosomes. Some examples of types of variation include:

    • differences between the base sequence of homologous chromosomes in an individual
    • differences between the base sequence of chromosomes in two individuals
    • differences in base sequences between two different species
    • differences in numbers of chromosomes between species
    • differences in arrangement of similar sequences between species (synteny)
    • differences in base sequence individuals with and without a disease or trait

    ....and more

    The study of sequences at the whole genome level is known as genomics.


    Why study genomes?

    Understanding genomes has a variety of applications including:

    • comparing species to understand gene function and evolution
    • understanding health differences between individuals
    • performing research into gene function using model organisms
    • identifying individuals or species


    Genome sequencing projects

    The first organismal genome to be sequenced was the prokaryote H. influenzae in 1995 (Fleischmann et al, 1995).

    Determining the sequence of the entire human genome took more than two decades from its initial official start in 1990 to the draft sequence published in 2001 and the refinement of that draft sequence over the next several years (review key dates in the timeline here: Sequences produced by genome projects are referred to as reference genomes.

    Competing genome projects?

    The results of human genome sequencing efforts were published at the same time in Nature (results of the International Human Genome Sequencing Consortium) and Science (by Craig Venter and colleagues with Celera Genomics) in February 2001. These two groups had taken somewhat different approaches to sequencing efforts.

    However, there were still some "gaps" in the sequence. For example, regions that are highly repetitive or similar to other chromosomes are difficult to sequence precisely. The next-generation sequencing techniques described previously have facilitated obtaining complete chromosome sequences for some of these missing regions. Recently, the complete sequence of an X chromosome from "telomere-to-telomere" was published (Miga et al., 2020), followed by a complete assembly of chromosome 8 (Logsdon et al., 2021).

    Exercise \(\PageIndex{1}\)

    Whose genome was the first sequenced?

    For the Celera project, refer to the Sources of DNA and Sequencing Methods section of this paper

    For the International consortium project, the sample collection is reported in this paper


    Celera enrolled 21 donors initially, but only 5 of these were used for the complete sequencing. What factors determined which samples were selected?

    For the International consortium, volunteer donors were solicited from areas near research labs. After sample collection and processing all identifying information was removed. Many more samples were collected than were actually used. So truly, we don't know whose genome is included!


    How big are genomes? And does it matter?

    A nice summary and visuals showing comparisons of genome sizes is found here\ (Milo and Philips). Bacterial genomes are the smallest in numbers of base pairs, followed by fungi and algae. Beyond these organisms, genome size does not necessarily increase with complexity. Plants have some of the largest genomes (often because they have more than two sets of chromosomes). However, regardless of the total genome size, most organisms does not have much more than 25,000 genes, so total size does not necessarily increase gene number. 

    Genome size in b p on X axis from one million to one hundred billion


    Figure \(\PageIndex{1}\): Genome sizes (in base pairs) vary by organism. (CC:BY-SA Abizar via Wikipedia Commons)


    Variation: SNPs and STRs

    Genome sequencing projects reveal the reference genome that is largely shared between species or individuals, but they also reveal variation. Although other types of variation exist, two major classes include single nucleotide polymorphisms (SNPs) and short tandem repeats (STRs). SNPs are positions along chromosomes that can vary in the nucleotide at that position. STRs are repeats of a few nucleotides that can vary in the number of repeats.



    Fleischmann RD, Adams MD, White O, et al. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science. 1995;269(5223):496-512. doi:10.1126/science.7542800

    International Human Genome Sequencing Consortium., Whitehead Institute for Biomedical Research, Center for Genome Research:., Lander, E. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).

    Logsdon, G.A., Vollger, M.R., Hsieh, P. et al. The structure, function and evolution of a complete human chromosome 8. Nature 593, 101–107 (2021).

    Miga, K.H., Koren, S., Rhie, A. et al. Telomere-to-telomere assembly of a complete human X chromosome. Nature 585, 79–84 (2020).

    Milo, Ron and Philips, Rob. Cell Biology by the Numbers. Garland Science, 2015.

    This page titled 1.5: Genomes and variation is shared under a not declared license and was authored, remixed, and/or curated by Stefanie West Leacock.

    • Was this article helpful?