5.2: Genomics
- Page ID
- 135674
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)With the publishing in 1953 of James Watson's and Francis Crick's seminal paper on the structure of DNA, the field of genomics was born. Having described the fundamental structure of DNA, researchers began to study how genetic information could be stored, transmitted, and expressed. However, it wasn’t until the Human Genome Project (HGP), initiated in October 1990, that genomics became a distinct scientific discipline. The HGP aimed to sequence the entire human genome and identify all of its genes. It was a highly collaborative effort that brought together computer scientists, mathematicians, and biologists from 20 laboratories located all around the world. While the project was met with some skepticism at first, it's success is now well known. When it was completed in 2003, the HGP sequenced more than 3 billion base pairs or more than 90% of the human genome. A DNA sample from a single individual with blended ancestry was used to elucidate approximately 70% of the genome, with another 19 individuals being used for the remaining regions. Without a doubt, the HGP represents one of the most remarkable, collaborative efforts to have been undertaken by the global scientific community. Even today, the HGP continues to be at the center of remarkable genomic discoveries and it's impact will be measured in future treatments and therapies.
- Read this letter opposing the HGP
- Read the HGP fact sheet to learn more about how the project was conducted
Introduction
Genomics is the study of an organism’s entire genetic material (i.e., the genome). First coined in 1920 by the German botanist Hans Winkler as a combination of the works gene and chromosome, genomics involves mapping, sequencing, and analyzing the structure and function of genes, but can also include studying their evolution. The term genome refers to the total genetic information of an organism, including all of its coding and non-coding regions. Unlike genetics, which typically focuses on individual genes, genomics is concerned with all of an organism's genes and their interrelationships that influence the growth, development, and functioning of organisms.
The human genome contains over three billion base pairs and approximately 25,000 genes that encode the proteins needed to carry out the cellular processes of the human organism. In addition to these protein-coding genes, the human genome also contains genes whose RNA transcripts are not translated but carry out their functions as RNA molecules. Over 99% of the human genome is composed of "spaces", regions of DNA found within and between genes whose functions are regulatory in nature or not fully understood. Almost half of these regions are known as "jumping genes" or their remnants. These jumping genes, also called transposons, move about the human genome in order to introduce genetic variation to a region.
With advancements in sequencing technologies and bioinformatics, scientists can now analyze vast amounts of genetic data, making genomics an important field. Genomics insights have led to remarkable developments in fields such as medicine, agriculture, evolutionary biology, and biotechnology. This page provides an overview of key concepts in genomics, tracing its history, principles, and applications.
By using genomics, researchers can begin to understand how an organism's genes determines its function. At the end of this section, you will be able to:
- Explain what a genome is
- List the types of genomics used today
- List some technique used in structural genomics
- Explain how structural genomes are created through mapping
- Define and explain recombination and recombination frequency
Types of Genomics
- Structural genomics: studies DNA sequences and the content of genomes
- Functional genomics: studies the organization and function of genes with a genome
- Comparative genomics: compares the content and organization of genomes across species
- Epigenomics: studies the epigenetic modifications to the genome
- Metagenomics: studies the structure and function of microbial communities by analyzing DNA extracted from environmental samples
- Pharmacogenomics: studies how genes affect the response to a medication
Structural Genomics, Genetic Maps, and Physical Maps
Structural Genomics is the branch of genomics that focuses on the physical structure of genomes. It includes determining the sequence of a gene, mapping this gene to its location on a chromosome and studying the three-dimensional structure of the protein encoded by that gene. For more information about chromosome structure, go to Chapter 3.1 DNA Structure. By creating a detailed blueprint of a genome's architecture, researchers can begin to understand the relationship between genes and proteins and how this relationship contributes to an organism's structure and function.
Structural genomics integrates several techniques, such as:
- DNA sequencing
- Gene mapping
- X-ray crystallography
- Nuclear magnetic resonance (NMR) spectroscopy
One of the key applications of structural genomics is in predicting the structure and function of a protein based on its DNA sequence. By determining the structure of proteins encoded by a genome, scientists can better understand their biological functions, especially for proteins with no known role. Structural genomics will often rely on comparative genomics, because the structure and function of a protein from one organism can often provide insights into a poorly understood protein from another. In addition to its application in basic biological research, structural genomics can also play a critical role in drug discovery and design. Because protein structure is closely linked with function, structural genomics has the potential to identify novel protein regions that can be potential targets for drug discovery. Through their understanding of the structure of proteins that play critical roles in disease pathways, researchers can design therapeutic molecules, such as drugs, that specifically target these proteins, leading to the development of new disease therapies.
Genetic Mapping
One of the first steps in structural genomics is to prepare a map of the genome. Genetic maps are a rough approximation of the location of genes or other molecular markers on a chromosome. Genetic maps are based on genetic recombination, a phenomenon that "swaps" the gene alleles in one segment of a chromosome (e.g. alleles A, B, and C) with the gene alleles found the same segment of the homologous chromosome (e.g. alleles a, b, and c). This recombination is the result of "crossing-over" that occurs during Meiosis I (Figure \(\PageIndex{1}\)). For more information on meiosis, check out this OER chapter on Meiosis. As a result of recombination, two of the four gametes will contain a "parental" chromosome containing the original gene alleles and the remaining two gametes will contain a "recombinant" chromosome where the alleles have been combined into new combinations. Genetic recombination is responsible for genetic diversity.
To make a genetic map using recombination, specific genetic crosses are performed and the number of offspring that show the parental phenotypes are recorded along with the number of offspring that show new phenotypic combinations of their parents. These unique offspring are called recombinants. From this data, recombination frequency can be calculated.
Recombination Frequency (RF) = (# of recombinant progeny/total # of progeny) x 100%
Video: Recombination
A recombination frequency of 1% would mean that out of 100 progeny, 1 of them would be a new combination of the parental phenotypes. The remaining 99 of them would have the phenotype of either one of their parents. If the recombination frequency between two genes is 50% or more, the two genes are located on two different chromosomes or are located far apart on the same chromosome. If the recombination frequency is less than 50%, the two genes are located on the same chromosome and are said to be "linked". For linked genes, the recombination frequency is a representation of the physical distance of the two genes on the chromosome. This is why genetics maps are often called linkage maps. Distances on genetic maps are measured in centimorgans (cM) or map units where 1% of RF is equal to 1 cM or 1 map unit. The lower the recombination frequency, the closer the two genes will be on the chromosome with regard to one other and the less likely these two genes will be separated to different homologous chromosomes by crossing-over. With enough genetic crosses and calculations of RF, scientists can determine the order of genes on a chromosome and their relative positions with respect to one another. Figure \(\PageIndex{2}\) is an example of such a map, with colored bands showing the locations of multiple genes on chromosome 11 with respect to one another. The further two bands (i.e. genes) are away from one another, the greater the RF will be.
Physical Maps
A significant problem with genetic maps is that they do not always accurately correspond to the actual physical distance between genes (i.e, base pairs). This is because genetic maps are based on rates of recombination, which can vary from one region of the chromosome to another. A physical map is based on direct analysis of DNA and gives the number of base pairs found between genes or molecular markers. A number of techniques exist for creating physical maps, such as restriction mapping, which determines the position of restriction sites within the DNA; sequence-tagged site (STS) mapping which locates positions of short unique sequences of DNA on a chromosome; fluorescent in situ hybridization (FISH) which physically locates markers on DNA using fluorescent probes; and DNA sequencing. The most detailed physical maps are produced through DNA sequencing. For more about DNA sequencing, go to Chapter 5.3 Genome Sequencing.
Genomics is an interdisciplinary field of molecular biology that focuses on the structure, function, mapping, and evolution of genomes. Some important concepts to remember are:
- an organism's genome is it's complete set of DNA and includes all of its genes
- genomics characterizes and quantifies all of an organism's genes and studies the relationships between these genes
- there are numerous types of genomics, including structural, functional, and comparative
- structural genomics produces chromosomal maps giving gene location
- functional genomics studies the relationship between gene structure and its function
- comparative genomics compares the genomes of organisms in the hopes of elucidating gene function
- the study of genomes involves several molecular techniques including high-throughput DNA sequencing and bio-informatics so that the genome can be assembled and its functions analyzed
- genetic maps, or linkage maps, use recombination frequency to determine the order and relative position of genes on a chromosome
- recombination frequency calculates the fraction of total progeny that show new combinations of parental phenotypes
- physical maps use techniques like DNA sequencing to determine the physical distance (in base pairs) between genes on a chromosome
- genomic analysis has triggered revolutions in research and has led to advancements in several fields, including pharmaceuticals.
Glossary
Allele - a form of a gene found at a specific location on a chromosome
Centimorgans (cM) - the distance between two genes on a chromosome; also known as a map unit; once cM unit is equivalent to a 1% recombination frequency
Chromatid - one of two identical "arms" of a replicated chromosome
Chromosome - the most condensed and organized form of DNA found in a cell; composed of DNA and proteins; carries genetic information in the form of genes
Comparative genomics - compares the content and organization of genomes across species
Crossing-over - a cellular process that occurs during meiosis when homologous chromosomes exchange genetic material; also called recombination
Gene - a specific sequence of DNA that codes for a polypeptide or an RNA molecule
Genetic maps - a rough approximation of the location of genes or other molecular markers on a chromosome; calculated using recombination frequency
Genetic recombination - a process by which DNA sequences are rearranged, resulting in new combinations of traits in offspring; also called recombination
Genome - the total genetic information of an organism
Genomics - the study of an organism’s entire genetic material
Homologous chromosome - pairs of chromosomes that contain the same genes but may have different alleles
Linkage - the tendency of genes located close to one another on a chromosome to be inherited together
Linkage map - a map showing the order and relative distances of genes on a chromosome
Map unit - the distance between two genes on a chromosome; also known as a centimorgan; once map unit is equivalent to a 1% recombination frequency
Physical map - the location of genes on a chromosome with the actual distance between genes in base pairs
Recombination - a cellular process that occurs during meiosis when homologous chromosomes exchange genetic material; also known as crossing-over
Recombination frequency - a measure of how often a crossover occurs between two genes during meiosis; used to create linkage maps
Recombinants - the offspring produced through recombination
Structural genomics - a branch of genomics that focuses on the physical structure of genomes

