7.13G: Metagenomics
Metagenomics is the study of genetic material derived from environmental samples.
LEARNING OBJECTIVES
Summarize the utility of metagenomics
Key Takeaways
Metagenomics
Metagenomics is the study of metagenomes; genetic material recovered directly from environmental samples. The broad field may also be referred to as environmental genomics, ecogenomics or community genomics. While traditional microbiology and microbial genome sequencing and genomics rely upon cultivated clonal cultures, early environmental gene sequencing cloned specific genes (often the 16S rRNA gene) to produce a profile of diversity in a natural sample. Such work revealed that the vast majority of microbial biodiversity had been missed by cultivation-based methods. Recent studies use “shotgun” Sanger sequencing or massively parallel pyrosequencing to get largely unbiased samples of all genes from all the members of the sampled communities. Due to its ability to reveal the previously hidden diversity of microscopic life, metagenomics offers a powerful lens for viewing the microbial world that has the potential to revolutionize understanding of the entire living world.
Conventional Sequencing Studies
Conventional sequencing begins with a culture of identical cells as a source of DNA. However, early metagenomic studies revealed that there are probably large groups of microorganisms in many environments that cannot be cultured and thus cannot be sequenced. These early studies focused on 16S ribosomal RNA sequences which are relatively short, often conserved within a species, and generally different between species. Many 16S rRNA sequences have been found which do not belong to any known cultured species, indicating that there are numerous non-isolated organisms. These surveys of ribosomal RNA (rRNA) genes taken directly from the environment revealed that cultivation based methods find less than 1% of the bacterial and archaeal species in a sample.
Shotgun Metagenomics
Advances in bioinformatics, refinements of DNA amplification, and the proliferation of computational power have greatly aided the analysis of DNA sequences recovered from environmental samples, This allows the adaptation of shotgun sequencing to metagenomic samples. The approach, used to sequence many cultured microorganisms and the human genome, randomly shears DNA, sequences many short sequences, and reconstructs them into a consensus sequence. Shotgun sequencing and screens of clone libraries reveal genes present in environmental samples. This provides information both on which organisms are present and what metabolic processes are possible in the community. This can be helpful in understanding the ecology of a community, particularly if multiple samples are compared to each other.
Shotgun metagenomics is also capable of sequencing nearly complete microbial genomes directly from the environment. As the collection of DNA from an environment is largely uncontrolled, the most abundant organisms in an environmental sample are most highly represented in the resulting sequence data. To achieve the high coverage needed to fully resolve the genomes of under-represented community members, large samples are needed. On the other hand, the random nature of shotgun sequencing ensures that many of these organisms, which would otherwise go unnoticed using traditional culturing techniques, will be represented by at least some small sequence segments.
High-Throughput Sequencing
The first metagenomic studies conducted using high-throughput sequencing used massively parallel 454 pyrosequencing. Two other technologies commonly applied to environmental sampling are the Illumina Genome Analyzer II and the Applied Biosystems SOLiD system. These techniques for sequencing DNA generate shorter fragments than Sanger sequencing; 454 pyrosequencing typically produces ~400 bp reads, Illumina and SOLiD produce 25-75 bp reads. These read lengths are significantly shorter than the typical Sanger sequencing read length of ~750 bp.
However, this limitation is compensated for by the much larger number of sequence reads. Pyrosequenced metagenomes generate 200–500 megabases, while Illumina platforms generate around 20–50 gigabases. An additional advantage to short read sequencing is that this technique does not require cloning the DNA before sequencing, removing one of the main biases in environmental sampling. As most short-read assembly software was not designed for metagenomic applications, specialized methods have been developed to utilize mate-read data in metagenomic assembly. From these studies the microbial fauna that might reside in a sample of soil, even on the surface of a keyboard, can be more accurately and efficiently identified.
Key Points
- While previous work needed cultivation of single microbes before they could be sequenced and identified, metagenomics attempts to more completely identify many of the microbes that inhabit a given environmental location.
- The first attempts at metagenomics was to sequence one gene from a sample. The changes in that one gene helped determine the microbial diversity in a sample.
- High-throughput sequencing allows the complete sequencing and assembly of entire genomes of the microbes that inhabit a given environment, giving unprecedented depth into understanding the microbial diversity of the world around us.
Key Terms
- solid : SOLiD (Sequencing by Oligonucleotide Ligation and Detection) is a next-generation DNA sequencing technology developed by Life Technologies and has been commercially available since 2008. This next generation technology generates hundreds of millions to billions of small sequence reads at one time.
- gigabase : One billion bases (nucleotides) as a unit of length of a nucleic acid
- pyrosequencing : A technique used to sequence DNA using chemiluminescent enzymatic reactions