3.2: DNA Sequence Analysis

Last updated
Save as PDF

Page ID: 18134

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\dsum}{\displaystyle\sum\limits} \)

\( \newcommand{\dint}{\displaystyle\int\limits} \)

\( \newcommand{\dlim}{\displaystyle\lim\limits} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\(\newcommand{\longvect}{\overrightarrow}\)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

DNA sequencing is most often accomplished using a procedure referred to by one of the following names:

Sanger sequencing
Di-deoxy sequencing
Chain termination sequencing

Each of these refers to the same method:
- the use of di-deoxy base incorporation in a polymerization reaction
- leads to termination of primer extension (a method pioneered by Fred Sanger, now retired and happily puttering about in his garden).

The basic method involves:

annealing a primer 5' to a region of DNA we would like to sequence.
The primer is extended in the traditional manner (i.e. with DNA polymerase and the four dNTP's).
However, a small concentration of di-deoxy bases are included in the reaction mix.
- Usually this is accomplished by having four separate reactions, into which one of the ddNTP's is added.
- Thus, one tube would contain primer, template, DNA polymerase, the four dNTP's and ddATP, another tube would have the same thing but with ddTTP instead of ddATP, and so on for ddCTP and ddGTP.
During the reaction, the normal dNTP's are incorporated into the growing chain.
- However, occasionally the DNA pol will incorporate a ddbase into the growing primer.
- When this happens, the primer cannot be extended any further (because the 3' dd base does not have an available 3' hydroxyl group).
- The resulting DNA fragment begins at the 5' end of the sequencing primer and ends at the site of dd base incorporation.

Thus, in the reaction mixture containing the dd base ddATP, there will result an ensemble of fragements of varying lengths, each ending in with the ddA base (i.e. at all positions in the template where there was a comlementary 'T' base).

The mixture containing ddCTP will have a different mix of fragements - they will contain ddC at the 3' ends (at positions in the template where 'G' bases were located).

Screenshot (282).png

Figure 3.2.1: DNA fragments

If the fragments from the 'A' reaction mix are run on a urea/acrylamide gel (typically 6%) the fragments will separate according to size.
Likewise for the 'C', 'G' and 'T' reaction mix fragments.
- If the four different reaction mixtures are run next to each other the fragment sizes can be directly compared to one another.
- Note that the shortest expected fragment is the primer itself and the longer the fragment, the further from the primer the extension reaction went before termination.
Consider a case where the template has a stretch of six 'A' bases in a row.
- In the 'T' reaction mix we will subsequently get six fragments; each ending in 'T' and differing by one base length.
- None of the other reaction mixes will contain fragments between these lengths (they will either be longer or shorter) because none of the other reaction mixes will terminate within this region.
- Thus, if we run the four reaction mixes side by side and look at the fragment patterns we would see the following:

Screenshot (283).png

Figure 3.2.2: Example fragment patterns with 6 A's

Now consider a template that contains the sequence 3' GATC 5' (note the orientation).

When the primer is extended in the different reaction mixes it can truncate first at the G (incorporating a dd 'C'), then at the A (incorporating a dd 'T') and so on.
The fragments run on a gel would thus look like:

Screenshot (284).png

Figure 3.2.3: Example fragment patterns with GATC

Thus, following the ladder of fragments on the sequencing gel allows you to "read" the sequence of the template.
Note however, that in regard to the template when we read from the bottom of the gel to the top we are reading in the 3' to 5' direction and reading the complementary bases to the actual sequence.

Visualization of fragments

If radiolabeled dATP is spiked into the mixtures it will be incorporated like a "normal" dATP base.
- However, the resulting DNA fragment will be radiolabled.
- Thus the acrylamide gel can be exposed to x-ray film and the location of the fragments determined.
Recently, automated sequencers have made use of specific dyes which are tagged to the dideoxy bases.
- These dyes can be "read" by a laser and thus the specific terminating dd base for a particular DNA fragment can be identified.
- Thus, the fragments can read as they elute, rather than stopping the gel and exposing it.
- Furthermore, since each dd base can be uniquely identified, all four reactions (ddA, ddC, ddG and ddT chain termination) can be done in a single tube and run in a single lane on a sequencing gel
  - Automated sequencers can thus read further than with manual methods.
  - Since a single lane is used per sample (as compared to four lanes with the radiolabeled method) many more samples can be analyzed and the throughput is greater
The acrylamide gels used for sequence analysis are typically 50 cm to 100 cm long.
- In manual sequencing the four reaction mixes are loaded and the gel is run for approximately 2 hours then the samples are reloaded on another part of the gel and the gel run is continued. A third set of samples may be loaded after another 2 hours.
  - The gel is stopped after the dye front of last sample loaded has just reached the bottom of the gel.
  - Thus, the short fragments can be visualized in the last load, medium fragements in the second load and the long fragments will be visualized in the first set of reaction mixtures loaded.
  - Manual sequencing can resolve on the order of 400 bases of continuous sequence. Automated sequencers can routinely provide twice this amount of information.
- Automated sequencers use the same types of glass plates
  - The continuous running of the gel (and dye identification) means that typically 400-700 bases or more can be read
  - Automatic software will interpret the dye signals into a sequence
  - nuances of the sequencing chemistry and expert knowledge can be programmed into the sequence analysis software (e.g. the software can compensate for the "smile" of the gel)

Search

Text Color

Text Size

Margin Size

Font Type