Skip to main content
Biology LibreTexts

22.5: Computational Methods for Studying Nuclear Genome Organization

  • Page ID
    41054
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    Sources of Bias

    page376image19022448.png
    Figure 22.10: Image depicting sources of bias

    American Association for the Advancement of Science. All rights reserved. This content is excluded from our

    Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.

    Source: Lieberman-Aiden, Erez, et al. "Comprehensive Mapping of Long-range Interactions Reveals Folding

    Principles of the Human Genome." Science 326, no. 5950 (2009): 289-93.

    The three steps that could potentially introduce biases include: Digestion, Ligation, and Sequencing. Digestion efficiency is a function of the restriction enzymes used and therefore some regions of the genome could be less prone to be digested as their distribution of the particular recognition site could be really sparse. Also, some regions could be enriched in the recognition site and thereby will be over-represented in the results. One solution for this is using many different restriction enzymes and compare the results. Ligation efficiency is a function of the fragment lengths. Depending on how the restriction enzymes cut the sequence, some ends may be more or less likely to ligate together. Finally, sequencing efficiency is a function of the composition of the sequence. Some DNA strands will be more difficult to sequence, based on GC richness and presence of repeats, which will introduce bias.

    Bias Correction

    To minimize ligation bias, non-specific ligation products are removed. Since non-specific ligation products typically have far-away restriction sites, they introduce much larger fragments. In addition, the influence of fragment size on ligation efficiency(Flen(alen,blen)), the influence of G/C content on amplification and sequencing(Fgc(agc, bgc)), and the influence of sequence uniqueness on mappability(M(a) * M(b)) can all be accounted for and corrected with the equation:

    P(Xa,b)=Pprior * Flen(alen,blen)*Fgc(agc,bgc)*M(a)*M(b) Alternatively, the sources of bias can be less explicitly represented by the following equation:

    Oi,jj = Bi * Bj * Ti,j

    where the sum of all relative contact probabilities Ti,j for each bin equals 1. The biases are only assumed to be multiplicative. This is solved by matrix balancing, or proportional fitting by an iterative correction algorithm.

    3D-modeling of 3C-based data

    3D-modeling can reveal many general principles of genome organization. Current models are generated using a combination of inter-locus interactions and known spatial distances between nuclear landmarks. However, a lot of uncertainty remains in current 3D-models because the data is gathered from millions of cells. The practical problems affecting 3D-modeling are due to the large amount of data necessary to construct models and the different dynamics between an individual cell and a population, which lead to unstable models. Next generation modeling is trending towards using single cell genomics.

    page377image18949888.png
    Figure 22.11: Chromosome Territories in 3D

    This page titled 22.5: Computational Methods for Studying Nuclear Genome Organization is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by Manolis Kellis et al. (MIT OpenCourseWare) via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request.