Skip to main content
Biology LibreTexts

Section 3: Interpretation

  • Page ID
    41361
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    The fundamental challenge in interpreting the sequencing results lies in differentiating driver mutations from passenger mutations. In order to accomplish this, we need to model the background mutational processes of the analyzed sequences and identify pathways/regions with more mutations than would have been predicted solely by the background model. Those regions then become our candidate cancer genes.

    However, we run into the potential issue of selecting an incorrect background model or we can encounter systematic artifacts in mutation calling. In this case, we have to go back to the drawing board and attempt to come up with a better background model before we can proceed with candidate gene idetification.

    Many tools have been developed in an effort to accurately detect candidate cancer genes and pathways (sub-networks) including NetSig, GISTIC, and MutSig. NetSig is used to identify clusters of mutated genes in protein-protein interaction networks. GISTIC can be used to score regions according to frequency and am- plitude of copy-number events. MutSig: is used to score genes according to number and types of mutations. The main analysis steps in finding candidate cancer genes are 1) estimation of the background mutation rate (which varies across samples, 2) calculate p-values based on statistical models, and 3) correct for multiple testing hypothesis (N genes).

    As sample size and or mutation rate increases, the significant gene list for cancer genes increases and contains many fishy genes. One major breakthrough to reduce fishy genes has been the proper modeling of background mutations. Standard tools use consistent background rate (rates for CpG, C/G, A/T, indel) while ignoring heterogeneity across samples, additional sequence contexts, and the genome. But it was dis- covered that the mutation rate across cancer varies ¿1000 fold, mutation rate is lower in highly expressed genes, and the frequency of somatic mutations correlates with DNA replication time. There are more mu- tations in areas of the genome that replicate later than those which divide early. MutSigCV is a tool which corrects for this variation in background mutation rates.


    Section 3: Interpretation is shared under a not declared license and was authored, remixed, and/or curated by LibreTexts.

    • Was this article helpful?