Skip to main content
Biology LibreTexts

2.4: Likelihood and paternity calculations

  • Page ID
    142291
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Introduction & Likelihood

    While we will see many applications for genetic markers over the course of the semester, an immediate use is for computing the likelihood that putative parent (the father, usually) is actually the parent of an offspring. Statistically speaking, the likelihood of a model \( \theta \) given some observed data \( x \) is the probability of the data \( x \) under the model \( \theta \). We write the relationship like this:

    \[ \mathcal{L}( \theta | x ) = P_{\theta}(x) \]

    The way likehood is usually used is that we think of the outcome as fixed -- we observed some data and that data is \( x \) -- and the model or assumptions \( \theta \) is variable because the question we want to answer is what model or assumptions give us the maximum likelihood of observing the data we did observe? The question of finding a model that has the maximum likelihood -- can best explain the actual observed data -- is the basis of the field of Baysian statistics, which we will see a bit more of later.

    What's the connection to paternity?

    Here, our question is simpler, because there are only two models we're interested in -- under one, a proposed male is the father, and under the other, a different male was. We often represent this as a likelihood ratio, which is the ratio of the likelihoods of two different models given the same data. It's written thus:

    \[ \Lambda( \theta_1 : \theta_2 | x ) = \frac{ \mathcal{L} ( \theta_1 | x )}{ \mathcal{L} ( \theta_2 | x ) } \]

    You can think of this as the "strength" of the evidence \( x \) supporting model \( \theta_1 \) compared to model \( \theta_2 \)

    When the question is one of paternity, there are really only two models we're interested in -- under the first, a male (for whom we have genotype data) was the father of an offspring. Under the second model, some other member of the population was the father. Then, the ratio of these two likelihoods is the strength of the genetic evidence that the male was in fact the father. This likelihood ratio is called the paternity index or PI. It is important to note that the paternity index is computed for each locus individually -- when we are looking at multiple loci, we combine the paternity indices into a combined paternity index or CPI.

    How do we compute the paternity index?

    Some folks teach the computation of paternity indices using a table -- that approach is explained at this website. In a genetics course, however, I would rather you learned to think logically about this -- that's the approach I'll present below.

    To compute the paternity index at a locus, we need four pieces of information:

    • The child's genotype (which alleles they have)
    • The mother's genotype
    • The putative father's genotype
    • The frequency of those alleles in the broader population.

    To compute a paternity index, we need to compute two probabilities:

    1. The probability of seeing the child's genotype given the mother's and putative father's genotypes
    2. The probability of seeing the child's genotype assuming that the father was some other person who we do not know the genotype for.

    Let's take these computations each in turn.

    Computing the probability of the child's genotype assuming the putative father is the real father

    Let's use the following microsatellite genotypes as an example. Remember, the alleles given in a microsatellite genotype are the number of times the microsatellite motif was repeated.

    Child Mother Potential Father
    15,17 14,17 15,16

    What is the probability of the observed data -- that the child has genotype 15,17 -- if we assume that the putative father is the actual father? Note that the 17 allele must have come from the mother, and so the 15 allele must have come from the father (whether or not the potential father is the actual father). Assuming the potential father is the actual father, the probability of receiving the 17 allele from the mother is 0.5 and the probability of receiving the 15 allele from the father is 0.5. These two events are independent, so the probability that the child received the 15 allele from the mother AND the 17 allele from the (potential) father is 0.25.

    Let's consider a slightly more complicated scenario:

    Child Mother Potential father
    15, 17 15, 17 15, 17

    Again, we ask: what is the probability of the observed data if we assume the putative father is the actual father? Here, there are two mutually exclusive possibilities:

    • First, it is possible that the 15 allele came from the mother -- an event with a probability of 0.5 -- and the 17 allele came from the father -- an event with a probability of 0.5. Thus, the probability that the 15 allele came from the mother AND the 17 allele came from the father is \( 0.5 * 0.5 = 0.25 \)
    • Second, the opposite is also possible. Ie, it is possible that the 17 allele came from the mother -- probability is 0.5 -- and the 15 allele came from the father -- probability is 0.5, Thus, the probability that the 17 allele came from the mother and the 15 allele came from the father is \( 0.5 * 0.5 = 0.25 \)
    • The two possibilities above are mutually exclusive. Thus, the probability that (15 came from mother and 17 came from father) OR (17 came from mother and 15 came from father) is \( 0.25 + 0.25 = 0.5 \)

    Remember, we are operating under traditional Mendelian rules here -- either parent could contribute either allele. Sometimes there is only one parent the child could have received an allele from (and, because the child has the genotype we observed, the probability of that outcome is 1.) And sometimes there is ambiguity -- but these outcomes are always mutually exclusive.

    Computing the probability of the child's genotype assuming the real father is a random member of the population

    If the first model whose likelihood we compute is the model that the proposed father is the actual father, then the second model whose likelihood we need assumes that the actual father is a random member of the population. And to compute that, we need the allele frequency for the alleles we're interested in. (This is the probability that if we draw a random allele from the population, we will see the one we're interested in.) Assume the following frequencies for our examples above.

    Allele Frequency
    15 0.23
    17 0.15

    Now let's consider our two examples from above:

     

    Child Mother Potential Father
    15,17 14,17 15,16

    Given these data, the child must have received the 15 allele from the (random) father and the 17 allele from the mother. What is the probability of this? First, the probability that the child received the 17 allele from the mother is 0.5. However, the probability that the child received the 15 allele from a male randomly drawn from thne population is the same as the allele frequency in the population -- 0.23. Because these two events are independent, the probability that the mother transmitted 17 and this random male transmitted 15 is \( 0.23 * 0.5 = 0.115 \)

     

    Child Mother Potential father
    15, 17 15, 17 15, 17

    Here, we don't know which allele the child received from the mother and from the random person -- so let's take each in turn.

    • If the child received the 15 allele from the mother, then they must have received 17 from the other person. The probability of this occurring is the allele frequency of 17 -- 0.15 -- times 0.5, following the logic above. Thus, the probability they received the 15 allele from the mother and the 17 allele from another person is \( 0.15 * 0.5 = 0.075 \)
    • If the child received the 17 allele from the mother, they must have received the 15 allele from the other person. The probability of this occurring is the allele frequency of 15 -- 0.23 -- times 0.5, following the logic above. Thus, the probability they received the 17 allele from the mother and the 15 allele from another person is \( 0.23 * 0.5 = 0.115 \)
    • These events are mutually exclusive. So, the probability of getting the 15 or the 17 allele from another person is \( 0.075 + 0.115 = 0.19 \)

     

    Compute the paternity index

    Remember, the paternity index is a likelihood ratio. Here, it's the likelihood that the potential father is the actual father (given the observed genotypes) divided by the likelihood that the potential father is a random person from the population.

    For the first example, the paternity index is \( 0.5 / 0.115 = 4.34 \)

    For the second example, the paternity index is \( 0.5 / 0.19 = 2.63 \)

     

    What if there are multiple loci?

    As we can see above, a single locus does not provide much certainty about parentage -- in the second case, we are only a little more than twice as certain that the putative father is the actual father. As a result, we test a number of loci to increase our evidence for or against the putative father being the actual father. We can combine paternity indices into a combined paternity index or CPI quite easily -- the CPI is the product of the paternity indices for the loci tested.

    Note

    This is because the likelihood ratio is a ratio of probabilities, and the molecular markers are chosen so that they are on different chromosomes and thus the events are independent -- so the probability of seeing all of the events is the product of seeing each of them.


    2.4: Likelihood and paternity calculations is shared under a not declared license and was authored, remixed, and/or curated by LibreTexts.

    • Was this article helpful?