Skip to main content
Biology LibreTexts

Information Integrity - Data and Analysis

  • Page ID
    130586
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Global Challenges

    Rotating_earth_animated_transparent2x2.gif

    Information Integrity

    Data and Modeling Errors

    Literature-Based Guided Assessment (LGA)

    StackOfJournals.svg

     

    Introduction

    "To Err is human", so the question for scientists is how to reduce error, not eliminate it. Even the best make errors.  One of the greatest scientific errors was by Linus Pauling, a two-time Nobel Laureate, who proposed in a paper in the Proceedings of the National Academy of Science that DNA is triple-stranded.  Weeks later, Watson and Crick correctly propose the classic double-stranded structure that has become so iconic.  The Pauling paper disregarded Chargarff's rule that the number of pyrimidine bases is equal to the number of purine bases, which is necessary to maintain the constant diameter of the Watson-Crick double-stranded helix and gives the correct base pairing of A-G and C-T (i.e. one purine with one pyrimidine).

    There are many types of errors and their causes (recently reviewed by Brown et al) that result in "bad" data. These include errors from poor data collection methods, sampling, and study design, poor data management,  inappropriate statistical analyses, poor logic, and poor communication. Factors that contribute to error include ignorance, poor study design at the inception, need to publish, excitement about initial results, lack of resources to perform and analyze results well, conflict of interest, and conflicting priorities.

    Let's face it. Most biochemistry students and researchers are not statisticians or data scientists, so unless they collaborate with others more trained in those fields, they may likely commit errors in statistical analyses.  Here are some of the most common errors in statistical analyses ( Creative Commons Attribution License):

    • lack of appropriate and adequate controls (both positive and negative).  In clinical biochemistry, every analysis would be problematic unless both types of controls are run. 
    • spurious correlations, which can be caused by an outlier for one variable as illustrated in Figure \(\PageIndex{1}\) below.

    10commonstatmistakesFig2.svg

    Figure \(\PageIndex{1}\):  Spurious correlations: the effect of a single outlier on Pearson’s correlation coefficients.  Tamar R Makin, Jean-Jacques Orban de Xivry (2019) eLife 8:e48175.  https://doi.org/10.7554/eLife.48175.   Creative Commons Attribution License
        

    In panels A–C,  two different uncorrelated variables with 19 samples (black unfilled circles) were simulated, and an additional data point (solid red circle) was added whose distance from the main population was systematically varied until it became a formal outlier (panel C). Note that the value of Pearson’s correlation coefficient R (the same one you have commonly used) goes from negative to positive and in the direction of a better fit as the distance between the main population and the red data point is increased, demonstrating that a single data point can lead to spurious Pearson’s correlations. The slope of the best-fit linear regression line (dotted) changes with the addition of the red point.  The calculated slope of the line in graph C is the most positive and shows the highest value for the correlation coefficient when the outlier is included.

    • small sample size. You can't detect small effects if the sample size is small, only large ones so the uncertainty is high.  Also, data that show effects might be underrepresented if the sample size is low.  
    • too much "flexibility of analysis and p-hacking.  If you change your outcome parameters or exclude subjects or outliers, you face statistical troubles. The result might lead to more significant p-values.  Continuing to analyze the data to obtain a significant result that supports your idea is called p-hacking. Changing your test to get the desired results is not statistically sound. 
    • over interpreting nonstatistical results:  Everyone has learned that if the p-value is <0.5, the null hypothesis (no relationship between the two samples) is wrong. That number is somewhat arbitrary and is retained based on long-standing historical use. It could be that the method used to acquire that data was insensitive (from low sample size, poor experimental design, etc).
    • conflating correlation with causation:  Just because two variables are correlated, doesn't mean that one is the cause of the change in the other.  They may simply covary or both might be "related and correlated to"  a hidden third variable.  Here's one that's been on the web:  All people who confuse correlation with causation die!

    The next several exercises are based on data generated to illustrate key points in analyzing data correctly.

    Exercise \(\PageIndex{1}\)

    You have set up an assay to acquire some experimental data.  You run 4 experiments and make 5 replicate measurements in each experiment for the experimental variable.  These are shown in the table. below.

    Exp #1 Exp #2 Exp #3 Exp #4
    90 120 120 64
    94 98 92 106
    98 104 105 104
    102 101 94 112
    106 94 80 110

    An Excel plot showing the measurements is shown in the graph below.  Calculate the mean for each experiment and comment on the quality of the data.  Are there any apparent outliers discerned from the graph?  If so, recalculate the mean without the outlier.

    GaussianCurve-Avg.svg

    Answer

    The means of each set of experiments is shown in red in the table below and as red dashed in the graph below.

     

    Exp 1 Exp #2 Exp #3 Exp #4
    90 120 120 65
    94 98 92 106
    98 104 105 104
    102 101 94 112
    106 94 80 110
    98 103.4 98.2 99.4

     

    GaussianCurve.svg

    The value of 65 in Experiment #4 might appear to be an outlier.  It certainly looks like that on the plot.  In actuality, all the points, even the "apparent" outlier, are taken from a perfect bell-shaped (Gaussian) curve, with a mean of 100 and a standard deviation of 20. The apparent outlier, with a value of 65 is 35 units away from the mean value, or 35/20 = -1.75 standard deviations away from the mean.  This value is known as the Z-score. You probably remember the following values:

    Probability data point within +/- 1 SD  within +/- 2 SD  within +/- 3 SD
    68% 95% 99.7%

    A z-score can be calculated for the value x = 64.  

    \begin{equation}
    z=\frac{x-\mu}{\sigma}
    \end{equation}

    where μ is the mean and σ the standard deviation.  The z-score for 65 is (64-100)/20 = -1.75.  Using this link to calculate the probability of x<65, x>65 and 65 < x <100 (the mean) gives these results.

    ZscoreProbCalc.png

    From this calculated Z-score it looks like the value is an outlier as the probability of a value less than 65 is 0.0401.  However, if you threw out the value of 65 in Experiment #4, the mean of the remaining value is 108, farther away from the actual mean of 100.  One has to be very careful in throwing out a putative outlier. Assuming that there were no errors in data acquisition or entry, the value of 65 probably derived from chance and should be retained.

    This Z value score is the output of the Grubb's test for outliers would suggest that it is an outlier by the general formula (value-mean)/SD (i.e the Z value). Again, if you input just the Experiment 4 data, the data value of 65 has a Grubb Z = 1.7 and is an outlier (P < 0.5).  If you plug in all the values from Experiments 1-4, the value 65 has a Z value of 2.73 with P<0.5 and is a calculated outlier.  However, if you put in all the values used to calculate the normal distribution (1-200), none of them are considered outliers!

     

    Fitting Models

    You have some experimental data and wish to fit it to a model. You should actually fit the data to multiple models and pick the one with the best statistical fit. Let's consider the simplest chemical reaction, the irreversible conversion of a reactant A to a product P with time. Assume that there is no catalyst (such as an enzyme) involved. Two simple models reflecting different mechanisms come to mind, which can be expressed by these simple chemical equations:

    • 1st order reaction:  A → P  (an example would be the spontaneous decarboxylation of a β-keto acid (with two products formed) or the radioactive decay of A to P
    • 2nd order reaction:  A + A → P  (which requires for the reaction as written the collision of two As to produce product P

    In both cases, A decreases with time.  Let's look at some data and try to see if it fits a 1st or 2nd order reaction. Whichever produces the better fits is the likely the better model.  

    Exercise \(\PageIndex{1}\)

    The graphs below show the progress curves ([A] vs t) for a reactant Ax (Set 1, blue) and reactant Ay (Set 2, red).  (Note:  the graphs show perfect data!  The lines that are drawn simply connect the data points.)  They are similar yet different.  One is 1st order and the other 2nd order. 

    a.  How would you determine if the decay of reactant A is 1st or 2nd-order for each reaction? A 1st order reaction gives a simple exponential decrease in A vs time.  At first glance do they both show an exponential decay?

    b.  Write the initial velocity equations (learned from introductory chemistry) for a 1st and 2nd order reaction.

    c.  Write the differential equations for a 1st and 2nd order reaction.

     

    AtoP_TwoModels_Initial Data.svg

     

    Answer

    a.  You would have to fit each set of data to the equations for both a first-order AND a second-order reaction and compare the statistics of the fit.  In the left graph, about 4/20 or 20% of the reactant remains at 1 hour, while about 5/20 or 25% of A remains at 1 hour.  It would seem like more data would be required to determine which is a simple exponential decay (1st order) and which is second order.

    b.  1st order:  v0 = k1[A];  2nd order:  v0 = k2[A][A] = k2[A]2.

    c.  Differential equations:  1st order:  d[A]/dt = - k1[A];  2nd order:  d[A]/dt = - k2[A][A] = - k2[A]2 (- sign indicates that the [A] decreases with time)

    Let's see more data over a larger range of time.

    Exercise \(\PageIndex{1}\)

    The graph below shows both sets of data on the sample plot over a longer period of time of 3 hours.

    AtoP_TwoModels_All Data.svg

    The graphs clearly diverge.  The red curve/data (Set 2) fall more quickly at first and then more slowly after about 0.6 hours.  Just from this observation, can you guess which curve is 1st and 2nd order?

     

    Answer

    The red curve is for the second-order reaction.  It falls more quickly at initial times since the rate is proportional to [A]2 but more slowly later as there is less A to collide with another A to form the product.

    Now let's fit the data 

    Exercise \(\PageIndex{1}\)

    Download this Excel spreadsheet by selecting the link.  The spreadsheet does not "fit" the data (Excel without addons does not do nonlinear regression analyses).  Rather the data points for both the 1st and 2nd order reactions were created using the integrated rate equations for both.  Change the rate constants for the 1st and 2nd order reactions (k1 and k2, respectively) and note the effect on the graphs.  What key points have you learned from this exercise?

    Answer

    Key points

    • When fitting experimental data, you need to use multiple models and the equations derived from them to explain and predict the behavior of the system.  A simple example is for reversible enzyme inhibition, in which the data should be fit to competitive, uncompetitive, and mixed (or noncompetitive) inhibition equations to find the best model;
    • It is important to have sufficient quality data to fit different models and select the one that best describes the behavior of the system.  In this simple case, it was critical to collect data over a large enough range of time points to differentiate and pick the model that best fits the data;
    • Of course, you should use nonlinear regression analysis to analyze real data (not perfect data as was derived from equations in this exercise) and compare the statistical fits for each model for the data.

    Example of data of insufficient quality in biochemistry

    The computer models showing the structure of molecules using PDB coordinate files can seem to be startlingly real models of the actual 3D structure of the molecule.  What the 3D models show are atoms of appropriate type, size, and position fit to the electron diffraction data and interpreted electron density from x-ray diffraction patterns.  The computer programs simply add connected colored spheres representing different atoms to fit x, y, and z positions representing electron density data. 

    It is difficult to interpret large data files by simple inspection.  Instead, data is increasingly converted into colored images for which the human eye and brain can infer properties quickly.  Let's look at one particular PDB file of an antibody:antigen (in this case a dodecapeptide KLASIPTHTSPL peptide) structure in which low-quality x-ray data led to low-quality structural models. In particular, the bound peptide has significant steric clashes with the antibody.  This is evident even in the structure of the complex in which no hydrogen atoms are present (as is the case of all X-ray structures since they are too small to diffract/scatter X-rays).

    Figure \(\PageIndex{2}\) below shows an interactive iCn3D model of the anti-arsonate germline antibody 36-65 in complex with a phage display derived dodecapeptide KLASIPTHTSPL without added hydrogens (2A6I) is shown below.

    Anti-arsonate germline antibody 36-65 in complex with a phage display derived dodecapeptide KLASIPTHTSPL without added hydrogens (2A6I).png

    NIH_NCBI_iCn3D_Banner.svg Figure \(\PageIndex{2}\): Anti-arsonate germline antibody 36-65 in complex with a phage display derived dodecapeptide KLASIPTHTSPL without added hydrogens (2A6I). (Copyright; author via source).  Click the image for a popup or use this external link: https://structure.ncbi.nlm.nih.gov/i...hi5pLnMmq6MEJ6

    Note the proximity and likely steric clashes between just one pair of amino acids, Y101 (cyan) in the antibody and P4 (magenta) in the bound peptide. 

    Exercise \(\PageIndex{1}\)

    Go to the external link for the iCn3D model of the antibody/peptide complex. Determine the distance between the Y101 of the antibody and P4 of the peptide.

    • Zoom into that region of the complex with your mouse or trackpad
    • Locate the two closest atoms of the amino acid pairs (YCE2 and PCG)
    • choose Analysis, Distance, between Two Atoms and follow directions to determine the distance

    Make a screen snip showing the distance between the two atoms.  Is this distance less than the sum of their Van der Waals radii? If so, there would be significant steric clashes between the atoms.

    I

    Answer

    Here is the screen snip showing the distance (4.7 Å) between the C atoms.  

    antibody_peptideYtoPdistminusH.png

    The van der Waals radii of the carbon atom is 1.7 Å and the noncovalent distance between the two carbon atoms is 3.4 Å, so it appears there is no steric clash since 3.4 Å (sum of vdw radii) < 4.7 Å (distance btw two C atoms).  However, we have not considered the H atoms on each carbon.

     

    Free web-based computer programs can be used to add hydrogen atoms to the atoms in the PDB file.  Download this file and open it in iCn3D (File, Open File, iCn3D PNG Image) to see a model with attached hydrogen atoms. Then do the next exercise.

     

    Exercise \(\PageIndex{1}\)

    Repeat the above exercise by determining the distance between the 2 Hs attached to the 2C atoms. Make a screen snip showing the distance between the two atoms.  Is this distance less than the sum of their Van der Waals radii?

    Answer

    Here it is.

    antibody_peptideYtoPdistPLUSHs.png

    The van der Waals radii of the hydrogen atom is around 1 Å so the noncovalent distance between the two Hs atoms is 2 Å. Hence there is a major steric clash between the two atoms since 0.8 Å (actual distance) < 2 Å (sum of vdw radii).  

    Over time, the PDB has added more and more tools that allow users to see the actual quality of the structure file and models from them in a process of model validation.  These metrics should be viewed for all PDB files.  They should remind us (so as not to be "fooled again") that the beautiful rotating models we see are just visualized data, and all data has potential errors associated with it.

    Figure \(\PageIndex{3}\), taken directly from the 2A61 PDB file shows percentile scores (ranging between 0-100) for global validation metrics.

     2a6i_full_validation 2.svg

    Figure \(\PageIndex{3}\): Geometric issues observed across the polymeric chains and their fit to the electron density. https://www.rcsb.org/structure/2a6i .  Rfree is a measure of the quality of a model from X-ray crystallographic data.  

    Figure \(\PageIndex{4}\) below shows the full validation table for this PDB file. It summarizes the geometric issues observed across the polymeric chains and their fit to the electron density. The red, orange, yellow, and green segments on the lower bar indicate the fraction of residues that contain outliers for > 3, 2, 1 and 0 types of geometric quality criteria respectively. A grey segment represents the fraction of residues that are not modeled. The numeric value for each fraction is indicated below the corresponding segment, with a dot representing fractions <5% The upper red bar (where present) indicates the fraction of residues that have poor fit to the electron density. The numeric value is given above the bar 

    2a6i_full_validation 2TableV2.svg

    Figure \(\PageIndex{4}\):  Full validation table for 2A6I.  Mol 3 is the peptide chain (KLASIPTHTSPL). Only 9 out of 12 amino acids gave observable electron density (S4 to the end). https://www.rcsb.org/structure/2a6i

     

    Exercise \(\PageIndex{1}\)

    Comment on the quality of the structure using the PDB percentile scores.

    Answer

    Quite simply, the structure is not good, especially for the P peptide.

    An updated PDB structure of the antibody is now available (5VGA) but does not have the bound npeptide..  

     

    We can use other programs to detect steric clashes in PDB structure.  

    Exercise \(\PageIndex{2}\)

    Show van der Waals steric clashes in the protein using the program Jsmol available at this link

    • check with hydrogens box on the right-hand side
    • click in the load mmCIF by PDB ID and input the 2a6i (small letters)
    • select the Clashes button on the left
    • rotate the image to display the greatest density of clashes to the right.
    • Select the PNG + Jmol  button on the right-hand side to download an image showing the clashes
    • hover over the region with the greatest number of clashes to identify amino acids in this region.  Which chain (A, B, P) is involved in the most clashes? 
    Answer

    The right-hand side with the greatest density of clashes shows the source of most clashes is the bound peptide P.

     

    Signal vs noise - Seeing a signal where none exists

    As our methods to detect and analyze analytes at very low concentrations increase, so does the problem of reliably detecting a signal S from a true sample in which the signal is embedded in a background (noise N) contributed to by the solvent and non-sample species that might either be present in the solvent or leached from the container by the solvent. Another problem is detecting species that are inherently unstable in the purification, storage, and analysis processes.  

    Samples that are purified in chromatographic separations elute as peaks that are theoretically Gaussian (very symmetric) but in practice have trailing edges. To get the true amount of a sample (signal), an integration of the area under the peak but be performed. This should also be compared to the area under the small background (noise) that arises from solvent or detector fluctuation.  If the area of a signal is at least 3x > the area of the solvent blank for the same elution time range (i.e. the S/N >3), you can be reasonably confident the signal true derives from the standard or test sample.

    We present data below that shows the elution of labile polyunsaturated fatty acid (especially ω-3) derivatives called specialized proresolving (or prodecomposition) mediators (SPM).  These fatty acid derivatives appear to help mediate the end of an inflammatory response and return the system to a noninflammatory state. They are present in low concentrations.  The SPMs include martins, protectins, resolving, and lipoxins. 

    The data below derives from a paper and a response to the paper that questions whether the identification of the lipid mediators in actual biological systems on analysis by GC/MS/MS is statistically valid. 

    O’Donnell, V.B., Schebb, N.H., Milne, G.L. et al. Failure to apply standard limit-of-detection or limit-of-quantitation criteria to specialized pro-resolving mediator analysis incorrectly characterizes their presence in biological samples. Nat Commun 14, 7172 (2023). https://doi.org/10.1038/s41467-023-41766-w.  http://creativecommons.org/licenses/by/4.0/.

     

    Exercise \(\PageIndex{1}\)

    The structure of Resolvin D2 (RvD2)m  is shown below.

    resolvinD2.svg

    It was analyzed in its pure state LC-MS/MS.  1.8 ng of RvD2 was dissolved in methanol and applied to the LC column.  The results are shown below along with three different chromatograms for pure HPLC-grade methanol, which serve as blanks.  The chromatograms for the blanks are shown at about the same elution time as the pure sample.  The blue region shows the area used for the integration.

     

    Failure to apply standard limit-of-detectionFig1AB_RvD2.svg

    Based on the intensity of the peaks, do you think that the signal (blue peak) of the pure sample in panel A is sufficiently higher than the integrated peaks (blue) for the methanol blanks to be confident that the peak in panel A at around 6.8 minutes results from RvD2?  Explain.

    Answer

    Even without statistical analyses, it is clear that the signal for the pure peak (based on an intensity peak of 4x105) is >> any contribution to the peak from methanol hidden within that peak.  Panel B shows an average integrated area of about 3100 would be contributed to the actual sample peak.  Hence the signal to noise ratio is >> 3, and the peaks in panel A likely derives from RvD2. 

    Now let's compare a chromatogram of a pure standard to an actual biological sample and ask the same question - can we be confident that the peak in the biological sample represents the standard?

     

    Exercise \(\PageIndex{1}\)

    Now let's look at the elution of the protectin PDx (also named PD1, 17-PD1, 10S,17S-diHDHA) from the GC column. The structure of PDx is shown below.

    PD1_PDX.svg

    The figure below shows the elution of the pure standard at 13.2 min (top panel) and the elution profile in the same time range of a biological sample said to contain PDx (bottom).

     

    Failure to apply standard limit-of-detectionFig1_C_PDx.svg

    The blue represents the peak integrated to reflect PDx.  The green rectangle indicates the region chosen to obtain a representative blank for the region under the PDx peak

    Based on the intensity of the peaks as illustrated in the chromatograms and by simple inspection of the peaks, do you think that the signal (blue peak) in the actual biological sample (bottom panel) is sufficiently higher than the integrated peak for the methanol hidden by the sample peak to warrant the conclusion that PDx is in the biological sample?  The investigators used the green peak area to represent the methanol blank.  Explain.

    Answer

    The y-axis intensity peak has been scaled or normalized to 100, so the contribution to the integrated peak from methanol would be a much higher percentage of the actual PDx peak compared to the RvD2 elution shown earlier.   Hence our intuition based on our eyes should be discarded and replaced with a more rigorous statistical analysis.  The authors of the paper did calculate a S/N value of 4 which is quite close to the cutoff value of 3.  Others who have analyzed the data calculate a S/N <2, which suggests that PDx in the biological sample was statistically unsound.  The value of S/N<2 was calculated by using the blue peak and a background peak calculated not from just the green box but from the entire background (light blue peaks).  A quick inspection shows that the green peak, which the investigators chose for their background, has a lower average intensity, compared to the average of the light blue background peak, especially underneath the dark blue peak area.

     

    Better analyses of minor peaks should include the following practices

    • a peak should have a minimum area of 2000 counts.
    • the standard and biological peaks should elute at the same time +/- 0.05 s
    • at least 6 of the MS/MS peaks from the GC peak of the biological sample should match the pure sample's MS/MS spectral peaks.

    Information Integrity - Data and Analysis is shared under a not declared license and was authored, remixed, and/or curated by LibreTexts.

    • Was this article helpful?