5.5: Quantitative Trait Loci and GWAS
- Page ID
- 143504
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Thus far, we have considered quantitative traits and their heritability -- and while we examined how Mendelian transmission of multiple genes that affect a trait can lead to quantitative inheritance, we haven't given much consideration to the genes themselves. A locus that affects a quantitative trait is called, fittingly, a quantitative trait locus or QTL. How do we find QTLs? With genome-wide association studies, or GWAS.
Learning Objectives
- Explain how molecular markers and linkage can be used to identify genes that contribute to complex traits through GWAS.
- Interpret a Manhattan plot, understanding that GWAS identify chromosomal regions that are linked to phenotypes.
- Explain how scientists use this information from GWAS to perform experiments on genes in candidate loci to determine if those genes have a role in the phenotype.
What is a genome wide association study?
We have seen how linkage can be used to map genes responsible for traits. However, this approach is not always appropriate for tackling important questions in genetics. Consider these factors:
- In some cases, especially human genetics, directed crosses (test crosses) are not practical or ethical!
- For complex traits, the phenotype may vary along a spectrum.
- For complex traits, alleles of genes on multiple chromosomes likely contribute to the phenotype.
Instead, genome wide association studies (GWAS) compare single nucleotide polymorphisms across the genome and use them as markers to compare many individuals with and without the phenotype to ask which SNPs (loci) are likely to be inherited with the phenotype.
Important terms for understanding GWAS
- SNP: single nucleotide polymorphisms are base pairs in particular positions (loci) that vary among individuals
- Linkage: loci are sufficiently close to each other along a chromosome such that homologous recombination occurs between them less often than predicted by independent assortment
- Haplotype: a set of SNPs that can be inherited together on the same chromosome
How to genotype SNPs on a genomic scale
Depending on the species being studied, there may be several options for technologies to genotype the samples. An example is the Illumina platform, in which DNA is attached to slides and probes adjacent to each SNP are provided. DNA polymerase can then extend the probe by adding a base complementary to the next base in the sample. The free nucleotides are tagged and visualized by fluorescence.
Video \(\PageIndex{1}\): Illumina Advances Genomic Research with the Infinium Assay. (Illumina via https://www.youtube.com/watch?v=lVG04dAAyvY)
P values and Manhattan Plots
The genotype data is analyzed to identify SNPs that have a statistically significant association with the trait of interest. Without covering the details of the calculations, in statistical analysis, a P value represents the likelihood that two events are associated by random chance. A low P value indicates a low likelihood that two things are associated by chance and therefore, the two things are correlated or connected one another. The datasets in GWAS are very large, for example, a study of human ear morphology genotyped approximately 670,000 SNPs in 4,919 individuals (Adhikari et al, 2015). For GWAS studies, a P value threshold may vary depending on the analysis, but it is often less than 0.00001 (1x10-5) or even less than 0.00000005 (5x10-8). The differences between such small P values can be difficult to visualize graphically, so scientists convert the P values to negative logarithms (−log10 (P value)).
negative logs and P values
Remember that a log is an exponent, for example, log10 (100) = 2 because 102 = 100.
A significant P value for GWAS is P < 5x10-8 and the log10 (0.00000005) = -7.3, so the -log = 7.3, so any SNPs that are significantly associated should have a −log10 (P value) > 7.3.
The −log10 (P ) are plotted on a graph called a Manhattan plot because the peaks at various positions across the genome resemble buildings such as in a city skyline. The peaks identify SNPs that are associated with the trait. The significant SNPs are not necessarily the cause of the disease or phenotype, but due to linkage may be inherited with the causal mutation.
A Manhattan Plot for a GWAS for Crohn's disease. Thresholds for GWAS significance at the blue line and red lines for a study by the IBDGC on Crohn’s disease. The blue line represents a p-value of 5e-8 and the red line represents approximately 7.2e-8.
Research example: A genome-wide association study identifies multiple loci for variation in human ear morphology
Craniofacial features are examples of complex traits that are likely controlled by many genes. In a GWAS in 2015, researchers examined 10 traits associated with human ear shape: ear protrusion, lobe size, lobe attachment, tragus size, antitragus size, helix rolling, folding of antihelix, crus helix expression, superior crus of antihelix expression and Darwin’s tubercle (Adhikari et al, 2015).

This study identified SNPs on chromosomes 1 and 2 as the strongest candidates loci with potential to influence human ear morphology. A significant SNP on chromosome 2 is associated with a missense mutation in the EDAR protein, which has been implicated in human ectodermal derivatives including teeth and sweat glands. Additional studies performed in mice suggest that this protein has an additional role in ear formation. A significant SNP on chromosome 1 has been identified as a binding site for a transcriptional regulatory protein for a nearby gene, TBX15, which functions in cartilage development, suggesting this region is part of the regulatory environment that influences ear shape.
Interpretation: How can GWAS inform the biology of disease?
Our primary goal is to use these found associations to understand the biology of disease in an actionable manner, as this will help guide therapies in order to treat these diseases. Most associations do not identify specific genes and causal mutations, but rather are just pointers to small regions with causal influences on disease. In order to develop and act on a therapeutic hypothesis, we must go much further, and answer these questions:
- Which gene is connected to disease?
- What biological process is thereby implicated?
- What is the cellular context in which that process acts and is relevant to disease?
- What are the specific functional alleles which perturb the process and promote or protect from disease?
This can be approached in one of two manners: the bottom-up approach, or the top-down approach.
Bottom-up
The bottom-up approach is used to investigate a particular gene that has a known association with a disease, and investigate its biological importance within a cell. Kuballa et al.[19] were able to use this bottom-up approach to learn that a particular risk variant associated with Crohn’s Disease leads to impairment of autophagy of certain pathogens. Furthermore, the authors were able to create a mouse model of the same risk variant found in humans. Identifying biological implications of risk variants at the cellular level and creating these models is invaluable as the models can be directly used to test new potential treatment compounds.
Top-down
In contrast, the top-down approach involves looking at all known associations, utilizing the complete set of GWAS results, and trying to link them to shared biological processes/pathways implicated in disease pathogenesis. This approach is based on the idea that many of the associated genes with a disease share relevant biological pathways. This is commonly done by taking existing biological networks like protein-protein interaction networks, and layering the associated genes on top of them. However, these resulting disease networks may not be significant due to bias in both the discovery of associations and the experimental bias of the data that the associations are being integrated with. This significance can be estimated by permuting the labels for the nodes in the network many times, and then computing how rare the level of connectivity is for the given disease network. This process is illustrated in Figure 30.9. As genes connected in the network should be co-expressed, it has been shown that these disease networks can be further validated from gene-expression profiling[14].

Comparison with Linkage Analysis
It is important to note GWAS captures more variants than linkage analysis. Linkage analysis identifies rare variants which have negative effects, and linkage studies are used when pedigrees of related individuals with phenotypic information is available. They can identify rare alleles that are present in smaller numbers of families, usually due to a founder mutatios and have been used to identify mutations such as BRCA1, associated with breast cancer. Alternatively, association studies are used for this purpose and also to find more common genetic changes that confer smaller influences in susceptibility, such as rare variants which have protective effects. Linkage analysis cannot identify these variants because they are anti-correlated with disease status. Furthermore, linkage analysis relies on the assumption that a single variant explains the disease, an assumption that does not hold for complex traits such as disease. Instead, we need to consider many markers in order to explain the genetic basis of these traits.
Conclusions
We have learned several lessons from GWAS. First, fewer than one-third of reported associations are coding or obviously functional variants. Second, only some fraction of associated non-coding variants are significantly associated to expression level of a nearby gene. Third, many are associated to regions with no nearby coding gene. Finally, the majority of reported variants are associated to multiple autoimmune or inflammatory diseases. These revelations indicate that there are still many mysteries lurking in the genome waiting to be discovered.
References
Adhikari, K., Reales, G., Smith, A. et al. A genome-wide association study identifies multiple loci for variation in human ear morphology. Nat Commun 6, 7500 (2015). https://doi.org/10.1038/ncomms8500
Ellinghaus D, Degenhardt F, Bujanda L, et al. Severe Covid-19 GWAS Group. Genomewide Association Study of Severe Covid-19 with Respiratory Failure. N Engl J Med. 2020 Jun 17:NEJMoa2020283. doi: 10.1056/NEJMoa2020283. Epub ahead of print. PMID: 32558485; PMCID: PMC7315890.