R Practice: Using Linear Models Determine the Relationship Between Biodiversity and Socioeconomic Status (Part II: Visualizing Linear Models)
- Page ID
- 101344
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Technical Learning Objective: In this module, students will gain a better understanding of how to create and interpret linear regression plots using R.
The Biodiversity Implications Implications of Socioeconomic Inequality - Visualization
In the previous model, we used a linear model to test for the relationship betwee the Gini Ratio of Income Equation, which ranges from 0 (perfect equality) to 1 (complete inequality) and the percentage of declining resident bird species in a state. In this follow-up module, we are going to learn how to effectively visualize the results of these models.
Data from: Mikkelson, G. M., Gonzalez, A., & Peterson, G. D. (2007). Economic inequality predicts biodiversity loss. PloS one, 2(5), e444.
Note: This is the second part of a two-part exercise. Please see Part I for guidance on interpreting linear model outputs.
Loading Our Packages and Data
First let's start by loading in our packages! For this module, we will be using rstatix and tidyverse.
We can also get a quick feel for our data using the summary command. This is a quick way, for example, to get an idea of the range of each of our data columns and to see whether there are any extreme outliers.
The Components of a ggplot object
After we've loaded in our data and gotten a quick feel for it, we can begin to visualize our data. A ggplot object is made up of three components:
- the data - the dataframe being plotted, which can be piped into the figure or defined within the ggplot command
- the aesthetics (aes) - the aesthetics of the geometric object, such as color and shape, as well as the columns to be used for the x and y axes
- the geometries (geom) - the type of plot (e.g. point, boxplot, histogram)
The type of plot you use (the geom) will depend on what type of data you are working with. Here, we are looking for the relationship between two continuous variables, so we will be using a scatterplot. We know from our previous analysis (Part I: Exploring Linear Model Outputs) that the linear relationship between these two variables is borderline insignificant. Generally, you would only include a linear model line on a regression plot if the relationship is significant. We will add a linear model line here, however, for learning purposes.
The first step is to pipe our data into a plot. We do so by using the formula:
Name of Data %>%
ggplot(aes(x = independent variable, y = dependent variable))
In the case of our experiment, we are testing whether the % declining species depends on the Gini Ratio. We will therefore place the % declining species on the y axis and the Gini Ratio on our x-axis.
IMPORTANT: After each line of code, it's important to include a brief description of what your code does to keep track of it all. This is called commenting. Any line of code that starts with a # will be treated as a comment by R.
Your code should look something like this:
You now have data added to your plot and your aesthetics set, but the plot is blank because you have not yet chosen a geom. Next, we need to add data points and a regression line. To add data points, we will use the function geom_point(). To add a regression line, we will use the function geom_smooth( method = "lm" ). The reason why we use "method = "lm" is because this is a linear model. The grey area surrounding the line represents your model's error. There are several other model types that can be added with geom_smooth (e.g., a generalized additive model), but these models are beyond the goals of this module.
IMPORTANT: Before each addition to the ggplot command, you have to use a "+" rather than a pipe (%>%).
Awesome, the regression has all the data needed!
Effective Data Visualization
There is still a long way to go to make our plot look nicer, however. Here are just a few ways to improve our visualization:
- We can change the labels of the x and y axes to be more descriptive using xlab("Text") and ylab("Text"). Make sure to include units for your variables where relevant!
- We can customize the text size for the axes. We do this within our plot "theme"
theme(axis.title.x = element_text(size = number), axis.title.y = element_text(size = number))
- We can change the color or size of our points by setting the color and size within geom_point.
Note: Effective ata visualization is a complex skill to learn! There are a lot of online resources to help you on this journey.
- For a full list of colors in R, visit this Colors in R Cheatsheet
- For more ways to make prettier plots, visit this Data Visualization in R Cheatsheet
- To explore some additional types of plots in R and figure out which are best for your data, visit datatoviz
After you've run this code, try out some of your own changes by replacing the color blue and the size of the axis titles.
Finally, let's add a theme to the regression plot. There are several set themes available for you to use, but for this regression lets use a simple black and white theme.
To do so, use the function: theme_bw()
Here are some other themes. Go ahead and try them out by changing and rerunning the code!
theme_gray()
theme_classic()
theme_minimal()
Here is the code again, including the theme.
Visualizing Model Fit
Looking at the graph, what can you say about the fit of this linear model?
- Answer
-
The data points are widely scattered around the line of best fit, indicating that there is a weak correlation between the Gini Ratio and % declining resident bird species. There seem to be some outliers that might be skewing this relationship (e.g., the state that has 10% declining resident bird species). These data are unlikely to fit the assumptions of a linear model without additional statistical consideration.
A male Kirtland's Warbler in a jack pine forest in Michigan, USA by snowmanradio is licensed under CC BY 2.0. Kirkland's Warblers are near threatened species that are dependent on jack pine habitat.
References:
Holland, T. G., Peterson, G. D., & Gonzalez, A. (2009). A cross-national analysis of how economic inequality predicts Biodiversity loss. Conservation Biology, 23(5), 1304–1313. https://doi.org/10.1111/j.1523-1739.2009.01207.x
Hope, D., Gries, C., Zhu, W., Fagan, W. F., Redman, C. L., Grimm, N. B., Nelson, A. L., Martin, C., & Kinzig, A. (2003). Socioeconomics Drive Urban Plant Diversity. Proceedings of the National Academy of Sciences, 100(15), 8788–8792. https://doi.org/10.1073/pnas.1537557100
Kuras, E. R., Warren, P. S., Zinda, J. A., Aronson, M. F. J., Cilliers, S., Goddard, M. A., Nilon, C. H., & Winkler, R. (2020). Urban socioeconomic inequality and biodiversity often converge, but not always: A global meta-analysis. Landscape and Urban Planning, 198. https://doi.org/10.1016/j.landurbplan.2020.103799
Mikkelson, G. M., Gonzalez, A., & Peterson, G. D. (2007). Economic inequality predicts biodiversity loss. PloS one, 2(5), e444.