Home
Campus Bookshelves
Gettysburg College
Ecology for All!
Back Matter
R Practice: Using Linear Models Determine the Relationship Between Biodiversity and Socioeconomic Status (Part II: Visualizing Linear Models)

R Practice: Using Linear Models Determine the Relationship Between Biodiversity and Socioeconomic Status (Part II: Visualizing Linear Models)

Last updated
Save as PDF

Page ID: 101344

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

Technical Learning Objective: In this module, students will gain a better understanding of how to create and interpret linear regression plots using R.

The Biodiversity Implications Implications of Socioeconomic Inequality - Visualization

In the previous model, we used a linear model to test for the relationship betwee the Gini Ratio of Income Equation, which ranges from 0 (perfect equality) to 1 (complete inequality) and the percentage of declining resident bird species in a state. In this follow-up module, we are going to learn how to effectively visualize the results of these models.

Data from: Mikkelson, G. M., Gonzalez, A., & Peterson, G. D. (2007). Economic inequality predicts biodiversity loss. PloS one, 2(5), e444.

Note: This is the second part of a two-part exercise. Please see Part I for guidance on interpreting linear model outputs.

Loading Our Packages and Data

First let's start by loading in our packages! For this module, we will be using rstatix and tidyverse.

library(tidyverse) #First, let's load the tidyverse packages, which we will use for data wrangling and visualization.

bird <- read.csv(url("https://bio.libretexts.org/@api/deki/files/66201/GINI_BIRDS.csv?origin=mt-web"), sep = ',', head = TRUE)
#read.csv loads our data file, and setting it to "bird" stores it in R's "brain" so that we can access it later

head(bird) # Head allows us to visualize the first few rows of the dataset, so that we know whether it loaded okay

We can also get a quick feel for our data using the summary command. This is a quick way, for example, to get an idea of the range of each of our data columns and to see whether there are any extreme outliers.

summary(bird) # Shows the summary statistics containing a summary of each column in the dataset "bird."

The Components of a ggplot object

After we've loaded in our data and gotten a quick feel for it, we can begin to visualize our data. A ggplot object is made up of three components:

the data - the dataframe being plotted, which can be piped into the figure or defined within the ggplot command
the aesthetics (aes) - the aesthetics of the geometric object, such as color and shape, as well as the columns to be used for the x and y axes
the geometries (geom) - the type of plot (e.g. point, boxplot, histogram)

The type of plot you use (the geom) will depend on what type of data you are working with. Here, we are looking for the relationship between two continuous variables, so we will be using a scatterplot. We know from our previous analysis (Part I: Exploring Linear Model Outputs) that the linear relationship between these two variables is borderline insignificant. Generally, you would only include a linear model line on a regression plot if the relationship is significant. We will add a linear model line here, however, for learning purposes.

The first step is to pipe our data into a plot. We do so by using the formula:

Name of Data %>%

ggplot(aes(x = independent variable, y = dependent variable))

In the case of our experiment, we are testing whether the % declining species depends on the Gini Ratio. We will therefore place the % declining species on the y axis and the Gini Ratio on our x-axis.

IMPORTANT: After each line of code, it's important to include a brief description of what your code does to keep track of it all. This is called commenting. Any line of code that starts with a # will be treated as a comment by R.

Your code should look something like this:

bird %>%
  ggplot(aes(x = GiniRatio_1969, y = DecResBird_1966)) #plotting two continuous variables, Gini Ratio and % declining resident bird species

You now have data added to your plot and your aesthetics set, but the plot is blank because you have not yet chosen a geom. Next, we need to add data points and a regression line. To add data points, we will use the function geom_point(). To add a regression line, we will use the function geom_smooth( method = "lm" ). The reason why we use "method = "lm" is because this is a linear model. The grey area surrounding the line represents your model's error. There are several other model types that can be added with geom_smooth (e.g., a generalized additive model), but these models are beyond the goals of this module.

IMPORTANT: Before each addition to the ggplot command, you have to use a "+" rather than a pipe (%>%).

bird %>%
  ggplot(aes(x = GiniRatio_1969, y = DecResBird_1966)) + #plotting two continuous variables, Gini Ratio and % declining resident bird species
  geom_point() + # Creates a scatterplot 
  geom_smooth( method = "lm" ) # Adds a fitted line for the linear relationship between the two variables

Awesome, the regression has all the data needed!

Effective Data Visualization

There is still a long way to go to make our plot look nicer, however. Here are just a few ways to improve our visualization:

We can change the labels of the x and y axes to be more descriptive using xlab("Text") and ylab("Text"). Make sure to include units for your variables where relevant!
We can customize the text size for the axes. We do this within our plot "theme"

theme(axis.title.x = element_text(size = number), axis.title.y = element_text(size = number))

We can change the color or size of our points by setting the color and size within geom_point.

Note: Effective ata visualization is a complex skill to learn! There are a lot of online resources to help you on this journey.

For a full list of colors in R, visit this Colors in R Cheatsheet
For more ways to make prettier plots, visit this Data Visualization in R Cheatsheet
To explore some additional types of plots in R and figure out which are best for your data, visit datatoviz

After you've run this code, try out some of your own changes by replacing the color blue and the size of the axis titles.

bird %>%
  ggplot(aes(x = GiniRatio_1969, y = DecResBird_1966)) + #plotting two continuous variables, gini ratio and Declining resident bird species
  geom_point(size = 2, color = "blue") + #adding points/creating a scatterplot and making this points larger (size = 2) and blue (color = "blue")
  geom_smooth(method = "lm") + #adds the best fit line for the linear model between the two variables
  ylab("Declining Resident Bird Species, 1966-2005 (%)") + 
  xlab("Gini Ratio of Family Income Inequality 1969") +
  theme(axis.title.x = element_text(size = 14), ## Changes x-axis font size to 14
  axis.title.y = element_text(size = 14)) ## Changes y-axis font size to 14

Finally, let's add a theme to the regression plot. There are several set themes available for you to use, but for this regression lets use a simple black and white theme.

To do so, use the function: theme_bw()

Here are some other themes. Go ahead and try them out by changing and rerunning the code!
theme_gray()
theme_classic()
theme_minimal()

Here is the code again, including the theme.

bird %>%
  ggplot(aes(x = GiniRatio_1969, y = DecResBird_1966)) + #plotting two continuous variables, gini ratio and Declining resident bird species
  geom_point(size = 2, color = "blue") + #adding points/creating a scatterplot and making this points larger (size = 2) and blue (color = "blue")
  geom_smooth(method = "lm") + #adds the best fit line for the linear model between the two variables
  ylab("Declining Resident Bird Species, 1966-2005 (%)") + 
  xlab("Gini Ratio of Family Income Inequality 1969") +
  theme(axis.title.x = element_text(size = 14), ## Changes x-axis font size to 14
  axis.title.y = element_text(size = 14)) +  ## Changes y-axis font size to 14
  theme_bw()

Visualizing Model Fit

Looking at the graph, what can you say about the fit of this linear model?

Answer: The data points are widely scattered around the line of best fit, indicating that there is a weak correlation between the Gini Ratio and % declining resident bird species. There seem to be some outliers that might be skewing this relationship (e.g., the state that has 10% declining resident bird species). These data are unlikely to fit the assumptions of a linear model without additional statistical consideration.

A male Kirtland's Warbler in a jack pine forest in Michigan, USA by snowmanradio is licensed under CC BY 2.0. Kirkland's Warblers are near threatened species that are dependent on jack pine habitat.

References:

Holland, T. G., Peterson, G. D., & Gonzalez, A. (2009). A cross-national analysis of how economic inequality predicts Biodiversity loss. Conservation Biology, 23(5), 1304–1313. https://doi.org/10.1111/j.1523-1739.2009.01207.x

Hope, D., Gries, C., Zhu, W., Fagan, W. F., Redman, C. L., Grimm, N. B., Nelson, A. L., Martin, C., & Kinzig, A. (2003). Socioeconomics Drive Urban Plant Diversity. Proceedings of the National Academy of Sciences, 100(15), 8788–8792. https://doi.org/10.1073/pnas.1537557100

Kuras, E. R., Warren, P. S., Zinda, J. A., Aronson, M. F. J., Cilliers, S., Goddard, M. A., Nilon, C. H., & Winkler, R. (2020). Urban socioeconomic inequality and biodiversity often converge, but not always: A global meta-analysis. Landscape and Urban Planning, 198. https://doi.org/10.1016/j.landurbplan.2020.103799

Mikkelson, G. M., Gonzalez, A., & Peterson, G. D. (2007). Economic inequality predicts biodiversity loss. PloS one, 2(5), e444.