Skip to main content
Biology LibreTexts

R Practice: Using Loops and Pattern Matching to Understand Colonial Histories of Species Names

  • Page ID
    98021
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

     

    Technical learning objective: In this module, you will learn how to use loops to repeat processes in R and to use grep to match patterns among two datasets (in this case, bird species names and European surnames).

     

    What's in a name?

    In their 2021 publication, Trisos et al. (2021) explore the history of colonialism in the field of ecology and how we, as scientists, can make ecology a more inclusive space. 

    The code below is a modification of one of their analyses, which focused on the number of bird species whose scientific names have a European influence. They found that many bird species outside of Europe contain European surnames, and that this pattern was most prominent in areas formally colonized by European countries. Exploring links between European surnames and species names helps us to recognize the colonial nature of this field, and this recognition is a first step towards decolonization. 

    One considerable issue is that these names often convey little ecological information, as compared to the indigenous names for these species, and create a barrier for indigenous involvement in local ecology. There is a movement in ecology to include indigenous names for species alongside their scientific names and, in some cases, to rename species with local ecological knowledge in mind.

    The authors list several approaches to more inclusive ecology: decolonizing access, decolonizing expertise, decolonizing your mind, practice ethical and inclusive ecology, and knowing your histories. Language is a key consideration in this process because of the important role it plays in our understanding of the world. 

    So, what can you do? You can learn more about decolonizing conservation here. Educating yourself on the issues and solutions is the first step towards making a difference.

    Below, we use a pattern-matching tool in R, grep, to compare two datasets, one with species names of birds and one with European surnames. We also learn about how to set up a "for loop" in R. Loops are used to repeat tasks multiple times across different rows or columns of a dataset and can be extremely useful for streamlining analyses. 
     

    Trisos, C. H., Auerbach, J., & Katti, M. (2021). Decoloniality and anti-oppressive practices for a more ethical ecology. Nature Ecology & Evolution5(9), 1205-1212.

     
    Loading Packages and Data

    Today, we'll be working with the package "tidyverse" to manipulate our data. We will be loading in two datasets, one on bird names and one on European surnames. Both are saved as a csv, or a comma-separated value.

    #Let's first load the tidyverse package
    library(tidyverse)
    
    #Load bird species names (names follow Taxonomy of Jetz et al. 2012 http://birdtree.org/taxonomy/)
    birdnames <- read.csv(url("https://bio.libretexts.org/@api/deki/files/65125/bird_species_names.csv?origin=mt-web"), sep = '\t', head = FALSE)
    
    #We are going to read in our csv files, and tell R that there is no column heading (header = FALSE) for our datasets
    
    #Load list of surnames
    peoplenames <- read.csv(url("https://bio.libretexts.org/@api/deki/files/65127/european_names.csv?origin=mt-web"), sep = '\t', head = FALSE)
    
    #Now let's make sure out data loaded correctly by examining the first few rows with the function "head"
    head(peoplenames)

     

    Learning About Loops and "grep"

    The goal of this module is to familiarize you with a "for" loop and with the function "grep", a pattern matching function. The code below creates an empty table (or "data frame"), then fills that table with two bits of information: the European surname of interest, and the number of bird species names based on the European surname.

    #Step 1: Setting up the length of the loop
    npeople = nrow(peoplenames) #When we are running a loop, we want to first figure out how many "iterations" that loop needs, or how many times the loop needs to run. In this case, we want the loop to run for each European name in the "peoplenames" dataset. We can use the "length" function to figure out how long the peoplenames dataset is, or how many names are in this dataset.
    
    
    #Step 2: Creating a table for the loop to put data into
    matches = matrix(nrow = npeople, ncol = 2) #We also need to create an empty data table, or data frame. This is where our for loop will add data. This dataframe has 2 columns (one for surname, and one for bird species count) and 1,543 rows (the number of surnames in the dataset)
    
    
    #Step 3: Let's give the empty table some useful column names 
    colnames(matches) = c("Name", "NMatches") #here we use the "colnames" labelling function to tell R that the columns of our dataset will represent the European Name and the number of bird species that match it, respectively
    
    
    #Step 4: Creating our loop
    for (i in 1:npeople) #The "for" loop says "do the tasks below for each of these elements". For each run of the loop, "i" tells us which element we will be using. So here, we will start with element 1 (the first name of the list) and go through element 1,543 (the total number of surnames on the list)
      { 
      matching_birds = grep(peoplenames[i,1], unlist(birdnames), ignore.case=TRUE, value=TRUE) #The "grep" function searches through a vector of data (x) to find matches with a pattern; here, we are telling R to look for any instances of the ith surname (peoplenames[i]) within the list of bird names. In this case [i] means choose the ith element in the list of surnmes.Each time we run the loop, it will repeat this search the next surname on the list, until we've reached surname 1,543. There are two additional options within this function - we have set ignore.case to TRUE (matches are not case-sensitive) and values = TRUE (the function will return a list of matches, even if there are no matches, in which case the list will have a length of 0).
      number_matches = length(matching_birds) #This counts the number of matches using the "length" function
      matches[i,1] = peoplenames[i,1] #In R, data are indexed as [row,column]. Here we are telling R to fill the dataframe out, putting the ith surname in the ith row of coulumn 1.
      matches[i,2] = number_matches #And, lastly, we are putting the number of matches for the ith surname in the ith row of column 2
    }
    
    
    #Step 5: Reorder our dataframe
    data.frame(matches) %>% #here we are using the "pipe" to direct our dataset into the function
      arrange(desc(NMatches)) #the arrange function allows us to sort our data by a column of interest (here, the number of bird species based on a given name, or NMatches) and to tell R if we want to sort ascending (from low to high) or descending (from high to low)

     

    What did you find?

    Based on your outputs, which surnames shows up most in bird species names? Using Google to help, what is the origin of that surname?

    Answer

    Dutch naturalist Coenraad Jacob Temminck

    British naturalist John Edward Gray 

     

     

    Case Study:  Rudolf Grauer

    Rudolf Grauer was a German Zoologist whose research focused on the Belgian Congo. The Belgian Congo was a Belgian colony in Central Africa from 1908 until independence in 1960. Four species in the Democratic Republic of Congo are still named after Grauer today: Grauer's Cuckoo Shrike (Coracina graueri), Grauer's Swamp Warbler (Bradypterus graueri), Grauer's warbler (Graueria vittata), and Grauer's Broadbill (Pseudocalyptomena graueri). A photo of Grauer's Broadbill is included below.

    A photo of an adult Grauer's Broadbill, a small green bird in the Eurylaimidae family, in the foreground. The background contains a nest with two juvenile Grauer's Broadbills.

     

     

     

     

     

     

     

     

     

     

    Grauer's Broadbill Pseudocalyptomena graueri by Nik Borrow is licensed under CC BY-NC 2.0