Skip to main content
Library homepage
 

Text Color

Text Size

 

Margin Size

 

Font Type

Enable Dyslexic Font
Biology LibreTexts

Activity 1-3 - Genetic evolution and Identifying Homologs

( \newcommand{\kernel}{\mathrm{null}\,}\)

Learning Objectives
  • Define what BLAST and BLASTp are and why they are used in evolutionary biology.
  • Explain the difference between homologous and analogous sequences.
  • Describe how protein conservation can reveal functional and evolutionary insights.
  • Recognize the strengths of BLASTp in detecting distant evolutionary relationships.
Note

Students should read:

  • The introductory text on BLAST and BLASTp provided in the chapter.
  • The example comparing jaw bones and ear bones in vertebrates.
  • Definitions of homologs vs. analogs.
  • Overview of how BLASTp works (breaking sequences into words, E-values, identity %).
Definition: Term
  • BLAST (Basic Local Alignment Search Tool): A bioinformatics tool that finds regions of similarity between biological sequences.
  • BLASTp: A version of BLAST that compares protein sequences, useful for finding distant evolutionary relationships.
  • Homologs: Genes or proteins that share a common ancestor and often a similar function.
  • Analogs: Genes or proteins with similar function but different evolutionary origins (convergent evolution).
  • E-value (Expect Value): A statistical measure that estimates the likelihood a match occurred by chance. Smaller = more significant.
  • Identity %: The percentage of amino acids that exactly match between two protein sequences.
  • Query Coverage: The percentage of your protein that aligns with a match in the database.
  • Functional Domain: A conserved region in a protein that is critical for its biological function.

Exploring Protein Homology Using BLASTp

What is BLAST?

BLAST (Basic Local Alignment Search Tool) is a powerful Bioinformatics tool used to compare your sequence (DNA or protein) against a vast database of sequences stored in NCBI. It helps identify similar sequences, called homologs, which likely share a common evolutionary origin. There are different types of BLAST:

  • BLASTn compares nucleotide sequences (DNA or RNA).
  • BLASTp compares protein sequences (amino acids).

In this lab, we use BLASTp. Why? Because proteins evolve more slowly than DNA, and BLASTp is better at recognizing distant evolutionary relationships. Also, amino acids have different chemical properties, which helps the software make better alignments—even when the DNA changes, the protein may still “look” similar functionally. Imagine a mutation changes a DNA codon, but the new codon still codes for an identical amino acid (like glutamic acid → aspartic acid). BLASTp recognizes the similarity based on function, even when the nucleotide sequence looks different. BLASTn would miss this.

Here’s an interesting example of protein homology and evolution: In mammals, the tiny bones in your middle ear (malleus, incus, and stapes) evolved from jaw bones in reptilian ancestors. The malleus and incus were once part of the jaw joint in early vertebrates. Over time, they shifted function and became part of the hearing apparatus in mammals. So if you compare the proteins involved in forming middle ear bones in humans and jawbones in reptiles, you might find homologous proteins—same evolutionary origin, different anatomical roles. This is a great example of how protein conservation gives us clues about evolutionary history.

  • Homologs: Sequences that are similar because they come from a common ancestor. (e.g., human and chimp hemoglobin).
  • Analogs: Sequences or structures that serve similar functions but evolved independently—a process called convergent evolution. (e.g., bat wings and insect wings).

How BLASTp Works

When you paste your protein into BLASTp, BLAST breaks the sequence into short segments (called "words"). It scans the database looking for matching words. Once a match is found, it extends the alignment on both sides. It scores the quality of alignment using: Identity (% of amino acids that match exactly) and E-value (probability the match is just by chance—the smaller, the better). BLAST ranks and displays the best matches (homologs) in a user-friendly way: A colorful graphic (where red = best hit), a description table with identity %, E-value, and coverage, and the actual sequence alignments.

Lab Protocol

  1. Find Your Protein
    • Go to NCBI Protein Database (https://www.ncbi.nlm.nih.gov/protein)
    • Search for your protein of interest (e.g., "sonic hedgehog").
    • Click on the result and look for the GenBank accession number (e.g., NP_000257.2, BAA33523.2, etc).
    • clipboard_e3253a63f10debc8d084473422bcc2851.png
  2. Launch BLASTp
    • On the right side of the page, under “Analyze this sequence,” click "Run BLAST".
    • This sends your protein into the BLASTp query form automatically.
      • clipboard_e76c5388caa78bd8f2685615c0a940667.png
  3. Set Up Your Search
    • Confirm that BLASTp is selected (top left).
    • Under Organism, search for one species at a time (e.g., Arabidopsis thaliana, E. coli, etc.).
    • Click "BLAST" at the bottom to begin.
    • clipboard_e59e9fb92c92ab0e12834a894d7138d58.png
  4. Review Your Results. You’ll see three main sections:
    • Graphic Summary: Shows where the best hits align. Red bars mean highly similar regions.
      • The Graphic Summary is a visual snapshot showing how well other proteins in the database match your query protein. The position of each bar shows where in your protein the match occurred, and the color of the bar indicates how strong that match is. Colors are used to represent similarity scores: red bars indicate very strong matches (high similarity), orange or pink bars show moderate matches, green or blue bars reflect lower similarity, and black or gray bars represent weak or non-significant matches. For example, if you see a red bar for a chimpanzee protein and a green bar for a fruit fly protein, it suggests that the chimpanzee version is highly similar to your query protein (likely the same function), while the fruit fly version is more distantly related, possibly sharing only one conserved domain.
      • clipboard_e9aa7c251e485df90446a26945ad379c9.png
    • Descriptions Table: Includes Identity %, E-value (smaller is better), and Query Coverage.
      • The Descriptions Table gives you a detailed breakdown of each top “hit”—each protein that had some level of similarity to your query. The Max Score tells you how well the sequences align overall, with higher scores indicating better matches. Query Coverage shows the percentage of your protein that aligned with the match; a higher coverage means a longer stretch of similarity. The E-value (or Expect value) is a critical measure: it tells you the probability that this match occurred by random chance. The lower the E-value, the more significant the match is—values close to 0.0 suggest that the sequences are almost certainly homologous. Finally, the Percent Identity tells you how similar the actual amino acid sequences are; values above 90% usually indicate the protein is nearly identical and probably has the same function in different organisms. For instance, if your human protein aligns with a mouse protein with 98% identity, 100% query coverage, and an E-value of 0.0, that means the two proteins are almost identical—likely performing the same function in both species.
      • clipboard_e5914474f640f852b55628bc1bb4e0fb9.png
    • Alignments: Detailed comparison of your protein to others.
      • The Alignments section lets you dig into the exact amino acid-by-amino acid comparison between your protein and the matching protein. It shows whether the amino acids are exactly the same (identical residues) or if they are biochemically similar (conservative substitutions), and highlights regions that match. This is where you can see not only that proteins are similar but also which parts are similar—which is key to understanding whether functional domains are conserved. This detailed view helps you ask deeper biological questions: for example, if only the middle part of the protein aligns across species, maybe that region is a conserved domain important for binding DNA or ATP. If the alignment only occurs in one part of the protein, it might not be the entire protein that’s conserved—just a functional part. This is common with proteins that share functional domains but are otherwise unrelated.
      • clipboard_e75b586acb1e80622bc8a5fee58dfa83d.png

Record Your Observations

Use the table below to document your results from different clades (groups of organisms). For each species, note the color bar, % identity, and E-value. Then, hypothesize the function of the protein in that organism. (You should look up the function of your proteins in these organisms. If the functions are unknown, you can come up with your own hypothesis of its functions)

Clade / Organism Bar Color Identity % E-value Hypothesized Function
Primates (9443)        
Marsupials (9263)        
Monotremes (9255)        
Birds (8782)        
Lizards (8504)        
Amphibians (8292)        
Fishes (117569)        
Fruit Flies (7211)        
Sea Urchins (7625)        
Sponges (6040)        
Arabidopsis (3702)        
Yeast (4932)        
E. coli (562)        

Reflection Questions

  1. What organisms had the most conserved (similar) version of your protein?
  2. How do the protein’s functions vary between simpler organisms (like bacteria) and more complex ones (like mammals)?
  3. Can you identify a trend in protein conservation across evolution?
  4. What does this say about the importance of your protein?

Example BLASTp Results Table

Query protein: Human MYH7 (Myosin heavy chain 7)
Function in humans: Part of the motor protein complex for cardiac and skeletal muscle contraction.

Clade / Organism Bar Color Identity % E-value Hypothesized Function
Primates (9443) Red 99% 0.0 Muscle contraction in heart and limbs
Marsupials (9263) Red 96% 0.0 Skeletal and cardiac muscle contraction
Monotremes (9255) Red 94% 0.0 Same – contractile protein in heart and skeletal muscle
Birds (8782) Red 85% 2e-100 Flight and leg muscle contraction
Lizards (8504) Red 83% 5e-90 Muscle movement (locomotion, tail movement)
Amphibians (8292) Orange 78% 3e-80 Swimming muscle function; limb movement
Fishes (117569) Orange 70% 2e-60 Swimming muscle movement; tail fin muscle control
Fruit Flies (7211) Green 35% 0.003 Muscle contraction in wings and legs; partial homolog
Sea Urchins (7625) Green 32% 0.4 Tentacle or tube foot movement (using actomyosin system)
Sponges (6040) Gray 18% 10 No true muscles; possible ancient cytoskeletal role
Arabidopsis (3702) (Plant) Gray 20% 8 Cytoplasmic streaming (actin-myosin-like transport system)
Yeast (4932) Gray 22% 6 Organelle transport, cell division via actin-myosin system
E. coli (562) Black 5% >100 No homolog found; prokaryotes lack myosin-like proteins

Example Observations:

  • High similarity in vertebrates, suggesting MYH7’s role is critical in muscle function and has been conserved.
  • Fruit flies and sea urchins have partial matches—they may use similar motor proteins for movement.
  • Plants and fungi don’t have muscles, but they still use actin-myosin-like systems for internal transport.
  • Bacteria (like E. coli) don’t have homologs—makes sense since they don’t use motor proteins in the same way.

Post-Lecture Objectives
  • Use BLASTp to analyze a protein’s conservation across different species.
  • Interpret E-values, bar colors, and identity % to assess evolutionary relatedness.
  • Hypothesize how conserved proteins may function differently in different organisms.
  • Construct a hypothetical evolutionary timeline for a protein’s role across clades.
Reflection Questions
  1. Which species had the most conserved version of your protein? Why might this be?
  2. How did the protein’s hypothesized function change across evolutionary time?
  3. Were there any unexpected matches? What might that say about functional conservation?
  4. What might be the evolutionary advantage of retaining this protein function?
  5. Which of the following best describes a homologous protein?
    • A) Same function, different structure, no evolutionary link
    • B) Similar sequence and common ancestry (Correct Answer)
    • C) Completely different function and origin
    • D) Protein found only in prokaryotes
  6. What does a red bar in the BLASTp graphic summary indicate?
    • A) Low similarity, possible analog
    • B) High identity and strong alignment (Correct Answer)
    • C) Only the protein's N-terminal matched
    • D) Sequence was not found

This page titled Activity 1-3 - Genetic evolution and Identifying Homologs is shared under a CC BY-NC 4.0 license and was authored, remixed, and/or curated by Victor Pham.

Support Center

How can we help?