Skip to main content
Biology LibreTexts

1: Ortholog/Paralog Lab

  • Page ID
    14881
  • For this assignment you will turn in your alignment, your tree, the names of a pair of orthologs, the names of a pair of paralogs, and a short description of the significance of the gene you chose. Additionally, you will have an "Appendix" with notes on the steps (if anything happened at a step that was not described in the instructions, if you noticed anything interesting).

    INSTRUCTIONS

    Getting your set of sequences

    1. Get your “bait” sequences.
      1. Go to Uniprot and type in YFG name + a model species (for example if you are interested in centipede genes ultimately, you would put in Drosophila). Read a bit about your gene and scroll down to FASTA. Click on this and it will take you to the FASTA formatted protein sequence.
        1. If there is more than one “version” of YFG for that species, get the FASTA for all of them - they are likely homologs. For example, looking at Dlx (the Distalless in vertebrates) I find 5 version in humans. So I would get the FASTA for each of these. Put these all in the same text document.
    2. Get your “model” sequences. Follow the same instructions for your other two (or three) models. If you are having trouble selecting your models, you can ask me for help! Add these to the FASTA text document.
    3. Get your “test” sequences
      1. Blast your “bait” sequence in NCBI blastp. Limit your search to your test species, it helps if you know the scientific name (wikipedia has these). Choose the sequences with an evalue less than 1e-10. If there are too many, just pick the top 5. If there aren’t any with an evalue this low, pick the top 3. Add these to the FASTA text document.
    4. Get your “outgroup” sequence
      1. This should be a sequence that is similar to but not homologous to YFG. To find one, go back to UniProt and click on BLAST. Paste in your bait sequence and choose the UniRef50 database from the pulldown menu and click run. Wait.
      2. Scroll down the results list until you start seeing gene names that differ from YFG. Click on a high scoring one of these and get the FASTA sequence, add this to your FASTA text document at the very top.

    Making your tree

    The next step is to align your sequences and make a tree. Aligning sequences places the most similar parts of each sequence in vertical columns. It makes it easier to visually see whether the sequences are really very similar or not so much. You can also sometimes see things like conserved domains in an alignment. The other useful thing about alignments is that they can be used to score similarity by statistical algorithms. These programs use alignments to infer phylogenetic relationships.

    1. Paste your FASTA formatted sequence into https://www.ebi.ac.uk/Tools/msa/muscle/ and choose Pearson/FASTA as your output. The output is important because we need an output file that the next program can read.
    2. The results page has a bunch of different options. Save the basic text version and then click around to visualize your alignment in different ways. Do you see anything interesting? Any patterns?
    3. Upload this “alignment file” to http://iqtree.cibiv.univie.ac.at. and click submit job. IQtree by default will test different models of molecular evolution on your data and see which one fits. We aren’t going to use super fancy models, so we don’t need to add in things like a gamma distribution or free rate heterogeneity. As IQtree runs you can ponder the difference between paralogs and orthologs and/or start writing up a description of your gene and what happened at each step you did to find out about it’s genetic complexity in your non-model organism.
    EvoDevo Tree.png

    Sample output by Hyung Joo Kim and Kinsei Imada

    This tree seeks to find out whether the Drosophila Distalless-family gene INDY (I'm Not Dead Yet) has a homolog in a related insect, Folsomia candida (FOLCA).

    Human paralogs are boxed in red. The closest relative in this tree to Drosophila INDY (XP_009059854.1) is a molluscan gene (orthologs boxed in red). The FOLCA gene falls within the same clade as Drosophila INDY, suggesting that it might be an INDY homolog.

    In blue are human and Drosophila representatives of an ancestral gene duplication event.