Skip to main content
Biology LibreTexts

16.1: Introduction

  • Page ID
    24913
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    FASTA Format

    Biological sequences are passed to software in a standardized format referred to as FASTA. FASTA is a plain text format that can be read in any text editor (TextEdit, Notepad, VIM, TextWrangler, etc.). Nucleic acids (DNA and RNA) and Proteins are represented by single-letter nucleotides (A, T, C, G) or single letter amino acid (20 amino acids). FASTA sequences begin with a > character in the first line followed by some descriptive information about the sequence, like a sequence name. The next line consists of the sequence information. A FASTA file can contain multiple sequence entries all demarcated by a new line and a title line beginning with >.


    Example FASTA File

    > Made-up nucleic acid sequence
    ATATAGGGATTAGGATTAGAGGATAGAGGGGATTGCGCCG
    > Another nucleic acid sequence in the same file
    GGGTCGGGCTAGCGGAATCGGATTCGGCATTCGGATATTCGGATTCGGAT


    FASTA files are plain text but usually have an extension indicating it as a sequence file: .fasta, .fa, .fna or even .txt

    A list of single-letter codes for nucleic acids follows below:

    Nucleic Acid Code Meaning Mnemonic
    A A Adenine
    C C Cytosine
    G G Guanine
    T T Thymine
    U U Uracil
    R A or G puRine
    Y C, T, or U pYrimidines
    K G, T, or U bases which are Ketones
    M A or C bases with aMino groups
    S C or G Strong interaction
    W A, T, or U Weak interaction
    B not A (i.e. C, G, T, or U) B comes after A
    D not C (i.e. A, G, T, or U) D comes after C
    H not G (i.e., A, C, T, or U) H comes after G
    V neither T nor U (i.e. A, C, or G) V comes after U
    N A, C, G, T, U Nucleotide
    X masked
    Gap of indeterminate length

    Graphical Sequence Manipulation

    The exercises described here regarding bioinformatics will utilize a free and open-source software called Unipro UGENE.


    This page titled 16.1: Introduction is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by Bio-OER.

    • Was this article helpful?