Module 4.2: Primary Structure

$$\newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$$

$$\newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}}$$

$$\newcommand{\id}{\mathrm{id}}$$ $$\newcommand{\Span}{\mathrm{span}}$$

( \newcommand{\kernel}{\mathrm{null}\,}\) $$\newcommand{\range}{\mathrm{range}\,}$$

$$\newcommand{\RealPart}{\mathrm{Re}}$$ $$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$

$$\newcommand{\Argument}{\mathrm{Arg}}$$ $$\newcommand{\norm}[1]{\| #1 \|}$$

$$\newcommand{\inner}[2]{\langle #1, #2 \rangle}$$

$$\newcommand{\Span}{\mathrm{span}}$$

$$\newcommand{\id}{\mathrm{id}}$$

$$\newcommand{\Span}{\mathrm{span}}$$

$$\newcommand{\kernel}{\mathrm{null}\,}$$

$$\newcommand{\range}{\mathrm{range}\,}$$

$$\newcommand{\RealPart}{\mathrm{Re}}$$

$$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$

$$\newcommand{\Argument}{\mathrm{Arg}}$$

$$\newcommand{\norm}[1]{\| #1 \|}$$

$$\newcommand{\inner}[2]{\langle #1, #2 \rangle}$$

$$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\AA}{\unicode[.8,0]{x212B}}$$

$$\newcommand{\vectorA}[1]{\vec{#1}} % arrow$$

$$\newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow$$

$$\newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$$

$$\newcommand{\vectorC}[1]{\textbf{#1}}$$

$$\newcommand{\vectorD}[1]{\overrightarrow{#1}}$$

$$\newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}}$$

$$\newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}}$$

$$\newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$$

$$\newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}}$$

$$\newcommand{\avec}{\mathbf a}$$ $$\newcommand{\bvec}{\mathbf b}$$ $$\newcommand{\cvec}{\mathbf c}$$ $$\newcommand{\dvec}{\mathbf d}$$ $$\newcommand{\dtil}{\widetilde{\mathbf d}}$$ $$\newcommand{\evec}{\mathbf e}$$ $$\newcommand{\fvec}{\mathbf f}$$ $$\newcommand{\nvec}{\mathbf n}$$ $$\newcommand{\pvec}{\mathbf p}$$ $$\newcommand{\qvec}{\mathbf q}$$ $$\newcommand{\svec}{\mathbf s}$$ $$\newcommand{\tvec}{\mathbf t}$$ $$\newcommand{\uvec}{\mathbf u}$$ $$\newcommand{\vvec}{\mathbf v}$$ $$\newcommand{\wvec}{\mathbf w}$$ $$\newcommand{\xvec}{\mathbf x}$$ $$\newcommand{\yvec}{\mathbf y}$$ $$\newcommand{\zvec}{\mathbf z}$$ $$\newcommand{\rvec}{\mathbf r}$$ $$\newcommand{\mvec}{\mathbf m}$$ $$\newcommand{\zerovec}{\mathbf 0}$$ $$\newcommand{\onevec}{\mathbf 1}$$ $$\newcommand{\real}{\mathbb R}$$ $$\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}$$ $$\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}$$ $$\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}$$ $$\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}$$ $$\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}$$ $$\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}$$ $$\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}$$ $$\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}$$ $$\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}$$ $$\newcommand{\laspan}[1]{\text{Span}\{#1\}}$$ $$\newcommand{\bcal}{\cal B}$$ $$\newcommand{\ccal}{\cal C}$$ $$\newcommand{\scal}{\cal S}$$ $$\newcommand{\wcal}{\cal W}$$ $$\newcommand{\ecal}{\cal E}$$ $$\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}$$ $$\newcommand{\gray}[1]{\color{gray}{#1}}$$ $$\newcommand{\lgray}[1]{\color{lightgray}{#1}}$$ $$\newcommand{\rank}{\operatorname{rank}}$$ $$\newcommand{\row}{\text{Row}}$$ $$\newcommand{\col}{\text{Col}}$$ $$\renewcommand{\row}{\text{Row}}$$ $$\newcommand{\nul}{\text{Nul}}$$ $$\newcommand{\var}{\text{Var}}$$ $$\newcommand{\corr}{\text{corr}}$$ $$\newcommand{\len}[1]{\left|#1\right|}$$ $$\newcommand{\bbar}{\overline{\bvec}}$$ $$\newcommand{\bhat}{\widehat{\bvec}}$$ $$\newcommand{\bperp}{\bvec^\perp}$$ $$\newcommand{\xhat}{\widehat{\xvec}}$$ $$\newcommand{\vhat}{\widehat{\vvec}}$$ $$\newcommand{\uhat}{\widehat{\uvec}}$$ $$\newcommand{\what}{\widehat{\wvec}}$$ $$\newcommand{\Sighat}{\widehat{\Sigma}}$$ $$\newcommand{\lt}{<}$$ $$\newcommand{\gt}{>}$$ $$\newcommand{\amp}{&}$$ $$\definecolor{fillinmathshade}{gray}{0.9}$$

Learning Objective

• Distinguish between primary, secondary, tertiary and quaternary structure.
• Predict the result of treating a protein with different cleavage reagents.
• Generate the primary sequence from sequencing data.

Overview of Protein Structure

A protein is composed of amino acids attached in a linear order. This basic level of protein structure is called it's primary structure and derives from the formation of peptide bonds between the individual amino acids. Each amino acid in the linear polymer is referred to as a residue. The order, or sequence of the amino acids is determined by information encoded in the cell's genes. An example of a protein sequence is shown below where the one letter abbreviations are used for each of the 20 amino acids used in cellular protein synthesis.

Amino acid sequence of Human Estrogen Receptor: Amino acids are indicated using the single letter code.

Higher order structure is determined by the Primary Structure

Proteins do not exist as linear threads in the cells but rather as spontaneously folded higher order structures. The higher order structure is determined by the amino acids in the primary structure. Usually the sequence alone is sufficient to generate higher order structures, but some proteins require chaparones to help them fold.

The stages or levels of protein structure are:

• Primary Structure: The amino acid sequence of the protein, with no regard for the conformation of the amino acids.
• Secondary Structure: interactions involving only mainchain (also known as backbone) atoms resulting in α-helices and β-sheets. Mainchain atoms are the N-Cα-C=O atoms that form the backbone of the protein polymer.
• Tertiary Structure: long range interactions resulting in the 3-D Folding of a single polypeptide chain.
• Quaternary Structure: The interaction of two or more peptide chains to make a functional protein.
• homodimer contains two identical chains, represented as $$\alpha_{2}$$
• a homotrimer contains three identical chains, represented as $$\alpha_{3}$$
• a heterodimer contains two different chains, represented as $$\alpha\beta$$
• heterotrimer can contain two (e.g. $$\alpha_2\beta$$) identical chains, or three different chains, as in $$\alpha,\beta,\gamma$$
• a heterotetramer often contains two pairs of identical chains, such as in $$\alpha_2\beta_2$$, but can contain four different chains, e.g. $$\alpha \beta \gamma \delta$$

learn by doing

Example - Structure Hierarchy in Hemoglobin

The oxygen transport protein, hemoglobin, is shown in this Jmol. The heme groups, which are colored purple, are responsible for binding the oxygen. The protein component of hemoglobin is colored gray. Hemoglobin looks complicated, but we can understand its structure using a hierarchical description of the structure.

Tertiary Structure is the complete description of the structure of both the mainchain and sidechain atoms of one poly-peptide chain. Clicking on the button will show you the tertiary structure of one of the sub-units of hemoglobin. Of course, the tertiary structure is built-up from secondary structural elements, which you can highlight with a pink ribbon by clicking here

Quaternary Structure is the complete description of the structure of all of the different poly-peptide chains that comprise the functional molecule. Clicking on the button will show you the complete quaternary structure of hemoglobin. You can click here to color each of the separate chains in hemoglobin. Of course, the quaternary structure is also built-up from secondary structural elements, which you can view by clicking here

Primary Structure is the sequence of amino acids. Hemoglobin has four separate polypeptide chains, the first few amino acids of the first chain (chain A) will appear after clicking the button.

Secondary Structure describes the local structure of just the main chain atoms. Each subunit of hemoglobin contains a number of alpha-helical secondary structural elements. Clicking on the button will show you one of these.

Which of the following levels of structure describes only the local structure of the mainchain atoms?

a. Primary

b. Secondary

c. Tertiary

hint

The two key words in this question are local and structure.

b. (primary structure is just the chemical structure of the protein, i.e. the order of the amino acids; the structure of the sidechain atoms are also specificied in the tertiary structure.)

Determining Primary Structure

We will focus on N-terminal sequencing of the actual protein using Edman degradation. Fragmentation of the peptide may be required in the case of larger proteins. Note that protein sequences can be also be inferred from the DNA sequence and experimentally using mass spectroscopy.

Edman Degradation: The detailed chemical mechanism of Edman degradation will not be discussed here, however an overview of the Edman chemistry is shown here:

The protein is treated with phenyl isothiocyanate (PITC). PITC reacts with the amino terminus, producing a derivatized protein. The modified amino-terminal residue can be cleaved off, producing the intact protein that is one residue shorter and the PTH-derivative of the amino terminal amino acid. The PTH derivative can be analyzed to determine the original amino terminal amino acid. The cycle can be repeated again, identifying the second amino acid in the original peptide. Under optimal conditions it is possible to determine the first 80-100 residues of a protein.

Sequencing long Proteins: It is generally not possible to sequence an entire protein from the amino terminus. To extend the sequence information the protein is fragmented into smaller peptides. After cleavage, the individual peptide fragments are separated from each other and each is independently subject to N-terminal sequencing using the Edman degradation method. Three common fragmentation reactions are:

Cyanogen bromide (CNBr) cleaves the peptide bond after Methionine residues. As an example:

$Ser−Met−Gly−Ala−Phe−Arg−Leu−Ile\stackrel{CNBr}{\longrightarrow}Ser−Met + Gly−Ala−Phe−Arg−Leu−Ile\nonumber$

Chymotrypsin hydrolyzes the peptide bonds that follow large hydrophobic residues, e.g. Phenylalanine, Tyrosine, Tryptophan. As an example:

$Ser−Met−Gly−Ala−Phe−Arg−Leu−Ile\stackrel{Chymotrypsin}{\longrightarrow}Ser−Met−Gly−Ala−Phe + Arg−Leu−Ile\nonumber$

Trypsin hydrolyzes the peptide bonds that follow positively charged residues, e.g. Lysine and Arginine. As an example:

$Ser−Met−Gly−Ala−Phe−Arg−Leu−Ile\stackrel{Trypsin}{\longrightarrow}Ser−Met−Gly−Ala−Phe−Arg + Leu−Ile\nonumber$

If only two fragments are produced by the cleavage reaction, then it is straightforward to reconstruct the sequence using the known sequence of the original protein. However if the original protein is cleaved into three or more fragments, then it is not possible to determine the correct order of fragments using a single cleavage agent. Multiple overlapping fragments have to be used to determine the correct ordering, as illustrated below.

walkthrough

Sequence Determination

$$Ala-Gly-Met-Ser-Thr-Gly-Val-Val-Lys-Gly-Ser-Ala-Phe-Leu$$

In this example I have assumed that 6 cycles of Edman degradation are possible. After that, impurities and side reactions prevent the reliable identification of the amino acid. Note that in practice 30-100 cycles can be accomplished, giving the sequence of the first 30-100 residues of the protein.

A: the first six cycles of edman degradation produced, Ala, Gly, Met, Ser, Thr, and Gly, in that order. therefore the amino terminal sequence is:

$$Ala-Gly-Met-Ser-Thr-Gly$$

B: A new sample of the peptide was treated with CNBr. The two peptides (CNBr-1, CNBr-2) that were produced were isolated and each was subject to Edman Degradation, giving the following sequences (The residues in bold were determined by Edman degradation, the remainder of the peptide is present, but not detectable).

 CNBr-1: $$Ala-Gly-Met$$ CNBr-2: $$Ser-Thr-Gly-Val-Val-Lys$$$$-Gly-Ser-Ala-Phe-Leu$$

C: A new sample of the peptide was treated with Trypsin. The two peptides (Trp1, Trp2) that were produced were isolated and each was subject to Edman Degradation. The sequence of these two peptides was:

 Trp1: $$Gly-Ser-Ala-Phe-Leu$$ Trp2: $$Ala-Gly-Met-Ser-Thr-Gly-$$$$Val-Val-Lys$$

Strategy: Find overlaps between fragments obtained with different cleavage reagents and use these overlaps to correctly pair the peptides obtained from one sequencing reaction. The overlaps can be readily identified by finding a cleavage site in a peptide that would be cut by another cleavage reagent (e.g. Trypsin) and then identifying the correct fragment based on the expected amino-terminal sequence. For example, the sequence from the Edman degradation of the intact peptide contains a Met residue, so you would look for overlaps between the intact sequence and the two CNBr fragments:

$Ala−Gly−Met−Ser−Thr−GlyAla−Gly−Met\nonumber$

$Ala−Gly−Met\space\space\space\space Ser−Thr−Gly−Val−Val−Lys\nonumber$

$CNBr−1\space\space\space\space\space\space\space\space\space\space\space\space CNBr−2\nonumber$

$Combine\space to\space give:\nonumber$

$Ala−Gly−Met−Ser−Thr−Gly−Val−Val−Lys\nonumber$

The partial sequence above contains a $$Lys$$ residue. Therefore one of the Trypsin fragments should start with a $$Gly$$ residue. Of the two Trypsin fragments, Trp1 starts with a $$Gly$$ residue. Therefore Trp1 must be the second fragment, allowing completion of the sequence:

$Ala-Gly-Met-Ser-Thr-Gly-Val-Val-Lys \space Gly-Ser-Ala-Phe-Leu\nonumber$

Review Quiz

did i get this

1. The primary structure refers to:

a. the conformation of multiple chains.

b. the conformation of a single protein chain.

c. the conformation of the sidechains.

d. the order of amino acids in a protein.

e. the first structure observed for proteins.

d. (The primary structure is the order of amino acids in the protein.)

2. If the peptide, $$Val-Lys-Glu-Met-Ser-Trp-Arg-Ala$$, was digested with chymotrypsin, which of the following fragments would be produced?

a. $$Val-Lys + Glu-Met-Ser + Trp-Arg-Ala$$.

b. $$Val-Lys-Glu-Met-Ser-Trp + Arg-Ala$$.

c. $$Val-Lys-Glu-Met-Ser + Trp-Arg-Ala$$.

d. $$Val-Lys-Glu + Met-Ser-Trp-Arg-Ala$$.

e. $$Val-Lys-Glu-Met + Ser-Trp-Arg-Ala$$.

b. (Chymotrypsin is specific for cleavage after the large aromatic residues, e.g. Phe, Tyr, and Trp.)

3. Digestion of a protein by trypsin produces three fragments: T1, T2, T3. How many different sequences could be constructed for the original protein from these peptides?

a. 1

b. 3

c. 6

hint

Do you know the order of the trypsin fragments?