Problem Set 2
You will study a protein, Myelin Regulatory Factor (MYRF), which may be a transcription factor. One way to learn more about the features and likely function of the MYRF protein is to explore the structure of the 1,139 amino acid sequence in silico.
You will analyze the protein sequence using a variety of web-based proteomics programs. For most of these programs you will need to input the amino acid sequence in FASTA format. Here is the FASTA amino acid sequence (in single letter amino acid code).
Use these programs to gain information about this protein. If you have any problem with any of the programs (lots of error messages), skip that particular program.
a. Sequence Manipulation Suite: Determine the molecular weight of the protein.
b. Eukaryotic Linear Motif : Linear motifs are short, evolutionarily plastic components of regulatory proteins and provide low-affinity interaction interfaces. These compact modules play central roles in mediating every aspect of the regulatory functionality of the cell. They are particularly prominent in mediating cell signaling, controlling protein turnover and directing protein localization. Given their importance, our understanding of motifs is surprisingly limited, largely as a result of the difficulty of discovery, both experimentally and computationally. The Eukaryotic Linear Motif (ELM) provides the biological community with a comprehensive database of known experimentally validated motifs, and an exploratory tool to discover putative linear motifs in user-submitted protein sequences.
c. PSORT II: programs for prediction of eukaryotic sequence subcellular localization as well as other datasets and resources relevant to cellular localization prediction. After running it, examine the link shown as PSORT features and traditional PSORTII prediction.
You might get an error message saying the protein does not begin with an N (Met). Met is the first amino acid encoded from a gene sequence in eukaryotes (using the codon AUG). It is usually removed after or during protein synthesis. Don’t’ worry about it. Either way, the output shows you the number of homologous proteins found and where they are located (cyto, nuc, secreted, etc). Go to the Details link and the protein are listed. The ones on top are most homologous to the MYRF.
d. NucPred: analyses a eukaryotic protein sequence and predicts if the protein spends at least some time in the nucleus or spends no time in the nucleus
e. TMPRED: The TMpred program makes a prediction of membrane-spanning regions and their orientation. The algorithm is based on the statistical analysis of TMbase, a database of naturally occurring transmembrane proteins. The prediction is made using a combination of several weight-matrices for scoring
f. CCTOP - Prediction of transmembrane helices and topology of proteins. Select the advanced tab. This program might not work. In the output under each amino acid you will see I (inside), O (outside), H for transmembrane helical region, and i of indeterminate.
g. Das-TMfilter: might have to remove nonsequence part of fasta file
h. TopPred 1.1 – Topoloyg predictor for membrane proteins at the Pasteur Institute. You will have to input your email address. http://bioweb.pasteur.fr/seqanal/interfaces/toppred.html
i. PFAM – multiple analyses of Protein FAMilies. View a sequence. Look at the domain organization of a protein sequence. Input MRF_Mouse. Click on the various domains discovered based on sequence homology.
j. Prosite: input your sequence in the fast scan region. Prosite can determine the likely function of the protein MYRF based on presence of "patterns, motifs, or signatures " in the protein sequences which are characteristic of a specific biological function, such as ligand binding, catalysis, in vivo chemical modification. We will only use it to probe for post-translational modification sites. Select Scan a sequence against PROSITE patterns and profiles, and see possible sites for in vivo chemical modification of the protein. In Prosite Tools uncheck exclude patterns of high probability of occurrence.
k. HHPRED will give you homology detection and structure prediction, returning domain information and alignment with other proteins of known function. Select the input link (FASTA format) to input your sequence.
m. Use CATH (Protein Structure Classification - Class, Architecture, Topology, homology Superfamilies) to determine its domain structure and the superfamily it resides in. Select Search and type in 1XWW in the ID/Key Word box. Select return. Determine its class, architecture, topology and homologous Superfamily classifications. After search, select the BLAST tab, then select CATH Code OR click CATH Code Superfamily (whichever works)Go to
n. UniProt and input the mouse MYRF sequence (accession number Q3UR85) for a trove of information which you have probably just discovered. Do the in silico analysis support the fact that the protein is a transcription factor?