5.4: Exercise 1- Finding gene records in NCBI databases

Last updated
Save as PDF

Page ID: 17520

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

The goal of this exercise is to locate the reference chromosome (NC__), transcript (NM__) and protein (NP___) records for your gene. You will need this information for experi- ments later in the semester.

Homepage: Point your browser to the NCBI homepage: ncbi.nlm.nih.gov
NCBI maintains a large collection of databases. Clicking on the dropdown box brings up the list of databases for more targeted searching. For a comprehensive search, use the “All databases” setting. Write the name of your MET gene in the search box and click “Search.”

Summary page:

This search brings you to a summary page that lists the number of hits in many (but not all) NCBI databases. The number is probably quite large! Take a look at the results. Note that the databases are arranged into categories, including literature, health, genomes, genes, proteins, and chemicals. List below the number of records for your gene in the PubMed, Nucleotide, Protein, and Structure databases. (You may not receive any hits in the Structure category, since the vast majority of proteins have not been crystallized or studied with NMR.)

Nucleotide database:

Click on the Nucleotide database (genomes category). Note the filters that you can use to narrow down your search in the left and right hand columns. The source databases include the primary database, GenBank, as well as the derivative RefSeq database, which contains reference sequences. Note the difference in the number of records in GenBank and Ref Seq. Clicking on the RefSeq link will restrict results to reference sequences.
The number of reference sequences is probably still very large, because of the many organisms for which sequence information is available. A logical next step is to narrow your search taxonomically. Click on the Tree link in the right column. Narrow your search to the ascomycetes. (Note that search terms are being added to the search string at the top of the page.) Next, narrow your search within the ascomycetes to the Saccharomyces. You should be able to find the reference chromosome and transcript sequences from strain 288C.
Let’s look at the NC___ record first.
Record the accession number __________________________________
Which chromosome is represented in the record? _________________
How many nucleotides are in the chromosome (bp)? __________________________
Click the GenBank link associated with the NC_________________ (fill in the numbers)record. Near the top, you will see links to articles in the primary literature, which will include both the Goffeau et al. (1996) report on the genome project, as well as a more detailed article on the chromosome by the investigators who determined its sequence.
Scroll down a bit in this very long record and to the FEATURES field and look at a few
genes. You will note that the first feature is a telomere, because the sequence begins at the end of the chromosome’s left arm. As you scroll down, you are moving from one end of the chromosome to the other, and you will see annotation information for the ORFs identified by the SGP. Each ORF has a description of its gene, mRNA, and coding sequence (CDS).
Find your MET gene in the NC record by typing its name into the “Find” search box of your browser.
Record the 7-character locus tag (begins with Y) ______________________

This is the systematic name assigned to the ORF by the Saccharomyces Genome Project. This serves as the gene’s unique identifier in the Saccharomyces Genome Database.

Is there an intron in your gene? ___________________________

(Introns will be manifest by interruptions in nucleotide numbers and the word “join” in the mRNA field for your gene.)
Find the transcript record by clicking on the link to the NM__________________ (fill in the accession number) record.
How many nucleotides are in the annotated gene sequence (bp)? _________________

Cursor down to the actual nucleotide sequence at the end of the record. Note that the an- notated S. cerevisiae ORFs begin with an ATG and end with a stop codon.
Is the NM_ record the actual sequence of the mRNA for your gene? Why or why not?
Find the field for the protein coding sequence (CDS) in the NM transcript record. The CDS field contains a translation of the the NM nucleotide sequence. Find the
NP_________________ (fill in the numbers) record. Click on the NP record.
The NP record will give you additional information about the protein, including links to information about its structure, conserved domains and homologs in other organisms. Refer to the right panel on the page. If a 3-dimensional structure is available for your protein, you will see a 4-character PDB accession number under “Protein 3D Structure.” Record the accession number, if it is available. (If you would like to see the structure, you can search the Protein Data Bank at www.pdb.org with either this accession number of your gene name.)