Logo
 Home | Sitemap | Contact us | Search | Language
Left Right
Home >> Biotechnology and Genomics >> Mass Spectrometry - An Essential Tool for Genome and Proteome Analysis >>Protein Indentification By Combining Mass Spectrometer with Database Search

Protein Identification By Combining MS With Database Search.
 Mass spectrometry has been combined with database search to create a valuable and automatic protein identification tool. Many three types of databases (DNA, cDNA/EST and nrdb) are searched by mass spectrometric data.

Non-redundant protein database (nrdb) contains known sets of full length protein sequences, devoid of duplicates. In this application, intact proteins are degraded into pools of peptides whose masses are determined by mass spectrometry and then searched against genomic DNA, cDNA/EST and nrdb database entries. These databases also be searched both by mass fingerprints and tandem mass spectrometric data.

Matches can quickly be annotated via links to annotated databases such as Swiss-prot, etc. EST databases, such as dbEST at the National Center for Biotechnology Information (NCBI), and EMEST at European Bioinformatics institute (EBI), contain millions of short single-pass sequences from random sequencing of cDNA libraries. These can be searched usually by translating the two strands of DNA into the six reading frames (tBLASTn).

Genome databases can also be searched with mass spectrometric data. In particular, the search of databases of completely sequenced genomes with mass spectrometric data can help to define the structure of a gene, with the help of start codon, stop codon and intron-exon junctions. Surprisingly, it has been found that peptides can match a raw genomic sequence, without any information about the reading frames, coding region and without translating it into amino acid sequence.

Protein modifications do not present an obstacle to identification because, for a typical 50 kD protein generally 50 peptides are obtained after tyrpic digestion and any a few are modified. Only a small number of peptides are actually required for unique matching to a database entry, especially in case of data from tandem mass spectrometery; therefore very extensive modification only marginally increases the difficulty of protein identification.Three main programs that are used for database mining involving protein identification using MS data are listed.

Three main programs for protein identification using database mining.

Program

Web site

Based on

Moscot

htt:/?www.matrixscience.com

It incorporates a probability based implementation of the MOWSE* algorithm.

ProFound

http://prowl.rockefeller.edu

It calculates the probability that a protein in a database is the protein analysed here and it uses Z-score as an indicator of the quality of the search result.

Protein Prospector (MS-Fit)

htt://prospector.ucsf.edu

It uses MOWSE* score to evaluate a hit in protein identification.


*The Molecular Weigh SEarch algorithm (MOWSE) capitalizes on information obtained through mass spectrometry (MS) technique, using molecular weights of both, the intact proteins and those resulting from digestion of the same proteins with specific protease).

 

Left Right