Home Office of the Vice Chancellor Links UC Davis

MS Based Protein Identification
 

 

 

 

 

 

 

 

 

 

 


Basics of protein identification are simple: Proteins are digested and the resulting peptides are analyzed by mass spectrometers to obtain their mass data. There are three different types of MS data that can be used for database search. They are (1) molecular weights of peptides that can be used for Peptide Mass Mapping, (2) combination of mass data and partial amino acid sequence that can be used for Sequence Tag, and (3) tandem mass spectrometry data (uninterpreted) that are used for MS/MS fragmentation ion search.

Most algorithms these days incorporate a statistical treatment to judge whether the search results are significant or not. The above three searches would allow, at least in theory, identifying all known proteins (database present), but cannot be applied to identifiying any unknown protein. For unknown proteins (database absent), it is best to obtain actual amino acid sequence by interpreting the MS/MS data (de novo sequencing by MS/MS). Interpreted amino acid sequences can also be used for sequence homology search (fasta or blast search).

General Strategy for the Identification of Gel (SDS-PAGE or 2-D Gel) Purified Proteins Adopted by the MSF

Most proteins submitted to the MSF are gel purified, and thus begins the process with in-gel digestion with trypsin. Peptides are extracted at the completion of digestion and a small aliquot (less than 10%) of the digest is analyzed to obtain molecular weights of tryptic peptides using one of the MALDI-TOF instruments (Bruker Biflex III or ABI DE-PRO). Obtained MW information is then used for peptide finger printing for the identification. When proteins are identified with high confidence (statistically significant identification), no further MS analysis is usually pursued. The MALDI-TOF data are also used to monitor the efficiency of the digestion and to estimate the amount of the protein in gel digested.

However, when proteins cannot be identified by peptide mass mapping unambiguously, the digest is further analyzed by a hybrid nanospray/ESI-Quadrupole-TOF-MS and MS/MS in a QSTAR mass spectrometer (Applied Biosystems Inc., Foster City, CA) for de novo peptide sequencing, sequence tag search, and/or MS/MS ion search. The static nanospray MS/MS is especially useful used when the target protein is not known (database absent). Interpreted MS/MS data can be used for the sequence homology search. For the proteins from the known genome databases which cannot be identified with a statistically significant score by peptide mass mapping due to impurities (presence of more than two proteins), a cLC/MS/MS analysis is performed using a Finnigan LCQ Deca XP-Michrom MS4 LC and the resulting data are used for a Sequest search.

For the highest confidence of the protein identification, we often performs a "two to three layered search" - a combination of peptide mass mapping, sequence query (sequence tag) and de novo sequencing.


Digestion with other endopeptidases

Not all proteins are digested equally well by trypsin. For example, highly hydrophobic and aggregated proteins, and extensively glycosylated proteins are not digested well. Trypsin may not be the ideal enzyme for very small proteins (<10 kDa) or some very acidic proteins that are lacking or have a low occurrence of basic amino acids (Lys and Arg) in the sequence. When trypsin fails to digest the protein, we use alternate digestion protocols as an additional effort: CNBr/trypsin double digestion or Lys-C digestion in the presence of SDS for aggregated proteins (SDS is removed afterward by organic solvent partitioning); PNGaseF/trypsin digestion for highly glycosylated proteins; chymotryptic digestion for proteins with no or a low number basic amino acids. Edman sequencing is an option for non-tryptic peptides after fractionation on a capillary C18 column (0.3 mm ID), if the starting amount of protein is sufficient (>5 pmol). Our Procise Edman sequencer can analyze 1 to 2 pmol amount of peptide loaded with the UV detection limit of 300 fmoles.