Bioinformatics Lecture 2 Notes
Second year student studying for a Biochemistry/Biotechnology degree
Content
- Focussed Scoring Methods
- Position-Specific Scoring Matrices (PSSM) including examples of matrices & using PSSM with PSI-BLAST
- Pseudocounts
- Protein domains and Functional sit...
Bioinformatics Lecture 2 Part 1
PSI-BLAST, PSSM and Patterns
Lecture Objectives
• An understanding of Position Specific Scoring
Matrices (PSSM) and PSI-BLAST
o Knowledge of how a PSSM identifies more
distant sequence similarities than scoring
matrices such as BLOSUM
o Understanding how PSSMs are derived
• Knowledge of protein sequence patterns and
motifs
• Understanding how simple sequence patterns can
“Position” = Column
be used to identify features, including domains
Ok let’s say you want to predict a protein structure.
You can use these scores to determine what amino
Focussed Scoring Methods
acid is most likely going to occupy each position.
PAM and BLOSUM are fine for programs such as
BLAST. BLAST generally searches an unknown
Example PSSM Matrix
sequence against all others so a generic scoring matrix
is needed
BUT - Amino acid substitutions are not uniform
There is evolutionary pressure, both within sequences
and also at specific positions within sequences. The
chances of a mutation being maintained varies across
the protein depending on where it is. Generic scoring
matrices such as PAM and BLOSUM will not take these
chances into account and scores these uniformly.
Amino acid X at a certain position in one protein may Serine has a higher score and so is highly conserved in
be vital, at a position in a different protein it may be the active site because it acts as a nucleophile.
free to change
Using a PSSM with PSI-BLAST
Position-Specific Scoring Matrices (PSSM) Position-Specific Iterative (PSI)-BLAST is a protein
What is it? sequence profile search method that builds off the
PSSM is a type of scoring matrix used in protein BLAST alignments generated by a run of the BLASTp
searches in which amino acid substitution scores are program. The first iteration of a PSI-BLAST search is
given separately for each position in a protein identical to a run of BLASTp program. PSI-BLAST is an
multiple sequence alignment. extension of BLAST.
PSSM builds a multiple sequence alignment of related PSI-BLAST builds from the related sequences of PSSM
sequences and score each position. Scores for the and researches again but uses the PSSM scoring
same amino acids may vary at different positions. matrix. New related sequences can be found by
C at position 12 may score 20 alignment and using the PSSM scoring mechanism
C at position 43 may score 5
How?
Scoring system is specific for the sequence - PSSMs 1. First search using BLOSUM 62 like normally
weigh sequences according to the observed diversity 2. Build MSA (Model Sequencing Alignment) with
specific to the family of interest good hits
3. Generate PSSM from MSA and search again using
Producing a PSSM PSSM
Each position is scored for the amino acids it contains 4. Now we have a PSSM specific for the protein
and scores can vary depending on position sequence
, Pseudocounts
• Some observed frequencies are equal to 0
• This is due to the limited number of sequences in
the MSA and may not reflect reality. Just bc an
amino acid is not present, doesn’t mean that it’s
never present
• Also creates a problem if log values used to create
log-odds scores. We cannot take the log of 0.
• One solution to this is to add a small number to
observed frequencies, often referred to as
pseudocounts. Pseudocounts give a small
probability that the amino acid could be present
• Simplest pseudocount is 1
PSI-BLAST Stages • Added to calculation to reflect 1 in 20 (total amino
1. PSI-BLAST takes as an input a single protein acids) chance of being present. This means that
sequence and compares it to a protein database, 1/pseudocount = 0.05
using the BLAST program and BLOSUM matrix.
This is the same step as the generic one. Creating a PSSM with a pseudocount: Example
2. The program constructs a multiple alignment,
and then a profile from any significant local
alignments is found. The original query
sequence (e value) serves as a template for the
multiple alignment and profile, whose lengths
are identical to that of the query
3. The profile is compared to the protein database,
again seeking local alignments. A slightly
modified version of the BLAST algorithm is used
for this
4. PSI-BLAST estimates the statistical significance of
the local alignments found
5. Finally, PSI-BLAST iterates, by returning to step
(2), an arbitrary number of times or until
convergence.
So this is how we find distantly-related proteins.
Usually, we get 2-4 iterations before you stop getting
any more significant alignments and you reach
convergence. Every time it’s adding sequences in, the
sequence can come up and have a significant e value
and is homologous to your protein but wasn’t
detected by BLOSUM because it’s too generic to
identify.
Creating a PSSM – Example Creating a PSSM with pseudoknots – Improvements
Similar to BLOSUM, we do sequence alignment and at • The simplest expected frequency is 1:20 (20
each position, we divide the amino acid frequency at amino acids) = 0.05
that position by the number of sequences in
alignment
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller christyau. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $7.78. You're not tied to anything after your purchase.