100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Class notes

Bioinformatics Lecture 2 Notes

Rating
-
Sold
-
Pages
5
Uploaded on
03-10-2022
Written in
2020/2021

Bioinformatics Lecture 2 Notes Second year student studying for a Biochemistry/Biotechnology degree Content - Focussed Scoring Methods - Position-Specific Scoring Matrices (PSSM) including examples of matrices & using PSSM with PSI-BLAST - Pseudocounts - Protein domains and Functional sites - MSA ClustalO Output - PROSITE

Show more Read less
Institution
Course









Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Study
Unknown
Course

Document information

Uploaded on
October 3, 2022
Number of pages
5
Written in
2020/2021
Type
Class notes
Professor(s)
-
Contains
Bioinformatics

Subjects

Content preview

Bioinformatics Lecture 2 Part 1
PSI-BLAST, PSSM and Patterns

Lecture Objectives
• An understanding of Position Specific Scoring
Matrices (PSSM) and PSI-BLAST
o Knowledge of how a PSSM identifies more
distant sequence similarities than scoring
matrices such as BLOSUM
o Understanding how PSSMs are derived
• Knowledge of protein sequence patterns and
motifs
• Understanding how simple sequence patterns can
“Position” = Column
be used to identify features, including domains
Ok let’s say you want to predict a protein structure.
You can use these scores to determine what amino
Focussed Scoring Methods
acid is most likely going to occupy each position.
PAM and BLOSUM are fine for programs such as
BLAST. BLAST generally searches an unknown
Example PSSM Matrix
sequence against all others so a generic scoring matrix
is needed

BUT - Amino acid substitutions are not uniform
There is evolutionary pressure, both within sequences
and also at specific positions within sequences. The
chances of a mutation being maintained varies across
the protein depending on where it is. Generic scoring
matrices such as PAM and BLOSUM will not take these
chances into account and scores these uniformly.

Amino acid X at a certain position in one protein may Serine has a higher score and so is highly conserved in
be vital, at a position in a different protein it may be the active site because it acts as a nucleophile.
free to change
Using a PSSM with PSI-BLAST
Position-Specific Scoring Matrices (PSSM) Position-Specific Iterative (PSI)-BLAST is a protein
What is it? sequence profile search method that builds off the
PSSM is a type of scoring matrix used in protein BLAST alignments generated by a run of the BLASTp
searches in which amino acid substitution scores are program. The first iteration of a PSI-BLAST search is
given separately for each position in a protein identical to a run of BLASTp program. PSI-BLAST is an
multiple sequence alignment. extension of BLAST.

PSSM builds a multiple sequence alignment of related PSI-BLAST builds from the related sequences of PSSM
sequences and score each position. Scores for the and researches again but uses the PSSM scoring
same amino acids may vary at different positions. matrix. New related sequences can be found by
C at position 12 may score 20 alignment and using the PSSM scoring mechanism
C at position 43 may score 5
How?
Scoring system is specific for the sequence - PSSMs 1. First search using BLOSUM 62 like normally
weigh sequences according to the observed diversity 2. Build MSA (Model Sequencing Alignment) with
specific to the family of interest good hits
3. Generate PSSM from MSA and search again using
Producing a PSSM PSSM
Each position is scored for the amino acids it contains 4. Now we have a PSSM specific for the protein
and scores can vary depending on position sequence

, Pseudocounts
• Some observed frequencies are equal to 0
• This is due to the limited number of sequences in
the MSA and may not reflect reality. Just bc an
amino acid is not present, doesn’t mean that it’s
never present
• Also creates a problem if log values used to create
log-odds scores. We cannot take the log of 0.
• One solution to this is to add a small number to
observed frequencies, often referred to as
pseudocounts. Pseudocounts give a small
probability that the amino acid could be present
• Simplest pseudocount is 1
PSI-BLAST Stages • Added to calculation to reflect 1 in 20 (total amino
1. PSI-BLAST takes as an input a single protein acids) chance of being present. This means that
sequence and compares it to a protein database, 1/pseudocount = 0.05
using the BLAST program and BLOSUM matrix.
This is the same step as the generic one. Creating a PSSM with a pseudocount: Example
2. The program constructs a multiple alignment,
and then a profile from any significant local
alignments is found. The original query
sequence (e value) serves as a template for the
multiple alignment and profile, whose lengths
are identical to that of the query
3. The profile is compared to the protein database,
again seeking local alignments. A slightly
modified version of the BLAST algorithm is used
for this
4. PSI-BLAST estimates the statistical significance of
the local alignments found
5. Finally, PSI-BLAST iterates, by returning to step
(2), an arbitrary number of times or until
convergence.

So this is how we find distantly-related proteins.

Usually, we get 2-4 iterations before you stop getting
any more significant alignments and you reach
convergence. Every time it’s adding sequences in, the
sequence can come up and have a significant e value
and is homologous to your protein but wasn’t
detected by BLOSUM because it’s too generic to
identify.

Creating a PSSM – Example Creating a PSSM with pseudoknots – Improvements
Similar to BLOSUM, we do sequence alignment and at • The simplest expected frequency is 1:20 (20
each position, we divide the amino acid frequency at amino acids) = 0.05
that position by the number of sequences in
alignment
$8.26
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Get to know the seller
Seller avatar
christyau
1.7
(3)

Also available in package deal

Get to know the seller

Seller avatar
christyau Imperial College London
Follow You need to be logged in order to follow users or courses
Sold
9
Member since
3 year
Number of followers
1
Documents
23
Last sold
8 months ago
Biochemistry Lecture and Exam Notes for Imperial College Students

1.7

3 reviews

5
0
4
0
3
1
2
0
1
2

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions