Bioinformatics Lecture 1 Part 1 Lecture Objectives
Sequence Alignment, BLAST and Substitution Matrices An understanding of sequence alignment – local
and global. Specifically, the difference between the
Why is Bioinformatics Relevant? two and when they are used.
DeepMind An understanding of scoring/substitution matrices.
DeepMind is a division of Alphabet, Inc. responsible for Specifically, how and why these are created and
developing general-purpose artificial intelligence (AGI) used.
technology. DeepMind illustrates the importance of An understanding of BLAST and interpreting output.
bioinformatics. Specifically, understanding BLAST output and the
meaning of the reported values and statistics
Chess and GO (games) initially used to develop
algorithms and demonstrate proof of principle - and Annotating Sequences
publicity! How do you compare a sequence …
>PFA0005w
Their long term aim to solve complex problems, such as MVTQSSGGGAAGSSGEEDAKHVLDEFGQQVYNEKVEKYAN
protein structure prediction (which is now achieved) - SKIYKEALKGDLSQASILSELAGTYKPCA …
https://deepmind.com/blog/alphafold. This is one of
the most difficult tasks in science because it’s essential To another sequence …
to fully understand disease and drug discovery, to >PFA0010c
understand drug-target interactions. MKIHYINILLFELPLNILIYNQRNHKSTTPHTPNHTQTTRLLCEC
ELYSPANNDNDAEMKKVEKYANSKIYK …
This technology revolutionized protein structure
determination. We’re getting predictions that are very Or a database of sequences …
close to the crystal structures of the proteins >PFA0015c
determined by crystallography. RSGYYDSCSLDHKFHTNINNGYPPARNPCDGRNQERFSNDG
ESKCGSDKIRGNENNSNAGEMK MALK … >PFA0020w RIF
Why do we need to predict protein structures? rifin 50586:51859 forward MW:41888
Even if the protein structure is known, if we have an MKIHYTNILLFPLKLNILVNTHKKPSITSRHIQTTRLLCECELYTP
SNP, this SNP changes the amino acid residue and thus NYDNDPEMKSVMQQTTPHTPNH FHDR …
changes the overall protein structure. We need
structure prediction programs to model these changes. There are two ways to compare these sequences –
global and local alignments.
BPS genomics lectures illustrated computational
approaches required to model and analyse rapid Global Alignments
advancements in DNA sequencing and genomic data. Form of global optimization that "forces" the
alignment to span the entire length of all query
Protein structure prediction is at the opposite end. This sequences.
is to understand what proteins the genes and the Most simple relationship - spans the whole length
effects of variants on the protein structure. The material of both sequences - aligns the whole e.g. same
in these lectures are the basis of the algorithms used to protein in 2 different species (ClustalO)
predict protein structure Seeks the best alignment over the whole length of
two sequences
The ability to predict a protein’s shape is useful to This is for when you have the same protein in
scientists because it is fundamental to understanding its different species
role within the body, as well as diagnosing and treating To produce a multiple sequence alignment
diseases believed to be caused by misfolded proteins, Any homologous sequences can be aligned globally
such as Alzheimer’s, Parkinson’s, Huntington’s and as long as they are sufficiently similar
cystic fibrosis.
, • Tblastx: nucleotide sequences against nucleotide
database but at a protein level. This is the most
precise way to compare nucleotide sequences. So
Pairwise Alignments the query is a nucleotide sequence, the database is
Local Alignments a nucleotide database and all 6 frames are
We look for partial regions of homology across the translated on the query and on the database.
genome
Alignment of a pair of sequences such that All searches bar blastn actually perform protein/protein
homologous sub-sequences are aligned, surrounded searches bc it’s the most sensitive way to search.
by areas of non-related (and unaligned) sequence
Align two protein sequences that share a single BLAST Output
common domain The middle row of every group shows the amino acids
Align cDNA to genomic sequence. The exons within that match. The + signs are for interchangeable amino
the cDNA will align but be separated by the introns acids. So like valine and methionine are both non-polar
in the genomic sequence aliphatic amino acids so they’re considered
It searches for sequence matches at every position. interchangeable. We quantify these + signs using
It’s complicated. Substitution Matrix.
e.g. align two protein sequences that share a single
common domain, align cDNA to genomic where no
matches to introns! (BLAST)
Tool for Local Alignment: BLAST
• BLAST (Basic Local Alignment Search Tool) is the
most common tool used to search sequence
databases. Based on the Smith-Waterman
algorithm.
• Quick to run and the first stage in identifying
potential similarity targets. BLAST is a key way of
identifying your initial homologous proteins Substitution Matrix
• The program compares nucleotide or protein During the course of evolution, amino acids at particular
sequences to sequence databases and calculates locations can change due to mutations. When aligning
the statistical significance of matches. protein sequences, a non-exact amino acid match
(indicated by “+” in BLAST) can be as informative as a
perfect match. It is the nature of the amino acid that is
important (hydrophobicity, polarity, etc.)
• Blastn: searches nucleotide bases against nucleotide
database If the chemical/physical properties of that amino acid is
• Blastp: searches protein sequences against protein the important factor then mutation to another amino
database acid with the same properties more likely to be
• Tblastn: protein sequences against nucleotide maintained.
database. Nucleotide translation is provided ‘on-
the-fly’ and can cover all 6 reading frames (or 3 as This homology is the basis of a substitution matrix.
specified). Matrices built by selecting a group of similar proteins
• Blastx: nucleotide sequences against protein and scoring based on the observed frequency of the
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller christyau. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $6.36. You're not tied to anything after your purchase.