16-06-23 HC Intro Blast
HC Quantifying Sequence Similarity
HC Sequence Conservation
HC Analysis of omics data
HC Phylogenetic Trees
HC Phylogenetic Interference
HC Bootstrapping
HC GWAS
WC Blast
WC Probabilities
WC Quantifying Molecular
Evolution
WC Sequence Conservation
WC Analysis of omics data
WC UPGMA & ArLV1
WC Phylogenetic Trees
WC Phylogentic inference
WC A bioinformatic murder
mystery.
WC Bootstrapping
WC GWAS
Oefentoets 1
Oefentoets 2
Oefentoets 3
Formuleblad begrijpen
KIJK OOK DE KENNISCLIPS!
Intro to Bioinformatics, Blast
Modern genetisch onderoek genereert steeds meer en steeds comlexere data en is onlosmakelijk
verbonden met bioinformatica. In deze cursus combineren we deze twee onderwerpen.
The ‘Omics’: Sequence everything of something
Genomics: Sequence all of the DNA of one organism
Transcriptomics: Sequence all of the mRNA in an organism/tissue/cell
Proteomics: Sequence all of the proteins in an organism/tissue/cell
Metagenomics: Sequence the DNA of all organisms in a sample
Metatranscriptomics: Sequence the mRNA of all organisms in a sample
Metaproteomics: Sequence the proteins of all organisms in a sample
Meta = All organisms.
Omics solves a major problem in science: Reducing biases by measuring all of a thing.
People are mostly interested in:
, - Their diseases
- Their food
- Themselves
This causes biases in our general understanding of biology, and biases our databases.
Plants have the largest genome. They like to duplicate it.
Top-down = Question first. Given a biological question, a good bioinformatician will immediately
think about which dataset could be used to answer it.
Bottom-up = Data first. Given a dataset, a good bioinformatician will immediately think about which
biological hypothesis it could help to test.
List the factors including heuristic that make BLAST fast.
Looking something up in a database:
Query Database
TGCTGCAGGA AATGAGGTTAAGACTAAGCAATGCATGTGTAAGTATGAACTCTTGTATCATAGATTAAGC
CAACAGTT CATGCATGTGTGATATCATGGTTGTGGTGGTATGACTTATT
Step 1: We have to break down the search because of possible mutations.
We do that with k-mers:
K-mer searches
- Sequences can be divided into shorter subsequences or k-mers
- k-mers consist of k nucleotides or amino acids
- We can make an index of all k-mers that occur in the database
Sequences
- If we split a query sequence into k-mers of the same length, we
- can rapidly identify all the database sequences containing them
- But: we limit ourselves to exact matches
Sequence alignment. Dif global and local
Sequence alignment: We try to match two sequences as good as possible.
We do this using a k-mer search (will be very fast, but limits you to exact matches.) and to make
pairwise alignments (will let you find distantly related sequences as well, but it would take a very
long time.).
The solution is to combine the best of both worlds: Quickly find potential hits using k-mers stored in
an index. Make pairwise alignment, but only for potential hits.
There is a tool that does this for you:
Basic Local Alignment Search Tool (BLAST)
BLAST finds similar sequences at reasonable speed
– 10-50x faster than previous algorithms
Terminology:
, – Query: sequence we search the database with (word in searchbar)
– Hit or Subject: similar sequence found in the database
BLAST is the most used bioinformatics program.
Even faster algoristhms are now available.
If you look up a sequence you BLAST it.
If you make a poster and you BLAST something, do cite it!
Heuristics: You are not guaranteed to find the best thing. You cut some corners, but this will make the
whole process a lot faster.
The BLAST search algorithm:
1. Identifies all words: W = 3 for protein, W = 11 for DNA.
2. All this is based on substitution scores.
7 + 5 + 6 = 18
3. Quickly finds similar words in the database. Similair words are defined by using the
substitution matrix. The index quickly locates all potential hit seqs.
Similar words. You look at words with the same score
4. Extends seeds in both directions to find HSP’s between query and hit.
, Global and local sequence alignments
- Are sequences completely or partially homologous. (=are they in the same ‘family’, have a
common ancestor)?
- Local alignment(what blast does). Finds the optimal sub-alignment within two sequences.
Partial homologs.
- Global alignment (our goal). Aligns two sequences from end to end. If you know two
sequences are full homologs, e.g. resulting from gene duplication.
BLAST flavors: direct searches
1. Nucleotide-nucleotide searches
o Blastn(W = 11 nucleortides): finds homologous genes in different species.
o Megablast(W = 28 nucleotides): Designed to find longer alignments between very
similar nucleotide sequences. Best tool to find highly identical hits for a query
sequence. For example: Find sequences from the same species.
o Discontiguous megablast(w = 11 nucleotides): This can focus the search on codons.
Best tool to find nucleotide-nucleotide hits at larger evolutionary disctances for
protein coding query sequences.
2. Protein-protein searches
o Blastp(W = 3 amino acids): Find homologous proteins in different species.
Blast flavors: translated searches
- This allows for more sensitive searches that detect homology at greater evolutionary
disctances.
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller jadeernsting. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $7.01. You're not tied to anything after your purchase.