Leerdoelen Genomica
HC1: Intro, BLAST
Why study bioinformatics?
Explain why a biologist should know Bioinformatic Data Analysis
Describe the ‘omics: (meta-) genomics, (meta-) transcriptomics, (meta-) proteomics,
metabolomics, etc.
Genomics: Sequence all of the DNA of one organism
Transcriptomics: Sequence all of the mRNA in an organism/tissue/cell
Proteomics: Sequence all of the proteins in an organism/tissue/cell
Metagenomics: Sequence the DNA of all organisms in a sample
Metatranscriptomics: Sequence the mRNA of all organisms in a sample
Metaproteomics: Sequence the proteins of all organisms in a sample
Explain the biology behind the ‘omics revolution: reduce bias by measuring all of a thing
Omics solves a major problem in science: biases
- People are mostly interested in: 1. Their diseases 2. Their food 3. Themselves
- This causes biases in our general understanding of biology, and biases in our databases
- For example, most studied bacteria are associated with humans
Compare the two ways a bioinformatician exploits existing data to make new discoveries
(top-down and bottom-up)
Sequence similarity searches
Explain what a sequence alignment is and the difference between a global and local
sequence alignment
In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or
protein to identify regions of similarity that may be a consequence of functional, structural,
or evolutionary relationships between the sequences. Aligned sequences of nucleotide or
amino acid residues are typically represented as rows within a matrix. Gaps are inserted
between the residues so that identical or similar characters are aligned in successive
columns.
Local alignment – Finds the optimal sub-alignment within two sequences – Partial homologs
Global alignment – Aligns two sequences from end to end – If you know two sequences are
full homologs, e.g. resulting from gene duplication.
Explain the BLAST algorithm
1. Identifies all words (length W) in the query – Default lengths: W = 3 for protein, W = 11
for DNA
– Based on substitution scores
2. Quickly finds similar words in the database – “Similar” words are defined by using the
substitution matrix (e.g. BLOSUM62) – The index quickly locates all potential hit seqs
, 3. Extends seeds in both directions to find HSPs between query and hit – HSP: region that
can be aligned with a score above a certain threshold
List the factors including heuristics that make BLAST fast
The fastest algorithms generally use heuristics Heuristic: a practical method that is not
guaranteed to be optimal, but sufficient for the present goals.
Running blast
Evaluate BLAST output/results
Decide which BLAST flavor to use for your similarity search
BLAST flavors: direct searches
o Nucleotide-nucleotide searches
- Nucleotide database & nucleotide query
- blastn (default: W = 11 nucleotides)
Find homologous genes in different species
- Megablast (default: W = 28 nucleotides)
Designed to efficiently find longer alignments between very similar
nucleotide sequences
Best tool to find highly identical hits for a query sequence • For
example: find sequences from the same species
- Discontiguous Megablast
Uses discontiguous words (e.g. W = 11 nucleotides: AT-GT-AC-CG-CG-T)
For example, this can focus the search on codons (the third nucleotide
of codons is less conserved due to the degeneracy of the genetic code)
Best tool to find nucleotide-nucleotide hits at larger evolutionary
distances for proteincoding query sequences.
o Protein-protein searches
- Protein database & protein query sequences
- blastp (default: W = 3 amino acids)
Find homologous proteins in different species
BLAST flavors: translated searches
o We can exploit the conservation of protein sequences when aligning DNA sequences, by
using translated searches
o This allows for more sensitive searches that detect homology at greater evolutionary
distances
– For example: homologous genes in distantly related species
o blastx and tblastx first translate the query from nucleotide into protein before identifying
high-scoring words
o tblastn and tblastx use a translated database of nucleotide sequences stored as proteins
, HC 2 Quantifying Sequence Similarity
Evolution
List the mechanisms of DNA mutation
Nucleotide substitutions
- Replication error
- Physical or chemical reaction
Insertions or deletions (indels)
- Unequal crossing over during meiosis
- Replication slippage
Inversions or rearrangements
Duplications of:
- Partial or whole gene
- Partial (polysomy) or whole chromosome (aneuploidy, polysomy)
- Whole genome (polyploidy)
Horizontal gene transfer (HGT)
- Transfer between individuals of the same generation
Define homology, similarity, and identity
Homology
- Property of two sequences that have a shared ancestor
- Homology is TRUE or FALSE: either you’re family or you’re not
Identity
- Percentage of identical residues in an alignment
- Used for amino acids or nucleotides.
Similarity
- Percentage of amino acid residues in an alignment with a positive substitution score-
- Not used for DNA
List four properties of amino acids that might be important in determining their physico-
chemical similarity
Size, polarity, hydrophobicity, preferred protein fold
Probability & Permutation Statistics
Work with P-values obtained using permutation statistics
P-value: defined as the probability of observing a hit as good as, or better than your score by
chance.
In permutation statistics -> corresponds to the fraction of times that the permuted score is
equal or higher than your score.
Meaningful observation -> low P-value -> if randomly permuted data rarely has a higher
score
The minimum P-value depends on the number of random permutations.
Example: for 100 permutations, the best P-value: <0.01
For 1000 permutations, the best P-value: <0.001
Explain how permutation statistics help us evaluate the strength of a result
Statistics are not well defined for many bioinformatic analyses. A simple solution is data
permutation:
- Permute (shuffle) the sequences 1000* times
- Make 1000* new alignment matrices
- Register if the alignment score of the permuted sequences is equal or higher than
Your Score
Voordelen van het kopen van samenvattingen bij Stuvia op een rij:
Verzekerd van kwaliteit door reviews
Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!
Snel en makkelijk kopen
Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.
Focus op de essentie
Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!
Veelgestelde vragen
Wat krijg ik als ik dit document koop?
Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.
Tevredenheidsgarantie: hoe werkt dat?
Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.
Van wie koop ik deze samenvatting?
Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper milofonville. Stuvia faciliteert de betaling aan de verkoper.
Zit ik meteen vast aan een abonnement?
Nee, je koopt alleen deze samenvatting voor €4,49. Je zit daarna nergens aan vast.