100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached
logo-home
summary Bioinformatics (Genomics) Biology UU $4.82   Add to cart

Class notes

summary Bioinformatics (Genomics) Biology UU

 109 views  1 purchase
  • Course
  • Institution

Notes from Bioinformatics lectures, with accompanying slides and tables.

Preview 3 out of 18  pages

  • July 3, 2021
  • 18
  • 2020/2021
  • Class notes
  • -
  • All classes
avatar-seller
Leerdoelen Genomica
HC1: Intro, BLAST
Why study bioinformatics?
 Explain why a biologist should know Bioinformatic Data Analysis

 Describe the ‘omics: (meta-) genomics, (meta-) transcriptomics, (meta-) proteomics,
metabolomics, etc.

Genomics: Sequence all of the DNA of one organism

Transcriptomics: Sequence all of the mRNA in an organism/tissue/cell

Proteomics: Sequence all of the proteins in an organism/tissue/cell

Metagenomics: Sequence the DNA of all organisms in a sample

Metatranscriptomics: Sequence the mRNA of all organisms in a sample

Metaproteomics: Sequence the proteins of all organisms in a sample

 Explain the biology behind the ‘omics revolution: reduce bias by measuring all of a thing
Omics solves a major problem in science: biases
- People are mostly interested in: 1. Their diseases 2. Their food 3. Themselves
- This causes biases in our general understanding of biology, and biases in our databases
- For example, most studied bacteria are associated with humans

 Compare the two ways a bioinformatician exploits existing data to make new discoveries
(top-down and bottom-up)

Sequence similarity searches
 Explain what a sequence alignment is and the difference between a global and local
sequence alignment
In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or
protein to identify regions of similarity that may be a consequence of functional, structural,
or evolutionary relationships between the sequences. Aligned sequences of nucleotide or
amino acid residues are typically represented as rows within a matrix. Gaps are inserted
between the residues so that identical or similar characters are aligned in successive
columns.

Local alignment – Finds the optimal sub-alignment within two sequences – Partial homologs
Global alignment – Aligns two sequences from end to end – If you know two sequences are
full homologs, e.g. resulting from gene duplication.

 Explain the BLAST algorithm
1. Identifies all words (length W) in the query – Default lengths: W = 3 for protein, W = 11
for DNA
– Based on substitution scores
2. Quickly finds similar words in the database – “Similar” words are defined by using the
substitution matrix (e.g. BLOSUM62) – The index quickly locates all potential hit seqs

, 3. Extends seeds in both directions to find HSPs between query and hit – HSP: region that
can be aligned with a score above a certain threshold

 List the factors including heuristics that make BLAST fast
The fastest algorithms generally use heuristics Heuristic: a practical method that is not
guaranteed to be optimal, but sufficient for the present goals.

Running blast
 Evaluate BLAST output/results

 Decide which BLAST flavor to use for your similarity search
BLAST flavors: direct searches
o Nucleotide-nucleotide searches
- Nucleotide database & nucleotide query
- blastn (default: W = 11 nucleotides)
 Find homologous genes in different species
- Megablast (default: W = 28 nucleotides)
 Designed to efficiently find longer alignments between very similar
nucleotide sequences
 Best tool to find highly identical hits for a query sequence • For
example: find sequences from the same species
- Discontiguous Megablast
 Uses discontiguous words (e.g. W = 11 nucleotides: AT-GT-AC-CG-CG-T)
 For example, this can focus the search on codons (the third nucleotide
of codons is less conserved due to the degeneracy of the genetic code)
 Best tool to find nucleotide-nucleotide hits at larger evolutionary
distances for proteincoding query sequences.
o Protein-protein searches
- Protein database & protein query sequences
- blastp (default: W = 3 amino acids)
 Find homologous proteins in different species

BLAST flavors: translated searches

o We can exploit the conservation of protein sequences when aligning DNA sequences, by
using translated searches
o This allows for more sensitive searches that detect homology at greater evolutionary
distances
– For example: homologous genes in distantly related species
o blastx and tblastx first translate the query from nucleotide into protein before identifying
high-scoring words
o tblastn and tblastx use a translated database of nucleotide sequences stored as proteins

, HC 2 Quantifying Sequence Similarity
Evolution
 List the mechanisms of DNA mutation
Nucleotide substitutions
- Replication error
- Physical or chemical reaction
Insertions or deletions (indels)
- Unequal crossing over during meiosis
- Replication slippage
Inversions or rearrangements
Duplications of:
- Partial or whole gene
- Partial (polysomy) or whole chromosome (aneuploidy, polysomy)
- Whole genome (polyploidy)
Horizontal gene transfer (HGT)
- Transfer between individuals of the same generation
 Define homology, similarity, and identity
Homology
- Property of two sequences that have a shared ancestor
- Homology is TRUE or FALSE: either you’re family or you’re not
Identity
- Percentage of identical residues in an alignment
- Used for amino acids or nucleotides.
Similarity
- Percentage of amino acid residues in an alignment with a positive substitution score-
- Not used for DNA
 List four properties of amino acids that might be important in determining their physico-
chemical similarity
Size, polarity, hydrophobicity, preferred protein fold

Probability & Permutation Statistics
 Work with P-values obtained using permutation statistics
P-value: defined as the probability of observing a hit as good as, or better than your score by
chance.
In permutation statistics -> corresponds to the fraction of times that the permuted score is
equal or higher than your score.
Meaningful observation -> low P-value -> if randomly permuted data rarely has a higher
score
The minimum P-value depends on the number of random permutations.
Example: for 100 permutations, the best P-value: <0.01
For 1000 permutations, the best P-value: <0.001
 Explain how permutation statistics help us evaluate the strength of a result
Statistics are not well defined for many bioinformatic analyses. A simple solution is data
permutation:
- Permute (shuffle) the sequences 1000* times
- Make 1000* new alignment matrices
- Register if the alignment score of the permuted sequences is equal or higher than
Your Score

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller milofonville. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $4.82. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

67474 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 14 years now

Start selling
$4.82  1x  sold
  • (0)
  Add to cart