100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached
logo-home
Fundamentals of Bioinformatics - Summary $8.21
Add to cart

Summary

Fundamentals of Bioinformatics - Summary

1 review
 130 views  4 purchases
  • Course
  • Institution

Fundamentals of Bioinformatics course on VU and UVA. This is a summary including all lecture material, lecture slides and compulsory reading. It also includes notes from Python tutorials (for the conversion class). It has the same lecture material as for the minor course Fundamentals of Bioinforma...

[Show more]

Preview 4 out of 34  pages

  • October 13, 2020
  • 34
  • 2020/2021
  • Summary

1  review

review-writer-avatar

By: sharraou • 2 year ago

avatar-seller
Fundamentals of Bioinformatics




0

, FUNDAMENTALS OF BIOINFORMATICS

INHOUDSOPGAVE
LECTURES 2

Lecture 1 Fundamentals of Bioinformatics – Introduction 2
Lecture 2 Evolution 6
Lecture 3 Introduction to Machine Learning 11
Lecture 4 Protein structures & sequence profiles 13
Lecture 5 Impact prediction: SIFT and PolyPhen-2 15
Lecture 6 DNA Sequencing & Computational Analysis of Massively Parallel Sequencing (MPS) Data
19
Lecture 7 Title 24
Lecture 8 Big Data in Life Sciences 25

TUTORIALS 28

Tutorial 1 28
Tutorial 2 First steps in Python 30
Tutorial 3 Lists and loops 31
Tutorial 4 Functions 31
Tutorial 5 File I/O and dictionaries 32




1

, Lectures
Lecture 1 Fundamentals of Bioinformatics – Introduction
Chapter 1

Keywords of bioinformatics: programming, knowledge molecular biology, machine learning,
translating the research question into methods, designing workflows, general data science skills and
genomics/genetics, algorithms, statistics. Bioinformaticians often are working on workflows.

Bioinformatics: studying the informatic processes in biotic systems.

Sequencing: data analysis flow: images → reads (small read of DNA 200 – 500 nucleotides long) →
align them / match to existing genomes (computationally expensive) → significance values (SNP, …).

Small devices are available for laptops, however the computation stays big and is send to a server.

We measure molecules / variants in the cells. We can measure proteins, DNA, RNA: called omics. So
in essence you get a profile of molecular state of a sample.

Dimensions are often very difficult, e.g. per sample 10000 features for example 150 people.

Applications are found in: biomedicine, pharmacy, eco genomics, plant breeding.

Data sources – Molecular profiling:
- Genome (genomics): point mutations / small variants (exome sequencing, DNAseq),
structural variants (arrayCGH, DNAseq), methylation.
- RNA transcription/expression (transcriptomics): RNAseq, micro arrays, single cell RNAseq.
- Proteins (proteomic): immunohistochemistry, mass spectrometry.
- Metabolites – small molecules (metabolomics): NMR, mass spectrometry.
- Microbes – bacteria/virus/parasite: DNAseq (metagenomics), RNAseq (metatranscriptomics)

Genomics → transcriptomics → proteomics → metabolomics.

40% group project, 30% individual assignment, 30% conversion class.

Project: Benchmark impact prediction methods.
Impact prediction methods: tries to predict what the effect of a mutation is. Polyphen and SIFT are
existing and will be used.

We need to have an annotated set in which we know what the effect is of a mutation. We use the
dataset ClinVar. Use predictions from your method + the gold standard annotations & create a ROC.

What factors make if a mutation has an effect on the organism or not: stop codon, type of amino acid
change, another amino acid, another amino acid from another group, location, the loci (NCR will not
affect the mutation), mutation can affect a promoter heavily, frameshift, nonsense mutations,
deletions/ duplications, frame shift mutations, some factors that have an effect on their organism:
needs to be in a gene, in exon, should affect the amino acid preferably the first two bases of the
amino acid, a mutation that doesn't change the amino acid, a mutation that is not in the active part
of the protein, a mutation in a gene that is not expressed in a certain cell type, a mutation in non-
coding DNA, a mutation in an intron, structure of the protein (mutation which decides structure for
example), location of mutation (intron exon) number of mutations already present, the kind of
mutation (point mutation etc), - DNA repairing; Environmental changes (spontaneous mutations);
Indels changes, Exonic/intronic, Major signalling pathway (y/n), or other system, e.g. DNA repair,
promoter region (y/n), redundancy (gene copies available?), wobble base mutations, start-,stop-

2

, codon, structure of protein, regulation signal, splicing signal, location (IGR, NCR, RNA, repeat
regions, pseudogenes), sort of mutations (deletions/duplications, frame shift), repair mechanisms of
cells, ability to kill cell, epigenetic modification, introns and exons might change, nothing might
happen since it is in in intron, nothing might happen since it is a wobble base, regulatory sequences
might change leading to: over expression, under expression, mutations that might alter the eventual
protein in different ways, introduction of extra stop/start codons.

Bioinformatics is to find which mutations cause a disease. Because you find many, but not all are
important. Aim is to select the biomarker (to make the difference between disease and not).

Biomarker: observation on which to make a clinical decision/diagnosis

We use ClinVar for human SNPs as dataset with gold standard. We will compare the predictions from
the method with the gold standard to create an ROC curve to asses and compare the quality of
predictions by the different methods.

SNV – single nucleotide variant: a position in the DNA where a person/cell has a different nucleotide
than the refence genome. Often are heteroallelic (in only one of two chromosomes).
SNP – single nucleotide polymorphism: a position in the DNA where throughout a population variant
to the common reference may be observed.

To make predictions, we look at conserved regions to see look in the past and see if previous changes
at a certain position were allowed. Impact prediction can reduce the number of SNVs to consider in
biomarker selection.

SNPs occur almost once every 1000 nucleotides, so 4 to 5 million SNPs are present in one genome.

Reading
Central feature of life: ability to reproduce itself.

Three components of an evolutionary process: inheritance, the passing of characteristics from
parents to offspring; variation, the processes that make offspring other than exact copies of their
parents; and selection, the process that differentially favours the reproduction of some organisms,
and hence their characteristics, over others. Evolution is a cumulative process.
Life: the result of evolutionary process taking place on earth.

Reproductive fitness: a measure of how many surviving offspring an organism can produce.

All life is divided into four groups: viruses, archaea, bacteria and eucarya.
Vertebrates (animals with backbones (fish, reptiles, amphibians, birds, mammals)) are 3% of species.

Viruses: small amount of genetic material surrounded by a protein coat.

Sperm and eggs are germ cells (divide during meiosis); all other kinds of cells in the body are somatic.
Differentiated cells cannot reproduce an animal (except reproductive cell).

Membranes: boundaries between the cell and the outside world. All cells have phospholipid (lipids
with a phosphate group attached) cell membrane. The phosphate hydrophilic and the lipid
hydrophobic. The membrane contains all sorts of signal transduction mechanisms.

Proteins: molecules that accomplish most of the functions of the living cell. They can for example
function as enzyme (which catalyses chemical reactions).
Proteins are built up out of 20 naturally occurring amino acids, often as many as 4500 per protein.
Prosthetic groups: groups of atoms to which some proteins bind to function.

3

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller lenie22. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $8.21. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

51292 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 15 years now

Start selling
$8.21  4x  sold
  • (1)
Add to cart
Added