100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Fundamentals of Bioinformatics - Summary €7,74
In winkelwagen

Samenvatting

Fundamentals of Bioinformatics - Summary

1 beoordeling
 130 keer bekeken  4 keer verkocht

Fundamentals of Bioinformatics course on VU and UVA. This is a summary including all lecture material, lecture slides and compulsory reading. It also includes notes from Python tutorials (for the conversion class). It has the same lecture material as for the minor course Fundamentals of Bioinforma...

[Meer zien]

Voorbeeld 4 van de 34  pagina's

  • 13 oktober 2020
  • 34
  • 2020/2021
  • Samenvatting
Alle documenten voor dit vak (2)

1  beoordeling

review-writer-avatar

Door: sharraou • 2 jaar geleden

avatar-seller
lenie22
Fundamentals of Bioinformatics




0

, FUNDAMENTALS OF BIOINFORMATICS

INHOUDSOPGAVE
LECTURES 2

Lecture 1 Fundamentals of Bioinformatics – Introduction 2
Lecture 2 Evolution 6
Lecture 3 Introduction to Machine Learning 11
Lecture 4 Protein structures & sequence profiles 13
Lecture 5 Impact prediction: SIFT and PolyPhen-2 15
Lecture 6 DNA Sequencing & Computational Analysis of Massively Parallel Sequencing (MPS) Data
19
Lecture 7 Title 24
Lecture 8 Big Data in Life Sciences 25

TUTORIALS 28

Tutorial 1 28
Tutorial 2 First steps in Python 30
Tutorial 3 Lists and loops 31
Tutorial 4 Functions 31
Tutorial 5 File I/O and dictionaries 32




1

, Lectures
Lecture 1 Fundamentals of Bioinformatics – Introduction
Chapter 1

Keywords of bioinformatics: programming, knowledge molecular biology, machine learning,
translating the research question into methods, designing workflows, general data science skills and
genomics/genetics, algorithms, statistics. Bioinformaticians often are working on workflows.

Bioinformatics: studying the informatic processes in biotic systems.

Sequencing: data analysis flow: images → reads (small read of DNA 200 – 500 nucleotides long) →
align them / match to existing genomes (computationally expensive) → significance values (SNP, …).

Small devices are available for laptops, however the computation stays big and is send to a server.

We measure molecules / variants in the cells. We can measure proteins, DNA, RNA: called omics. So
in essence you get a profile of molecular state of a sample.

Dimensions are often very difficult, e.g. per sample 10000 features for example 150 people.

Applications are found in: biomedicine, pharmacy, eco genomics, plant breeding.

Data sources – Molecular profiling:
- Genome (genomics): point mutations / small variants (exome sequencing, DNAseq),
structural variants (arrayCGH, DNAseq), methylation.
- RNA transcription/expression (transcriptomics): RNAseq, micro arrays, single cell RNAseq.
- Proteins (proteomic): immunohistochemistry, mass spectrometry.
- Metabolites – small molecules (metabolomics): NMR, mass spectrometry.
- Microbes – bacteria/virus/parasite: DNAseq (metagenomics), RNAseq (metatranscriptomics)

Genomics → transcriptomics → proteomics → metabolomics.

40% group project, 30% individual assignment, 30% conversion class.

Project: Benchmark impact prediction methods.
Impact prediction methods: tries to predict what the effect of a mutation is. Polyphen and SIFT are
existing and will be used.

We need to have an annotated set in which we know what the effect is of a mutation. We use the
dataset ClinVar. Use predictions from your method + the gold standard annotations & create a ROC.

What factors make if a mutation has an effect on the organism or not: stop codon, type of amino acid
change, another amino acid, another amino acid from another group, location, the loci (NCR will not
affect the mutation), mutation can affect a promoter heavily, frameshift, nonsense mutations,
deletions/ duplications, frame shift mutations, some factors that have an effect on their organism:
needs to be in a gene, in exon, should affect the amino acid preferably the first two bases of the
amino acid, a mutation that doesn't change the amino acid, a mutation that is not in the active part
of the protein, a mutation in a gene that is not expressed in a certain cell type, a mutation in non-
coding DNA, a mutation in an intron, structure of the protein (mutation which decides structure for
example), location of mutation (intron exon) number of mutations already present, the kind of
mutation (point mutation etc), - DNA repairing; Environmental changes (spontaneous mutations);
Indels changes, Exonic/intronic, Major signalling pathway (y/n), or other system, e.g. DNA repair,
promoter region (y/n), redundancy (gene copies available?), wobble base mutations, start-,stop-

2

, codon, structure of protein, regulation signal, splicing signal, location (IGR, NCR, RNA, repeat
regions, pseudogenes), sort of mutations (deletions/duplications, frame shift), repair mechanisms of
cells, ability to kill cell, epigenetic modification, introns and exons might change, nothing might
happen since it is in in intron, nothing might happen since it is a wobble base, regulatory sequences
might change leading to: over expression, under expression, mutations that might alter the eventual
protein in different ways, introduction of extra stop/start codons.

Bioinformatics is to find which mutations cause a disease. Because you find many, but not all are
important. Aim is to select the biomarker (to make the difference between disease and not).

Biomarker: observation on which to make a clinical decision/diagnosis

We use ClinVar for human SNPs as dataset with gold standard. We will compare the predictions from
the method with the gold standard to create an ROC curve to asses and compare the quality of
predictions by the different methods.

SNV – single nucleotide variant: a position in the DNA where a person/cell has a different nucleotide
than the refence genome. Often are heteroallelic (in only one of two chromosomes).
SNP – single nucleotide polymorphism: a position in the DNA where throughout a population variant
to the common reference may be observed.

To make predictions, we look at conserved regions to see look in the past and see if previous changes
at a certain position were allowed. Impact prediction can reduce the number of SNVs to consider in
biomarker selection.

SNPs occur almost once every 1000 nucleotides, so 4 to 5 million SNPs are present in one genome.

Reading
Central feature of life: ability to reproduce itself.

Three components of an evolutionary process: inheritance, the passing of characteristics from
parents to offspring; variation, the processes that make offspring other than exact copies of their
parents; and selection, the process that differentially favours the reproduction of some organisms,
and hence their characteristics, over others. Evolution is a cumulative process.
Life: the result of evolutionary process taking place on earth.

Reproductive fitness: a measure of how many surviving offspring an organism can produce.

All life is divided into four groups: viruses, archaea, bacteria and eucarya.
Vertebrates (animals with backbones (fish, reptiles, amphibians, birds, mammals)) are 3% of species.

Viruses: small amount of genetic material surrounded by a protein coat.

Sperm and eggs are germ cells (divide during meiosis); all other kinds of cells in the body are somatic.
Differentiated cells cannot reproduce an animal (except reproductive cell).

Membranes: boundaries between the cell and the outside world. All cells have phospholipid (lipids
with a phosphate group attached) cell membrane. The phosphate hydrophilic and the lipid
hydrophobic. The membrane contains all sorts of signal transduction mechanisms.

Proteins: molecules that accomplish most of the functions of the living cell. They can for example
function as enzyme (which catalyses chemical reactions).
Proteins are built up out of 20 naturally occurring amino acids, often as many as 4500 per protein.
Prosthetic groups: groups of atoms to which some proteins bind to function.

3

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper lenie22. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €7,74. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 51292 samenvattingen verkocht

Opgericht in 2010, al 15 jaar dé plek om samenvattingen te kopen

Start met verkopen
€7,74  4x  verkocht
  • (1)
In winkelwagen
Toegevoegd