Goede samenvatting ook te gebruiken voor het examen. Met uitwerkingen en uitleg van jupyternotebooks vragen en codes. Ook de lessen uitgeschreven en handige links voor extra informatie toegevoegd
Good summary which is usable for the exam. With explanations of the jupyternotebooks excersises and ...
Bio-informatica: Sequence
Inhoud
Les 1: Intro to NGS, genome variation, genomic medicine.....................................1
Applications of genomic variations......................................................................2
(NGS) methods to determine genetic variation...................................................3
Les 2: Evaluating and processing raw sequence data............................................4
NGS analysis pipeline.......................................................................................... 4
General useful commands jupyter.......................................................................8
Case studies...................................................................................................... 11
Les 3: Variant calling and annotation...................................................................13
Variant calling.................................................................................................... 13
Jupyter case study (1)- variant calling...............................................................16
Variant annotation............................................................................................. 17
Jupyter case study (1)- annotation....................................................................18
Les 4: Non-coding genetic variants......................................................................19
Enformer (python language)............................................................................. 21
Les 5/6: Variant interpretation & personal genomics............................................22
Les 7: Copy number variation............................................................................... 25
Les 8+9: Complex structural variation.................................................................27
Les 10: Single Cell CNA calling............................................................................. 34
R (similar to python language)..........................................................................36
Questions in the Jupyter notebook....................................................................37
Les 11: Guest speakers........................................................................................ 39
Les 1: Intro to NGS, genome variation, genomic medicine
Genomic variation is related to disease. There are different types of variation:
1
, - SNPs “DNA spelling mistakes”, one nucleotide change
- INDELs “extra or missing DNA”, some nucleotides inserted or deleted
- SVs Large blocks of extra, missing or rearranged DNA
Applications of genomic variations
Health conditions:
1. Non-invasive prenatal test (NIPT)
2. Mendelian disorders
a. Trio-based sequencing unaffected parents and an affected offspring
b. SMA, BRCA1
3. Complex diseases: polygenic risk
a. Not one gene is responsible= polygenic risk
b. Many traits are polygenic Wide Association Study: associate absence/presence of
SNPs in cases (with disease) and controls (without disease)
i. P-value of every SNP tested associate to disease
c. Also can do a gene prioritisation if a SNP is present, is the gene expression higher?
Try to attribute a SNP to the closest gene present.
d. Another way is to quantify genetic risk as a diagnostic tool
e. Alzheimer's disease
i. Everything above the red line is significant meta-analysis of Alzheimer’s
4. Cancer
genomics:
a. Somatic mutations very different genetic profiles
b. Far more so than in the other areas discussed above, driver genes and mutations in
cancer provide clear molecular targets for therapeutic agents broad application
c. Non-small cell lung cancers with activating somatic mutations in the EGFR kinase
EGFR kinase inhibitor gefitinib
d. TCGA and PACWG: broad surveys
i. About half of the common tumours contain one or more clinically relevant
mutations, predicting sensitivity or resistance to specific agents or suggesting
clinical trial eligibility
e. Tumours shed DNA in the blood circulating tumour DNA (ctDNA) liquid
biopsies
2
, f. Evolution graphs of mutations to see where the problems are personalised medicine
Traits: Genomic variance also leads to different traits such as length, eye colour etc.
Ancestry: Genetic variants are the "bread crumbs" for tracking evolution
(NGS) methods to determine genetic variation
Restriction fragment length polymorphism Restriction enzymes cut DNA yielding fragments of
different sizes. Mutations may disrupt this pattern which is linked to disease.
Arrays and NGS have resulted in an explosion of genomic testing 2 key technologies:
1. High-density DNA microarrays to genotype millions of specific positions in each of many
human genomes. Coupled with population-based maps of linkage disequilibrium (LD), array-
based genotyping enables the ascertainment of the most common genetic variation in a human
genome for a low-cost
2. Massively parallel DNA sequencing technologies can generate billions of short sequencing
reads within a day or less next generation sequencing (NGS) now permits the near-
comprehensive ascertainment of both rare and common genetic variation.
Most technologies have the DNA sequencing information in a FASTQ format. De multiplex reads
generates 2 FASTQ files for each sample (forwards and reverse read). Different types of genome
alterations that can be detected by NGS.
Types of point mutations in protein-coding genes
Mutations in regulatory regions are harder to interpret. With machine learning approaches we can
understand genetic variations.
3
, Les 2: Evaluating and processing raw sequence data
NGS analysis pipeline
Three main formats:
1. Raw reads (FASTQ)
2. Alignment file (SAM/BAM)
3. vcf
Raw reads
Start with sequencing (FASTQ) e.g. Illumina; sequencing by synthesis
1. First line is the identifier starts with @
2. Second line is the sequence
3. Third line is +=separator
4. Fourth line is quality sequence how good/certain the sequence is
Phred-score are quality scores of the certainty of the base that is correctly recorded (0-40)
Everything >28 is good.
The scores are encoded every symbol/letter is representative for numbers:
https://en.wikipedia.org/wiki/Phred_quality_score
Illumina coding is mostly used nowadays.
4
Voordelen van het kopen van samenvattingen bij Stuvia op een rij:
√ Verzekerd van kwaliteit door reviews
Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!
Snel en makkelijk kopen
Je betaalt supersnel en eenmalig met iDeal, Bancontact of creditcard voor de samenvatting. Zonder lidmaatschap.
Focus op de essentie
Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!
Veelgestelde vragen
Wat krijg ik als ik dit document koop?
Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.
Tevredenheidsgarantie: hoe werkt dat?
Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.
Van wie koop ik deze samenvatting?
Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper sisivorst. Stuvia faciliteert de betaling aan de verkoper.
Zit ik meteen vast aan een abonnement?
Nee, je koopt alleen deze samenvatting voor €10,66. Je zit daarna nergens aan vast.