Goede samenvatting ook te gebruiken voor het examen. Met uitwerkingen en uitleg van jupyternotebooks vragen en codes. Ook de lessen uitgeschreven en handige links voor extra informatie toegevoegd
Good summary which is usable for the exam. With explanations of the jupyternotebooks excersises and ...
Bio-informatica: Sequence
Inhoud
Les 1: Intro to NGS, genome variation, genomic medicine.....................................1
Applications of genomic variations......................................................................2
(NGS) methods to determine genetic variation...................................................3
Les 2: Evaluating and processing raw sequence data............................................4
NGS analysis pipeline.......................................................................................... 4
General useful commands jupyter.......................................................................8
Case studies...................................................................................................... 11
Les 3: Variant calling and annotation...................................................................13
Variant calling.................................................................................................... 13
Jupyter case study (1)- variant calling...............................................................16
Variant annotation............................................................................................. 17
Jupyter case study (1)- annotation....................................................................18
Les 4: Non-coding genetic variants......................................................................19
Enformer (python language)............................................................................. 21
Les 5/6: Variant interpretation & personal genomics............................................22
Les 7: Copy number variation............................................................................... 25
Les 8+9: Complex structural variation.................................................................27
Les 10: Single Cell CNA calling............................................................................. 34
R (similar to python language)..........................................................................36
Questions in the Jupyter notebook....................................................................37
Les 11: Guest speakers........................................................................................ 39
Les 1: Intro to NGS, genome variation, genomic medicine
Genomic variation is related to disease. There are different types of variation:
1
, - SNPs “DNA spelling mistakes”, one nucleotide change
- INDELs “extra or missing DNA”, some nucleotides inserted or deleted
- SVs Large blocks of extra, missing or rearranged DNA
Applications of genomic variations
Health conditions:
1. Non-invasive prenatal test (NIPT)
2. Mendelian disorders
a. Trio-based sequencing unaffected parents and an affected offspring
b. SMA, BRCA1
3. Complex diseases: polygenic risk
a. Not one gene is responsible= polygenic risk
b. Many traits are polygenic Wide Association Study: associate absence/presence of
SNPs in cases (with disease) and controls (without disease)
i. P-value of every SNP tested associate to disease
c. Also can do a gene prioritisation if a SNP is present, is the gene expression higher?
Try to attribute a SNP to the closest gene present.
d. Another way is to quantify genetic risk as a diagnostic tool
e. Alzheimer's disease
i. Everything above the red line is significant meta-analysis of Alzheimer’s
4. Cancer
genomics:
a. Somatic mutations very different genetic profiles
b. Far more so than in the other areas discussed above, driver genes and mutations in
cancer provide clear molecular targets for therapeutic agents broad application
c. Non-small cell lung cancers with activating somatic mutations in the EGFR kinase
EGFR kinase inhibitor gefitinib
d. TCGA and PACWG: broad surveys
i. About half of the common tumours contain one or more clinically relevant
mutations, predicting sensitivity or resistance to specific agents or suggesting
clinical trial eligibility
e. Tumours shed DNA in the blood circulating tumour DNA (ctDNA) liquid
biopsies
2
, f. Evolution graphs of mutations to see where the problems are personalised medicine
Traits: Genomic variance also leads to different traits such as length, eye colour etc.
Ancestry: Genetic variants are the "bread crumbs" for tracking evolution
(NGS) methods to determine genetic variation
Restriction fragment length polymorphism Restriction enzymes cut DNA yielding fragments of
different sizes. Mutations may disrupt this pattern which is linked to disease.
Arrays and NGS have resulted in an explosion of genomic testing 2 key technologies:
1. High-density DNA microarrays to genotype millions of specific positions in each of many
human genomes. Coupled with population-based maps of linkage disequilibrium (LD), array-
based genotyping enables the ascertainment of the most common genetic variation in a human
genome for a low-cost
2. Massively parallel DNA sequencing technologies can generate billions of short sequencing
reads within a day or less next generation sequencing (NGS) now permits the near-
comprehensive ascertainment of both rare and common genetic variation.
Most technologies have the DNA sequencing information in a FASTQ format. De multiplex reads
generates 2 FASTQ files for each sample (forwards and reverse read). Different types of genome
alterations that can be detected by NGS.
Types of point mutations in protein-coding genes
Mutations in regulatory regions are harder to interpret. With machine learning approaches we can
understand genetic variations.
3
, Les 2: Evaluating and processing raw sequence data
NGS analysis pipeline
Three main formats:
1. Raw reads (FASTQ)
2. Alignment file (SAM/BAM)
3. vcf
Raw reads
Start with sequencing (FASTQ) e.g. Illumina; sequencing by synthesis
1. First line is the identifier starts with @
2. Second line is the sequence
3. Third line is +=separator
4. Fourth line is quality sequence how good/certain the sequence is
Phred-score are quality scores of the certainty of the base that is correctly recorded (0-40)
Everything >28 is good.
The scores are encoded every symbol/letter is representative for numbers:
https://en.wikipedia.org/wiki/Phred_quality_score
Illumina coding is mostly used nowadays.
4
Les avantages d'acheter des résumés chez Stuvia:
Qualité garantie par les avis des clients
Les clients de Stuvia ont évalués plus de 700 000 résumés. C'est comme ça que vous savez que vous achetez les meilleurs documents.
L’achat facile et rapide
Vous pouvez payer rapidement avec iDeal, carte de crédit ou Stuvia-crédit pour les résumés. Il n'y a pas d'adhésion nécessaire.
Focus sur l’essentiel
Vos camarades écrivent eux-mêmes les notes d’étude, c’est pourquoi les documents sont toujours fiables et à jour. Cela garantit que vous arrivez rapidement au coeur du matériel.
Foire aux questions
Qu'est-ce que j'obtiens en achetant ce document ?
Vous obtenez un PDF, disponible immédiatement après votre achat. Le document acheté est accessible à tout moment, n'importe où et indéfiniment via votre profil.
Garantie de remboursement : comment ça marche ?
Notre garantie de satisfaction garantit que vous trouverez toujours un document d'étude qui vous convient. Vous remplissez un formulaire et notre équipe du service client s'occupe du reste.
Auprès de qui est-ce que j'achète ce résumé ?
Stuvia est une place de marché. Alors, vous n'achetez donc pas ce document chez nous, mais auprès du vendeur sisivorst. Stuvia facilite les paiements au vendeur.
Est-ce que j'aurai un abonnement?
Non, vous n'achetez ce résumé que pour €10,66. Vous n'êtes lié à rien après votre achat.