Good summary can also be used before the exam. With the elaborations and explanations of jupyternotebooks, questions and codes. The lessons have also been written out and useful links for additional information added
Good summary which is usable for the exam. With explanations of the jupyternote...
Bio-informatica: Sequence
Inhoud
Les 1: Intro to NGS, genome variation, genomic medicine.....................................1
Applications of genomic variations......................................................................2
(NGS) methods to determine genetic variation...................................................3
Les 2: Evaluating and processing raw sequence data............................................4
NGS analysis pipeline.......................................................................................... 4
General useful commands jupyter.......................................................................8
Case studies...................................................................................................... 11
Les 3: Variant calling and annotation...................................................................13
Variant calling.................................................................................................... 13
Jupyter case study (1)- variant calling...............................................................16
Variant annotation............................................................................................. 17
Jupyter case study (1)- annotation....................................................................18
Les 4: Non-coding genetic variants......................................................................19
Enformer (python language)............................................................................. 21
Les 5/6: Variant interpretation & personal genomics............................................22
Les 7: Copy number variation............................................................................... 25
Les 8+9: Complex structural variation.................................................................27
Les 10: Single Cell CNA calling............................................................................. 34
R (similar to python language)..........................................................................36
Questions in the Jupyter notebook....................................................................37
Les 11: Guest speakers........................................................................................ 39
Les 1: Intro to NGS, genome variation, genomic medicine
Genomic variation is related to disease. There are different types of variation:
1
, - SNPs “DNA spelling mistakes”, one nucleotide change
- INDELs “extra or missing DNA”, some nucleotides inserted or deleted
- SVs Large blocks of extra, missing or rearranged DNA
Applications of genomic variations
Health conditions:
1. Non-invasive prenatal test (NIPT)
2. Mendelian disorders
a. Trio-based sequencing unaffected parents and an affected offspring
b. SMA, BRCA1
3. Complex diseases: polygenic risk
a. Not one gene is responsible= polygenic risk
b. Many traits are polygenic Wide Association Study: associate absence/presence of
SNPs in cases (with disease) and controls (without disease)
i. P-value of every SNP tested associate to disease
c. Also can do a gene prioritisation if a SNP is present, is the gene expression higher?
Try to attribute a SNP to the closest gene present.
d. Another way is to quantify genetic risk as a diagnostic tool
e. Alzheimer's disease
i. Everything above the red line is significant meta-analysis of Alzheimer’s
4. Cancer
genomics:
a. Somatic mutations very different genetic profiles
b. Far more so than in the other areas discussed above, driver genes and mutations in
cancer provide clear molecular targets for therapeutic agents broad application
c. Non-small cell lung cancers with activating somatic mutations in the EGFR kinase
EGFR kinase inhibitor gefitinib
d. TCGA and PACWG: broad surveys
i. About half of the common tumours contain one or more clinically relevant
mutations, predicting sensitivity or resistance to specific agents or suggesting
clinical trial eligibility
e. Tumours shed DNA in the blood circulating tumour DNA (ctDNA) liquid
biopsies
2
, f. Evolution graphs of mutations to see where the problems are personalised medicine
Traits: Genomic variance also leads to different traits such as length, eye colour etc.
Ancestry: Genetic variants are the "bread crumbs" for tracking evolution
(NGS) methods to determine genetic variation
Restriction fragment length polymorphism Restriction enzymes cut DNA yielding fragments of
different sizes. Mutations may disrupt this pattern which is linked to disease.
Arrays and NGS have resulted in an explosion of genomic testing 2 key technologies:
1. High-density DNA microarrays to genotype millions of specific positions in each of many
human genomes. Coupled with population-based maps of linkage disequilibrium (LD), array-
based genotyping enables the ascertainment of the most common genetic variation in a human
genome for a low-cost
2. Massively parallel DNA sequencing technologies can generate billions of short sequencing
reads within a day or less next generation sequencing (NGS) now permits the near-
comprehensive ascertainment of both rare and common genetic variation.
Most technologies have the DNA sequencing information in a FASTQ format. De multiplex reads
generates 2 FASTQ files for each sample (forwards and reverse read). Different types of genome
alterations that can be detected by NGS.
Types of point mutations in protein-coding genes
Mutations in regulatory regions are harder to interpret. With machine learning approaches we can
understand genetic variations.
3
, Les 2: Evaluating and processing raw sequence data
NGS analysis pipeline
Three main formats:
1. Raw reads (FASTQ)
2. Alignment file (SAM/BAM)
3. vcf
Raw reads
Start with sequencing (FASTQ) e.g. Illumina; sequencing by synthesis
1. First line is the identifier starts with @
2. Second line is the sequence
3. Third line is +=separator
4. Fourth line is quality sequence how good/certain the sequence is
Phred-score are quality scores of the certainty of the base that is correctly recorded (0-40)
Everything >28 is good.
The scores are encoded every symbol/letter is representative for numbers:
https://en.wikipedia.org/wiki/Phred_quality_score
Illumina coding is mostly used nowadays.
4
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller sisivorst. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $11.46. You're not tied to anything after your purchase.