I wrote these notes for the course Big Data in Biomedical Sciences. I tried to write all the important information. Everywhere you read (EXAM QUESTION), this subject literally came back in my exam. I got an 8.1, I hope it helps others too!
BIG DATA NOTES
GENETICS/ GWAS
Issues:
1. Relative influence of genes and environment still not resolved (need more reliable estimates).
MZ twins share 100% genes, 100% shared and 0% non-shared environment.
DZ twins share 50% genes, 100% shared and 0% non-shared environment. (EXAM QUESTION)
100% heritable? Then Rmz = 2Rdz.
2. Nature of genes also still under debate (additive vs non-additive).
3. Same for determination of causal mechanism (detection and interpretation).
Heritability is the proportion of trait variance attributable to genetic variance (the extent to which
observed individual differences can be traced back to genetic differences).
DNA facts:
- Each cell contains 23 sets of chromosomes, carrying the heritable blueprint of life.
- Each single chromosome is a DNA molecule.
- A DNA molecule consists of sequences of nucleotides (ACGT).
- The 23 chromosomes together contain ~3 x 10 9 nucleotides, ‘completely’ sequenced (in 2002).
- A sequence of 3 bases is a codon and particular sequences serve as a recipe for an amino-acid.
- Multiple codons together enclosed in transcription start and end sites are called genes and
provide blueprints for proteins.
- One chromosome consists of both non-genic (~90%) and genic regions (~10%).
- Humans have in total 22,000-24,000 genes. (EXAM QUESTION)
- Not every gene is expressed in every cell.
- The specific set of genes that is expressed in a cell determines the cell type.
Human share: 87.5% DNA with mice, 99% with chimp and 99.9% with humans. (EXAM QUESTION)
Genetic variations can be: harmless, harmful, latent or silent. Causes (EXAM QUESTION):
Mutation- level of base pairs (occurs by accident e.g. when DNA is replicated).
o Monogenic disorders: influenced by one gene, most genetic causes already known.
o Polygenic disorders: influenced by multiple genes, causes mostly unknown, often
complex (G + E).
Recombination- level of parts of the chromosome (crossing over).
Segregation- level of combination of chromosomes.
, Candidate gene study: focus on very small sub set of genes.
GWAS: in every single nucleotide in the genome (1 million SNPs). Microarrays can now contain more
than 1 million tagging SNPs covering the genome in high density.
Advantages:
- May identify several possible loci as spans whole genome.
- Relationships between loci may identify new biological pathways.
- Results from multiple studies can be integrated, aiding the prioritization of genes for replication
and increasing statistical confidence.
Disadvantages:
- Increased likelihood of false positives because you do multiple testing.
- Population stratification.
- Large number of samples needed.
- Vast amounts of data analysed (need cluster computers) and produced.
Every point above the threshold of a Manhattan plot is evidence for association. Every dot in this graph
represents the outcome one single test for allele frequency difference of one variant per trait.
Majority of human complex traits probably caused by thousands of genes of very small effect. Huge
sample sizes needed. GWAS have only detected a fraction of genetic variance (<2%).
4 issues GWAS:
- GWAS hits for polygenic traits mostly outside genes, or in non-coding genic regions, with likely
regulatory functions that are currently unknown.
- GWAS hits for polygenic traits have small effects.
- SNPs are correlated which complicates pinpointing the causal SNP.
- There are 100’s of genes involved in polygenic traits – a single gene will not provide the whole
picture.
Functional categories of SNPs
Protein coding
o SNPs in exonic regions may alter protein structure and/or function e.g. nonsense SNPs
or missense SNPs.
Splicing regulation
o SNPs in splice sites may disrupt splicing regulation, resulting in exon skipping or intron
retention.
o They can also interfere with alternative splicing regulation by changing exonic splicing
enhancers or silencers.
Transcriptional regulation
o SNPs in transcription regulatory regions can alter binding sites, and thus disrupt proper
gene regulation.
Post-translational modification
o SNPs in protein-coding regions may alter post-translational modification sites.
Voordelen van het kopen van samenvattingen bij Stuvia op een rij:
Verzekerd van kwaliteit door reviews
Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!
Snel en makkelijk kopen
Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.
Focus op de essentie
Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!
Veelgestelde vragen
Wat krijg ik als ik dit document koop?
Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.
Tevredenheidsgarantie: hoe werkt dat?
Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.
Van wie koop ik deze samenvatting?
Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper arzuburak. Stuvia faciliteert de betaling aan de verkoper.
Zit ik meteen vast aan een abonnement?
Nee, je koopt alleen deze samenvatting voor €10,49. Je zit daarna nergens aan vast.