MSc Biomedical Sciences UvA
Molecular Biology of the Cell
Summary of lectures
, Important info / dates
• Tutorials prepared beforehand: answer the questions and print your answers to swap with
students & discuss.
o Tutorial week 36 – September 9th : Michel Haring → preparation not necessary!
o Tutorial week 37 – September 16th: Maike Stam → prepare!
o Tutorial week 38 – September 20: Stanley Brul → prepare!
• Digital test (1 hour) – September 19th, 9:00 – 10:00 (8:45 aanwezig!) → IWO 4.04A (AMC)
o Learn lectures of Michel Haring & Maike Stam!
• Final exam (3 hours) – September 30th, 14:00-17:00 → IWO 4.04A (AMC)
o Learn all lectures!
Groups
• Tutorial group: R
• Computer practical: Group Y_8 with Ali → I upload the practical answers.
• Assigment group: G_4 with Laura & Carmen
2
, Lecture 1 – Michel Haring
DNA Replication, sequencing & PCR
• The central dogma: replication (DNA) → transcription (RNA) → translation (protein)
o DNA → RNA → protein → metabolite → phenotype
• DNA synthesis / replication: see figure.
o DNA replication is semi-conservative: every new double stranded molecule (=
daughter DNA double helices) consists of one old and one new strand.
o The old strand is the template (example) for the new strand → see figure.
• DNA synthesis: process of the formation of phosphodiester bonds while hydrolyzing the
matching dNTP molecule (5’→ 3’)
o Deoxyribonucleoside triphosphate incoming → hydrolysis of matching dNTP →
phosphodiester bond formed + release of pyrophosphate during the process (see
figure).
o Always 3’ end to synthesize the next nucleotide of the DNA strand.
o DNA polymerase synthesizes DNA from a double stranded “primer”: both strands of a
double-stranded DNA molecule can serve as a template for DNA synthesis (see
figures).
• The replisome is a molecular machine present at eukaryotic replication forks consisting of
several subunits (see figure): 3 DNA polymerase variants (α, δ, ε) present + proteins:
topoisomerases to regulate unwinding of DNA during replication / transcription by relaxing the
supercoiling in DNA strands. Helicase cuts the DNA backbone, which leads to unwinding! →
present on lagging + CMG replicative helicase + sliding clamp and clamp loader.
Proof-reading
• “Proof-reading” activity (3’-5’) by DNA polymerase complex can restore mistakes in DNA
synthesis by removing mispaired nucleotides (see figures):
o Frequency of mistakes:
▪ without proof-reading → 1 error per 105 nucleotides
▪ with proof-reading → 1 error per 107 nucleotides
▪ with strand-directed mismatch repair → 1 error per 1010 nucleotides
• Mechanism of proof reading and repair DNA synthesis by mismatch proofreading proteins
(see figure): error in newly made strand → binding of mismatch proofreading proteins → DNA
scanning detects nick in new DNA strand → strand removal → repair DNA synthesis
3
,Chemical changes in DNA
• Chemical changes in DNA bases can cause mutation in the DNA! (see figure)
o Depurination → loss of purines (e.g. adenine / guanine)
o Deamination → cytosine loses amino group and converted to uracil.
• Result of chemical modifications: point mutations or deletions (see figure)
1. Depurination of adenine (A): A deletion in sequence → A-T nucleotide pair has been
deleted in one of daughter DNA molecule products of DNA replication.
2. Deamination: C to T change / A to G change / G stops replication (stutter/deletion)
• Types of endogenous DNA lesions arising and repaired (see table): especially depurination at
high rate.
Application knowledge of DNA biosynthesis
• Determine the order of the bases A, C, G, and T: DNA sequencing (whole genome)
• DNA amplification: Polymerase Chain Reaction (PCR)
DNA sequencing
• Dideoxy sequencing (according to Sanger) (see figure): DNA synthesis while incorporating
chain terminators in separate reaction for the bases A,C,G and T → A chain terminator
(dideoxy nucleotide) lacks the 3’OH necessary for strand extension
o Deoxyribonucleoside triphosphate (dNTPs): has 3’OH group that allows strand
extension at 3’end.
o Dideoxyribonucleoside triphosphate (ddATP) = chain terminator: lacks 3’ OH group,
thus prevents strand extension at 3’ end → DNA synthesis stops!
o See figure: With a high concentration of all dNTPs and low concentrations of ddATP
(ddCTP , ddGTP en ddTTP) in separate reactions DNA synthesis will continue, but
occasionally synthesis will stop when a dideoxy nucleotide (=chain terminator) is
incorporated.
o
• Primer: determines the start of the DNA dideoxy Sanger sequencing reaction → See figure:
with four separate reactions and a labelled (radioactive or fluorescent) primer fragments of
different length are generated. These fragments are separated alongside using electrophoresis.
Every visible fragment represents a termination in the DNA synthesis.
• Sanger sequencing method: see figure.
4
,Genome sequencing
• See figure for mechanism: BAC libraries and “shotgun” fragment sequencing.... Assembly by
comparison of overlapping sequences → Hierarchical shotgun sequencing
o Genomic DNA → BAC library → organized mapped large clone contigs → BAC to be
sequenced → Shotgun clones → Shotgun sequence: overlapping sequences identified
→ assembly.
• Abovementioned method can be used to sequence whole genomes
• Sequencing whole genomes using shotgun sequencing → mechanism (see figure); multiple
copies of genome = genomic DNA → random fragmentation → genomic library formed →
double-stranded fragments / clones → sequence one strand of two fragments / clones →
sequences of two fragments / clones obtained → original sequence reconstructed based on
overlap of sequences! → the full genome can be reconstructed this way by stitching together
nucleotide sequences of each clone, using overlaps between clones as a guide!
• Shotgun sequencing works efficiently for small genomes (virus/bacteria) that lack repetitive
DNA!
• Contigs: assembly of smaller DNA sequences into one continuous strand (see figure) → a
contig is a set of overlapping DNA segments that together represent a consensus region of
DNA
• Bac clones (see figure) → bacterial artificial chromosome (BAC) is a engineered DNA molecule
used to clone DNA sequences in bacterial cells. Segments of an organism's DNA, ranging from
100,000 to about 300,000 base pairs, can be inserted into BACs. The BACs, with their inserted
DNA, are then taken up by bacterial cells. As the bacterial cells grow and divide, they amplify
the BAC DNA, which can then be isolated and used in sequencing DNA.
• 454 sequencing or GS-Flex technology (see figure):
o The “innovation” of massive parallel sequencing: Reactions take place in picoliter
volume on microscopic beads.
o Sequencing technique: Pyrosequencing → Incorporation of a dNTP molecule supplies
PPi that is converted to ATP. Every ATP molecule is consumed by luciferase to yield 1
flash of light → End result: pyrogram of DNA fragment with determined base order
(see figure).
• Pyrogram of a DNA fragment: large peak respresents multiple of the same nucleotide row in
the sequence (see figure) → problems with identifying polymer tracts!
• Ion torrent sequencing: incorporation of a nucleotide results in release of a proton. Resulting
pH change can be measured on a chip (see figures)
• Massive parallel sequencing:
o Bulk DNA sequence data
o Read length
o Error rate
5
, • Ultra long read scanning: BioNano technology covers hundreds of kilobases to longer than a
megabase (see figure).
• Whole genome sequencing is used for reconstruction of genomes of extinct species: Cave
bear, Mammoth, Neanderthal
Polymerase Chain Reaction (PCR)
• Amplification of 1 specific DNA fragment → exact amplification of desired fragment! (see
figure)
• Process: define fragment length by choice of primers → DNA synthesis: exact amplification of
desired fragment! (see figure)
• PCR step by step (see figure):
1. Choose primers in desired area → GC content (hybridisation temperature) & 3’ end
(synthesis start)
2. Determine reaction conditions PCR → hybridisation temperature (specificity), synthesis
time (length fragment), & number of cycles (amount template).
3. Start reaction: separate strands of double-stranded DNA by heat (1) → primers hybridize
with strands (2) → DNA polymerase and dNTPs added & DNA synthesis from primers (3)
• First cycle: producing two double-stranded DNA molecules.
• Second cycle: producing four double-stranded DNA molecules
• Third cycle: producing eight double-stranded molecules.
• Primers anneal to single DNA strands in each cycle (see figure)
• Specific fragment is being amplified during PCR (see figure) → Preferential amplification of the
target fragment!
• PCR amplification in numbers (see figure)
• PCR application: Cloning genomic fragments (see figure)
o Knowledge required: homologous DNA sequence & Aminoacid sequence protein
o Advantages: Sensitive, Fast
o Restrictions: Fragment length (< 20kb), Homologous DNA sequences
• PCR application: RT-PCR → Cloning of specific cDNA fragments
o Knowledge required: homologous DNA sequence, Aminoacid sequence protein
o Advantages: Sensitive, Fast
o Restrictions: 5’ end RACE for full length, Gene families
6
,Recombinant DNA techniques make it possible to move experimentally from gene to protein and from
protein to gene. If a gene has been identified (right), its protein-coding sequence can be inserted into
an expression vector to produce large quantities of the protein, which can then be studied
biochemically or structurally. If a protein has been purified on the basis of its biochemical properties,
mass spectrometry can be used to obtain a partial amino acid sequence, which is used to search a
genome sequence for the corresponding nucleotide sequence. The complete gene can then be cloned
by PCR from a sequenced genome. The gene can also be manipulated and introduced into cells or
organisms to study its function.
DNA sequencing techniques for detecting DNA polymorphisms
• PCR-based: VNTR (variable number tandem repeats)
• PCR and sequencing-BASED/melting curve: SNP (single nucleotide polymorphism)
• VNTR = variable number tandem repeats → repeat number can vary between individuals →
hence: VNTRs generate a unique fingerprint of an individual! (see figure)
o Allele frequencies of VNTRs allow us to calculate the likelyhood of each fingerprint.
• SNP = single nucleotide polymorphism → variation of specific/unique nucleotide in a DNA
sequence → collection of all SNPs from a certain genomic area is called a haplotype of a
individuals. Three “tag” SNPs are enough to summarize the haplotype (see figure)
• SNPs easy to identify using next-generation sequencing.
7
, Lecture 2 – Michel Haring
Genomes, OMICS & Bioinformatics
• Bioinformatics: How to handle the enormous amount of information produced by;
o DNA sequencing
o RNA expression
o Protein patterns
o Metabolite contents
o Digitalized phenotypes
• The science of OMICS:
o DNA → genomics
o RNA → transcriptomics
o Protein → proteomics
o Metabolite → metabolomics
o Phenotype → phenomics
• DNA sequencing shows variable genome size between species (see figure) → however, there is
no correlation between genome size and organismal complexity.
o E.g. model organisms and their genomes (see table)
• Annotation of DNA sequences (see figure):
o Only 2% of the human genome codes for proteins
o Repeated sequences: e.g. transposons, SINE = Short Interspersed Nuclear Element
(13%), LINE = Long Interspersed Nuclear Element (21%)
o Unique sequences: e.g. genes = protein-coding regions + introns, non-repetitive DNA
that is neither in introns nor codons.
o Repeated DNA sequences: important for regulation of gene expression or maintaining
structure of DNA.
• Genomics → can be used for annotation of the human genome:
o DNA sequence analyse genome fragments
o DNA database (NCBI, Genbank, EMBL)
o DNA markers (SNPs etc)
o Utilization of databases:
▪ Compare unknown fragments
▪ Identification gene fragments (annotation)
▪ Chromosome structure / synteny
▪ Intron / exon boundaries
▪ Phylogeny
• Chromosomes contain many duplicated segments: intra- and inter-chromosomal duplications
(see figure; e.g. chromosome 7)
8
,Genome annotation: the process of identifying the locations of genes and all of the coding regions in a
genome and determining what those genes do. An annotation (irrespective of the context) is a note
added by way of explanation or commentary. Once a genome is sequenced, it needs to be annotated
to make sense of it.
• Annotation of a DNA sequence: how can you identify a gene in a genome? What landmarks
are used for annotation? (see + learn figure)!
• Human gene characteristics (see figure).
Comparing genomes
• Exon length is conserved between human, fly and worm: suggests functional restriction
splicing machinery.
• Intron length is much more variable in human, peaking at 87 bp, but trailing till 3300 bp.
• Splice site = consensus sequence required for intron removal (see figure).
• Alternative and Differential splicing → one gene can produce different mRNAs that code for
different proteins → variation in exon combination → unique mRNA for different cell types
(see figure).
• Differential splicing: splicing on predicted / known / canonical splice sites.
• Alternative splicing: splicing on sites you did not predict (so unknown splice sites / non-
canonical splice sites).
o 35% of the genes have alternative splicing
o 70% in the coding sequence → alters the protein.
o 20% terminal exon added
o Alternative splicing: 30.000 genes = 100.000 proteins.
• Comparing genomes to find functional elements: synteny
o Synteny is the preserved order of genes between related organisms.
o Since the order of genes mostly has a neutral effect in eukaryotes, an organism will
have no ill effects from having genes re-arranged.
o The order of genes is generally preserved best between tightly related species.
o Conservation of the order of a cluster of genes suggests a functional relation.
9
, o Syntany helps forming a hypothesis for gene function!
Natural selection: Changes in DNA that do or do not affect the encoded protein.
• Ka = non-synonymous substitution ratio (base change leads to different amino acid)
• Ks = synonymous substitution ratio (base change leads to same amino acid)
• Ka/Ks<1 strong selection
• Ka/Ks>1 NO selection
What makes us human? → 3 hypotheses to account for the evolution of “humanness traits”:
1. Protein evolution
2. The ‘less-is-more’ hypothesis
3. Changes in the regions of the genome that regulate gene activity
Homolog: A gene related to a second gene by descent from a common ancestral DNA sequence. The
term, homolog, may apply to the relationship between genes separated by the event of speciation
(see ortholog) or to the relationship between genes separated by the event of genetic duplication (see
paralog).
Ortholog: Orthologs are genes in different species that evolved from a common ancestral gene by
speciation. Normally, orthologs retain the same function in the course of evolution. Identification of
orthologs is critical for reliable prediction of gene function in newly sequenced genomes (see figure).
Paralog: Paralogs are genes related by duplication within a genome. Orthologs retain the same
function in the course of evolution, whereas paralogs evolve new functions, even if these are related
to the original one (see figure).
• Conservation protein sequences → comparison of protein coding sequences using BLAST
• Finding similar proteins → database analysis using BLASTP (see figure): The example shows a
search for an exact match. If you look for similar proteins you will have to use a “score matrix”
for aminoacids = shows amino acid relationships (see figure).
• Comparison of several similar proteins = “multiple alignment” → results in: phylogenetic tree,
subgrouping, conserved domains (see figure).
The new molecular language: protein domains (see figures)
• Protein domains are made of building blocks and preserved in other proteins.
• Similar proteins show identical protein domain combinations.
10