Genregulatie
Samenvatting – Hoorcolleges
Bonifiya Visuvasam
1
, Lecture 1 & 2 – Paul
Introductie / Eukaryote genregulatie
Epigenetica: stabiele verandering in genexpressie zonder wijziging in de genetische code →
Epigenetics is the study of phenotypic variation caused by external or environmental factors that turn
genes 'on' and 'off' and affect how cells 'read' genes instead of being caused by changes in the DNA
sequence”
• De veranderingen in genexpressie kunnen mitotisch of meiotisch overerfbaar (heritable) zijn
• Veranderingen van genactivitiviteit door epigenetische systemen (b.v. DNA methylatie) in b.v.
hersencellen zijn “niet overerfbaar”
Epigenetische mechanismen voor genregulatie: zie figuur
• DNA methylatie
• Histon modificatie:
o Methyl
o Acetyl
o Phosphate
o Ubiquitine
o More groups …
• Histon varianten
• Noncoding RNA (post-transcriptional)
Overzicht van alle belangrijke concepten van genregulatie:
What is genetic information? → two types of “genetic (heritable) information”: genetic vs. epigenetic
Belangrijk epigenetisch mechanisme: chromatin accesibility → various “levels/states” of open/closed
chromatin → development / defense / chromosome stability.
2
,Eukaryotic gene regulation
The cells of our body all have the same DNA sequence. However, several different cell types with
distinct functions exist. This is regulated through genetic information → how is this regulated?
Different views on: what is genetic information?
• DNA
• A set of chromosomes
• Coding and non-coding genes
• Chromatin
What is the relation between heritable genetic information and the genomic sequence/structure of a
chromosome/DNA elements/chromatin?
DNA
• DNA carries the genetic information: consists of A/T/C/G nucleotide sequence
o 4 chemische groepen bepalen variatie in DNA → A/T/C/G (U) nucleotiden verschillen
mogelijk door deze chemische groepen (zie figuur):
1. H (proton)
2. NH2 (amine)
3. O (zuurstof
4. CH3 (methyl)
o Purines: adenine (A) & guanine(G) → 2 ringen
o Pyrimidines: cytosine (C), uracil (U), & thymine (T) → 1 ring
• Genetische code: welke informatie (welk eiwit wordt gecodeerd)
• Sequentie specificiteit: regulatie eiwitten (transcriptie factoren) kunnen binden aan cis-regel
elementen b.v. promoter, enhancer (zo kan worden gereguleerd wanneer + waar er
transcriptie plaatsvind):
o Recognition and binding of TFs to DNA (see figure):
▪ TFs bind in between a major groove & minor groove in the DNA, for example
at methyl groups of nucleotides in the major groove → see figure: methylated
cytosine (C) serves as binding site for DNA-binding proteins like TFs.
▪ Specific transcription factors: recognize specific consensus DNA sequences →
these consensus sequences contain nucleotides that TF prefers to bind → see
figure: LexA / Oct4 / GATA consensus with preferred nucleotides for TF
binding.
• Double stranded: essentieel voor in stand houden van genetische informatie via replicatie
o Double-stranded structure is essential for DNA replication: parental DNA double helix
→ daughter DNA double helices (each contains one template strand from parent +
new strand) (see figure).
• DNA strands form a helix with minor grooves & major grooves.
• Zie figuur voor structuur van DNA
3
,DNA is a carrier of genetic information → coding + non-coding genes are transcribed, however, what
happens to regulatory sequences / “junk” DNA: repetitive sequences (centromere repeat, telomere
repeat, & transposons) + other sequences (see figure).
• Human: only 2% of genome encodes for proteins (98% is non-coding) → Is de genetische
informatie op het DNA voldoende om een organisme op te bouwen?
• Genomes of organisms are sequenced to obtain information of size + amount of genes.
• Genomes are vastly different in size between organisms:
• Is genome size related to complexity of the organism?
o Genome size differs strongly between related organisms:
▪ See figure: genome size strongly differs between plant species
▪ ‘Additional’/extra DNA in large genomes is mainly repetitive DNA.
▪ See figure: human have a smaller genome than plants, but more encoding
genes and neurons → thus evidently, genomes do not code for (all)
complexity of organisms!
A set of chromosomes
• Each chromosome is a single long DNA molecule wrapped in proteins → 23 chromosome pairs
• Chromosomal DNA: largest molecules in nature (see figure).
• Chromosome sizes of some plants: see figure.
• The linear chromosome (see figure):
o Een lineair chromosoom is een chromosoom dat lineair van vorm is en terminale
uiteinden bevat. In de meeste eukaryote cellen is DNA gerangschikt in meerdere
lineaire chromosomen (mens = 23 chromosoom paren = 46 lineaire chromosomen).
Daarentegen bevatten de meeste prokaryote cellen over het algemeen een
enkelvoudig cirkelvormig chromosoom.
o Telomeer: terminal uiteinde van een linear chromosoom → telomere sequence
human: TTAGGG vs. telomere sequence Arabidopsis: TTTAGGG.
o Centromeer: houdt twee zusterchromatiden van één chromosoom bij elkaar →
centromere sequence human: 171 bp repeat vs. centromere sequence Arabidopsis:
178 bp repeat → centromeer sequence is highly variable!
• Human genome overview (see figure)
• What is a transposon element = TE ?
o TE / transposon is a genetic unit that can move within the genome!
o A TE needs a transposase = activator enzyme to catalyze its movement to another
spot in the genome.
o For example (see figure): normal C gene expressing pigment product (purple) →
Dissociator (DS), a Class II TE, associates with gene C through activation of its
movement by Ac = activator, a transposase enzyme → disrupted (mutant) c gene →
yellow pigment product expressed → TE moves away from gene C → yellow product
with purple spots expressed.
o Silencing of transposons is required to prevent them from disturb expression of
important genes!
4
, o Two classes of transposons:
1. Class I transposons = retrotransposons;
▪ TEs with long terminal repeats (LTRs): encode reverse transcriptase,
similar to retroviruses
▪ LINEs (LINE-1s or L1s): encode reverse transcriptase, lack LTRs, and are
transcribed by RNA polymerase II.
▪ SINEs: do not encode reverse transcriptase and are transcribed by RNA
polymerase III.
2. Class II transposons = DNA transposons; cut-and-paste transposition mechanisms
that do not involve an RNA intermediate.
Waarom bestaan TEs / transposons? Wat is hun nut? → many are important in genome function and
evolution! → 45% of human genome consists of TEs!
• Chr4 (~25 Mb) in Arabidopsis: in heterochromatine (~7 Mb) → 1 TE per 4 kb! / In
euchromatine (~18 Mb) → 1 gen per 4 kb!
• Chr22 (~48 Mb) in human: has the highest gene density → 1 gene per 27 kb
Chromosome transcriptome map → highly transcribed genes within a chromosome are clustered!
• Genes are not evenly distributed on chromosomes → transcriptome map gives a great
overview of the distribution.
• RIDGE = regions of increased of gene expression!
• Example: 1208 genes on human chromosome 11 (see figure)
o Each vertical bar represents a gene
o Height of bar represents average expression level in several cell types
• Large clusters of highly expressed genes and of lowly expressed genes
• Discovered at the UvA/AMC
o Caron et al. Science 291, 1289-1291 (2001) → transcriptome map
• Relation to function...?
Coding and non-coding genes
• What is a gene? → a gene is a coding unit.
• One gene = one protein hypothesis (see figure)
• One gene = multiple protein types hypothesis → alternative splicing of mRNA transcript →
same gene, but different genetic information → several distinct protein types produced from
one gene.
• DNA as a carrier of genetic information:
o The gene concept:
▪ Stretch of DNA coding for a protein (via a mRNA)
▪ Classic: one gene → one protein hypothesis
▪ Alternative splicing: one gene → multiple different proteins
o What is the definition of a ‘gene’ … ?
▪ Genetic terms: a gene encodes an heritable trait
▪ Internet: A gene is the fundamental and physical unit of heredity
▪ For annotation: gene is preceded by promoter and ends with terminator
sequence.
▪ Gene is transcription unit!
o More than 80% of human genome is transcribed! → see figure.
5
, • Are there more transcription units? → TFs bind at many more sites than just promoters of
protein coding genes!
• Binding sites of several TFs mapped:
o SP1: 12,000 sites in genome
o cMyc: 25,000
o p53: 1,600
o CREB: 19,000
o NF-kB: 19,000
• Many more than expected based on known genes!
o 78% of binding sites for TFs found were NOT in a known promoter region
o 24% in promoter-like genomic sites
o Many binding sites inside and downstream of genes
• Promoter-like binding sites not near known genes → explanation…?? → are there many more
non-coding genes?
Chromatin
Heterochromatin vs. euchromatin → E. Heitz, 1929 (see figure).
• Distinguished between heterochromatin and euchromatin.
• Heterochromatin are all the intensely stained domains.
• Euchromatin are the diffusely staining regions.
• Heterochromatin is usually spread over the whole nucleus and has a granular appearance.
General characteristics of heterochromatin:
• Cytological:
o Intensely stained chromosomal regions
o C-band positive
o G-band positive
• Molecular:
o Chromosome segments containing abundant (tandemly) repeated DNA and
transposons
• Biochemical:
o A DNA-protein complex that is relatively insensitive to DNase I
o Enriched in methylated DNA
o Enriched in histones H3 methylated at position lysine 9
o Enriched in heterochromatin protein HP1
• Physical:
o Chromatin with regularly arranged nucleosomes
o Condensed chromatin
• Functional:
o Largely inactive in transcription
o Late-replicating
o Rarely involved in meiotic recombination
• Heterochromatin is full of transposons!
• Heterochromatin distribution (see figure):
o Pericentromeric heterochromatin is near the centromere of a chromosome (see
figure).
o As mentioned earlier, heterochromatin is full of transposons and other repetitive DNA
→ 14% of Arabidopsis, 45% of human and 80% of maize genome consists of
transposons!
6
,Eukaryote genregulatie → epigenetica → Stabiele verandering in genexpressie zonder wijziging in de
genetische code → kan worden verklaard door chromatine!
• Chromatine in alle eukaryoten (zie figuur) → simpel overzicht van chromosoom organisatie in
nucleus:
o Kralenketting van chromatine vouwt zich op in celkern
o Verschillende niveaus van opvouwing!
o Opvouwing van chromatine belangrijk voor aan- en uitzetten van genen binnen een
chromosoom.
• Chromatin folding patterns (zie figuur):
o Solenoid model
o Zig-zag model
• Chromatine structuur (zie figuur):
o Chromatine = DNA van onze cellen verpakt met eiwitten!
o Zelfde structuur voor alle (!) planten, dieren en schimmels
o Beads-on-a-string form of chromatin: DNA wrapped around core histones to form a
nucleosome → nucleosome includes ~200 nucleotide pairs of DNA → nucleosomes
linked through linker DNA to form string of chromatin (see figure).
o Breakdown of chromatin: nuclease digests linker DNA → released nucleosome core
particle → disassociation with high concentration of salt → octameric histone core +
147-nucleotide-pair DNA double helix → disassociation of octameric histone core into
H2A, H2B, H3 and H4 = histone proteins (see figure).
• The nucleosome: universal in eukaryotes (see figure)
o 4 different histone proteins
▪ H2A, H2B, H3, H4
▪ Two of each in a histone octamer core of nucleosome
o Histone octamer is disk-shaped → 6 nm high, 11 nm diameter
o About 200 bp DNA
▪ ~145 bp is 1.7 times wrapped around histone octamer
▪ ~50 bp is linker DNA
o De nucleosoom structuur is sterk geconserveerd: sequentie verschil tussen histon H4
van mens en erwt = maar 2 aminozuren!
o Dimer and tetramer formation within nucleosome histone core: between H2A-H2B
and H3-H4 (see figure); 2 H3-H4 dimers formed → H3-H4 dimers form a H3-H4
tetramer → DNA wrapped around H3-H4 tetramer → 2 H2A-H2B dimers formed → 2
H2A-H2B dimers bind to H3-H4 tetramer wrapped with DNA → histone octamer
formed with DNA wrapped → nucleosome!
o Nucleosomes strongly reduce accessibility to DNA! (see figure) → only a part of the
DNA helix within nucleosome is accessible for interaction with a transcription factor
(see figure).
o Nucleosomes must be removed before transcription factors can bind in vivo:
▪ Chromatin remodeling is the process required to remove nucleosomes to
provide accessibility of DNA for TFs → two different types of enzymatic
processes:
➢ One required ATP (see figure): transcription of chromatin relies upon
factors that can use energy provided by hydrolysis of ATP to displace
nucleosomes from specific DNA sequences.
➢ Other involves enzymatic acetylation of lysine residues in N-terminal
part of histone proteins
7
, o Structure nucleosome is precisely known → atomic resolution: see figure.
o Histon core proteins have histone tails!
o Over 30 different modifications of histones possible at many different sites (amino
acids): each modification is made, read and erased by different proteins! (see figure)
▪ acetylated lysine → acK
▪ methylated arginine → meR
▪ methylated lysine → meK
▪ phosphorylated serine → PS
▪ ubiquitinylated lysine → uK
o Modified histones are epigenetic marks:
▪ Chromatine-eiwitten = histonen kunnen chemische veranderingen ondergaan
in cel
▪ Modificaties zijn op specifieke aminozuren in histon tails!
▪ Histon code bepaalt of genetische informatie in DNA “beschikbaar is of niet”
▪ Histon code “hiërarchisch hoger” dan genetische code in DNA
o Thus, modified histones provide epigenetic information (see figure):
▪ H3K4me3 and/or H3K9Ac modifications on Histone H3 tail are epigenetic
marks for active chromatin (or poised to active).
▪ H3K9me3 and/or H3K27me3 modifications on Histone H3 tail are epigenetic
marks for inactive/repressed chromatin.
• Switching from active to inactive chromatin and vice versa (see figure for overview):
o Euchromatin:
▪ Acetylation of lysine amino acids
- Open chromatin structure
- Genes can be active
▪ Methylation of lysine 4 of histone H3
- Open chromatin
- Genes can be active
o Heterochromatin:
▪ Methylation of lysine 9 of histone H3
- Compact chromatin structure
What is genetic information? (see figures) → all heritable stuff!
Heritability → see figure
4 epigenetic mechanisms for gene regulation → see figure:
• Different species use different (combinations of) epigenetic mechanisms / different molecular
levels of epigenetic gene regulation.
Key players of eukaryotic gene regulation: see figure for overview.
8
, Werkcollege 3 - Maike
Common Molecular Techniques to study eukaryotic gene regulation
Technieken voor analyse van eukaryotische genregulatie
• Chromatin immunoprecipitation → ChIP
• Polymerase Chain Reaction – (q)PCR
• Microarray analysis
• High-throughput sequencing
• Deze technieken kunnen gebruikt worden voor analyse van verkregen “precipitated material”,
oftewel specifiek DNA materiaal of interest dat is verkregen na scheiding/specifiek targetten
met beads/primers etc.
ChIP
• Chromatin ImmunoPrecipitation = ChIP
• What can be examined with ChIP?
• What is the principle of ChIP?
• Different ChIP protocols
• Analyses of precipitated material
• Points of attention with ChIP
What can be examined with ChIP? → ChIP identifies DNA sequences that are associated with
proteins/modifications of interest in endogenous in vivo situation
• With ChIP, it’s possible to determine presence & location in the genome of transcription
factors, chromosomal proteins, and histone modifications (see figure).
For example, ChIP can be used for Epigenetic profiling by ChIP-sequencing (see figure) → annotation
of the genome → Histone modifications & specific proteins demarcate functional elements in the DNA
→ their locations can be determined by generating a epigenetic profile through ChIP-sequencing.
• Modificaties (methylatie/acetylatie) op DNA analyseren & kijken op welke locatie/functional
elements eiwitten zoals RNA polymerase, transcriptiefactoren etc. binden op het eiwit.
• Example of ChIP target (see figure): histone modifications → locations with high histone
acetylation (e.g. H3Ac) indicate location of active regulatory elements, like enhancers or b1, in
DNA of chromatin environment , because high histone acetylation indicates open DNA = more
transcription.
o Method: antibody that targets histone modification H3Ac = ChIP target in chromatine
o Orange = high antibody binding = high histone modification.
o White = no/low antibody binding = no/low histone modification.
What is the general principle of the ChIP technique? (see figure) → Aim: immunoprecipitation of
chromatin to study protein-protein/DNA associations in a chromatin environment → incubate
chromatin material with specific antibodies that target specific protein of interest = ChIP target (to
9
, analyze protein-protein/DNA associations) → incubate with capture beads that are specific for the
antibody → isolate beads to obtain antibody bound chromatin and wash away not bound chromatin
→ isolate DNA from chromatin (wash away antibodies & beads) → analyses of DNA by high-
throughput sequencing to precisely determine locations = sequences of interest → these locations
indicate important protein-protein/DNA associations.
Different ChIP protocols
• Two different types of ChIP: X-ChIP and N-ChIP (see figure)
• X-ChIP → based on formaldehyde crosslinking of chromatin fragments → X-ChIP =
crosslinking ChIP → why crosslinking? → ChIP target eiwitten die aan het DNA binden blijven
steviger vastzitten door de crosslinking, wat ten voordele is van de ChIP analyse = een groot
voordeel ten opzichte van N-ChIP!
• N-ChIP → no crosslinking of chromatin → N-ChIP = native ChIP → why no crosslinking? → pro:
antilichaam kunnen beter binden omdat ze niet worden benadeeld door crosslinking.
• Both X & N-ChIP require small chromatin fragments (see figure) → these are obtained through
distinct methods;
o X-ChIP: small chromatin fragments by sonication (see figure) → chromatin is
crosslinked and sonicated with a sonication device afterwards to obtain small
chromatine fragments. The longer you sonicate, the smaller the chromatin fragments.
▪ Current sonication devices can handle several samples at once (see figure).
o N-ChIP: small chromatin fragments by micrococcal nuclease (see figure) →
micrococcal nuclease digestion of chromatin fragments to obtain smaller ones (mono-
and oligonucleosomes).
Overview of pro’s and cons: X-ChIP vs. N-ChIP (see figure)
• Big pro X-ChIP: protein of interest/epitope of antibody blijft steviger vastzitten aan DNA door
de crosslinking, wat ten voordele is van de ChIP analyse = een groot voordeel ten opzichte van
N-ChIP!
• Big con X-ChIP: overcrosslinking may disrupt binding of antibody.
• Big pro N-ChIP: antibody binding is not disrupted by crosslinking. Een ander groot voordeel
van N-ChIP is de efficiënte precipitation: DNA en eiwit kunnen beide worden geanalyseerd (zie
figuur) → antibody tegen H4K8ac = ChiP target → Antibody precipates entire nucleosomes
(containing H2A, H2B, H3 & H4).
• Big con N-ChIP: protein of interest/epitope of antibody blijft minder goed vastzitten aan DNA,
dus ChIP analyse minder accurate/efficiënt.
10