Methylation analysis by bisulfite sequencing
Practical Week 37; MOBC, University of Amsterdam
Introduction
Bisulfite Sequencing is used to investigate the genome-wide methylation status of cytosines with base-pair
resolution. The methylation status of cytosines is known to negatively influence expression levels of genes
and activity of transposable elements (TEs). In mammals, for instance, DNA methylation of regulatory
elements is generally negatively correlated with the binding of transcription factors. Hence, for proper
expression of genes, regulatory elements should generally be depleted of DNA methylation. Sequences
that need to be methylated in a genome are TEs. The methylation of TEs is important to prevent them
from becoming active, and resulting among others in mutations in the genome.
Aberrant methylation is thus a potential cause of abnormal gene expression and hence an important topic
in cell and molecular biology. It is for example increasingly investigated in cancer research. Although DNA
methylation appears to be a widespread epigenetic regulatory mechanism, genomes are methylated in
different ways in diverse organisms. In animals, DNA methylation occurs mostly symmetrically (both
strands) at the cytosines of a CG dinucleotide. DNA methylation in plant genomes can occur symmetrically
at cytosines in both CG and CHG contexts, and also asymmetrically in a CHH context (H = A, T, or C), with
the latter mediated and maintained by small RNAs. In the model plant Arabidopsis thaliana, levels of
cytosine methylation at CG, CHG, and CHH nucleotides are about 24%, 6.7%, and 1.7%, respectively. DNA
methylation in the CHH context is also observed in specific animal cells, such as brain cells.
Bisulphite sequencing is a powerful method to determine which cytosines are methylated in a genome.
With this method, DNA is treated with bisulfite. Bisulfite converts unmethylated cytosines in a genome into
uracil (U), which appears as thymine (T) after a PCR amplification step (see Figure 1).
Figure 1. Bisulfite conversion of genomic DNA and subsequent PCR amplification gives rise to two types of PCR
products. Methylated cytosine residues are resistant to bisulfite conversion and can be used as a read-out of the DNA
methylation level. mC, 5-methylcytosine; OT, original top strand; CTOT, strand complementary to the original top
strand; OB, original bottom strand; and CTOB, strand complementary to the original bottom strand.
Realize that the bisulfite reaction is not always 100% efficient. In such case, cytosines appear to be
methylated, while in reality they are not properly converted. Such unconverted cytosines will be randomly
spread throughout the sequence of interest.
1
, Figure 2. After bisulfite conversion, PCR amplication and sequencing, the resulting reads show which cytosines are
converted or unconverted, hence unmethylated or methylated, respectively. Figure 2 illustrates the effect of bisulfite
treatment of an unmethylated piece of DNA. At the top you see the DNA sequence before conversion, indicating the
cytosines. At the bottom you see the DNA sequence after conversion. All cytosines are changed into thymines. To be
able to align such converted sequence to the reference sequence and determine the level of cytosine methylation,
specific alignment methods are needed. There are different of such methods, including methods that only take
methylated cytosines in a CG context into account and methods that take the methylation of all cytosines into account.
You will use examples of both types of methods in this practical.
Practical
In this practical, we will be looking at bisulphite sequencing data derived from a regulatory region of the b1
gene in two different maize lines: B-I and B-prime. b1 is a gene coding for a transcription factor in the
anthocyanin pigmentation pathway.
In the practical, using web-based methylation calling, we will compare the methylation levels at the
regulatory regions (blue rectangles) in the B-I and B-prime lines (data from R. Bader and M. Stam, UvA). In
one of the lines, the regulatory region is not methylated and the expression of the b1 gene is enhanced;
therefore the plant becomes darkly pigmented. In the other line, the regulatory region is methylated and
the b1 gene is low expressed; the plant becomes light pigmented. From the methylation status, we will
determine which line is purple and which is green. But first, we will use different methods to analyse the
sequence reads obtained after bisulfite treatment and get to know which methods can be used for such
analysis and why.
Realize that, depending on the tool/papers used, CG dinucleotides can be indicated as CG, CpG and CGN.
All these notations are indicating the same: a cytosine followed by a guanine (and another nucleotide = N)
First download Files for PART II.zip from Canvas. Unzip the file using, for instance, 7-Zip, WinZip or
WinRAR. The files needed for the questions below are in the respective folders.
2