Molecular Basis of Bacterial Infections Evelien Floor
Whole genome sequencing and
bioinformatics
With whole genome sequencing (WGS) one genome of an organism can be sequenced. Sequencing
of one single cell is not possible because there is not enough DNA available. In case of bacteria,
colonies are grown and with one of those colonies sequencing will be performed. WGS is not the
same as RNAseq, TnSeq or metagenome sequencing. Metagenome sequencing sequences everything
in the environment.
Whole genome sequencing
The WGS process starts with a bacterial culture. The cells will by lysed and thereby the DNA is
released. The DNA will be isolated and sheared into known fragment lengths. The DNA is then
amplified with PCR generating a DNA library. The DNA library will be sequenced afterwards.
Depending on the read size, different reads will appear on the computer.
Before PCR the DNA fragments are ligated to adapters. The adapters contain binding sites for
sequencing primers and are always the same. The fragments have to be separated and sequenced
one by one. After that, hundreds of millions of sequencing reads can be generated. Polonies (PCR
colonies) are generated to enhance the signal.
By WGS the genotype of a living organism can be determined but it is preferred to know more about
the phenotype. But not only a gene is responsible for the phenotype, also other factors are
responsible. Such factors are: dosage, growth rate, epigenetic modifications, gene interactions and
the environment.
What can be predicted with WGS:
The species of the bacterial isolate
In some species antibiotic resistance
Whether or not two isolates are part of an outbreak, provided that the genome sequences
are similar
Proteins secreted
o Further validation is needed with for instance RNAseq
1
, Molecular Basis of Bacterial Infections Evelien Floor
Bioinformatics
Bioinformatics is used to go from raw data to answering research questions. Firstly, sequencing reads
are obtained in a Fastq format. Afterwards a quality check is performed; all the bases with low
quality are removed. This is called trimming so now the residual reads are trimmed reads.
After trimming sequence assembly takes place. Sequence assembly is aligning and merging fragments
from a longer DNA sequence in order to reconstruct the original sequence. This is done with a de
Bruijn graph.
k-mers
For a de Bruijn graph k-mers are used. A k-mer is a
substring of length k. All those k-mers are used to assemble
the genome. A path can be made that visits all the k-mers:
the Eulerian path. A node in the Eulerian path represents a
2
Whole genome sequencing and
bioinformatics
With whole genome sequencing (WGS) one genome of an organism can be sequenced. Sequencing
of one single cell is not possible because there is not enough DNA available. In case of bacteria,
colonies are grown and with one of those colonies sequencing will be performed. WGS is not the
same as RNAseq, TnSeq or metagenome sequencing. Metagenome sequencing sequences everything
in the environment.
Whole genome sequencing
The WGS process starts with a bacterial culture. The cells will by lysed and thereby the DNA is
released. The DNA will be isolated and sheared into known fragment lengths. The DNA is then
amplified with PCR generating a DNA library. The DNA library will be sequenced afterwards.
Depending on the read size, different reads will appear on the computer.
Before PCR the DNA fragments are ligated to adapters. The adapters contain binding sites for
sequencing primers and are always the same. The fragments have to be separated and sequenced
one by one. After that, hundreds of millions of sequencing reads can be generated. Polonies (PCR
colonies) are generated to enhance the signal.
By WGS the genotype of a living organism can be determined but it is preferred to know more about
the phenotype. But not only a gene is responsible for the phenotype, also other factors are
responsible. Such factors are: dosage, growth rate, epigenetic modifications, gene interactions and
the environment.
What can be predicted with WGS:
The species of the bacterial isolate
In some species antibiotic resistance
Whether or not two isolates are part of an outbreak, provided that the genome sequences
are similar
Proteins secreted
o Further validation is needed with for instance RNAseq
1
, Molecular Basis of Bacterial Infections Evelien Floor
Bioinformatics
Bioinformatics is used to go from raw data to answering research questions. Firstly, sequencing reads
are obtained in a Fastq format. Afterwards a quality check is performed; all the bases with low
quality are removed. This is called trimming so now the residual reads are trimmed reads.
After trimming sequence assembly takes place. Sequence assembly is aligning and merging fragments
from a longer DNA sequence in order to reconstruct the original sequence. This is done with a de
Bruijn graph.
k-mers
For a de Bruijn graph k-mers are used. A k-mer is a
substring of length k. All those k-mers are used to assemble
the genome. A path can be made that visits all the k-mers:
the Eulerian path. A node in the Eulerian path represents a
2