Since the discovery that inherited genes contribute crucially to the development and function of our
body, many years of medical and biological research have been dedicated to the functions of single
genes, their protein products and the ways in which all those proteins work together. In the past
decade, large initiatives of the scientific community have culminated in an essentially complete
description of all our genetic material, together called the genome. After this landmark, more and
more genomes have been analyzed. These include genomes of many other species, allowing us to
recognize DNA sequences that are highly conserved through evolution. In addition, the genomes of
many different human individuals can be compared to get an idea of the genetic variation that exists
in human populations.
With free information sharing being the standard in the scientific community, there is a wealth of
valuable data available to everyone with a computer and internet connection. So, how can you use
all this information to aid medical diagnostics and to solve the scientific problems that we are facing
today?
The term genomics refers to the discipline that studies an organism’s entire genome, as opposed to
studies that focus on one or several genes. In this course, you will learn fundamental concepts
relevant for such studies and be introduced to powerful techniques that can be applied to investigate
all genes simultaneously (i.e. on a genome-wide scale). The availability of such techniques is of
immense value for biomedical sciences, for example in clinical genetics where these techniques can
rapidly detect genetic abnormalities in patients, but not limited to humans. The steeply increasing
efficiency of mutation discovery prompts a new need for bioinformatics tools to handle all the
available data and to distinguish causative mutations from neutral variations. At the same time,
systematic approaches to restore genetic defects are also progressing. In the coming years,
genome-wide approaches can be expected to yield many medical advancements, collectively
referred to as medical genomics.
CHEMISTRY OF THE GENOME
Bases
A desoxyribonucleic acid (DNA) chain contains four types of deoxynucleotides which differ in their
chemical base: either adenine (A), guanine (G), cytosine (C) or thymine (T). Adenine and guanine
each have two ring-like structures and together form the group of purines (Fig. 1.1, left). Cytosine
and thymine belong to the pyrimidines (Fig. 1.1, right). In RNA, the base thymine is replaced by Uracil
(U), also a pyrimidine. The bases are attached to a sugar molecule which is also a circular structure
composed of 5 carbon atoms (one oxygen atom is also part of the ring; see Fig. 1.2). The carbon
atoms are numbered C1 - C5, with the base group attached to C1 and a phosphate molecule attached
to C5. The carbon atoms C3 and C5 play a key role in DNA polymerization; the terms 5’ and 3’ refer to
this numbering system. A base, sugar, and phosphate together form a nucleotide.
,Nucleotides
When the nucleoside binds to one or more phosphate groups, the resultant chemical is termed a
nucleotide. Depending on the number of phosphate groups, such a nucleotide is called a nucleotide
mono-, di- or triphosphate. The respective abbreviations are: NMP, NDP and NTP. In the same
manner, the triphosphate deoxyribonucleotides that are used as a substrate by DNA polymerase
enzymes are commonly abbreviated as dNTPs. Since there are four possible bases in DNA, dNTPs can
be specified to be dATP, dTTP, dCTP or dGTP.
Nucleotide polymerization
The nucleotides are linked together by the phosphate groups to form a DNA polymer, also called a
DNA strand. The order of nucleotides (i.e. bases) in the DNA polymer is referred to as the DNA
sequence. It is that sequence that defines the amino acid sequence of the gene product. The DNA
polymer has a sense of direction (Fig. 3) because one end is different from the other. The so-called 5’
end contains a free phosphate group (-PO4) on the C5 carbon atom. By contrast, the 3’ (‘three prime‘)
end contains a free hydroxyl (-OH) group. The synthesis of DNA polymers from nucleotides always
takes place in the same direction: from 5’to 3’.
The double helix
The DNA in a mammalian chromosome (or in a bacterial genome) is formed by two DNA polymers
that run in the opposite orientation. This is called double-stranded DNA. It has a characteristic
spiral-like structure known as the double helix (Fig. 1.4). The structure of the double helix resembles
a ladder in which the base pairs meet in the center of the structure and the sugar backbone and the
phosphate molecules are exposed to the outside. The strands are not covalently bound to each
other, but they bind by interactions between the bases in opposite strands. These ‘basepairing‘
interactions can only occur in specific combinations: Adenine (A) pairs with thymine (T), whereas
guanine (G) pairs with cytosine (C). Thus, A and T are called complementary, and so are G and C. Both
strands of a double-stranded DNA molecule (which, you could argue, are actually 2 molecules)
behave like a zipper which can be opened for gene transcription or DNA replication. During these
processes, one DNA strand can function as a template for the production of copies that carry the
same genetic information. The replication of DNA is very reliable and provides a basis for the
inheritance of sequence information to new generations of cells and organisms.
RNA
The structure of RNA (ribonucleic acid) closely resembles that of DNA. It is a polymer that contains
nucleotides, and again, synthesis takes place in the 5’ to 3’ direction. The ribose sugar ring is slightly
different, however, because it contains a hydroxyl (OH) group at the second carbon atom where DNA
has a hydrogen (H) atom (Fig. 1.5). Thymidine nucleotides are virtually absent from RNA, and are
replaced by Uridine, containing uracil as the base group. As a final key difference RNA is typically a
single-stranded molecule. This is not always the case: RNA strands have the ability to bind to both
DNA or RNA strands with complementary sequences, and in some molecules (e.g. in micro-RNAs or
in tRNAs) this is also of functional importance.
,FROM DNA TO RNA TO PROTEINS
The “Central Dogma” is a long-standing theory which states that genetic information is transferred
from DNA to RNA and finally to proteins (Fig. 1.6).
DNA is considered the blueprint of the cell. It is passed from generation to generation. DNA cannot
be converted to protein, even though it contains the code for all of them. First, the code must be
converted into the form of an RNA transcript. This process (the synthesis of RNA from a DNA
template) is called gene transcription. This typically produces messenger RNA (mRNA), which is
recognized by ribosomes that are the principal part of the protein synthesis machinery. The coding
sequence in the open reading frame (ORF) within the mRNA transcript is matched at the ribosome
with transfer RNA (tRNA). These tRNAs help to couple the correct amino acids to the growing
polypeptide chain. More details about the protein translation process will follow in a separate
section. The role of tRNAs already illustrates that not all RNA takes the form of messenger RNA. Also
the ribosomes contain high amounts of RNA, named ribosomal RNA (rRNA). These three types of
RNA (mRNA, rRNA and tRNA) are the most important forms of RNA in this classical view of the
transcription and translation process.
Nevertheless, additional types do exist (e.g. miRNA, ncRNA, snRNA and others). A detailed
description of these types of RNA is beyond the scope of this chapter. For now, please note that
these RNA types also serve important functions, and the discovery of these functions is an expanding
field of research in genomics.
Finally, proteins contribute in a major way to the total structural elements of all cells. They carry out
many functions in the synthesis, breakdown or transport of nutrients, are needed for DNA and RNA
synthesis, transmit signals between intracellular and extracellular compartments, protect against cell
damage or carry out damage repair, as well as countless other functions critical to cellular behavior.
GENES
Structural elements of genes
For the processing of DNA to mRNA and finally proteins, a cell needs to identify the start and end
points of a gene. This occurs at two levels: both during transcription and translation. A functional
gene contains at least the following elements:
1) A promoter (or regulatory sequence) that facilitates the production of messenger RNA. Since the
messenger RNA is produced in the 3’ direction of the DNA strand, the promoter is always oriented at
the 5’ end of the gene. In other words, the promoter is said to be upstream from the transcribed
region. In cartoons, it is a widely followed convention to always depict the 5’ end on the left, and the
3’ end on the right side.
2) A transcription termination signal at the 3’ end of the gene. This signal defines the end of the
produced mRNA molecule.
3) A start codon, usually ATG on the DNA template. In the mRNA it is converted into AUG. At this
point, the ribosomes start to synthesize a polypeptide chain starting with a methionine. In addition
, to the start codon itself, the neighboring nucleotide sequence (i.e. the context of the start codon)
also contributes to effective translation initiation. The optimal sequence differs between species. For
vertebrates, this optimal sequence is called the Kozak sequence.
4) The open reading frame (ORF) is a series of codons (nucleotide triplets). Each codon encodes an
amino acid. The corresponding amino acid is added to the growing polypeptide chain that will
ultimately form the protein encoded by the gene. The open reading frame continues until a stop
codon is encountered.
5) The STOP codon causes the ribosome to detach from the mRNA, thereby terminating translation.
Since there is no additional amino acid linked to the end of the polypeptide chain, the end of the
protein is formed by the amino acid encoded by the previous codon.
The first amino acid of the resulting protein contains a free amino-group (NH2) and is therefore called
the amino-terminus (N-terminus). Likewise, the last amino acid of the protein contains a free
carboxyl group. This end of the protein is termed the carboxy-terminus (or C-terminus). Consistent
with the convention for the 5’ and 3’ ends of genes, proteins are usually displayed with their
N-terminus on the left and their C-terminus on the right hand.
Prokaryotic versus eukaryotic gene structure
The gene organization in prokaryotes differs from that in eukaryotes. In contrast to eukaryotic mRNA,
prokaryotic mRNA is essentially mature without any further processing. Prokaryotes contain many of
their genes in operons, which are coordinately transcribed in a so-called polycistronic mRNA. An
example is the tryptophan (trp) operon in the bacterium Escherichia coli that contains five genes
designated trpA, -B, -C, -D and -E. Each gene in the operon has its own ribosome binding site so
translation starts at its corresponding start codon in the mRNA. At the 5’ end of the operon, a single
control region makes sure that transcription of the polycistronic mRNA is repressed when tryptophan
synthesis is not required, resulting in a well-coordinated expression of all involved enzymes.
Eukaryotic genomes usually do not encode polycistronic mRNAs. In contrast to prokaryotic genes,
they have exon-intron structures. After the initial production of a mRNA transcript, which is called
the immature transcript, non coding introns are removed before the mRNA is translated by the
ribosomes. This removal process is termed splicing. Splicing removes the introns and joins the
surrounding exons together. Mature eukaryotic mRNAs are decorated with a poly(A) signal (AAUAAA)
at the 3’ end. This causes the addition of a stretch of adenines that stimulates transcription
termination and stabilizes the mRNA. The mRNA transcript also undergoes a modification at the first
nucleotide, called a 5’-cap. It is composed of a 7-methylguanosine (m7G) base that is linked to the
first nucleotide of the transcript by a 5’-to-5’ triphosphate link which does not exist in other parts of
DNA or RNA polymers.
Transcription
The conversion of a DNA sequence to an RNA copy is performed by RNA polymerase. A so-called
promoter sequence in the DNA directs the RNA polymerase to a start site in the coding strand of the
gene. This signal is unidirectional: only a single DNA strand is transcribed in a single direction. In
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller ssarto. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $13.40. You're not tied to anything after your purchase.