Proteome
Comparative genomics
Comparative genomics is the systematic comparison of genomic sequences of different species.
Conserved sequences are functionally important and rapidly evolving sequences tell us why species
differ. Comparative genomics is important in human genetic research because it allows identification
of new genes and other genomic elements. It also helps us to understand gene function and the
effects/pathogenicity of mutations. The more the amino acid is conserved, the stronger the effect of
a change.
Comparative genomics is most often performed on the protein level. There are many different
genomes who get obtained:
New genomes of industrially and agriculturally relevant organisms (plant pathogenic fungi)
New genomes of medically relevant organisms (pathogens)
New genomes of evolutionary interesting organisms
The human genome
Three reasons to compare the protein in your genome of interest to proteins in other organisms:
1. Alignment important residues negative/purifying selection, alignment needs orthologs
2. Find information from “the same” gene (=ortholog) in model organisms, or find the same
gene in a model organism to do experiments on
3. Copy/hypothesize functional information from experimentally characterized homolog to
gene of interest
For all three purposes you want the same gene in other species/genomes, not just homologs but
orthologs. A homolog is a gene or sequence in two or more species that is derived from a common
ancestor. There are two types of homologs; Orthologs are genes found in two species that had a
common ancestor. An orthologous gene arises by speciation. Paralogs are genes in the same species
created through gene duplication.
Phylogenetic trees
To differentiate orthologs from homologs we need to look at the relations between genes. These we
infer from and summarize in trees. A tree consists of a hierarchical classification: order family
genus species. A phylogenetic tree consists of a historical pattern of relationships among
organisms.
A phylogenetic tree can be rooted and contain a molecular clock. If
it consists of a uniform clock this leads to identical distances from
1
, Bioinformatica & Genoomanalyse Evelien Floor
root to leaves (ultrametric tree). If it consists of a non-uniform evolutionary clock, the leaves will
have different distances to the root (additive tree).
In case of no molecular clock it means that a phylogenetic
reconstruction method will only infer relations and no direction. The
analysis will give you an unrooted tree. To go from unrooted to rooted
or vice versa you can introduce a root somewhere in the tree. So, one
unrooted tree can be turned into multiple rooted trees.
The first step in making a molecular phylogenetic tree is alignment of
the sequences. From those different species you can make a radial
unrooted tree. To go from unrooted to rooted you take another
organism that is definitely not related to the other species and you
introduce the root on that branch.
There are two ways to make a molecular phylogenetic tree:
1. Alignment distances clustering
2. Alignment best fitting tree
Parsimony
Maximum likelihood
Phylogenetic tree by distance methods
After alignment you start by making a distance matrix based on
alignment differences. To make a phylogenetic tree of this information,
the algorithm UPGMA is used:
Initialization:
• Fill distance matrix with pairwise distances
• Start with N clusters of 1 element (gene) each
Iteration:
• Merge cluster Ci and Cj for which dij is minimal
• Place internal node connecting Ci and Cj at dij/2
• Delete Ci and Cj; replace by new C with group average distances
Termination:
• When only two clusters i, j remain, put root at d ij/2
Phylogenetic tree by best fitting alignment
There are two ways to find the best fitting tree direct after alignment; maximum parsimony and
likelihood. Maximum parsimony: the tree that requires the fewest evolutionary events to explain the
alignment, the simplest explanation of the observations. Maximum likelihood: the tree most likely to
have led to the alignment given a certain model of evolution.
With the maximum parsimony you can draw all possible trees for the sequences/species present in
your multiple alignment. For each tree, identify where the mutations have taken place. You then
choose the tree with the minimum number of required mutations. However, a problem with this
method is that for only 50 species there are already billions of trees possible. Therefore, the method
does not search all the trees but just a selection heuristic search.
2
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller evelienfloor. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $4.34. You're not tied to anything after your purchase.