100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Samenvatting Bioinformatics and Functional Genomics Chapter 2, 3, 4, 5, 9 €2,99
In winkelwagen

Samenvatting

Samenvatting Bioinformatics and Functional Genomics Chapter 2, 3, 4, 5, 9

 19 keer bekeken  0 keer verkocht

Summary study book Bioinformatics and Functional Genomics of Jonathan Pevsner, J. Pevsner (Chapter 2, 3, 4, 5, 9) - ISBN: 9781118581780, Edition: 3rd Edition, Year of publication: - (Samenvatting)

Voorbeeld 3 van de 28  pagina's

  • Nee
  • Chapter 2, 3, 4, 5, 9
  • 29 maart 2021
  • 28
  • 2021/2022
  • Samenvatting
book image

Titel boek:

Auteur(s):

  • Uitgave:
  • ISBN:
  • Druk:
Alle documenten voor dit vak (2)
avatar-seller
RSusan
Chapter 2 Access to Sequence Data and Related Information

Introduction to biological databases
There are two main technologies for DNA sequencing. Beginning in the 1970s
dideoxynucleotide sequencing (“Sanger sequencing”) was the principal method. Since 2005
next-generation sequencing (NGS) technology has emerged, allowing orders of magnitude
more sequence data to be generated. The availability of vastly more sequence data has
impacted most areas of bioinformatics and genomics.
Two ways of thinking about accessing data:
1. In terms of individual genes, proteins, or related molecules.
2. In terms of large datasets related to a problem of interest:
a. Study all the variants that have been identified across all human globin genes
b. In patients having mutations in a gene we might want to study the collection of
all of the tens of thousands of RNA transcripts in a given cell type in order to
assess the functional consequences of that variation.
c. Perhaps we want to sequence the DNA corresponding to a set of 100 genes
implicated in hemoglobin function. Databases and resources such as Entrez,
BioMart, and Galaxy facilitate the manipulation of larger datasets.

CENTRALIZED DATABASES STORE DNA SEQUENCES
3 main sites that have been responsible for storing nucleotide sequence data from 1982 to
the present:
1. GenBank -> National Center for Biotechnology Information (NCBI) (NIH)
2. the European Molecular Biology Laboratory (EMBL)-Bank
3. the DNA Database of Japan (DDBJ)
All three are coordinated by the International Nucleotide Sequence Database Collaboration
(INSDC) and they share their data.
Genbank, EMBL-Bank and DDBJ accept sequence data that consist of complete or
incomplete genomes (or chromosomes) analyzed by a whole-genome shotgun (WGS)
strategy. The WGS division consists of sequences generated by high-throughput sequencing
efforts.




CONTENTS OF DNA, RNA AND PROTEIN DATABASES
Organisms in GenBank/EMBL-Bank/DDBJ

Types of Data in GenBank/EMBL-Bank/DDBJ
We want to find out the sequence of human beta globin. A fundamental distinction is that
both DNA, RNA-based, and protein sequences are stored in discrete databases.
Furthermore, within each database sequence data are represented in a variety of forms.
Because RNA is relatively unstable, it is typically converted to complementary DNA (cDNA),
and a variety of databases contain cDNA sequences corresponding to RNA transcripts.
Beginning with the DNA, a first task is to learn the official name and symbol of a gene. For
humans and many other species, the RNA or cDNA is generally given the same name, while
the protein name may differ and is not italicized.

,Genomic DNA Databases
A gene is localized to a chromosome. The gene is the functional unit of heredity and is a
DNA sequence that typically consists of regulatory regions, protein-coding exons, and
introns. A bacterial artificial chromosome (BAC) is a large segment of DNA that is cloned into
bacteria. Similarly, yeast artificial chromosome (YAC) are used to clone large amount of DNA
into yeast. BACs and YACs are useful vectors with which to sequence large portions of
genomes.

DNA-Level Data: Sequence-Tagged Sites (STSs)
The Probe database at NCBI includes STSs, which are short genomic landmark sequences
for which both DNA sequence data and mapping data are available. Because they are
sometimes polymorphic, containing short sequence repeats, STSs can be useful for mapping
studies.

DNA-Level Data: Genome Survey Sequences (GSSs)
All searches of the NCBI Nucleotide database provide results that are divided into three
sections: GSS, ESTs and “CoreNucleotide”. The GSS division of GenBank consist of
sequences that are genomic in origin. The GSS division contains:
• random “single-pass read” genome survey sequences
• cosmid/BAC/YAC end sequences
• exon-trapped genomic sequences
• the Alu polymerase chain reaction (PCR) sequences

DNA-Level Data: High-Throughput Genomic Sequence (HTGS)
The HTGS division was created to make “unfinished” genomic sequence data rapidly
available to the scientific community. The HTGS division contains unfinished DNA
sequences generated by the high-throughput sequencing centers

RNA data
RNA-Level Data: cDNA Databases Corresponding to Expressed Genes
Protein-coding genes, pseudogenes, and noncoding genes are all transcribed from DNA to
RNA. Genes are expressed from particular regions of the body and times of development. If
one obtains a tissue such as liver, purifies RNA, then converts the RNA to the more stable
form of complementary DNA (cDNA).

RNA-Level Data: Expressed Sequence Tags (ESTs)
The database of expressed sequence tags (dbEST) is a division of GenBank that contains
sequence data and other information on “single-pass” cDNA sequences from a number of
organisms. An EST is a partial DNA sequence of a cDNA clone. All cDNA clones, and
therefore all ESTs, are derived from more specific RNA source. The RNA is converted into a
more stable form, cDNA, which may then be packaged into a cDNA library. Typically ESTs
are randomly selected cDNA clones that are sequenced on one strand (and therefore may
have a relatively high sequencing error rate). ESTs are often 300-800 base pairs in length.
Currently, GenBank divides ESTs into 3 major catergories:
• human
• mouse
• other

RNA-Level Data: UniGene
The goal of the UniGene project is to create gene-oriented clusters by automatically
partitioning ESTs into nonredundant sets. Ultimately there should be one UniGene cluster
assigned to each gene of an organism.
A UniGene cluster is a database entry for a gene containing a group of corresponding ESTs.

, There are far more human UniGene clusters than there are genes, because:
1. Much of the genome is transcribed at low levels. Currently, 64.000 human UniGene
clusters consist of a single EST and ~100.000 UniGene clusters consist of just 1-4
ESTs. These could reflect rare transcription events of unknown biological relevance.
2. Some DNA may be transcribed during the creation of a cDNA library without
corresponding to an authentic transcript; it is therefore a cloning artifact. Alternative
splicing may introduce apparently new clusters of genes because the spliced exon
has no homology to the rest of the sequence
3. Clusters of ESTs could correspond to distinct regions of one gene. In that case there
would be two (or more) UniGene entries corresponding to a single gene. As a
genome sequence becomes finished, it may become apparent that the two UniGene
clusters should properly cluster into one. The number of UniGene clusters may
therefore collapse over time.

Access to Information: Protein Databases
The Protein database at NCBI consist of translated coding regions from GenBank and
external databases such as UniProt, The Protein Information Resource (PIR), SWISS-PROT,
Protein Research Foundation (PRF) and the Protein Data Bank (PDB). The EBI similarly
provides information on proteins via these major databases.

UniProt
The Universal Protein Resource (UniProt) is the most comprehensive, centralized protein
sequence catalog. Formed as a collaborative effort in 2002, it consists of a combination of
three key databases:
1. Swiss-Prot is considered the best annotated protein database, with descriptions of
protein structure and function added by expert curators
2. The translated EMBL (TrEMBL) Nucleotide Sequence Database Library provides
automated annotations of proteins not in Swiss-Prot. It was created because of the
vast number of protein sequences that have become available through genome
sequencing projects.
3. PIR maintains the Protein Sequence Database, another protein database curated by
experts

UniProt is organized in 3 database layers:
1. The UniProt Knowledgebase (UniProtKB) is the central database that is divided into
the manually annotated UniProtKB/Swiss-Prot and the computationally annotated
UniProtKB/TrEMBL.
2. The UniProt Reference Clusters (UniRef) offer nonredundant reference clusters
based on UniProtKB. UniRef clusters are available with members sharing at least
50%, 90% or 100% identity.
3. The UniProt Archive, UniParc, consists of a stable, nonredundant archive of protein
sequences from a wide variety of sources


CENTRAL BIOINFORMATICS RESOURCES: NCBI AND EBI

Introduction to NCBI
The NCBI creates public databases, conducts research in computational biology, develops
software tools for analyzing genome data, and disseminates biomedical information.
Prominent resources include the following:
• PubMed is the search service from the National Library of Medicine (NLM) that
provides access to over 24 million citations in MEDLINE (Medical Literature, Analysis,
and Retrieval System Online) and other related databases, with links to participating
online journals.

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper RSusan. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €2,99. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 53340 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€2,99
  • (0)
In winkelwagen
Toegevoegd