Bioinformatics summary
Lecture 1 databases
Bioinformatics
• Lookup → databases: organized for fast retrieval of the data
• Compare → information transfer => alignment
• Predict → information transfer
Databases
Database: growing very fast, large amount of information stored → fast retrieval of the data
necessary
➔ Databases’ innovation: technological, logistical and administrative
• Heterogenous data types: different kinds of data
• Homogenous data types: the same data
Fast growth due to reduced cost and time for sequencing
Primary databases: contain experimental data and annotation information → biomolecular
sequences/structures
• Nucleic acid sequences: EMBL, Genbank, DDBJ
• Protein sequences: Swissprot, trEMBL, Uniprot
• Protein structures: PDB
• Small compounds’ structures: CSD
• Genomes: ensembl, USCS
Databases’ format
In order to function, all databases must come in a specific format for the software to be recognized
Essential components of format depend on content of database, however data elements that are
essential for each database are:
- Unique identifier = accession code
- Name of depositor
- Literature references
- Deposition data
- The real data
, Data quality
“Quality” of the database’s data in the sense of it being true according to nowadays’ understanding
depends on several aspects:
• Deposition date → you must be able to find is therefore essential for a database
• Automatic check
• Who has access to drop files? → depositor mentioned, cross references
• Are there annotations to the data?
Swissprot: only submitted by experts, manually annotated and reviewed
➔ Pro: high quality data
➔ Con: less data findable than in less strict databases
EMBL, PDB, uniprot: everyone that wants to submit data can do that
➔ Pro: lots of data stored in the database → high chance of finding what you’re looking for
➔ Con: not always such high quality data
Swissprot
Only submission by experts
Info reviewed → updated, checking if info is correct
Manual fact check during deposition
Manually added annotations
Obligatory deposit in Swissprot before publication
Swissprot is part of Uniprot
Other part of uniport is Tremble → low quality (relatively)
Swissprot is a keyword organized flat-file
Depends on the database how you can view the data
Important swissprot fields:
1)
2) cross references: hyperlinks to entries in all other databases that are related to the specific data in
Swissprot
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller lisaverhoeven80. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $3.99. You're not tied to anything after your purchase.