100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Bio-informatica samenvatting/hulpbestand voor examen €16,66   In winkelwagen

Samenvatting

Bio-informatica samenvatting/hulpbestand voor examen

 33 keer bekeken  1 keer verkocht

Samenvatting van slides+notities van bio-informatica. Codes met uitleg en werkzittingen uitwegewerk+uitleg. Uitleg van verschillende databases en veel gebruikte codes.

Voorbeeld 4 van de 33  pagina's

  • 26 mei 2024
  • 33
  • 2023/2024
  • Samenvatting
Alle documenten voor dit vak (14)
avatar-seller
sisivorst
Bioinformatica
Inhoud
Les 1: intro+ databases ............................................................................................................2
Gendatabase ............................................................................................................................2
Protein database ...................................................................................................................3
Les 2: databases ......................................................................................................................4
Ontologies ............................................................................................................................4
Gene expression ....................................................................................................................5
Phenotypes/Diseases..............................................................................................................6
Model Organism databases ....................................................................................................6
Les 3: genome browsers+ SQL .................................................................................................6
Genome browsers .....................................................................................................................6
Homology.............................................................................................................................7
Database architectures...........................................................................................................8
Les 4: Linux + Jupyter .......................................................................................................... 10
Navigating the file system ..................................................................................................... 10
Additional Jupyternotebooks notes ........................................................................................ 13
Les 5: EMBOSS + BedTools exercises .................................................................................... 14
Les 6: Gene prediction ........................................................................................................... 15
Les 7+8: Python .................................................................................................................... 17
Les 9: Alignment, pattern matching, gene set analysis ............................................................. 19
Werkzitting1 ......................................................................................................................... 25
Information retrieval............................................................................................................ 25
CpG islands ........................................................................................................................ 26
Unknown sequence study...................................................................................................... 28
Werkzitting 2 ........................................................................................................................ 30
Python CpG island .............................................................................................................. 30
miRNA ............................................................................................................................... 32

,Les 1: intro+ databases




Gendatabase
- Entrez gene
o Onderdeel van NCBI: https://www.ncbi.nlm.nih.gov/gene/
o Each line is a transcript isoform (due to alternative promoters, and alternative
splicing); look at the exons, introns, non-coding exons (light greens: 5’UTR,
3’UTR), coding exons (dark green)
o Each transcript has a unique NM_ identifier = RefSeq identifier
o Each NM transcript corresponds to a unique NP_ protein entry
o More details about each NM/NP and links to the sequence in Entrez
Nucleotide are at the bottom of the Gene page
▪ Entrez Nucleotide contains all nucleotide sequences
▪ Search Nucleotide db with NM_000564
▪ (After the dot “.” is the version number)
- Refseq
o https://www.ncbi.nlm.nih.gov/refseq/
o Many sequences were/are represented more than once in GenBank
o RefSeq = curated “secondary” database that aims to provide a
comprehensive, integrated, nonredundant set of sequences
o Goal is to provide a reference sequence for each molecule in the central dogma
(DNA, mRNA, and protein)
o Each RefSeq represents a single, naturally occurring molecule from one
organism
o Nucleotide and protein sequences in RefSeq are explicitly linked to one
another
o Distinct accession number: 2+6 format (2 letters, underscore, six-digit number)
▪ NT_123456 (Genomic contigs), NM_123456 (mRNAs), NP_123456
(Proteins)
▪ XM_123456 (Model mRNAs), XP_123456 (model proteins):
computational predictions

,To visualize the data, download GenBank format (.gb) as textfile and open it in text editor,
such as Visual Studio Code or Jupyternotebooks.
- How to download:
o Click on “Send to” (right upper screen)
o Select “Complete Record” and “File”
o Choose GenBank format or FASTA (no header and features)
- In feature
o Sequence has a coding sequence (CDS) made up of five exons
▪ First exon begins at base 201 and ends at base 224
▪ Then is joined at basepair 1550 until bp 1920, and so forth.
o Each comma in this line represents a splicing event, and each “..” represents
the string of letters between the two coordinates.
o The gene product is eukaryotic initiation factor 4E-II, and the gene name is
eIF4E




EMBL/EBI
o https://www.ebi.ac.uk/
o European database
o DBFETCH provides an easy way to retrieve entries from various databases at
the EMBL-EBI
o Format:https://www.ebi.ac.uk/Tools/dbfetch/db=refseqn;id=NM_000231;form
at=fasta&style=raw
Protein database
- Uniprot: https://www.uniprot.org/
o Gives general feature format (GFF) (text file)
▪ Click download
▪ Choose GFF format
- Protein sequences in databases can be derived from translation of nucleotide
sequences (secondary databases)
o e.g., RefSeq NM_ to RefSeq NP_
o e.g.,TrEMBL
o Go to the protein database, following one of the NP_isoforms
- There are also curated databases: experts enhance the original data by adding new
information

, o e.g., SwissProt (in the UniProt knowledgebase)
▪ Information from literature
▪ Curator-evaluated computational analysis/predictions
- 3D structures
o https://www.ncbi.nlm.nih.gov/structure/ or Uniprot→ structure


Les 2: databases




Ontologies
- Gene ontology (GO)
o https://geneontology.org/ or https://www.ebi.ac.uk/QuickGO/ (human usually
capitalized)
▪ Data downloaden QuickGo
• Click on export
• Choose format: gen association file (then add .txt in the name)
• Adjust the amount of annotations
o Specific purpose: “Annotation of genes and proteins in genomic and protein
databases”
o Facilitate complex queries
o Applicable to all species
o Databases involved:
▪ FlyBase (Drosophila)
▪ MGI (Mouse)
▪ SGD (S. cerevisae)
▪ TAIR (Arabadopsis)
▪ TIGR (microbes including prokaryotes)
▪ SWISS-PROT (several thousand species inc. human)
▪ PSU (P. falciparum)
▪ ZFIN (zebrafish)
▪ PAMGO (plant pathogens)
o GO structure

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

√  	Verzekerd van kwaliteit door reviews

√ Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, Bancontact of creditcard voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper sisivorst. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €16,66. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 76462 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€16,66  1x  verkocht
  • (0)
  Kopen