Samenvatting

Summary of Data Science in Biomedicine

0 keer verkocht

Vak
Data Science In Biomedicine

Instelling
Rijksuniversiteit Groningen (RuG)

Summary for the subject of Data Science in Biomedicine. Every subject compulsory for the test is summarized.

[Meer zien]

Voorbeeld 4 van de 41 pagina's

Bekijk voorbeeld

Geupload op 8 oktober 2022
Aantal pagina's 41
Geschreven in 2022/2023
Type Samenvatting

€5,49

In winkelwagen

Op verlanglijstje

100% tevredenheidsgarantie
Direct beschikbaar na je betaling
Lees online óf als PDF
Geen vaste maandelijkse kosten

Summary Data Science in Biomedicine

College 1: Introduction to Data Science in Biomedicine
Datum: 26-09-2022

- Patients Data collection -> Biomedical data: Electronic Health Record (EHR) and
Omics -> Personalised Health Data Analysis: Large Volume Data, Data
Management and High Performance Computing

- Translate large data sets to something you can understand and discuss

There are many types of (big) data available:
- Numerical
- Textual
- Categorical
- Imaging
- Clinical
- Demographic
- Psychosocial
- Lifestyle
- Environmental
- Genomic
- DNA
- Genes
- proteins
- RNA
- SNPs
- ncRNA
-Splice variants
-RNA expression levels

Next Generation Sequencing (NGS)
- 1 illimina NovaSeq6000 run will read 6,000,000,000,000 (6,000,000,000 kb,
6,000,000 Mb, 6,000Gb, 6Tb) bases in ~44hr (computers and software is necessary)
- Bioinformatics pipelines, e.g., Analyzing NGS data
Reference mapping -> transcript assembly, comparison, merging -> detection of
differentially expressed genes/transcripts (understand input and output of programs,
know your statistics, modify the graphical output).

Using R or Python
R -> Retrieve data from a database, apply statistical analyses and visualize results
Python -> What if the data is in a wrong format then write a small Python script

,R vs Python
- R is dedicated to statistics
- R is very popular in research
- Many good libraries for R; Genomics, GWAS, Proteomics, Transcriptomics,
Metabolomics etc.
- R is not a real programming language but more a statistical scripting tool
- Python is easier and much better in handling text files and data text files
- R and Python are slower than C++
- Although loads of people R there will be a decline so why still learn?

R
- open source package for Statistics
- most popular statistics program in bioinformatics
- Also popular -> Python data analysis library - pandas
- MATLAB

R vs Excel
- In excel you can load data by opening a file or copy paste a data table
- You can edit this data in excel
- You can NOT edit data in R

R Graphics
- popular Graphics library is ggplot2 (also in Python)
- you can also log the data by log(my_data)
- How to plot multiple classes: multiple_classes <- c(“N”, “O”, “P”) and
my_multi_subset <- subset(my_annotated_subset, classID%in% multiple_classes
- C() is a list
- to add dimensional data to the graph, often the graphs are plotted in a matrix
- You have: Script, Data Sets, Text output and graphic output

,College 2: Data Science in Biomedicine Basis Statistics 1
Datum: 27-09-2022

What is statistics?
- Why do we need statistics?
- when difference?
- p-value?
- impact of risk
- identify problems
- where does the data come from?
- which data and conclusions are trustworthy
- properties?
- Reliable p-value

Measurements
- experiments -> variation
- variation between persons, equipments and time of the day
- define the experiments properly
- what is the main source of variation
- after standardization; do we always get exactly the same value
- measurements show variation!

P-value
- a p- value is the probability of a an observed result
- 0.05
- x axis = set of possible results
- y-axis is probability density
- same statistics, same p-value, different “impact of risk”
- you can calculate p-values but it never tells you if it’s good or bad
- especially in Biomedical sciences this can be an ethical discussion: Risk for
treating/not treating patients and until which age should you treat a patient
- 0.05 is a good starting point but always evaluate this assumptiom
- p-value cutoff = This means that, if your null hypothesis is indeed correct
and there is no difference between the groups, the result that you
obtained is very rare. You would expect to obtain such a result fewer than 1
in 20 times if you collected samples over and over again.

Generating data
- A statistician want: a good designed study, trustworthy data and many
replicates
- a statistician know how to: analyze data and calculate p-values
- a statistician does not know; detailed theoretical background, impact of risk
(threshold) and potential pitfalls.

, Some basic statistics in this course
- t-test
- linear regression
- permutation testing
- FDR testing
- Fischer’s exact test
- Chi-squared test
- Pearson’s vs Spearman correlation
- PCA

T-statistic
- Compares two data sets and tells you if they are different from each other
- e.g. compare two groups, one treated with a drug the other with a placebo
- Pearson 1857
- Fisher 1890
- Neyman 1894 (Random stats)
- Bayes 1702 (probability stats)
- A t-test is a statistical test that is used to compare the means of two
groups. It is often used in hypothesis testing to determine whether a process
or treatment actually has an effect on the population of interest, or whether
two groups are different from one another

Types of T test
1. independent Samples: compares the means for two independent groups
2. Paired Samples: compares means from the same group (e.g. at different time
points
3. One: test the mean of a single group against a known mean (a standard or
reference

Paired data
- group of mice (8) before and after albumin treatment
- the null hypothesis is that the pairwise difference between the two tests is
equal (h0:μd =0)

Dit zijn jouw voordelen als je samenvattingen koopt bij Stuvia:

Bewezen kwaliteit door reviews

Studenten hebben al meer dan 850.000 samenvattingen beoordeeld. Zo weet jij zeker dat je de beste keuze maakt!

In een paar klikken geregeld

Geen gedoe — betaal gewoon eenmalig met iDeal, creditcard of je Stuvia-tegoed en je bent klaar. Geen abonnement nodig.

Direct to-the-point

Studenten maken samenvattingen voor studenten. Dat betekent: actuele inhoud waar jij écht wat aan hebt. Geen overbodige details!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper willemdevries99. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €5,49. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 69411 samenvattingen verkocht

Opgericht in 2010, al 15 jaar dé plek om samenvattingen te kopen

Begin nu gratis