100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Summary Advanced Data Analysis €8,99   In winkelwagen

Samenvatting

Summary Advanced Data Analysis

2 beoordelingen
 416 keer bekeken  22 keer verkocht

An extensive English summary of the course Advanced Data Analysis followed in academic year . Obtained result with this summary was 17/20. The explanations during the class were attentively noted and processed with the slides and course material to a complete summary. This summary is a perfect prep...

[Meer zien]

Voorbeeld 4 van de 107  pagina's

  • 7 oktober 2020
  • 107
  • 2019/2020
  • Samenvatting
Alle documenten voor dit vak (19)

2  beoordelingen

review-writer-avatar

Door: lizaburdz • 3 jaar geleden

review-writer-avatar

Door: stubmw • 3 jaar geleden

avatar-seller
UA-BiomedischeWetenschappen
CHAPTER 1: INTRODUCTION

A bit of context
Big data revolution
= a revolution of information technology that is affecting industries around the globe. It has a
radically changing impact on a lot of domains in the world
= a disruptive trend in computer sciences

Big data
= data for which conventional computer-techniques are not sufficient anymore due to size,
complexity, …
= characterized by:

1. Data volume
a. data is collected everywhere
b. evolution to cloud: data is stored in clouds where it can be approached anywhere in
the world (not captured on a physical computer anymore)
c. the cost to sequence the genome is really decreasing: it becomes affordable

2. Data velocity
a. Is the speed at which data is being generated (= enormous)
b. Data is generated continuously: e.g. a smartphone is collecting
a lot of data all the time (light sensor, barometer,…)
c. Data management gap: IT staff didn’t grow as fast as data did
d. Dynamic molecular profiles: we are able to do transcriptome
profiling, sequencing the immune system, microbiome,…

! The sequencing facility and the data analysis facility are separated from each other with 1
km à what’s the most appropriate way to send the information from data analysis to the
sequencing facility? à you would think: a network, cloud,… but in fact it is a bicycle (you can
transfer a lot of hardware with a lot of TB)

3. Data variety
a. A huge diversity of data type: DNA sequences, protein structures, gene regulation,
interactions, morphology, metabolism
b. A lot of this data is heterogeneous and unstructured (e.g. text)

4. Data veracity (waarheidsgetrouw)
a. To what extent can we trust the things we see? How certain are we about things?

à Is big data a reality in life sciences? Yes (volume P - verlocity P - variety P - veracity P)




1

,Emergence of a fourth research paradigm
We have doing science for a long time – we have gone through 4 different paradigms:

1. Experimental science
a. Thousand years ago
b. Description of natural phenomena

2. Theoretical science
a. Last few hundred years
b. Newton’s laws, Maxwell’s equations,…

3. Computational science
a. Last few decades
b. Simulation of complex phenomena

4. Data-intensive science
a. Today
b. A lot of things we study we don’t study them anymore from simple observations as
we did in the past but we start from a lot of data
c. Scientists overwhelmed with data sets from many different sources
i. Data captured by instruments
ii. Data generated by simulations
iii. Data generated by sensor networks

d. eScience is the set of tools and technologies to support data federation and
collaboration
i. for analysis and data mining
ii. for data visualization and exploration
iii. for scholarly communication and dissemination


But what is data?

- Collection of data objects and their attributes

- An attribute is a property or characteristic of an object
o Examples: eye color of a person, temperature, etc
o An attribute describes an object
o Attribute is also known as variable, field, characteristic,
or feature

- A collection of attributes describes an object
o Examples: individuals,…


2

, o Object is also known as record, point, case, sample, entity, or instance

SO: Each row is an object – for each of these objects we have a series of attributes (characteristics)
® These objects and attributes are the base of a lot of data we have


Attribute values

Attribute values are numbers or symbols assigned to an attribute
- Example: eye color (attribute) can be blue, green, brown,… (attribute values)

- Distinction between attributes and attribute values
o Same attribute can be mapped to different attribute values
§ Example: height can be measured in feet or meters

o Different attributes can be mapped to the same set of values
§ Example: attribute values for ID and age are integers

o However, properties of attribute values can still be different
§ Example: ID has no limit but age has a maximum and minimum value


Attribute types

There are different types of attributes:
- Nominal
o Examples: ID numbers, eye color, zip codes à categorical attribute
o You cannot do a real comparison

- Ordinal
o Examples: rankings (e.g. taste of potato chips on a scale from 1-10)-, grades, height
in tall, medium, short
o Which you can rank

- Interval
o Examples: calendar dates, temperatures in Celsius or Fahrenheit
o Which you can do subtractions with à we know both the order and the exact
difference
o There is ‘no zero’ – can go below 0

- Ratio
o Examples: temperature in Kelvin, length, time, counts
o Which you can do divisions, multiplications with
o There is a ‘true zero’ – can’t go below 0




3

, Properties of attributes

- The type of an attribute depends on which of the following properties it possesses:
o Distinctness: = ≠
o Order: < >
o Addition: + -
o Multiplication: * /

- Nominal attribute: distinctness
- Ordinal attribute: distinctness & order
- Interval attribute: distinctness, order & addition
- Ratio attribute: all 4 properties




Discrete vs. continuous

- Discrete attribute
o Can only take particular values (geen kommagetal)
o Has only a finite or countable infinite set of values
o Often represented as integer variables
o Examples: zip codes, counts, or the set of words in a collection of documents
o Other examples: eye color, house number in streets,…


4

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

√  	Verzekerd van kwaliteit door reviews

√ Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, Bancontact of creditcard voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper UA-BiomedischeWetenschappen. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €8,99. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 57727 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€8,99  22x  verkocht
  • (2)
  Kopen