100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Summary

Experimental Design and Analysis - Summary Slides

Rating
-
Sold
-
Pages
76
Uploaded on
31-12-2024
Written in
2023/2024

A summary of all the slides for the course Experimental Design and Analysis, MSc AI.

Institution
Course











Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Study
Course

Document information

Uploaded on
December 31, 2024
Number of pages
76
Written in
2023/2024
Type
Summary

Subjects

Content preview

Experimental Design and Data Analysis - Summary


Lecture 0
What is experimental design?
● Experiments are performed with varied preconditions represented by ind. variables, also
referred to as input variables or predictor variables.
● The change in predictors is hypothesized to result in a change in one or more dep.
variables, also referred to as output or response variables.
● The experimental design may also identify control variables that must be held constant
to prevent external factors from affecting the results.
● Experimental design involves also planning the experiment under statistically optimal
conditions given the constraints of available resources.
● Main concerns in experimental design: validity, reliability, replicability, achieving
appropriate levels of statistical power and sensitivity.
● Ronald Fisher: The Arrangement of Field Experiments (1926) and The Design of
Experiments (1935).

Experimental design, randomization
● Statistics allows to generalize from data to a true state of nature, but statistical inference
requires assumptions and mathematical modeling.
● The data should be obtained by a carefully designed experiment (or at least it must be
possible to think about the data in this way).
● Any good design involves a chance element: “experimental units” are assigned to
“treatments” by chance, or by randomization. The purpose is to exclude other possible
explanations of an observed difference.
● We need probability to quantify the randomization. In practice, randomization is
implemented with a random number generator. In R:




Examples, observational studies
1. To compare two fertilisers we prepare 20 plots of land, apply the first fertiliser to 10
randomly chosen plots and the second one to the remaining plots. We plant a crop and
measure the total yield from each plot.
2. To compare two web designs we randomly select 50 subjects and measure the time
needed to find some information. All 50 subjects perform this task with both designs, but
for each subject the order of the two designs is based on tossing a coin.
3. If an experiment involves subjects, then it could be wrong to assign “task A” to the first
10 subjects who arrive and “task B” to the last 10. (There may be a reason for arriving
early.) Instead assign the tasks at random. Then an observed difference is due to the
task (or chance).


1

,Experimental Design and Data Analysis - Summary


a. Data obtained by registering an ongoing phenomenon, without randomization or
applying other controls, is called observational.
4. The incidence of lung cancer among 500 smokers is observed to be higher than among
500 non-smokers. Does this finding generalize to the full population? Does this show
that smoking causes lung cancer?

Probability distributions: continuous, discrete
● A probability distribution P determines the probability of different outcomes of a
random variable.
● Probability distributions for:
○ discrete random variables which have finite or countable sets of possible
outcome values (e.g., dice, coins, birthdays);
○ continuous random variables which have infinite sets of possible outcome
values (e.g., temperature, length).
● The corresponding probability distributions: continuous, discrete.
● Note: There are distributions which neither continuous nor discrete.

Probability density functions
● Examples of the probability density p of some continuous distributions
(realised also in R with some default parameter values):
○ normal distribution norm with parameters μ mean=0 and σ sd=1




○ exponential distribution exp with parameter λ (lambda=1)



○ uniform distribution unif with parameters minimum (min=a) and
maximum (max=b) of the support interval



○ Gamma distribution gamma with parameters shape shape and
rate rate=1.

Probabilities of events – continuous distribution
● If a random variable X has a distribution with the density p(x), then



● In other words, the probability to have an outcome in some interval I is the area under
the density function p(x) over that interval.




2

,Experimental Design and Data Analysis - Summary


● Example. For X ∼ N(0,1),




● In events for continuous distributions:
< or ≤ (> or ≥) does not matter.

Location and scale, normal density
● Two important characteristics of a population are location
(or mean) µ and scale (or standard deviation) σ.
● The normal density curve is given by



● The parameters µ and σ are the location and scale. Normal
distributions with different µ and σ are still similar in a way.
● Note: The normal curve is very specific! There are many
“bell shaped” curves that are not normal.

Other symmetric and asymmetric densities




Probabilities and quantiles
● If a random variable X is distributed according to a density curve, the probability P(X ≤ u)
is the (red) area under the density curve left of u.
● Likewise, P(X ≥ u) is the (green) area under the density curve right of u




3

, Experimental Design and Data Analysis - Summary


● For distribution P, the quantile of level α ∈ (0, 1) is the number qα such that P(X ≤ qα) =
α, the upper quantile uα such that P(X ≥ uα) = α.
● For the standard normal distribution, the quantile and upper quantile are usually denoted
by ξα and zα.

Probability of events – discrete distribution
● For discrete distributions we have a probability mass function p
○ p(x) = P(X = x).
● The probability to have an outcome in some set A is the sum



● Examples of discrete distributions are binomial and Poisson




Probability mass functions for some discrete distributions
● Discrete distributions (realised also in R):
○ Binomial distribution binom with parameters n size and p prob



○ Poisson distribution pois with parameter λ lambda




Cumulative distribution/probability function
● The cumulative distribution function (CDF) (sometimes also called cumulative
probability function) of a random variable X is F(u) = P(X ≤ u) = pdist(u,par) (continuos
and discrete)

● Continuous distr.:
● Any other probability can be computed via F(u), e.g., for any a ≤ b,
P(a < X ≤ b) = P(X ≤ b) − P(X ≤ a) = F(b) − F(a).




4
$20.60
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached


Also available in package deal

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
tararoopram Vrije Universiteit Amsterdam
Follow You need to be logged in order to follow users or courses
Sold
26
Member since
3 year
Number of followers
2
Documents
38
Last sold
2 months ago

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions