100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached
logo-home
Experimental Design and Analysis - Summary Slides $18.05
Add to cart

Summary

Experimental Design and Analysis - Summary Slides

 5 views  0 purchase
  • Course
  • Institution

A summary of all the slides for the course Experimental Design and Analysis, MSc AI.

Preview 4 out of 76  pages

  • December 31, 2024
  • 76
  • 2023/2024
  • Summary
avatar-seller
Experimental Design and Data Analysis - Summary


Lecture 0
What is experimental design?
● Experiments are performed with varied preconditions represented by ind. variables, also
referred to as input variables or predictor variables.
● The change in predictors is hypothesized to result in a change in one or more dep.
variables, also referred to as output or response variables.
● The experimental design may also identify control variables that must be held constant
to prevent external factors from affecting the results.
● Experimental design involves also planning the experiment under statistically optimal
conditions given the constraints of available resources.
● Main concerns in experimental design: validity, reliability, replicability, achieving
appropriate levels of statistical power and sensitivity.
● Ronald Fisher: The Arrangement of Field Experiments (1926) and The Design of
Experiments (1935).

Experimental design, randomization
● Statistics allows to generalize from data to a true state of nature, but statistical inference
requires assumptions and mathematical modeling.
● The data should be obtained by a carefully designed experiment (or at least it must be
possible to think about the data in this way).
● Any good design involves a chance element: “experimental units” are assigned to
“treatments” by chance, or by randomization. The purpose is to exclude other possible
explanations of an observed difference.
● We need probability to quantify the randomization. In practice, randomization is
implemented with a random number generator. In R:




Examples, observational studies
1. To compare two fertilisers we prepare 20 plots of land, apply the first fertiliser to 10
randomly chosen plots and the second one to the remaining plots. We plant a crop and
measure the total yield from each plot.
2. To compare two web designs we randomly select 50 subjects and measure the time
needed to find some information. All 50 subjects perform this task with both designs, but
for each subject the order of the two designs is based on tossing a coin.
3. If an experiment involves subjects, then it could be wrong to assign “task A” to the first
10 subjects who arrive and “task B” to the last 10. (There may be a reason for arriving
early.) Instead assign the tasks at random. Then an observed difference is due to the
task (or chance).


1

,Experimental Design and Data Analysis - Summary


a. Data obtained by registering an ongoing phenomenon, without randomization or
applying other controls, is called observational.
4. The incidence of lung cancer among 500 smokers is observed to be higher than among
500 non-smokers. Does this finding generalize to the full population? Does this show
that smoking causes lung cancer?

Probability distributions: continuous, discrete
● A probability distribution P determines the probability of different outcomes of a
random variable.
● Probability distributions for:
○ discrete random variables which have finite or countable sets of possible
outcome values (e.g., dice, coins, birthdays);
○ continuous random variables which have infinite sets of possible outcome
values (e.g., temperature, length).
● The corresponding probability distributions: continuous, discrete.
● Note: There are distributions which neither continuous nor discrete.

Probability density functions
● Examples of the probability density p of some continuous distributions
(realised also in R with some default parameter values):
○ normal distribution norm with parameters μ mean=0 and σ sd=1




○ exponential distribution exp with parameter λ (lambda=1)



○ uniform distribution unif with parameters minimum (min=a) and
maximum (max=b) of the support interval



○ Gamma distribution gamma with parameters shape shape and
rate rate=1.

Probabilities of events – continuous distribution
● If a random variable X has a distribution with the density p(x), then



● In other words, the probability to have an outcome in some interval I is the area under
the density function p(x) over that interval.




2

,Experimental Design and Data Analysis - Summary


● Example. For X ∼ N(0,1),




● In events for continuous distributions:
< or ≤ (> or ≥) does not matter.

Location and scale, normal density
● Two important characteristics of a population are location
(or mean) µ and scale (or standard deviation) σ.
● The normal density curve is given by



● The parameters µ and σ are the location and scale. Normal
distributions with different µ and σ are still similar in a way.
● Note: The normal curve is very specific! There are many
“bell shaped” curves that are not normal.

Other symmetric and asymmetric densities




Probabilities and quantiles
● If a random variable X is distributed according to a density curve, the probability P(X ≤ u)
is the (red) area under the density curve left of u.
● Likewise, P(X ≥ u) is the (green) area under the density curve right of u




3

, Experimental Design and Data Analysis - Summary


● For distribution P, the quantile of level α ∈ (0, 1) is the number qα such that P(X ≤ qα) =
α, the upper quantile uα such that P(X ≥ uα) = α.
● For the standard normal distribution, the quantile and upper quantile are usually denoted
by ξα and zα.

Probability of events – discrete distribution
● For discrete distributions we have a probability mass function p
○ p(x) = P(X = x).
● The probability to have an outcome in some set A is the sum



● Examples of discrete distributions are binomial and Poisson




Probability mass functions for some discrete distributions
● Discrete distributions (realised also in R):
○ Binomial distribution binom with parameters n size and p prob



○ Poisson distribution pois with parameter λ lambda




Cumulative distribution/probability function
● The cumulative distribution function (CDF) (sometimes also called cumulative
probability function) of a random variable X is F(u) = P(X ≤ u) = pdist(u,par) (continuos
and discrete)

● Continuous distr.:
● Any other probability can be computed via F(u), e.g., for any a ≤ b,
P(a < X ≤ b) = P(X ≤ b) − P(X ≤ a) = F(b) − F(a).




4

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller tararoopram. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $18.05. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

48072 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 15 years now

Start selling
$18.05
  • (0)
Add to cart
Added