College notes Data Analysis for EBE (30K215-B-6) for midterm
36 views 3 purchases
Course
Data-Analyse voor EBE (30K215B6)
Institution
Tilburg University (UVT)
This document contains all slides of the lectures + explanation of the lecturer (very detailed) + r codes (with explanation of what everything means and how to get the output)
Intro:
•Statistics and business decisions
•Forecasts and/or their verification needed:
•costs of inputs (electricity, raw materials, ...)
•profit forecast of a company
•expected wage
•First-year course: estimation and inference for a single variable
•The above quantities often depend on other factors/quantities (they control it by them self)
•costs depend on the demand and supply
•profit on the competition, margins, advertisement, ...
•wage on a person’s qualification, field of industry, ...
•This course: Interest in relating two or more measures/variables with each other
Motivation:
Examples – relationships between:
•Y= ‘weekly sales (€) of a clothing shop’ and
X= ‘weekly amount (€) spent on ads’
•Y= ‘weekly sales (units, €) of an ebook’ and
X= ‘price (€) of the ebook’ (is in control of the company)
•Y = ‘hourly wage of an employee’ and
X= ‘attained education level’
•Y= ‘productivity of a worker’ and
X= ‘number of months of experience of that worker’
•Y= ‘final grade of a student for this course’ and
X= ‘number of lectures/tutorials visited’
•Y= ‘weekly return of share Microsoft’ and
X = ‘weekly return of the Dow Jones stock index’
•.......
Introduction:
,Case: Relationship of the ebook sales ($) and the ebook pricing ($) published and sold by Amazon;
Period: January 2016; ebooks sold above 1000 pieces per day
Amazon is a (e)bookseller, but also an ebook publisher
Is the number of sold ebooks independent of their price?
Relation: Y = the number of sold ebooks, X = ebook price
Do the dollar sales of ebooks depend on their price?
Relation: Y = the dollar sales of ebook, X = ebook price
Which of the two quantiles is more price-sensitive?
Dataset: AmazonSales.sav (to be continued)
5.1 Scatter plot, Covariance and Correlation
Quantitative variables X and Y
, Questions:
-Are the y-data related to the corresponding x-data?
-Is it a linear relationship, that is, do the pairs of observations concentrate around a straight line?
-What is the degree (strength) of that linear relationship?
Objectives:
-Construct a plot visually relating the y-data and x-data.
-Find measures of association that measure the strength of the linear relationship.
Scatter plot: plot that depicts the pairs (xi, yi)
In the first graph, the points are close to the line. This means that the line approximates the
increasing relationship very well. In the second graph, the point are further removed from the line.
This means that the doesn’t approximate the decreasing relationship very well. In the third graph,
the mountain-shaped line implies that the growth is stagnating. A linear relationship (black line)
might not be the right form of dependence. Regarding the data, you have to determine which
relationship fits best. In the fourth graph there is a little positive relationship: the more GNI per
capita, the more happiness (it’s very hard to say what for relationship it is looking to the data)
General objectives:
-Single variable X: how can we measure/quantify the location and variation? Does it generalize to
multiple variables Y and X?
what we want to study: how strongly are X and Y related to each other? How much moves Y if X
changes? For that, we first have to find…. (see next line!)
-First, measures of association of Y and X are studied; covariance and correlation
-Later, study the variation of a variable Y
-Explain this variation by the dependence of Yon some other variable X.
[EbookCase: dependence of sales Yon price X]
-Consider regression of Y on X with
Y being the dependent variable and
X being the independent variable.
, Covariance:
interpretation: the higher the sales (x-axis), the higher the profits (y-axis). Area 1 is above both
averages, area 3 is under both averages. A positive relationship implies high sales in 1, low sales in 3.
If I compute in area 1 or 3, both quadrants have a positive (area 1) or a negative (area 3) sign. The
product of these both positive or negative sign give > 0. In area 2 and 4, one quadrant is either
positive or negative so the product is < 0.
Definitions of covariance: (can be faster and more handy then scatter plots!)
Sample dataset:
sample covariance
Population dataset:
population covariance
Remarks:
-notice the different notations (sample: Latin letters, population: Greek letters)
-sXY is often used as an estimate of the unknown XY
-division by n–1 leads generally to better estimates
see 1: both quadrants are
positive, so this
observation is in 1. See 3:
both observations are
negative, so the
observation is in 3. Etc..
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller Economiestudentje. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $8.67. You're not tied to anything after your purchase.