100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Summary Statistics for Business and Economics 1 €9,49   In winkelwagen

Samenvatting

Summary Statistics for Business and Economics 1

 176 keer bekeken  2 keer verkocht

Summary Statistics for Business and Economics from chapter 1 to 11 in the book.

Voorbeeld 3 van de 16  pagina's

  • Nee
  • Chapter 1- 11
  • 21 mei 2022
  • 16
  • 2021/2022
  • Samenvatting
book image

Titel boek:

Auteur(s):

  • Uitgave:
  • ISBN:
  • Druk:
Alle documenten voor dit vak (3)
avatar-seller
LukaBuggenhout
Statistics module 1: Data and decisions
1. What are data?
 Statistics is about data and decisions, quantities calculated from data. It’s a toolbox and a way of thinking -> its purpose is to gather data that is
relevant to the problem.
 Statistics: A collection of tools and the associated reasoning to model, summarize and understand data.
 Data: Not just information, also set of values that are measured and observed along with their context.
 The “Five W’s”
 Who: who is a subject or a case (information) -> who and what are part of data and essential (values)
 What: what are variables (information)
 Where: where are the values recorded -> why, when, where, how are part of the metadata.
 When: when are these values recorded
 How: how are these values recorded
 Why: why are these values recorded
 Big data: Data sets so large that traditional methods of storage and analysis are inadequate.
 Data mining -> when companies try to obtain actionable information from data that may have been collected in the course of doing business.
 Predictive analysis -> focuses on future performance
 Business analysis -> any use of data and statistical analysis to inform business decisions.
 Metadata -> information about data
 Data warehouses: vast digital repositories where data is recorded and stored.
 Data -> latin: givens (Plural).
 Data is stored in a table -> datatable.
 Relational database -> two or more separate data tables are linked together so that information can be merged across them.
 Cases (who?) -> rows of a datatable.
 Variables (what?) -> colomns of a datatable

2. Variables types
 Categorical: values are names of categories.
 Nominal -> just a label without a particular order. Some nominal variables are used as identifiers. Example -> steak: rare, medium,
well done.
 Ordinal -> labels with a given order.
 Identifier -> purpose is to assign a unique identifier code to each individual. Special case of nominal categorical variables.
 Quantitative: values are numerical quantities (sometimes have units) -> variables record measurements, amounts or something else but they
must have units, must be a number quantity and have units.
 Cross sectional data -> when you record something at a given points in time for different units. Example -> starbucks locations at the end of
2018.
 Time series data -> collects information for 1 subject over different points in time -> if there is displacing, then there is no time-series. Example
-> number of starbucks locations in the world for each year.


Statistics module 2: Displaying and describing categorical Data
3. Displaying and describing categorical data introduction
 Descriptive statistics: summarizes data and displays it.
 In statistics, instead of using the whole population its best to just take a sample of people from the population and making inferences on that
sample.
 Inferential statistics: making inferences on a sample ( chosen randomly).
 Display and summarize data to see: patterns, relationships, exceptions in values and observations.

4. Summarizing a categorical variable

IP Number (categorical, ordinal) Time (quantitative)ll Source (categorical, nominal)

245.240.221.71 1/feb/2013 at 13:15:08 Google
196.345.281.51 1/feb/2013 at 14:56:23 direct
 Frequency table: records the counts for each of the categories of a categorical variable. How often a variable occurs.
 Absolute frequency -> Count of the number of cases in each category.
 Relative frequency -> count of the number of cases in each category divided by the total number of cases.

Source Absolute frequency (count) Relative frequency (%)
Google 130158 57.36
Direct 52969 23.34
…. …. ….
Total 226925 100%

 Express frequency as a percentage: Compute the proportion: 130158/226925 ≈ 0.5736
 Percentage = proportion x 100% -> 0.5736 x 100% = 57.36%

, 5. Displaying a frequency table
 Bar chart -> Displays the distribution of a categorical variable, showing the counts for each category next to each other for easy comparison.
 keep it proportional, categorical bar cart must have gaps to indicate different categories, label the axes.
 Pie chart: shows how the whole group breaks into several categories and shows all the cases as a circle sliced into pieces whose areas are
proportional to the fraction of cases in each category.
 Pie chart gives less information and its really difficult to see that the story is. Bar charts are better to represent frequency tables.

6. 2 categorical variables
 Example: survey of 5039 people in 5 countries: “Do you use social network sites”
 Data table:

Respondent ID (identifier) Social networking (categorical) Country (categorical)
0001 Yes Egypt
0002 No access Egypt
…. …. ….
5039 Yes US
 Frequency table:

Social networking count Relative frequency
No 1249 24.79%
Yes 2175 43.16%
No access 1615 32.1%
Total 5039 100.1%


7. Contingency table
 List of all possible outcomes for each of the two categorical variables. Shows how individuals are distributed along variables depending on the
value of the other variables.

GB EG DE RU US Total
No 326 70 460 90 293 1249
Yes 529 300 340 500 506 2175
No access 153 630 200 420 212 1615
Total 1018 1000 1000 1010 1011 5039


 Marginal distribution: the frequency distribution of either one of the variables
 Each cell of the contingency table gives the count for a combination of values of the 2 variables.
 For every cell, we can compute three percentages
 Total percentage = 300/5039 x 100% ≈ 6.0% -> 6.0% of the total number of respondents are from Egypt and answered Yes
 Row percentage = 300/2175 x 100% ≈ 13.8% -> 13.8% of the total number of respondents who answered Yes are from Egypt.
 Column percentage = 300/1000 x 100% ≈ 30.0% -> 30.0 % of the total respondents answered Yes to the survey question.
 Compare social networking country to country -> use column percentages.

Statistics Module 3: Displaying and describing quantitative data
1. Displaying quantitative Variables.
 Quantities you can count:
 Quantities that can be measured: continuous
 E.g. Monthly Stock price ($), AIG 2002-2007.


Month AIG stock price
Jan 2002 $77.26
Feb 2002 $72.95
Mar 2002 $73.72

Nov 2007 $56.86
Dec 2007 $58.13

 How are the values of price distributed? -> construct a frequency table
 To construct a frequency table:
1) Sort the values from small to large
2) Decide which bins you will use: e.g. -> $45-$50, $50-55, $55-60, $60-65, $65-70, $70-75, $75-80
 The values that fall on the boundaries -> e.g. bin $45-50 includes 45 but not 50: left closed, right open: [$45, $50[
3) Count the number of cases that fall into each bin:

AIG stock price Count (absolute frequency) Relative frequency Density (per $)
$45-$50 2 2.8% 0.0056
$50-55 3 4.2% 0.0084

, $55-60 13 18.1% 0.0362
$60-65 16 22.2% 0.0444
$65-70 24 33.3% 0.0667
$70-75 13 18.1% 0.0362
$75-80 1 1.4% 0.0028
Total 72 100.1% /
 Before making a histogram, you should check the quantitative condition: the data must be values of quantitative variable whose units are
known. A bar chart and a histogram look similar, there not the same. You can’t display categorical data in a histogram, histograms don’t have
gaps, bar charts do.
 Bar chart of a frequency table of quantitative data: Histogram / Bar chart of relative frequency table of quantitative data: a relative frequency
histogram
 When you look at a histogram, look for four characteristics:
1) Shape: symmetry vs skew, Bumps and valleys, gasps
2) Center
3) Spread.




2. Data density
 Density histogram: area of each bar represents relative frequency. 3.62%/$ Area
 Area = height x width =18.1%
 Relative frequency = height x width of bin.
 Height = relative frequency/width of bin 

 E.g. for $55-$60 = 18.1%/$5 = 3.62% per $  in the $55-$60 bin, every interval of $1 wide contains about 3.62% of the values. $55
$60
 Density is usually expressed as decimal fraction per horizontal unit: 3.62%/$ = 3.62/100 per $ = 0.0362/$;
 Density histogram:




3. Shape
 You should pay attention to 3 things: shape, center and spread.
 Mode:
o A single mode (e.g. the bin $65-$70): unimodal distribution -> the IQR should be bigger than standard deviation, if not check again if
the distribution isn’t skewed or multimodal.
o two modes: bimodal distribution
o no clear modes: uniform distribution
o three or more modes: multimodal.
 Symmetry: when the distribution is symmetric. When you can fold the histogram in the middle and the 2 sides almost match (mirror images)
o Skewed to the left: left side has longer “tail”
o Skewed to the right: right side has longer “tail”
 Outliers: values that stick out -> they tell us something interest about the data, can be the most informative part of your data
 Centre: What is the typical stock price? -> about $65 -> if we want more precise number -> calculate the average (mean).
 Mean(average) = sum of all values
How many values there are

 The mean is sensitive to skewness. If its right skewed the mean will be bigger than the median, if its left skewed the mean will be smaller than
the median.
 Adolphe Quetelet (1796-1874): inventor of the average.
 Median = the value that splits the histogram into two equal areas, used for variables like cost or income (likely to be skewed), median is
resistant to unusual observations. Median is better choice for skewed data because its resistant to outliers.
1) Order the values

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

√  	Verzekerd van kwaliteit door reviews

√ Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, Bancontact of creditcard voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper LukaBuggenhout. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €9,49. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 71498 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€9,49  2x  verkocht
  • (0)
  Kopen