100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached
logo-home
Summary Statistics for Business and Economics 1 CA$14.58   Add to cart

Summary

Summary Statistics for Business and Economics 1

 175 views  2 purchases
  • Course
  • Institution
  • Book

Summary Statistics for Business and Economics from chapter 1 to 11 in the book.

Preview 3 out of 16  pages

  • No
  • Chapter 1- 11
  • May 21, 2022
  • 16
  • 2021/2022
  • Summary
avatar-seller
Statistics module 1: Data and decisions
1. What are data?
 Statistics is about data and decisions, quantities calculated from data. It’s a toolbox and a way of thinking -> its purpose is to gather data that is
relevant to the problem.
 Statistics: A collection of tools and the associated reasoning to model, summarize and understand data.
 Data: Not just information, also set of values that are measured and observed along with their context.
 The “Five W’s”
 Who: who is a subject or a case (information) -> who and what are part of data and essential (values)
 What: what are variables (information)
 Where: where are the values recorded -> why, when, where, how are part of the metadata.
 When: when are these values recorded
 How: how are these values recorded
 Why: why are these values recorded
 Big data: Data sets so large that traditional methods of storage and analysis are inadequate.
 Data mining -> when companies try to obtain actionable information from data that may have been collected in the course of doing business.
 Predictive analysis -> focuses on future performance
 Business analysis -> any use of data and statistical analysis to inform business decisions.
 Metadata -> information about data
 Data warehouses: vast digital repositories where data is recorded and stored.
 Data -> latin: givens (Plural).
 Data is stored in a table -> datatable.
 Relational database -> two or more separate data tables are linked together so that information can be merged across them.
 Cases (who?) -> rows of a datatable.
 Variables (what?) -> colomns of a datatable

2. Variables types
 Categorical: values are names of categories.
 Nominal -> just a label without a particular order. Some nominal variables are used as identifiers. Example -> steak: rare, medium,
well done.
 Ordinal -> labels with a given order.
 Identifier -> purpose is to assign a unique identifier code to each individual. Special case of nominal categorical variables.
 Quantitative: values are numerical quantities (sometimes have units) -> variables record measurements, amounts or something else but they
must have units, must be a number quantity and have units.
 Cross sectional data -> when you record something at a given points in time for different units. Example -> starbucks locations at the end of
2018.
 Time series data -> collects information for 1 subject over different points in time -> if there is displacing, then there is no time-series. Example
-> number of starbucks locations in the world for each year.


Statistics module 2: Displaying and describing categorical Data
3. Displaying and describing categorical data introduction
 Descriptive statistics: summarizes data and displays it.
 In statistics, instead of using the whole population its best to just take a sample of people from the population and making inferences on that
sample.
 Inferential statistics: making inferences on a sample ( chosen randomly).
 Display and summarize data to see: patterns, relationships, exceptions in values and observations.

4. Summarizing a categorical variable

IP Number (categorical, ordinal) Time (quantitative)ll Source (categorical, nominal)

245.240.221.71 1/feb/2013 at 13:15:08 Google
196.345.281.51 1/feb/2013 at 14:56:23 direct
 Frequency table: records the counts for each of the categories of a categorical variable. How often a variable occurs.
 Absolute frequency -> Count of the number of cases in each category.
 Relative frequency -> count of the number of cases in each category divided by the total number of cases.

Source Absolute frequency (count) Relative frequency (%)
Google 130158 57.36
Direct 52969 23.34
…. …. ….
Total 226925 100%

 Express frequency as a percentage: Compute the proportion: 130158/226925 ≈ 0.5736
 Percentage = proportion x 100% -> 0.5736 x 100% = 57.36%

, 5. Displaying a frequency table
 Bar chart -> Displays the distribution of a categorical variable, showing the counts for each category next to each other for easy comparison.
 keep it proportional, categorical bar cart must have gaps to indicate different categories, label the axes.
 Pie chart: shows how the whole group breaks into several categories and shows all the cases as a circle sliced into pieces whose areas are
proportional to the fraction of cases in each category.
 Pie chart gives less information and its really difficult to see that the story is. Bar charts are better to represent frequency tables.

6. 2 categorical variables
 Example: survey of 5039 people in 5 countries: “Do you use social network sites”
 Data table:

Respondent ID (identifier) Social networking (categorical) Country (categorical)
0001 Yes Egypt
0002 No access Egypt
…. …. ….
5039 Yes US
 Frequency table:

Social networking count Relative frequency
No 1249 24.79%
Yes 2175 43.16%
No access 1615 32.1%
Total 5039 100.1%


7. Contingency table
 List of all possible outcomes for each of the two categorical variables. Shows how individuals are distributed along variables depending on the
value of the other variables.

GB EG DE RU US Total
No 326 70 460 90 293 1249
Yes 529 300 340 500 506 2175
No access 153 630 200 420 212 1615
Total 1018 1000 1000 1010 1011 5039


 Marginal distribution: the frequency distribution of either one of the variables
 Each cell of the contingency table gives the count for a combination of values of the 2 variables.
 For every cell, we can compute three percentages
 Total percentage = 300/5039 x 100% ≈ 6.0% -> 6.0% of the total number of respondents are from Egypt and answered Yes
 Row percentage = 300/2175 x 100% ≈ 13.8% -> 13.8% of the total number of respondents who answered Yes are from Egypt.
 Column percentage = 300/1000 x 100% ≈ 30.0% -> 30.0 % of the total respondents answered Yes to the survey question.
 Compare social networking country to country -> use column percentages.

Statistics Module 3: Displaying and describing quantitative data
1. Displaying quantitative Variables.
 Quantities you can count:
 Quantities that can be measured: continuous
 E.g. Monthly Stock price ($), AIG 2002-2007.


Month AIG stock price
Jan 2002 $77.26
Feb 2002 $72.95
Mar 2002 $73.72

Nov 2007 $56.86
Dec 2007 $58.13

 How are the values of price distributed? -> construct a frequency table
 To construct a frequency table:
1) Sort the values from small to large
2) Decide which bins you will use: e.g. -> $45-$50, $50-55, $55-60, $60-65, $65-70, $70-75, $75-80
 The values that fall on the boundaries -> e.g. bin $45-50 includes 45 but not 50: left closed, right open: [$45, $50[
3) Count the number of cases that fall into each bin:

AIG stock price Count (absolute frequency) Relative frequency Density (per $)
$45-$50 2 2.8% 0.0056
$50-55 3 4.2% 0.0084

, $55-60 13 18.1% 0.0362
$60-65 16 22.2% 0.0444
$65-70 24 33.3% 0.0667
$70-75 13 18.1% 0.0362
$75-80 1 1.4% 0.0028
Total 72 100.1% /
 Before making a histogram, you should check the quantitative condition: the data must be values of quantitative variable whose units are
known. A bar chart and a histogram look similar, there not the same. You can’t display categorical data in a histogram, histograms don’t have
gaps, bar charts do.
 Bar chart of a frequency table of quantitative data: Histogram / Bar chart of relative frequency table of quantitative data: a relative frequency
histogram
 When you look at a histogram, look for four characteristics:
1) Shape: symmetry vs skew, Bumps and valleys, gasps
2) Center
3) Spread.




2. Data density
 Density histogram: area of each bar represents relative frequency. 3.62%/$ Area
 Area = height x width =18.1%
 Relative frequency = height x width of bin.
 Height = relative frequency/width of bin 

 E.g. for $55-$60 = 18.1%/$5 = 3.62% per $  in the $55-$60 bin, every interval of $1 wide contains about 3.62% of the values. $55
$60
 Density is usually expressed as decimal fraction per horizontal unit: 3.62%/$ = 3.62/100 per $ = 0.0362/$;
 Density histogram:




3. Shape
 You should pay attention to 3 things: shape, center and spread.
 Mode:
o A single mode (e.g. the bin $65-$70): unimodal distribution -> the IQR should be bigger than standard deviation, if not check again if
the distribution isn’t skewed or multimodal.
o two modes: bimodal distribution
o no clear modes: uniform distribution
o three or more modes: multimodal.
 Symmetry: when the distribution is symmetric. When you can fold the histogram in the middle and the 2 sides almost match (mirror images)
o Skewed to the left: left side has longer “tail”
o Skewed to the right: right side has longer “tail”
 Outliers: values that stick out -> they tell us something interest about the data, can be the most informative part of your data
 Centre: What is the typical stock price? -> about $65 -> if we want more precise number -> calculate the average (mean).
 Mean(average) = sum of all values
How many values there are

 The mean is sensitive to skewness. If its right skewed the mean will be bigger than the median, if its left skewed the mean will be smaller than
the median.
 Adolphe Quetelet (1796-1874): inventor of the average.
 Median = the value that splits the histogram into two equal areas, used for variables like cost or income (likely to be skewed), median is
resistant to unusual observations. Median is better choice for skewed data because its resistant to outliers.
1) Order the values

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller LukaBuggenhout. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for CA$14.58. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

77764 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 14 years now

Start selling
CA$14.58  2x  sold
  • (0)
  Add to cart