Garantie de satisfaction à 100% Disponible immédiatement après paiement En ligne et en PDF Tu n'es attaché à rien
logo-home
Summary Statistics for Business and Economics 1 €9,49
Ajouter au panier

Resume

Summary Statistics for Business and Economics 1

 176 vues  2 fois vendu

Summary Statistics for Business and Economics from chapter 1 to 11 in the book.

Aperçu 3 sur 16  pages

  • Non
  • Chapter 1- 11
  • 21 mai 2022
  • 16
  • 2021/2022
  • Resume
book image

Titre de l’ouvrage:

Auteur(s):

  • Édition:
  • ISBN:
  • Édition:
Tous les documents sur ce sujet (2)
avatar-seller
LukaBuggenhout
Statistics module 1: Data and decisions
1. What are data?
 Statistics is about data and decisions, quantities calculated from data. It’s a toolbox and a way of thinking -> its purpose is to gather data that is
relevant to the problem.
 Statistics: A collection of tools and the associated reasoning to model, summarize and understand data.
 Data: Not just information, also set of values that are measured and observed along with their context.
 The “Five W’s”
 Who: who is a subject or a case (information) -> who and what are part of data and essential (values)
 What: what are variables (information)
 Where: where are the values recorded -> why, when, where, how are part of the metadata.
 When: when are these values recorded
 How: how are these values recorded
 Why: why are these values recorded
 Big data: Data sets so large that traditional methods of storage and analysis are inadequate.
 Data mining -> when companies try to obtain actionable information from data that may have been collected in the course of doing business.
 Predictive analysis -> focuses on future performance
 Business analysis -> any use of data and statistical analysis to inform business decisions.
 Metadata -> information about data
 Data warehouses: vast digital repositories where data is recorded and stored.
 Data -> latin: givens (Plural).
 Data is stored in a table -> datatable.
 Relational database -> two or more separate data tables are linked together so that information can be merged across them.
 Cases (who?) -> rows of a datatable.
 Variables (what?) -> colomns of a datatable

2. Variables types
 Categorical: values are names of categories.
 Nominal -> just a label without a particular order. Some nominal variables are used as identifiers. Example -> steak: rare, medium,
well done.
 Ordinal -> labels with a given order.
 Identifier -> purpose is to assign a unique identifier code to each individual. Special case of nominal categorical variables.
 Quantitative: values are numerical quantities (sometimes have units) -> variables record measurements, amounts or something else but they
must have units, must be a number quantity and have units.
 Cross sectional data -> when you record something at a given points in time for different units. Example -> starbucks locations at the end of
2018.
 Time series data -> collects information for 1 subject over different points in time -> if there is displacing, then there is no time-series. Example
-> number of starbucks locations in the world for each year.


Statistics module 2: Displaying and describing categorical Data
3. Displaying and describing categorical data introduction
 Descriptive statistics: summarizes data and displays it.
 In statistics, instead of using the whole population its best to just take a sample of people from the population and making inferences on that
sample.
 Inferential statistics: making inferences on a sample ( chosen randomly).
 Display and summarize data to see: patterns, relationships, exceptions in values and observations.

4. Summarizing a categorical variable

IP Number (categorical, ordinal) Time (quantitative)ll Source (categorical, nominal)

245.240.221.71 1/feb/2013 at 13:15:08 Google
196.345.281.51 1/feb/2013 at 14:56:23 direct
 Frequency table: records the counts for each of the categories of a categorical variable. How often a variable occurs.
 Absolute frequency -> Count of the number of cases in each category.
 Relative frequency -> count of the number of cases in each category divided by the total number of cases.

Source Absolute frequency (count) Relative frequency (%)
Google 130158 57.36
Direct 52969 23.34
…. …. ….
Total 226925 100%

 Express frequency as a percentage: Compute the proportion: 130158/226925 ≈ 0.5736
 Percentage = proportion x 100% -> 0.5736 x 100% = 57.36%

, 5. Displaying a frequency table
 Bar chart -> Displays the distribution of a categorical variable, showing the counts for each category next to each other for easy comparison.
 keep it proportional, categorical bar cart must have gaps to indicate different categories, label the axes.
 Pie chart: shows how the whole group breaks into several categories and shows all the cases as a circle sliced into pieces whose areas are
proportional to the fraction of cases in each category.
 Pie chart gives less information and its really difficult to see that the story is. Bar charts are better to represent frequency tables.

6. 2 categorical variables
 Example: survey of 5039 people in 5 countries: “Do you use social network sites”
 Data table:

Respondent ID (identifier) Social networking (categorical) Country (categorical)
0001 Yes Egypt
0002 No access Egypt
…. …. ….
5039 Yes US
 Frequency table:

Social networking count Relative frequency
No 1249 24.79%
Yes 2175 43.16%
No access 1615 32.1%
Total 5039 100.1%


7. Contingency table
 List of all possible outcomes for each of the two categorical variables. Shows how individuals are distributed along variables depending on the
value of the other variables.

GB EG DE RU US Total
No 326 70 460 90 293 1249
Yes 529 300 340 500 506 2175
No access 153 630 200 420 212 1615
Total 1018 1000 1000 1010 1011 5039


 Marginal distribution: the frequency distribution of either one of the variables
 Each cell of the contingency table gives the count for a combination of values of the 2 variables.
 For every cell, we can compute three percentages
 Total percentage = 300/5039 x 100% ≈ 6.0% -> 6.0% of the total number of respondents are from Egypt and answered Yes
 Row percentage = 300/2175 x 100% ≈ 13.8% -> 13.8% of the total number of respondents who answered Yes are from Egypt.
 Column percentage = 300/1000 x 100% ≈ 30.0% -> 30.0 % of the total respondents answered Yes to the survey question.
 Compare social networking country to country -> use column percentages.

Statistics Module 3: Displaying and describing quantitative data
1. Displaying quantitative Variables.
 Quantities you can count:
 Quantities that can be measured: continuous
 E.g. Monthly Stock price ($), AIG 2002-2007.


Month AIG stock price
Jan 2002 $77.26
Feb 2002 $72.95
Mar 2002 $73.72

Nov 2007 $56.86
Dec 2007 $58.13

 How are the values of price distributed? -> construct a frequency table
 To construct a frequency table:
1) Sort the values from small to large
2) Decide which bins you will use: e.g. -> $45-$50, $50-55, $55-60, $60-65, $65-70, $70-75, $75-80
 The values that fall on the boundaries -> e.g. bin $45-50 includes 45 but not 50: left closed, right open: [$45, $50[
3) Count the number of cases that fall into each bin:

AIG stock price Count (absolute frequency) Relative frequency Density (per $)
$45-$50 2 2.8% 0.0056
$50-55 3 4.2% 0.0084

, $55-60 13 18.1% 0.0362
$60-65 16 22.2% 0.0444
$65-70 24 33.3% 0.0667
$70-75 13 18.1% 0.0362
$75-80 1 1.4% 0.0028
Total 72 100.1% /
 Before making a histogram, you should check the quantitative condition: the data must be values of quantitative variable whose units are
known. A bar chart and a histogram look similar, there not the same. You can’t display categorical data in a histogram, histograms don’t have
gaps, bar charts do.
 Bar chart of a frequency table of quantitative data: Histogram / Bar chart of relative frequency table of quantitative data: a relative frequency
histogram
 When you look at a histogram, look for four characteristics:
1) Shape: symmetry vs skew, Bumps and valleys, gasps
2) Center
3) Spread.




2. Data density
 Density histogram: area of each bar represents relative frequency. 3.62%/$ Area
 Area = height x width =18.1%
 Relative frequency = height x width of bin.
 Height = relative frequency/width of bin 

 E.g. for $55-$60 = 18.1%/$5 = 3.62% per $  in the $55-$60 bin, every interval of $1 wide contains about 3.62% of the values. $55
$60
 Density is usually expressed as decimal fraction per horizontal unit: 3.62%/$ = 3.62/100 per $ = 0.0362/$;
 Density histogram:




3. Shape
 You should pay attention to 3 things: shape, center and spread.
 Mode:
o A single mode (e.g. the bin $65-$70): unimodal distribution -> the IQR should be bigger than standard deviation, if not check again if
the distribution isn’t skewed or multimodal.
o two modes: bimodal distribution
o no clear modes: uniform distribution
o three or more modes: multimodal.
 Symmetry: when the distribution is symmetric. When you can fold the histogram in the middle and the 2 sides almost match (mirror images)
o Skewed to the left: left side has longer “tail”
o Skewed to the right: right side has longer “tail”
 Outliers: values that stick out -> they tell us something interest about the data, can be the most informative part of your data
 Centre: What is the typical stock price? -> about $65 -> if we want more precise number -> calculate the average (mean).
 Mean(average) = sum of all values
How many values there are

 The mean is sensitive to skewness. If its right skewed the mean will be bigger than the median, if its left skewed the mean will be smaller than
the median.
 Adolphe Quetelet (1796-1874): inventor of the average.
 Median = the value that splits the histogram into two equal areas, used for variables like cost or income (likely to be skewed), median is
resistant to unusual observations. Median is better choice for skewed data because its resistant to outliers.
1) Order the values

Les avantages d'acheter des résumés chez Stuvia:

Qualité garantie par les avis des clients

Qualité garantie par les avis des clients

Les clients de Stuvia ont évalués plus de 700 000 résumés. C'est comme ça que vous savez que vous achetez les meilleurs documents.

L’achat facile et rapide

L’achat facile et rapide

Vous pouvez payer rapidement avec iDeal, carte de crédit ou Stuvia-crédit pour les résumés. Il n'y a pas d'adhésion nécessaire.

Focus sur l’essentiel

Focus sur l’essentiel

Vos camarades écrivent eux-mêmes les notes d’étude, c’est pourquoi les documents sont toujours fiables et à jour. Cela garantit que vous arrivez rapidement au coeur du matériel.

Foire aux questions

Qu'est-ce que j'obtiens en achetant ce document ?

Vous obtenez un PDF, disponible immédiatement après votre achat. Le document acheté est accessible à tout moment, n'importe où et indéfiniment via votre profil.

Garantie de remboursement : comment ça marche ?

Notre garantie de satisfaction garantit que vous trouverez toujours un document d'étude qui vous convient. Vous remplissez un formulaire et notre équipe du service client s'occupe du reste.

Auprès de qui est-ce que j'achète ce résumé ?

Stuvia est une place de marché. Alors, vous n'achetez donc pas ce document chez nous, mais auprès du vendeur LukaBuggenhout. Stuvia facilite les paiements au vendeur.

Est-ce que j'aurai un abonnement?

Non, vous n'achetez ce résumé que pour €9,49. Vous n'êtes lié à rien après votre achat.

Peut-on faire confiance à Stuvia ?

4.6 étoiles sur Google & Trustpilot (+1000 avis)

50064 résumés ont été vendus ces 30 derniers jours

Fondée en 2010, la référence pour acheter des résumés depuis déjà 14 ans

Commencez à vendre!
€9,49  2x  vendu
  • (0)
Ajouter au panier
Ajouté