Summary

Summary Statistics and data sciences 188

Name: Statistics and data sciences 188
SKU: doc_1247434
Rating: 4.75 (4 reviews)
Author: hollymadison

Rating

4,8

(4)

Sold

Pages

Uploaded on

09-08-2021

Written in

2021/2022

A helpful guide of stats chapters 1 -4 and chapter 6, with examples and the formulas for semester one.

Institution

Course

Whoops! We can’t load your doc right now. Try again or contact support.

Report Copyright Violation

Connected book

David M. Levine, Kathryn A. Szabat, David Stephan Statistics for Managers Using Microsoft Excel

Edition:2016
ISBN:9780134173054
Edition:Unknown

Written for

Institution: Stellenbosch University (SUN)
Course: Statistics and data science 188 (STATISTICSANDDATASCIENCES188)

All documents for this subject (13)

Document information

Summarized whole book?: No
Which chapters are summarized?: Chapter 1 - 4 and chapter 6
Uploaded on: August 9, 2021
Number of pages: 35
Written in: 2021/2022
Type: Summary

Subjects

stats
data cleaning
defining data
organising variables
graphs
probability
normal distribution
continuous distributions
statistics and data sciences
numerical descriptive measures

Content preview

INTRODUCTION
All statistical methods require data. Data is the facts about the world that one seeks to study
and explore.
Data – summarized or unsummarized [raw].

WHAT IS STATISTICS
Statistics is the collection of methods that allow one to work w. data effectively.
Stats is a tool to obtain information from data. It provides us w. formal basis to summarize and
visualize data, reach conclusions about the data, make reliable predictions about business
activities and improve business process.

DCOVA framework
Define data you want to study to meet an objective.
Collect the data from appropriate sources.
Organize data collected by developing tables.
Visualize data by developing charts.
Analyze data collected, reach conclusions and present results.

BUSINESS ANALYTICS
Combines statistical methods w. management science and information systems to form an
interdisciplinary tool that supports fact-based decision making.

DATA SCIENCE
The field of study that combines domain expertise, programming skills and knowledge of
mathematics and statistics to extract meaningful insights from data.

BIG DATA
A collection of data that cannot be easily browsed or analyzed using traditional methods.
Is data being collected in huge volumes, at very fast rates [real time] and in variety of forms.
It may refer to large data sets of structured data stored in files / worksheets. May be
unstructured such that the data has an irregular pattern and contain values that are not
comprehensible without further interpretation [unstructured data could be text, pictures,
videos or audio].

DEFINITIONS

Descriptive statistics: Variable: characteristic /
methods of organizing, property of an item that
summarizing, and presenting can vary among the
data in an informative and occurrences of those items.
convenient way. [Note: each value for a
variable is a single fact –
Inferential statistics: methods not a list of facts].
used to make a conclusion
about a characteristic of a Data: set of values
population, based on a associated w. one / more
smaller sample of the variables.
population.
Statistics: methods that
analyze the data of the
variables of interest.

,CLASSIFYING VARIABLES BY TYPE
Categorical [qualitative] variables:
- Take categories as their values [e.g. “yes” / “no”].

Numerical [quantitative] variables:
- Have values that represent a counted / measured quantity.
o Discrete variables arise from a counting process. Values are countable over a
finite range.
o Continuous variables arise from a measuring process. Values are uncountable
over a finite range.

MEASUREMENT SCALES
Nominal scale – classifies categorical data into distinct categories in which no ranking is
implied.
Ordinal scale – classifies categorical data into distinct categories in which ranking is implied.
Numerical variables use an interval scale or ratio scale
- Interval scale: ordered scale in which the difference btwn. measurements is a
meaningful quantity but the measurements do not have a true zero point.
- Ratio scale: ordered scale in which the difference btwn. measurements is a
meaningful quantity and the measurements have a true zero point.

Variables

Categorical Numerical

Ordinal Nominal Discrete Continuous

E.g. Ratings; Good, E.g. Marital status / E.g. Number of
Better, Best eye colour children / defects E.g. Weight / time
[ordered [defined per hour [counted [measured
categories] categories] items] characteristics]

POPULATION VS SAMPLE
Data is collected from a population / sample.
Population:
- Contains all items / individuals of interest that you seek to study / about which you
want to reach conclusions.
Sample:
- Contains only a portion of a population of interest.
- Use because:
o Less time consuming than selecting every item in population.
o Less costly than selecting every item in population.
o Less cumbersome and more practical than analyzing entire population.
- Analyzed to estimate characteristics of an entire pop.
o Population parameter summarizes the value of a specific variable for sample
data.
o Sample statistic summarizes value of a specific carriable for sample data.
o Sample statistics are used to estimate population parameters.
OBSERVATIONAL STUDIES AND DESIGNED EXPERIMENTS
Have a common objective.
- Both attempt to quantify the effect that a process change [called a treatment] has
on a variable of interest.
In observational study, no direct control over which items receive treatment.

,In designed experiment, is direct control over which items receive treatment.

SOURCES OF DATA
Primary sources: Secondary sources:
- Data collector is one using data for - Person performing data analysis is
analysis: not data collector:
o Data from political survey. o Analyzing census data.
o Data collected from o Examining data from print
experiment. journals / data published on
o Observed data. internet.

SOURCES OF DATA ARISE FROM
Capturing data generated by ongoing business activities.
Distributing data compiled by an organization / individual.
Compiling the responses from a survey.
Conducting a designed experiment and recording the outcomes.
Conducting an observational study and recording results.

SAMPLING PROCESS
Begins w. sampling frame:
- Sampling frame is a listing of items that make up pop.
- Frames are data sources.
- Inaccurate / biased results can result if frame excludes certain groups / portions of
pop.
- Using different frames to generate data can lead to dissimilar conclusions.

TYPES OF SAMPLES

Samples

Non Probability Probability
Samples Samples

Simple
Judgement Convenience Systematic
Random

Stratified Cluster
Non Probability sample
Items included are chosen without regard to their probability of occurrnce.
- In Convenience sampling, items are selected based only on fact they are easy,
inexpensive or convenient to sample.
- In Judgement sample, get options of pre-selected experts on subject matter.

Probability sample
Items in sample are chosen on basis of known probabilities.
Simple Random sample
- Every individual / item from frame has equal chance of being selected.
- Selection may be w. replacement [selected individual is returned to frame for
possible reselection] or w. out replacement [selected insividual is not returned to
frame].
- Samples obtained from table of random numbers / comp random number
generators.

, Systematic sample
- Decide on sample size : n.
- Divide frame of N individuals into groups of k individuals : k = N / n.
- Randomly select one individual from 1st group, i.e. choose a sample btwn 1 and k.
- Select every kth individual thereafter [e.g. if you choose 2 and k=10 then it will be 2,
12, 22, etc.].

Stratified sample
- Divide pop into two / more subgroups [called strata] according to some common
characteristic.
- Simple random sample is selected from each subgroup, w. sample sizes proportional
to strata sizes.
- Samples from subgroups are combined into one.
- This is common technique when sampling population of voters, stratifying across
provincal / socio-economic lines.

Cluster sample
- Population is divided into several “clusters”, each representative of the pop.
- Simple random sample of clusters is selected.
- All items in selected clusters can be used, or items can be chosen from a cluster using
another probability sampling technique.
- Common application of cluster sampling involvers election exit polls, where certain
election districts are selected and sampled.

Comparing sampling methods
- Simple random sample and systematic sample:
o Simple to use.
o May not be good representation of populations underlying characteristics.
- Stratified sample:
o Ensures representation of individuals across entire population.
- Cluster sample:
o More cost effective.
o Less efficient [need larger sample to acquire the same level of precision].
Selection w. probability propertionate to size
- In cases of random sample, elements of population are selected without monetary
value on invoice playing a role [e.g. if we consider sales, an invoice w. value R10 has
same probability of being selected as invoice w. value R100].
- If correctness of monetary value must be verified, the magnitude of monetary value
becomes important.
- In such a case a selection process that takes the magnitude of monetary values on
each invoice into account is preferred.
- Refer to this type of selection process as selection proportional to size [PPS], where
size refers to monetary value on each invoice.
- Suppose several invoices must be selected from N invoices via PPS selection process.
- Let T denote the total monetary value of the N invoices.
- According to PPS selection process, each of T rand units has an equal probability of
being selected.
- This implies invoice w. R4 000 entry has a probability of selection four times as large as
selection probability of an invoice w. R1 000 entry.
- In this case, dealt w. two types of elements [invoices and rand units].
- W. PPS selection an invoice is selected in an indirect manner, because a rand unit is
selected first and then the invoice on which it occurs is selected.
- Note that each rand unit ahs same chance of selection, but chance of selection for
each invoice is proprtionate to number of rand units that appears on it.

Example

R105,00

Get access to the full document:

Purchased by 16 students

100% satisfaction guarantee

Immediately available after payment

Both online and in PDF

No strings attached

Get to know the seller

hollymadison

4,7

(20)

Document also available in package deal

Reviews from verified buyers

Showing all 4 reviews

MiaByrne · 11 reviews

1 year ago

sunetteels · 33 reviews

2 year ago

vickyarm · 1 review

2 year ago

carmenbooyens · 5 reviews

3 year ago

4,8

4 reviews

Trustworthy reviews on Stuvia

All reviews are made by real Stuvia users after verified purchases.

Get to know the seller

hollymadison Stellenbosch University

View profile

Sold

152

Member since

4 year

Number of followers

103

Documents

Last sold

7 months ago

4,7

20 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their exams and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can immediately select a different document that better matches what you need.

Pay how you prefer, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card or EFT and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying this summary from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller hollymadison. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy this summary for R105,00. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 44104 documents were sold in the last 30 days Founded in 2010, the go-to place to buy summaries for 15 years now

Summary Statistics and data sciences 188

Connected book

Written for

Document information

Subjects

Content preview

More courses for Stellenbosch University (SUN) >

Document also available in package deal

Reviews from verified buyers

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay how you prefer, start learning right away

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying this summary from?

Will I be stuck with a subscription?

Can Stuvia be trusted?