100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Data Science for Business - everything for the exam 2020 €7,69
In winkelwagen

College aantekeningen

Data Science for Business - everything for the exam 2020

 133 keer bekeken  4 keer verkocht

This is the summary of the book that helped me pass the Data Science course for the Business Administration Msc.

Voorbeeld 4 van de 52  pagina's

  • 8 februari 2021
  • 52
  • 2020/2021
  • College aantekeningen
  • Chintan amrit
  • Alle colleges
book image

Titel boek:

Auteur(s):

  • Uitgave:
  • ISBN:
  • Druk:
Alle documenten voor dit vak (4)
avatar-seller
khandapanda
C1: Introduction. Data-analytic thinking 3

C2: Business Problems and Data Science Solutions 5

C3: Introduction to Predictive Modeling: From Correlation to Supervised Segmentation 11

C4: Fitting a Model to Data 18

C5: Overfitting and its avoidance 27

C7: Decision analytic thinking 1: What is a good model? 39

C6: Similarity, neighbors, and clusters 42

C9: Evidence and probabilities 51

,C1: Introduction. Data-analytic thinking
At a high level, ​data science​ is a set of fundamental principles that guide the extraction of
knowledge from data. Data mining is the extraction of knowledge from data, via technologies
that incorporate these principles. As a term, “data science” often is applied more broadly than the
traditional use of “data mining,” but data mining techniques provide some of the clearest
illustrations of the principles of data science.




Figure 1-1. Data science in the context of various data-related processes in the organization.

Data-driven decision-making (DDD) refers to the practice of basing decisions on the analysis of
data, rather than purely on intuition.

The sort of decisions we will be interested in in this book mainly fall into two types: (1)
decisions for which “discoveries” need to be made within data, and (2) decisions that repeat,
especially at massive scale.

,As shown in figure 1-1: data engineering and data processing technologies are not data science
technologies. They however support data science and are useful for much more.

Big data​ essentially means datasets that are too large for traditional data processing systems, and
therefore require new processing technologies.

One of the fundamental principles of data science: ​data, and the capability to extract useful
knowledge from data, should be regarded as key strategic assets.

A fundamental strategy of data science: acquire the necessary data at a cost.

4 fundamental concepts of the rest of the book:
Extracting useful knowledge from data to solve business problems can be treated systematically
by following a process with reasonably well-defined stages:​ The Cross Industry Standard Process
for Data Mining → CRISP-DM

From a large mass of data, information technology can be used to find informative descriptive
attributes of entities of interest

If you look too hard at a set of data, you will find something—but it might not generalize beyond
the data you’re looking at.​ This is referred to as ​overfitting​ a dataset.

Formulating data mining solutions and evaluating the results involves thinking carefully about
the context in which they will be used.

, C2: Business Problems and Data Science Solutions
Despite the large number of specific data mining algorithms developed over the years, there are
only a handful of fundamentally different types of tasks these algorithms address:
- Classification and class probability estimation: attempt to predict, for each individual in a
population, which of a (small) set of classes this individual belongs to. A closely related
task is scoring or class probability estimation. A scoring model applied to an individual
produces, instead of a class prediction, a score representing the probability that that
individual belongs to each class.
- Regression: value estimation; attempts to estimate or predict, for each individual, the
numerical value of some variable for that individual.
- Similar to classification but different: classification predicts whether something
will happen, whereas regression predicts how much something will happen.
- Similarity matching: attempts to identify similar individuals based on data known about
them.
- Clustering: attempts to group individuals in a population together by their similarity, but
not driven by any specific purpose.
- Co-occurrence grouping: attempts to find associations between entities based on
transactions involving them
- Profiling: attempts to characterize the typical behavior of an individual, group, or
population.
- Link prediction: attempts to predict connections between data items, usually by
suggesting that a link should exist, and possibly also estimating the strength of the link.
- Data reduction: attempts to take a large set of data and replace it with a smaller set of
data that contains much of the important information in the larger set.
- Causal modeling: attempts to help us understand what events or actions actually
influence others.

Supervised and unsupervised were inherited from machine learning
- Unsupervised data mining: when there is no specific target
- Clustering, co-occurrence grouping, and profiling generally are unsupervised
- Supervised: when it’s done for a specific reason with a specific target
- Condition: there must be data on the target
- Classification, regression, and causal modeling generally are supervised
- Similarity matching, link prediction, and data reduction could be either

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper khandapanda. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €7,69. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 52510 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€7,69  4x  verkocht
  • (0)
In winkelwagen
Toegevoegd