100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Summary Data Mining for Business & Governance €3,29   In winkelwagen

Samenvatting

Summary Data Mining for Business & Governance

1 beoordeling
 89 keer bekeken  7 keer verkocht

Summary Data Mining for Business & Governance, written in the spring semester of 2021 for Data Science & Society, Tilburg University.

Voorbeeld 4 van de 65  pagina's

  • 6 oktober 2021
  • 65
  • 2020/2021
  • Samenvatting
Alle documenten voor dit vak (6)

1  beoordeling

review-writer-avatar

Door: bvdbogaart • 2 jaar geleden

avatar-seller
xtessaroes
Recap before midterm
What is data mining?
(slides) Data mining is the computational process of discovering patterns
in large data sets involving methods at the intersection of artificial
intelligence, machine learning, statistics and database systems.
(google) Data mining is searching for patterns in data. In exact words,
data is the actual extraction of knowledge from data via technologies that
incorporate these principles.
(slides Chris) Data mining is a concept to unify statistics, data analysis
and their related methods in order to understand and analyze actual
phenomena with data.
With data mining, we want to prove that something can be predicted
better than the baseline, or that a certain method works better than a
method that has been explored before.


What are the related disciplines?
The related disciplines that have overlap with data mining are;
1. Artificial Intelligence (AI): interdisciplinary field aiming to develop
intelligent machines
2. Machine Learning (ML): branch of computer science studying
learning from data
3. Statistics: branch of mathematics focused on data
4. Information retrieval/knowledge discovery in databases
Others are;

,
,What are the applications?
In companies, data mining is applied as business intelligence (market
analysis and management).
In science, data mining is applied as knowledge discovery (scientific
discovery in large data). In science, also text mining (natural language
processing) is used, which is going form unstructured text to structured
knowledge.


What is big data?
(slides) Big data consists of three parts;
1. Volume: data that is too big for manual analysis, too big to fit in
RAM and too big to store on disk.
2. Variety: big data has high ranges of values (variance), has outliers,
confounders and noise, and consists of different data types.
3. Velocity: big data changes quickly (require results before data
changes) and big data is streaming data (no storage).
(readings) Datasets that are too large for traditional data-processing
systems and that therefore require new technology. There is big data 1.0
(businesses got the basic internet technologies in place so that they could
establish a web presence, build electronic commerce capability and
improve operating efficiency. With big data 2.0, new systems and
companies started to exploit the interactive nature of the web. The
changes brought on by this shift in thinking are extensive and pervasive;
the most obvious are the incorporation of social-networking components
and the rise of the ‘voice’ of the individual consumer and citizen.


Different types of learning: supervised and unsupervised
Supervised learning (classification, regression) is done using a ground
truth; we have prior knowledge of what the output values of our samples
should be. The goal of supervised learning is to learn a function that,
given a sample of data and desired outputs, best approximates the
relationship between input and output observable in the data. Supervised

, learning means that the data is labeled. In supervised learning, you know
x and y.
Unsupervised learning (clustering, dimensionality reduction) does not
have labeled outputs, so its goal is to infer the natural structure present
within a set of data points. Unsupervised learning means that the data is
not labeled, we want to find patterns within the data. In unsupervised
learning, you know only x (you do not know yet what to research). In
short, unsupervised learning can be defined as data mining algorithms
that infer patterns from a dataset without reference to outcomes or
decisions.
Semi-supervised classification is a combination of both. It means that
we have some instances we shall attach to the decision classes, and we
have a small amount of labeled data with a large amount of unlabeled
data.


Examples of supervised and unsupervised learning (regression,
classification, clustering, dimensionality reduction)
Supervised: regression, classification (3 parts; input, output and function)
Unsupervised: clustering, dimensionality reduction


 Workflow of supervised learning
1. Collect data
2. Label examples
3. Choose representation (features are numerical or categorical,
possibly convert to feature vector)
4. Train models (use a training set for learning, and a validation
set for tuning. hyperparameters are settings of learning
algorithms. For each value of hyperparameters, apply
algorithm to training set to learn, check performance on
validation set and find the best-performing setting)
5. Evaluate (check performance of tuned model on test set. You
want to estimate how well your model will be do in the real
world).

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper xtessaroes. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €3,29. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 59325 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€3,29  7x  verkocht
  • (1)
  Kopen