Garantie de satisfaction à 100% Disponible immédiatement après paiement En ligne et en PDF Tu n'es attaché à rien
logo-home
Samenvatting Data Mining €6,99
Ajouter au panier

Resume

Samenvatting Data Mining

1 vérifier
 300 vues  32 fois vendu

Uitgebreide samenvatting van de slides en lesnotities. Leverde mij een 15/20 op :)

Aperçu 4 sur 82  pages

  • 3 octobre 2022
  • 82
  • 2021/2022
  • Resume
Tous les documents sur ce sujet (2)

1  vérifier

review-writer-avatar

Par: margotwillemse • 1 année de cela

avatar-seller
emmabosteels
Lecture 1: Introduction
Definitions
Data mining = “The automatic extraction of patterns from large amounts of data”
→ via tools/technologies that incorporate the principles of data science

Data science = “A set of fundamental principles that guide the extraction of knowledge from data”
→ that what’s underneath data mining

Importance of data science
You need more data savvy managers than you need data scientists. Everyone in the company should
know to some extent what to do with data. Firms that are data-driven will have 5-6% higher
productivity, higher market value, and higher return on equity.

“AI is the new electricity”: companies will go through the same transformation as electricity.

Roles in data science:
• Data architect (focuses on the storage of data; how should the data base look like?)
• Data analyst (won’t do data mining)
• Data & machine learning engineer (they put data science into production)
• Data scientist (works with the models and presents results)

Skills: Data scientist must know about all these 3 skills.




1.1 What is Data Mining?
Example: hurricane Frances
Target wanted to find non-obvious buying patterns of customers before a hurricane. Then, they could
stock up on it and have lots of sales.

Terminology
Big Data = data that is so large that traditional data processing systems are unable to deal with it (both
storage and analysis component)

Querying and reporting
• You know exactly what you are looking for: e.g. What is the profitability of my store in Brussels?
• SQL

OLAP (On-Line Analytical Processing) = advanced query and reporting
• Multidimensional analysis
• Nice visualization, data cubes, roll-up, slice and dice, ...


1

,Business Intelligence = Getting the right information to the right person at the right time.

Data warehousing: collect and coalesce data from across an enterprise, often from multiple
transaction-processing systems, each with its own database.




Machine Learning
• Improving the knowledge of a learning agent
• More than just data mining, also computer vision and robotics
Artificial Intelligence
• A computer interacts through data
• Learning from data leads to intelligence
• Big Data + Machine Learning = Artificial Intelligence
• Renewed interest from Deep Learning
• Most work in AI is on data mining
The separation between these 2 fields has blurred.

Data Mining
Example: credit scoring in banks
Bank: should I grant credit to this loan applicant? → We want to predict creditworthiness, based on
historical data.
All major banks use data mining for credit scoring. The model will predict whether you’re able to repay
the loan. The bank has information on everyone at their bank: income, profession, the amount of the
loan, the mortgage, the value of the house they want to buy etc. And they also know whether they
repaid their loans or not. That’s the dataset. You then give the dataset to the data mining algorithm to
learn what predicts whether the loan will be repaid. So, you need to know the value of what you want
to predict (= initial target variables) for some set of customers. In other words: you need to know which
people did and which people didn’t repay their loan. Only then the algorithm can learn.

We need:
• Input matrix X
• Target variable Y = the variable you want to predict
• Vector/column = feature/input variable
• Row = a data instance (e.g. a customer)




2

,The upper half of the figure illustrates the mining of historical data to produce a model. Importantly,
the historical data have the target value “class” specified. The bottom half shows the result of the data
mining, where the model is applied to new data for which we do not know the class value. The model
predicts both the class value and the probability that the class variable will take on that value.

Other examples:
• Market basket analysis
• Recommendation systems
• Facebook likes predict personality traits
o Study done at Cambridge: Can we use Facebook likes to predict personality
characteristics? For most of these characteristics, the prediction worked quite well.
o You build your dataset: each row is a user and each column is a potential Facebook
page that you like. If you liked the page it’s ‘1’, otherwise ‘0’.
o What to predict? For example: gender or IQ. Some people were willing to give their
information to Facebook. The company used this dataset to develop patterns.
o Nice thing about linear model is that you can ask things like: give me the top 10 pages
with highest predicted IQ scores.
• Clustering
• Predicting political preference with Twitter

1.2 Data Mining Process
CRISP-DM: Cross Industry Standard Process for Data Mining




3

, Business understanding: understanding the problem to be solved.
The initial formulation may not be complete, so multiple iterations may be necessary for an optimal
solution formulation.
 Analyst’s creativity plays a great role here
 What exactly do we want to do? How would we do it? What parts of this use scenario
constitute possible data mining models?

Data understanding: where is the data coming from, what is the data?
Understand the strengths and limitations of the data. Historical data often are collected for purposes
unrelated to the current business problem.
Costs of data can also vary → estimate costs and benefits of each data source.
Uncover the structure of the business problem and the data that are available, and then match them
to one or more data mining tasks for which we may have substantial science and technology to apply.

Data preparation:
You might have outliers, numbers in € and in $ etc. So, often data has to be manipulated and converted
into another form that yields better results.
Important → beware of leaks: a leak = a situation where a variable collected in historical data gives
information on the target variable, it’s information that appears in historical data but is not actually
available when the decision must be made.

Modeling: make the model and look for patterns in the dataset.
The output of this stage is a model/pattern capturing regularities in the data.

Evaluation: is this model good or not? It’s quite a difficult step!
Assess the data mining results and gain confidence that they are valid and reliable before moving on.
It is also used to help ensure that the model satisfies the original business goals.
 If a model passes strict evaluation tests “in the lab”, there may be external considerations that
make it impractical.

Deployment: if the model is good, you start using it in practice.
The results of the data mining are put into real use. For example: implementing a predictive model in
an information system or business process.

Which step would be the most difficult one?
Data preparation! Takes up most time, it’s non-fun part. But it’s really important.
Modeling is an easier step! It’s often a matter of milliseconds.




Craft: you learn by doing: the more you do it, the easier it becomes.
Creativity: it’s typically an inherent skill. What kind of variables could be useful? What can I do with
this model? Could I use it in other settings as well?
Common sense: if the model comes out and the evaluation says it’s always correct in all predictions,
you should realize that’s not possible.


4

Les avantages d'acheter des résumés chez Stuvia:

Qualité garantie par les avis des clients

Qualité garantie par les avis des clients

Les clients de Stuvia ont évalués plus de 700 000 résumés. C'est comme ça que vous savez que vous achetez les meilleurs documents.

L’achat facile et rapide

L’achat facile et rapide

Vous pouvez payer rapidement avec iDeal, carte de crédit ou Stuvia-crédit pour les résumés. Il n'y a pas d'adhésion nécessaire.

Focus sur l’essentiel

Focus sur l’essentiel

Vos camarades écrivent eux-mêmes les notes d’étude, c’est pourquoi les documents sont toujours fiables et à jour. Cela garantit que vous arrivez rapidement au coeur du matériel.

Foire aux questions

Qu'est-ce que j'obtiens en achetant ce document ?

Vous obtenez un PDF, disponible immédiatement après votre achat. Le document acheté est accessible à tout moment, n'importe où et indéfiniment via votre profil.

Garantie de remboursement : comment ça marche ?

Notre garantie de satisfaction garantit que vous trouverez toujours un document d'étude qui vous convient. Vous remplissez un formulaire et notre équipe du service client s'occupe du reste.

Auprès de qui est-ce que j'achète ce résumé ?

Stuvia est une place de marché. Alors, vous n'achetez donc pas ce document chez nous, mais auprès du vendeur emmabosteels. Stuvia facilite les paiements au vendeur.

Est-ce que j'aurai un abonnement?

Non, vous n'achetez ce résumé que pour €6,99. Vous n'êtes lié à rien après votre achat.

Peut-on faire confiance à Stuvia ?

4.6 étoiles sur Google & Trustpilot (+1000 avis)

53340 résumés ont été vendus ces 30 derniers jours

Fondée en 2010, la référence pour acheter des résumés depuis déjà 14 ans

Commencez à vendre!
€6,99  32x  vendu
  • (1)
Ajouter au panier
Ajouté