100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached
logo-home
"Comprehensive Notes on Data Mining and Data Warehousing Techniques for Effective Information Extraction and Analysis" $7.89   Add to cart

Class notes

"Comprehensive Notes on Data Mining and Data Warehousing Techniques for Effective Information Extraction and Analysis"

 0 view  0 purchase
  • Course
  • Institution

Data mining and data warehousing are two related fields that deal with the process of extracting useful information from large datasets. Your DMDW notes might cover a wide range of topics, including data preprocessing, data mining techniques such as clustering, classification, and association rule ...

[Show more]

Preview 2 out of 5  pages

  • April 30, 2023
  • 5
  • 2021/2022
  • Class notes
  • Student
  • All classes
avatar-seller
Data Mining – Cluster Analysis
Cluster Analysis is the process to find similar groups of objects
in order to form clusters. It is an unsupervised machine
learning-based algorithm that acts on unlabelled data. A group
of data points would comprise together to form a cluster in
which all the objects would belong to the same group.
Cluster:
The given data is divided into different groups by combining
similar objects into a group. This group is nothing but a cluster.
A cluster is nothing but a collection of similar data which is
grouped together.
For example, consider a dataset of vehicles given in which it
contains information about different vehicles like cars, buses,
bicycles, etc. As it is unsupervised learning there are no class
labels like Cars, Bikes, etc for all the vehicles, all the data is
combined and is not in a structured manner.
Now our task is to convert the unlabelled data to labelled data
and it can be done using clusters.
The main idea of cluster analysis is that it would arrange all the
data points by forming clusters like cars cluster which contains
all the cars, bikes clusters which contains all the bikes, etc.
Simply it is the partitioning of similar objects which are applied
to unlabelled data.
Properties of Clustering :
1. Clustering Scalability: Nowadays there is a vast amount
of data and should be dealing with huge databases. In order to
handle extensive databases, the clustering algorithm should be
scalable. Data should be scalable, if it is not scalable, then we
can’t get the appropriate result which would lead to wrong
results.
2. High Dimensionality: The algorithm should be able to
handle high dimensional space along with the data of small
size.
3. Algorithm Usability with multiple data kinds: Different
kinds of data can be used with algorithms of clustering. It

, should be capable of dealing with different types of data like
discrete, categorical and interval-based data, binary data etc.
4. Dealing with unstructured data: There would be some
databases that contain missing values, and noisy or erroneous
data. If the algorithms are sensitive to such data then it may
lead to poor quality clusters. So it should be able to handle
unstructured data and give some structure to the data by
organising it into groups of similar data objects. This makes the
job of the data expert easier in order to process the data and
discover new patterns.
5. Interpretability: The clustering outcomes should be
interpretable, comprehensible, and usable. The interpretability
reflects how easily the data is understood.
Clustering Methods:
The clustering methods can be classified into the following
categories:
 Partitioning Method
 Hierarchical Method
 Density-based Method
 Grid-Based Method
 Model-Based Method
 Constraint-based Method
Partitioning Method: It is used to make partitions on the data
in order to form clusters. If “n” partitions are done on “p”
objects of the database then each partition is represented by a
cluster and n < p. The two conditions which need to be
satisfied with this Partitioning Clustering Method are:
 One objective should only belong to only one group.
 There should be no group without even a single purpose.
In the partitioning method, there is one technique called
iterative relocation, which means the object will be moved from
one group to another to improve the partitioning
Hierarchical Method: In this method, a hierarchical
decomposition of the given set of data objects is created. We
can classify hierarchical methods and will be able to know the
purpose of classification on the basis of how the hierarchical
decomposition is formed.

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller ajijnadaf. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $7.89. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

76800 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 14 years now

Start selling
$7.89
  • (0)
  Add to cart