These are the lecture notes I created which I used to revise for the CS3002 Artificial Intelligence exam at Brunel University in which I received a First Class in.
Machine learning is the ability to simulate human learning and appears in different forms.
Learning
• Unsupervised learning is a way from learning from historical situations or data without any
knowledge of what is it supposed to be doing with that data. There is no signal to tell it that
it is doing good or bad. Unsupervised is without the desired output.
• A task of a supervised learning algorithm is learning with the desired output.
• Reinforcement learning – reward / punishment signals.
Methods:
Supervised:
• Classification, the input is some data, and the output is a decision
• Regression, the output is not a decision but a number so it may be a prediction.
Unsupervised:
• Learning without the desired output (teacher signals)
• Clustering is one of the most widely used unsupervised learning methods.
Clustering
There is no desired output but finding structure in data or the inputs. For example, a human will see
cats and recognise what a cat is…
• Clustering is to partition a data set into subsets, clusters, so that the data in each subset share
some common trait. Often similarity or proximity for some defined distance measure.
The process of organisation objects into groups whose members are similar in some way. A cluster is
a group of objects that is somehow similar to themselves and dissimilar to the objects to other
clusters.
Clustering algorithms have several uses. For example:
• Social networks: People who talk to each other more, separate cluster. E.g. Marketing ads to
this cluster, more likely to share between friends.
• Customer Segmentation: Different clusters of consumers, know how to advertise to them.
• Gene networks: Understanding gene interactions, identifying important genes linked to
disease.
,Pattern Similarity
A key concept in clustering is similarity. Clusters are formed by similar patterns. We need to define
some metric to measure similarity. Using distance, the shorter the distance the more similar the two
patterns.
Euclidean distance is the most common one.
You go through each pair of data points; you want to see how close they sit in the data space. You go
through all variables and subtract the difference between each measurement. For Euclidean you
sum them up and square root the value.
,K-Means Clustering
The K parameter determines how many clusters are going to be in the final clustering arrangement.
Limitations: At each iteration of K-Means a pattern can be assigned to only one cluster. E.g:
Advantages:
• Computationally faster than hierarchical clustering (if K is small)
• May produce tighter cluster than hierarchical clustering, especially if the clusters are global
Disadvantages:
• Fixed number of clusters makes it difficult to predict what K should be
• Different initial partitions (the initial allocation of centroids) can result in different final
clusters
• Potential empty cluster (not always bad)
• Does not work well with non-globular clusters (so if you have elongated or massively
different size clusters its difficult)
, Hierarchical Clustering
A form of agglomerative clustering.
It generates a load of different clustering result. Each object / data point in its own cluster and then
it runs through the algorithm and is all in one cluster at the end.
The intermediate clusters is what is important while it creates a series of merges. Creating a
dendrogram. Allows you to see the underlying structure of the data set, so for example there is 3 or
4 clusters in this data set. Unlike K means there is a
visualisation of what the number of clusters should be.
You have more control over the structure that you want
to cut the dendrogram into. You can control how many
clusters you have and how big they need to be.
Single: Smallest distance between any two pairs from the two clusters being compared
Average: The average distance between pairs
Complete: The largest distance between any two pairs from the two clusters being compared
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller cslbrunel. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for £6.79. You're not tied to anything after your purchase.