100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached
logo-home
FULL data mining summary $14.63
Add to cart

Summary

FULL data mining summary

 0 purchase
  • Course
  • Institution

This is a full summary of the course Data mining given by Prof. Fransen, Prof. Laukens and Prof. Meysman. It includes all the topics of the theory classes. Using this together with the notes from the practical classes gave me a 16/20.

Preview 4 out of 84  pages

  • March 19, 2025
  • 84
  • 2023/2024
  • Summary
avatar-seller
Introduction
Confidence Confident

Moment of lecture @February 14, 2024

Review @February 17, 2024

Materials advdatanalysis-01-introduction.pdf

Last Edited @March 12, 2025 10:32 AM

People always looked at the human body in a disciplinary way, they have their
own perspective but comprehend with each other. This is not changed but a
new perspective is emerging such as large scale data. New technologies have
become available which is a new perspective. This new perspective requires
new techniques to look into but they all fit into ‘big data’ (nowadays: deep
learning, AI, etc).
People refer to big data as data for which conventional computer-techniques
are not sufficient anymore (in example, size). The present tools will need to
solve more complex problems which means people need to be smarter with the
tools. Big data is also considered as a disruptive trend in computer sciences.
Big data is characterized by 4 main aspects:

Volume: the amount of data you’re dealing with. Example: having a genome
on paper, stacks and stacks of data

Moore's law is a prediction that the number of transistors on a
chip doubles every two years, making computers faster and cheaper.

Velocity: the speed at which data is produced/collected and the fact that it
is produced all the time, machines are producing data all the time. There is
data everywhere and it changes our world.

For example: smartphones have a massive amount of data that it holds
at all times

There is need for new, effective, high-tech data transfer approach

The speed increases faster then the staff




Introduction 1

, Variety: in life sciences there are different data types/data sets. A
distinction is made between structured and unstructured data → 80% of the
data is unstructured. Life sciences have much more variability in the data
that is collected.

Examples: DNA sequencing, morphology, metabolic data, protein
structures, etc

Transcriptome is more variable than the genome

Veracity: the data is never perfect (for example: noise, biases, missing
points) and it is problematic in life sciences because it is present almost
everywhere (it is also present in other aspects of life but always in life
sciences).

⇒ Large scale data and AI brought a new data intensive research paradigm. A
lot of science nowadays is started from data from which predictions and
hypothesis are made. Mostly the paradigm during the research shifts.
Terminology

Data = collection of objects (known as record, point, case, etc) and their
attributes, objects could be the samples and the attributes could be the
measurements performed on the objects but also a feature or a variable.

Attributes = property or characteristic of an object. A collection of attributes
describes the object → more attributes means more knowledge about the
object.
Example: student = object, attributes of student are grades, student number,
etc
It is typical to have the objects in rows and attributes in columns.




Introduction 2

, An attribute value = numbers or symbols assigned to an attribute. Examples of
difference with attributes:

Same attribute can be mapped to different attribute values: height can be
measured in feet or meters

Different attributes can be mapped to the same set of values: attribute value
for ID and age are integers (= gehele getallen)

→ Properties of attribute values can still be different (ID number has no limit but
age does).
There are different types of attributes:

Nominal → only has the distinction mathematical property

Examples: ID numbers, eye color, zip codes

Ordinal → has both the distinction and order mathematical property

Examples: rankings (e.g., taste of potato chips on a scale from 1-10),
grades, height in {tall, medium, short}

Interval → has the distinction, order or addition mathematical properties

Examples: calendar dates, temperatures in Celsius or Fahrenheit.

Ratio → has all 4 mathematical properties

Examples: temperature in Kelvin, length, time, counts

⇒ Distinction is based on mathematical properties they have: distinction, order,
addition and/or multiplication.




Introduction 3

, You can also make a distinction between discrete and continuous attribute
(discrete is an integer and continuous is a real number which means it can have
a comma):

Discrete Attribute: has only a finite or countable infinite set of values. Often
represented as integer variables.

Examples: zip codes, counts, or the set of words in a collection of
documents

Continuous Attribute: has real numbers as attribute values. Practically, real
values can only be measured and represented using a finite number of
digits. Continuous attributes are typically represented as floating-point
variables.

Examples: temperature, height, or weight.


Dataset types
There are 3 main types of datasets:

Record data

Graph data

Ordered data

Record data

= data that consists of a collection of records, each of which consists of a fixed
set of attributes.
Data Matrix




Introduction 4

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller Studentje2001. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $14.63. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

76388 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 15 years now

Start selling
$14.63
  • (0)
Add to cart
Added