Introduction
Big data
1. Data volume : size of the data sets that an organization has
collected to be analyzed and processed. Quantity of the generated
and stored data.
2. Data velocity : data is collected at an enormous speed. Compared
with small data, big data is produced more continually. 2 kinds of
velocity related to big data are the frequency of generation and
frequency of handling, recording, publishing.
3. Data variety : type and the nature of the data. Big data is
unstructured and heterogenous.
4. Data veracity : the reliability of the data (quality and value of the
data). The data must not only be large but must also achieve value
in the analysis of it.
What is data
-> a collection of data objects and their attributes
> an attribute is a property or characteristic of an object. The more
attributes, the more information about an object. The attributes describe
an object (which is a record, point, case, sample, instance or entity)
Attribute values: numbers or symbols assigned to an attribute
The same attribute can have different attribute values (height -> meters
and feet)
Different attributes can be mapped to the same set of values (ID, age ->
integers)
Types of attributes
Nominal: category or state (categorical attribute) -> ID, eye color,
ZIP code, sex
Ordinal: ranking, grade, height
Interval: has values, measured using intervals that show order,
direction and difference in values -> calendar dates, temperature in
C
Ratio: a numeric attribute with an inherent zero-point ->
temperature in K, length, time, counts
1
, discrete attribute -> has a finite or countable set of values (counts, zip
codes)
continuous attribute -> floating point variables (T, height, weight)
Dataset types
Record
Data matrix
Document data
Transaction data
Graph
www
protein interactions
Ordered
spatial data
temporal data
sequential data
molecular sequences
1. record data: data that consists of a collection of records, each of
which consists of a fixed set of attributes
a. data matrix (QR or barcode)
-> 2D code consisting of black and white dots or dots arranged
in a square or rectangular pattern (m by n matrix)
2
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller AVL2. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $3.77. You're not tied to anything after your purchase.