Interactive Data Transforming | Lecture 4
In the early days of databases (using DBMS), data was stored using file systems, but this approach had
many issues (which was explained in lecture 1).
Following Developments RDBMS.
It’s designed to take care of DBMS drawbacks/inefficiencies. Data is stored in the form of tables and
maintains the relationships among tables.
Big Data refers to large and complex data sets that require advanced methods of processing to
uncover valuable insights and help with decision-making. Here’s a breakdown of the key
characteristics of Big Data, known as the “Vs”:
1. Volume
Amount of generated and stored data. For example, Wikipedia has millions of articles across
different languages, contributing to massive data storage needs.
2. Velocity
This describes the speed at which data is created, collected, and processed. For example,
Wikipedia has thousands of editors making constant updates, adding to the continuous flow
of new data.
3. Variety
Data comes in many forms, from structured data (like RDBMS) to unstructured data (like
social media posts, which can include text, images and videos).
Big Data is data that’s high volume, high velocity and high variety information assets that require new
forms of processing to enable enhanced decision making and insight discovery.
For smaller datasets we just use RDBMS. Big Data analytics is about dealing with massive amounts of
data in ways that traditional methods, like using databases (SQL), can’t handle efficiently. Solution is
to compromise: instead of perfects answers, we focus on patterns, trends, and the most important
information (e.g. top results or partial answers). Integral parts of Big Data analytics:
Interactive Processing
Users are involved in the data analysis. They give feedback or opinions during the process.
This helps the system make decisions, as users understand the problem and guide the
analysis.
, Approximate Processing
Instead of analyzing all the data, the system looks at a sample that represents the whole. This
method gives approximate answers, not exact ones, but it’s much faster. For example, if 95%
of people have similar behavior, the system will assume this is representative.
Crowdsourcing Processing
Complex tasks are given to groups of people to solve. Humans fill in surveys for example, in
exchange for small payments. Challenges: deciding what to ask, how to ask, and how to
handle different or conflicting answers.
Progressive Processing
The system starts showing results as soon as possible, even if it hasn’t processed all the data.
This is useful when time or computing power is limited, so users can see early results and
work with them.
Incremental Processing
Since Big Data is constantly changing, results from earlier processing can quickly become
outdated. This method allows systems to update their results when new data comes in,
correcting or completing earlier analyses.
Ensuring transparency, providing clear explanations, and making results interpretable are key to
getting users to trust and accept new technologies, even when data challenges affect the results.
RDBMS have limitations when it comes to handling the massive amounts of data and demands we
face today. Here’s why people are moving beyond traditional RDBMS:
1. Data Growth: The amount of data keeps growing rapidly, and RDBMS often struggles
to keep up.
2. User Expectations: Users expect faster access to more complex data, which puts
pressure on databases.
3. Scaling Limitations: RDBMS can only add more resources like memory or storage to a
single server, but it can’t add more servers which limits the ability to handle large
amounts of data.
Scaling is the ability of a system to handle increasing amounts of data (by adding
more servers or storage for example)
Here are some alternatives to Traditional RDBMS: