1 LECTURE 1
1. Big data is a big thing. Often, the 5 V’s are used to understand this phenomenon of big
data. Explain all v’s.
Volume: this refers to the vast amounts of data generated every second. This increasingly makes data sets
too large to store and analyse using traditional database technology. With big data technology we can now
store and use these data sets with the help of distributed systems.
Velocity: refers to the speed at which new data is generated and the speed at which data moves around. Big
data technology allows us new to analyse the data whole it is being generated, without ever putting it into
databases.
Variety: refers to the different types of data we can now use. In the past it was structured data that neatly
fits into tables or relational databases, such as financial data. Nowadays, (80%) the data is now much more
unstructured, and therefore can’t easily be put into tables (photos, video, social media files). With big data
technology, we can now harness different types of data (structured and unstructured) including messages,
social media conversations, photos, sensor data, video or voice recordings and bring them together with
more traditional, structured data.
Veracity: refers to the messiness or trustworthiness of the data. With many forms of big data, quality and
accuracy are less controllable (just think of Twitter posts with hash tags, abbreviations, typos and colloquial
speech as well as the reliability and accuracy of content) but big data and analytics technology now allows
us to work with these types of data. The volumes often make up for the lack of quality or accuracy.
Value: Then there is another V to take into account when looking at Big Data: Value. It is all well and good
having access to big data but unless we can turn it into value it is useless. So you can safely argue that 'value'
is the most important V of Big Data. It is important that businesses make a business case for any attempt
to collect and leverage big data. It is so easy to fall into the buzz trap and embark on big data initiatives
without a clear understanding of costs and benefits.
2. “There are many reasons for Nokia’s downfall, but one of the biggest reasons that I
witnessed in person was that the company over-relied on numbers. They put a higher value
on quantitative data, they didn’t know how to handle data that wasn’t easily measurable,
and that didn’t show up in existing reports.”
What was the general message by Tricia Wang based on Nokia’s downfall?
Well, she said that companies must be aware of quantitative bias or quantitative addition. One must not put
to much emphasize on just quantitative numbers. Big data must be supported with thick data (emotion,
context, meaning). In addition, companies should improve the use of big data or analytics by seeing the
whole picture of decision making. Also, only using a quant data means that a lot of data is missed out on,
not reported or included.
3. Why is big data nowadays so booming?
Due to the availability of massive amounts of digital data in combination with technical developments, as
well as the social needs. In addition, many people are discovering the power of data.
4. Explain the differences in viewpoints on big data by rationalism versus empiricism.
Well, the empiricists are the ones that just want to measure and analyse everything that is possible.
Rationalists however, argue that things like data and sensors misguide us. They argue that we first need a
solid basis and good principles before we collect data.
1