HMPYC 80:
Research
Methodology
Section C
Chapter 10: Big Data in
Social Science HMPYC 8
Research
, ➢ 1. Introduction:
• There has been exponential growth over data over recent years due to increased technological advances; Gr
exponentially from gigabytes (GB) to terabytes (TB) two petabytes (PB) to exabytes (EB) and now to zettabyt
- with this expansion of data , the traditional database storage mechanisms have become obsolete or outdat
new opportunities and challenges.
• Contrary to popular belief, most social science research projects, to some extent, lacks scientific validity, and
result of using relatively small datasets.
- Qualitative projects frequently use small samples of less than 20 participants, while quantitative studies wi
300 to 500 or relatively common.
• A further limitation of social science databases is that a single survey represents a snapshot of a carefully sel
within a certain population, with representation seldom achieved.
• With the changing database landscape, social science researchers will need to take cognizance of new develo
data collection and how this affects social science research in general.
• By entering a period where everything can unobtrusively be measured by means of some kind of sensor, and
or transmitted to live databases running in the background, huge datasets have become common.
- Sensors enable knowledge about the status of vehicles, the levels of milk in the fridge, and how long it wou
to ripen in the field. Perhaps the only being, or object, not yet measured extensively by sensors, is the huma
• Smart applications such as electronic wristwatches are increasingly able to count the number of steps people
heartbeat and blood pressure, thereby carefully recording such data for future use.
, ➢ 2. Conceptualising Big Data:
➢ 2.1.Big Data in Context:
• The concept of big data in the world of data analytics refers to masses of datasets originating from disparate
systems which are then aggregated and mind for business value.
• Translated into social science language, big data consists of very large and complex sets of quantitative and q
data, being collected from various sources, in increasing volumes and with increasing velocity.
• The term big data is globally accepted for describing very large sets of information, and the application of sp
computational methods for analyzing such data by means of programming.
• Traditionally, conventional data processing analytical software did not have the capacity to manage large com
datasets and new methods had to be developed to deal effectively with large datasets.
• Traditional information technology architectures are referred to as ‘legacy systems’ and use online analytical
(OLAP) tools and static data repositories such as a relational database management system (RDBMS), which
ubiquitous across all business commercial landscapes.
- These databases became inadequate with the advancement and explosion of the mainframe in the 1980s w
gave rise to parallel database systems to meet the increased demand.
- The clustered storage system gave birth to the first Teradata parallel database in history.
• The K-mart franchise acquired its first Teradata database storage system for its retail stores across North Ame
and this was widely recognized across the industry as a pioneering platform for large datasets.