SLI NOTES
DE
1 ---
2 ---
3 ---
4 Some of the skills required:
Critical thinking is important because you have to use the data, to find out information or
ask a question and you must know how to ask the data those questions, therefore it can be
tricky.
Knowledge in python: Anything done in python is much quicker. Python is a programming
language that has been used to analyse big sets of data, therefore is useful to know it.
Presentation skills: Sometimes you have to present a large amount of data to your
manager or to your team, and that is when the presentation skills kick in.
5 ----
6 [E] Quantitative, [E] Qualitative, [E] Internal, [E] External.
Quantitative Advantages:
- You can conduct a bit more in depth research since the Quantative data can be
statistically analysed, it is highly likely that the research will be detailed.
- Because Quantative data has a numerical nature, the personal information involved
can lead to incorrect results. Quantative data has reduce this risk quite a lot.
Qualitative Advantages:
- Just like quantitative, it provides people with a detail analysis of the subject.
- Helps companies understand what the customer think. It helps them understand
the mindset of the customers. It tells them why a customer has brought a specific
product.
Internal Advantages:
- One of the biggest advantages of internal data is that it is ready and available for
analysis.
- It gives you the ability to make quick decisions because it almost has an instant
access to information.
Disadvantages:
- There can view downsides as well. If you are a beginner the chances are that you
might not have access to many departments and therefore your data will be
limited.
External Advantages:
- External data provides the opportunity for owners to see how the world around
them is functioning, allowing them to make the right decisions.
- It helps the company with the pressure of producing relevant data themselves.
7 What is data warehouse: Data warehouses are used to store structured; filtered data that
has already been processed for a specific purpose.
What Is data lake: Data lakes are used as a pool to store raw which has no purpose yet.
8 The three v’s are defining the properties or dimensions of big data. Volume refers to the
amount of data, variety refers to the number of types of data and velocity refers to the
speed of data processing.
, According to the three 3vs model, the challenges of big data management result from the
expansion of all three properties rather than just volume itself.
Data Volume: The size of all available data is growing day by day. This applies to individuals
and companies. One text file is a few kilo bytes, a sound file is a few mega bytes while a full
movie can be up to several giga bytes. More data is added each day. Most of the data is
generated from customers and employees. More sources with larger size of data increase
the volume that needs to be analysed. This Is a major issue for those looking to store data,
instead of letting it disappear.
Data velocity: Companies analyse data using a batch process. A batch process is the
process where a computer completes most of the jobs, often simultaneously in a non-stop
stop. One takes a chunk of data, submits a job to the server and waits for delivery of the
result. This only works when the incoming data is slower than the computer processing rate
and when the result is useful despite the delay.
Data Variety: Data Variety can be anything from excel tables or databases, it has changed
to lose its structure and to add hundreds of formats. Text, photos, audio, video, gps data.
No one has control over the input of data.
9 Password protection: Passwords are commonly used to protect access to systems that
contain personal data. Any password you implement, you must ensure that it meets the
minimum requirements of 8 characters.
Data Encryption: Is a way of securing data and the only way to access it is for someone to
decrypt it with the correct encryption key. Encrypted data or also known “ciphertext”
appears scrambled or in other words unreadable to a person.
Maintenance of information systems: It is important that computer systems are properly
maintained against any data breaches and other potential risks. The more you maintain a
system, the more it will be able to deal with potential threats.
10 Data marts make specific data a bit more available, making it easier for people to access it
without wasting any time trying to search up information. Some of the benefits for data
marts is cost efficiency as it is much cheaper than a data warehouse.
Also, it has a very simplified data access because they only hold a small subset of data so
people can easily extract it. There are 3 main types of data marts.
[E] Depended, which is created by drawing data directly from external sources.
[E] Independent, which is created without the need of a central warehouse.
[E] Hybrid, which is able to take data from the data warehouse or the operating systems.
11 The amount of data companies collect can be quite large, therefore it makes it hard to
analyse all of it.
There is a system that collects all the information and organizes it. If you have to perform
this manually, it will take you far too long to do so.
Poor Quality Data: nothing is more harmful than data that is not accurate. Collecting data
without good input or output will be unreliable. Most things that cause inaccurate data is
manual errors that are made during the data entry. Meaning that they are mostly human