Unit 11 Systems Analysis & Design Assignment 2 Report
All for this textbook (2)
Written for
Universiteit Antwerpen (UA)
Master In Digital Business Engineering
Data Engineering
All documents for this subject (1)
Seller
Follow
PVE1
Reviews received
Content preview
Exam Questions Data Engineering
Exam Questions Data Engineering 1
Introduction and file formats 2
Computer architecture and operating systems 12
Networks 16
Regular expressions exercises 19
Cloud services 20
Linux 26
Algorithms 28
Data structures 32
Algorithm and datastructure exercises 37
Relational databases 39
Sql exercises 44
Data warehousing 45
Nosql 47
Visualization 50
Parallel and distributed computing 53
Map-Reduce 59
Map-Reduce exercises 61
Recommender systems 61
1
,Introduction and file formats
How are integer, decimal numbers, text and images stored in a computer? Give an
example of binary encoding for each type.
Computers works with bits or boolean values (0/1).
1) For integers:
First bit for the sign: 1 is negative, 0 is
positive. N bits for representing a number
between 0 and 2N -1.
2) For Decimal:
We get rid of the decimal point and store 2 integers: the exponent and
mantissa (Like scientific notation).
For example, the decimal number 3.14 can be encoded in binary as per the
IEEE 754 standard, resulting in a binary representation like
01000000010010001111010111000011.
3) For Text:
= Sequence of characters or string
Each character is encoded using a single byte using an encoding table.
Example: “Len” = [76, 101, 110] or 3 bytes. Check ASCII table for codes.
For example, the ASCII encoding of the letter 'A' is 65, which in binary is
01000001
4) For Images:
- Matrix of pixels.
- Each pixel represented by 3 numbers between 0 and 255 for red, green and
blue intensity.
- Thus: 4K image = 3840 x 2160 x 3 bytes = 2.4 MB
For example, the RGB encoding of a pixel with red = 255, green = 0, and
blue = 127 would be represented in binary as 11111111 00000000
01111111
2
,What is encoding and decoding? Explain and give an example.
1. Encoding (In-memory to On-file), serialization: Encoding is the process of converting
data from its in-memory representation into a format suitable for storage in files or
other persistent storage.
2. Decoding (On-file to In-memory) deserialization: Decoding is the reverse process of
encoding. It involves reading data from files or other persistent storage and
reconstructing the original in-memory data structures or objects.
It involves parsing or interpreting the stored data, extracting relevant information, and
reconstructing the appropriate data structures or objects.
During decoding, the stored data is read from the file and processed according to the
encoding scheme used.
We saw three different data models for representing data. Name and provide a short
summary of each data model.
The relational model:
- Consists of tables and rows (or tuples /records)
- Each column contains primitive value such as string, integer, float or date
- Two types of tables:
o Entities, i.e. Persons, groups, objects
o Relations between entities: i.e. Part-of, has-a, has-many, linked-to
o Each table can be saved as Comma-Seperated-Values (or CSV) file
Strengths Weaknesses
Structured Static and less flexible schema
Schema checking Joins = necessary evil (they are
complex)
Natural model
Flexible queries
3
, The document-oriented model:
- Consists of keys and documents, that is, each key is associated with one document
- Document is a tree containing:
o Primitive values
o Nested entities
o One-to-many relations
- Each document can be stored (and transferred) in JSON or XML
Strengths Weaknesses
Flexible No static schema checking
Natural model when data is tree- Less flexible queries: Document
structured with few intra structure reflects common
document relations operations
(E.g. text documents (with
chapters, sections paragraphs…))
Performance is good since no
joins
The graph-oriented model: Consists of nodes and edges
- A node is an instance of an entity and has a unique ID
- An edge is a relation between two nodes and has a unique ID
- A node and edge have named properties with a primitive value
Strengths Weaknesses
Flexible: schema easily changed No static schema checking
Natural model: social or Used less in industry (academic
geographic networks model)
Variable number of joins Used in domains where
everything is connected through
everything
4
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller PVE1. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $7.90. You're not tied to anything after your purchase.