100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
JADS master Data Engineering Notes €9,89   In winkelwagen

College aantekeningen

JADS master Data Engineering Notes

 17 keer bekeken  1 keer verkocht

Excessive summary of the relevant notes in the data engineering course.

Voorbeeld 4 van de 65  pagina's

  • 21 maart 2024
  • 65
  • 2023/2024
  • College aantekeningen
  • Dr. indika weerashinga dewage
  • Alle colleges
Alle documenten voor dit vak (2)
avatar-seller
juultjevandervelden
Lecture 1: Introduction to Data Engineering
Week Week 2



A primer to data engineering
V3: Volume, Velocity and Variety
Volume: Enterprises are awash with ever-growing data of all types, easily amassing terabytes, even petabytes, of information. (Amount)

Velocity: Sometimes 2 minutes is too late. For time-sensitive processes such as catching fraud, big data must be used as it streams into your enterprise in order to
maximize its value. (Speed)

Variety: Big data is any type of data → structured and unstructured data such as text, sensor data, audio, video, click streams, log les and more. (Type)


(Big) Data Structure
Structured data: RDMSs

Semi-structured data: XML, JSON, CSV, etc.

Unstructured data: natural language, video, images, etc.




Processing Big Data: Data Pipelines
A data pipeline aggregates, organizes, and moves data to a destination for storage, insights, and analysis. Modern data pipeline systems automate the ETL (extract,
transform, load) process and include data ingestion, processing, filtering, transformation, and movement across any cloud architecture and add additional layers of
resiliency against failure.




Stages in a Big Data Pipeline




Lecture 1: Introduction to Data Engineering 1

, Lecture 2: Virtualization and Cloud Computing
Week Week 3



Virtualization
Virtualization is the ability to run multiple operating systems on a single physical system and share the underlying hardware resources

Uses software to create an abstraction layer over computer hardware that allows the hardware elements of a single computer (processors, memory, storage, and
more) to be divided into multiple virtual computers, commonly called virtual machines (VMs).

Each VM runs its own operating system (OS) and behaves like an independent computer, even though it is running on just a portion of the actual underlying computer
hardware.

Improves IT throughput and costs by using physical resources as a pool from which virtual resources can be allocated.


Virtual Architecture
A virtual machine (VM) is an isolated runtime environment (guest OS and applications)

Multiple virtual systems (VMs) can run on a single physical system




Hypervisor
A hypervisor, a.k.a. a virtual machine manager/monitor (VMM), or virtualization manager, is a program that allows multiple operating systems to share a single
hardware host.

Each guest operating system appears to have the host's processor, memory, and other resources all to itself. However, the hypervisor is actually controlling the host
processor and resources, allocating what is needed to each operating system in turn and making sure that the guest operating systems (in virtual machines) cannot
disrupt each other.


Benefits virtualization
Economies of Scale: Sharing of resources helps cost reduction

Isolation: Virtual machines are isolated from each other as if they are physically separated

Encapsulation: Virtual machines encapsulate a complete computing environment

Hardware Independence: Virtual machines run independently of underlying hardware

Portability: Virtual machines can be migrated between different hosts.




The Cloud
A style of computing where massively scalable (and elastic) IT-related capabilities are provided “as a service” to external customers using Internet technologies


What’s new

Acquisition Model: Based on purchasing of services

Business Model: Based on pay for use

Access Model: Over the internet to any device

Technical Model: Scalable, elastic, dynamic, multi-tenant & sharable




Cloud computing
“A consumption and on-demand delivery computing paradigm that enables convenient network access to a shared pool of configurable and often virtualized
computing resources (e.g., networks, servers, storage, middleware and applications as services) that can be rapidly provisioned and released with minimal
management effort or service provider interaction”




Lecture 2: Virtualization and Cloud Computing 1

, Cloud computing is one answer to this crisis of complexity in the data Center

Clouds primarily as a new way of consuming and delivering IT services




Three aspects cloud modelling

Self-service: A new relationship with IT, which enables the user a degree of freedom in configuring and accessing services and can dramatically reduce labor on the
delivery side

Flexibility sourcing options: The idea of more choices and, a hybrid modes of delivery that allows CIOs to optimize costs and qualities of service by work load

Greater focus on scale: enables both new economics and new capabilities




Why cloud
Cost reduction

Lower infrastructure costs

Lower maintenance and energy costs

Elasticity / Scalability

Capacity only when you need it

Ability to handle expected or unexpected changes in load

Achieve high business agility

Speed to serve

Reduction of time to pilot and test projects

Faster availability to customers

High performance computing

Increase capacity from your current physical infrastructure

Avoid provisioning (and paying) for the peak

“Infinite” computing capacity on demand




Cloud Service Delivery Models / Usage Models




Cloud Service Type
There are three cloud service types




Lecture 2: Virtualization and Cloud Computing 2

, IaaS

company needs a virtual machine, opt for infrastructure as a service

PaaS

company requires a platform for building software products, pick platform
as a service

SaaS

company doesn’t want to maintain any it IT equipment, choose software
as service

customer of SaaS is called a tenant

can be individual user or a group of users (e.g. customer organization)




Cloud Deployment Models
Public Clouds: The cloud infrastructure is available to the general public (anyone wanting to use or purchase cloud services).

Private Clouds: The cloud infrastructure is operated solely by a single organization.

Community Clouds: is available to members of a community. A community can be a set of organizations with similar requirements and goals (e.g., universities).

Hybrid Clouds: is a combination of public and private clouds.

Multi Clouds: is a combination of more than one public cloud (a private cloud can also be included).



Public Clouds Private Clouds

Often depicted as being available to users from a third-party provider Offer many of the same benefits as “public” clouds but are managed within the
organization
“Public” clouds are typically made available via the internet and may be free or
inexpensive to use These types of clouds are not burdened by network bandwidth and availability
issues or potential security exposures that may be associated with public
e.g. Amazon Web Services
clouds
Greater risks in terms of security, resiliency, transparency and performance
Can offer the provider and user greater control, security and resilience
predictability
Better cost effectiveness and agility
Key benefit: tremendous elasticity
Move to SLA based service delivery

Lower elasticity in comparison to external clouds

single-tenant environment: all resources are accessible to one customer only
(isolated access)

Typically hosted on-premises in the customer’s data center (Can be hosted on
an independent cloud provider’s infrastructure)




Tenancy Models for SaaS Application
A customer of a SaaS application is called a tenant. A tenant of a SaaS can be an individual user or a group of users, such as a customer organization.

There are three main tenancy models to be used for SaaS applications

Single tenant

Mixed tenant

Multi-tenant



Single Tenant Model
3-tier Simple Example: A single dedicated instance of an application is deployed for each customer




Lecture 2: Virtualization and Cloud Computing 3

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper juultjevandervelden. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €9,89. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 67866 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€9,89  1x  verkocht
  • (0)
  Kopen