100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
EECS 485 Final Exam with complete solution €14,73   In winkelwagen

Tentamen (uitwerkingen)

EECS 485 Final Exam with complete solution

 8 keer bekeken  0 keer verkocht
  • Vak
  • Instelling

EECS 485 Final Exam with complete solution Map Reduce System provides - 1 Automatic parallelization & distribution Map Reduce System provides - 2 Fault-tolerance Map Reduce System provides - 3 Status and monitoring tools Map Reduce System provides - 4 Clean abstraction for programmers Ma...

[Meer zien]

Voorbeeld 4 van de 32  pagina's

  • 12 augustus 2023
  • 32
  • 2023/2024
  • Tentamen (uitwerkingen)
  • Vragen en antwoorden
avatar-seller
EECS 485 Final Exam with complete solution
Map Reduce System provides - 1
Automatic parallelization & distribution
Map Reduce System provides - 2
Fault-tolerance
Map Reduce System provides - 3
Status and monitoring tools
Map Reduce System provides - 4
Clean abstraction for programmers
Map Reduce is Popular because - 1
Easy to write distributed programs
Map Reduce is Popular because - 2
Built-in reliability on large clusters
Map Reduce is Popular because - 3
Byte streams, not relations
Map Reduce is Popular because - 4
Schema-later, or schema-never
Map Reduce is Popular because - 5
Your choice of programming languages
Map Reduce is Popular because - 6
Hadoop relatively easy to administer
Mappers can work...
Mappers can work independently
Reducers can work...
Reducers can work independently
Grouper does not have to...
Count
Map Reduce Optimization 1
the mappers don't need write the words down, just "1" for each word of a certain length
Map Reduce Optimization 2
mappers do part of the reducer work: number of 1-letter words, 2-letter words, etc
Master work
Dividing the work
Grouper work
Grouping the values by keys
Work of the mappers and reducers
differs with problem
MR Computation
Input key/value pairs

Output different key/value pairs
Map Function
Takes an input pair and produces a set of intermediate key/value pairs
MapReduce library

,Groups together all intermediate values associated with the same intermediate key and
passes them to the Reduce function
Reduce function
Accepts an intermediate key and a list of values for that key.

Merges together these values to form a possibly smaller set of values.
Map and reduce have related types
map (k1, v1) → list(k2, v2)

reduce (k2, list(v2)) → list(v2)
MR output is smaller
computing summary statistics
MR output is larger
computing some kind of data structure for downstream use
Output value produced per reduce invocation
Zero or one
Execution Pipeline
Stage 1, partition input data and run map() on many machines

Then group intermediate data by intermediate key

Stage 2, partition intermediate data by key and run reduce() on many machines

Output is whatever reduce() emits
Shuffle/Sort
Default: hash(key)%R

- Break input into M chunks

- Process each chunk w/ map process

- Group-by map output keys

- Place key-groups into R chunks

- Process each chunk w/ reduce process

- reduce fn's outputs go to disk
What can be a MapReduce program?
- URL counting in logs

- Inverted index construction for search engines, Sorting

- Massive image conversion, others
How do we know if a machine goes down?
Heartbeat messages tell master which machines are online

,Map Worker Dies
- Just restart that task on a different box

- You lose the map() work, but no big deal
Reduce Worker Dies
- Restart the reducer, using output from source mappers
Relational database management system (RDBMS) machine dies
Query is restarted
What about slow, not dead, machines?
- Speculative execution for stragglers

- Kill the 2nd-place finisher
What about data placement?
Use GFS across cluster disks; start tasks where the target data already lies
Isn't the intermediate data size large?
- Use a local reducer called a Combiner at each map

- Compress data between map and reduce
Mapper and Reducer are...
Stateless - no side effects
Scalability and fault-tolerance achieved by optimizing the execution engine once
Use it many times by writing different map and reduce functions for different
applications
Map and reduce functions inspired by functions of the same name in...
Lisp programming language
Functional programming
Computation as the evaluation of mathematical functions
MR is easy to...
Parallelize
Simple Distributed System Failures
Power loss

Process crash
Complex Distributed System Failures
- Slow or unresponsive application

- Network partition
Nightmare Distributed System Failures
Bugs, hackers
Fail-Stop, Synchronous environment
- Component halts immediately

- Other components can tell that it has failed

- Example: process crash
Fail-Stop, Asynchronous environment

, - Components may fail (stop working) and recover

- Components may take arbitrarily long to respond

- Others may not be able to tell when a component fails

- Examples: network partition, hung process
Byzantine
- Components may fail in arbitrary ways

- Examples: hacker compromises a component
Primary-Backup fault tolerance
- Both primary and backup should commit modifications before acknowledging

- Client will never lose data for operations it observes to complete

- Client is uncertain about operations in progress. Must recheck with new primary to see
if it completed.
Asynchronous environment 1
Network can delay messages arbitrarily
Asynchronous environment 2
Network can partition
Asynchronous environment 3
Servers can crash and take a long time to recover
Asynchronous environment 4
Servers can hang and stop responding
Asynchronous environment 5
Servers can run very slow and appear to be crashed
In async environment...
Hard to differentiate slow and failed components
Async means...
primary-backup not great in practice
Paxos
Goal: nodes all agree on one result

Tolerates fail-stop failures in synchronous environ.

Requires 2f+1 nodes to tolerate f faults (Primary-backup only needed f+1)

Example: 5 Paxos nodes can tolerate 2 such faults
Basic (synod) protocol Phase 1
- Leaders propose ballot #.

- Others (acceptors) adopt ballot if highest # so far.

- Leader waits for f+1 adoptions

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper magdamwikash23. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €14,73. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 85651 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€14,73
  • (0)
  Kopen