100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
BigDataEx1 €12,36   In winkelwagen

Tentamen (uitwerkingen)

BigDataEx1

 7 keer bekeken  0 keer verkocht
  • Vak
  • BigDataEx1
  • Instelling
  • BigDataEx1

What are the 5 Phases of Real-Time? - answer-1) Data Distillation 2) Model Development 3) Validation and Deployment 4)real-time scoring 5) model refresh SQOOP - answer--SQL+Hadoop = sq oop -To import data from relational databases into Hadoop and -to export data to relational databases fr...

[Meer zien]

Voorbeeld 3 van de 21  pagina's

  • 6 september 2024
  • 21
  • 2024/2025
  • Tentamen (uitwerkingen)
  • Vragen en antwoorden
  • BigDataEx1
  • BigDataEx1
avatar-seller
BIGDATA EXAM1 AND ANSWERS 2024

GFS - answer--Google File System
-designed to solve the issues with distributed systems

What does GFS store? - answer-stores large volume of data, and distributed MapReduce
processes that data

When was Hadoop published? - answer-2003-2004. Based on the solution used by
Google in the 1990s

What led to Hadoop - answer-Doug Cutting's Open source project: Nutch

What is the Hadoop solution? - answer--bring computation to the data rather than
bringing data to the computation
-Distribute computing to where data is stored
-run computations where data resides

Apache's Hadoop Definition - answer-The Apache Hadoop software library is a framework
that allows for the distributed processing of large scale data sets across clusters of
computers using simple programming models. It is designed to scale up from single
servers to thousands of machines, each offering local computation and storage. Rather
than rely on hardware to deliver high availability, the library itself is designed to detect
and handle failures at the application layer, so delivering a highly available service on
top of a cluster of computers, each of which may be prone to failures.

What does Hadoop Include? - answer--Hadoop Distributed File System (HDFS)
-Hadoop YARN
-Hadoop Common

Hadoop Distributed File System (HDFS)? - answer--A distributed file system that provides
high-throughput access to application data
-is a distributed, scalable, and portable file system written in Java for the Hadoop
framework
-can store any type of file
-data is automatically split into chunks and replicated for high availability

Hadoop YARN? - answer-- A framework for job scheduling and cluster resource
management
-Hadoop MapReduce: a YARN-based system for parallel processing of large data sets
-manages the cluster resources, for job processing

Hadoop Common? - answer-The common utilities that support the other Hadoop modules

Hadoop- Processing, storing, and analyzing large volumes of data - answer--Software:
handles distribution of data, handling failures
-Hardware: handles storage of data and processing power

,Hadoop is distributed - answer-a Hadoop cluster can have several machines

Hadoop is scalable - answer-can add more machines to the cluster (proportionally adds
capacity)

Hadoop is Fault-Tolerant - answer-can recover hardware failures
-Master re-assigns work
-Data replicates by default on 3 machines
-Nodes that recover rejoin the cluster automatically

Hadoop is Open Source - answer--overseen by Apache
-close to 100 committers from companies like Cloudera, Hortonworks, etc.

Hadoop tools? - answer--ETL(extract, transform, load)
-BI
-Data Storage
-Predictive and Statistical Modeling
-Machine Learning
-others

Hadoop MapReduce - answer--processing framework to process the data
-other processing frameworks, also now available.
- A MapReduce job usually splits the input data-set into independent chunks which are
processed by the map tasks in a completely parallel manner. The framework sorts the
outputs of the maps, which are then input to the reduce tasks. Typically both the input
and the output of the job are stored in a file-system.

MapReduce Process - answer-usually splits the input data-set into independent chunks
which are processed by the map tasks in a completely parallel manner. The framework
sorts the outputs of the map, which are then the input to the reduce task. Typically both
the input and the output of the job are stored in a file-system.

Japan is seeking ________, while India craves _____________ and ___________. The leaders of
both countries, _______________(india) and _______________(Japan), are also working to
counter the growing regional influence of _________ -- an important economic partner to
both but also historically a rival. - answer-Japan is seeking growth markets, while India
craves Advanced technology and Foreign Investment. The Shinzo Abe (Japan), are also
working to counter the growing regional influence of Chine -- an important economic
partner to both but also historically a rival.

Reports suggest that big data and analytics market in India will grow approximately __
times, to ______ by 2020 - answer-8 Times
$16 Billion

Japanese companies are using India as a _____________ ______ to expand into Africa, and
service providers are expanding from Japan into India - answer-Manufacturing base

Hadoop "Ecosystem" - answer--Tools built around the core Hadoop
-All ecosystem tools are open source

, -Tools are designed to extend Hadoop's Functionality
-New tools are added all the time

Hadoop Ecosystem projects included in Cloudera's CDH: - answer--Spark, Hbase, Hive,
Impala, Parquet, Sqoop, Flume/ Kafka, Solr, Hue, Sentry

Spark - answer-in-memory and Streaming processing framework

HBase - answer-noSQL database built on HDFS

Hive - answer-SQL processing engine designed for batch workloads

Impala - answer-SQL Query Engine designed for BI workloads

Parquet - answer-Columnar data storage format

Sqoop - answer-Data movement/ETL to and from RDBMS

Flume, Kafka - answer-streaming data ingestion

Solr - answer-test search functionality

Hue - answer-web based user interface for Hadoop

Sentry - answer-an authorization tool for managing security

Hadoop Is? - answer--Scalable, for parallel/distributable problems (no dependencies
across data)
-A write once, read many solution (vs. RDMS for write and update a lot)

Hadoop is not? - answer--Database (random Access)
-Interactive OLAP (for the moment)
-Updates to files
-Nonparallel work
-Many small files
-Low latency

What do most organizations prefer? - answer--An enterprise-ready distribution of Hadoop
that is: Tested thoroughly, supported, and integrates well with Hadoop projects and
other key software like ETL tools and databases.

Most widely used enterprise-ready Hadoop distributions? - answer-Cloudera,
Hortonworks, and MapR

A cluster? - answer-a group of computers working together

a node? - answer-is an individual computer in that cluster

Two kind of nodes? - answer--Master node (Name Node)

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper TOPDOCTOR. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €12,36. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 85651 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€12,36
  • (0)
  Kopen