100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached
logo-home
BigDataEx1 CA$18.68   Add to cart

Exam (elaborations)

BigDataEx1

 10 views  0 purchase
  • Course
  • BigDataEx1
  • Institution
  • BigDataEx1

What are the 5 Phases of Real-Time? - answer-1) Data Distillation 2) Model Development 3) Validation and Deployment 4)real-time scoring 5) model refresh SQOOP - answer--SQL+Hadoop = sq oop -To import data from relational databases into Hadoop and -to export data to relational databases fr...

[Show more]

Preview 3 out of 21  pages

  • September 6, 2024
  • 21
  • 2024/2025
  • Exam (elaborations)
  • Questions & answers
  • BigDataEx1
  • BigDataEx1
avatar-seller
BIGDATA EXAM1 AND ANSWERS 2024

GFS - answer--Google File System
-designed to solve the issues with distributed systems

What does GFS store? - answer-stores large volume of data, and distributed MapReduce
processes that data

When was Hadoop published? - answer-2003-2004. Based on the solution used by
Google in the 1990s

What led to Hadoop - answer-Doug Cutting's Open source project: Nutch

What is the Hadoop solution? - answer--bring computation to the data rather than
bringing data to the computation
-Distribute computing to where data is stored
-run computations where data resides

Apache's Hadoop Definition - answer-The Apache Hadoop software library is a framework
that allows for the distributed processing of large scale data sets across clusters of
computers using simple programming models. It is designed to scale up from single
servers to thousands of machines, each offering local computation and storage. Rather
than rely on hardware to deliver high availability, the library itself is designed to detect
and handle failures at the application layer, so delivering a highly available service on
top of a cluster of computers, each of which may be prone to failures.

What does Hadoop Include? - answer--Hadoop Distributed File System (HDFS)
-Hadoop YARN
-Hadoop Common

Hadoop Distributed File System (HDFS)? - answer--A distributed file system that provides
high-throughput access to application data
-is a distributed, scalable, and portable file system written in Java for the Hadoop
framework
-can store any type of file
-data is automatically split into chunks and replicated for high availability

Hadoop YARN? - answer-- A framework for job scheduling and cluster resource
management
-Hadoop MapReduce: a YARN-based system for parallel processing of large data sets
-manages the cluster resources, for job processing

Hadoop Common? - answer-The common utilities that support the other Hadoop modules

Hadoop- Processing, storing, and analyzing large volumes of data - answer--Software:
handles distribution of data, handling failures
-Hardware: handles storage of data and processing power

,Hadoop is distributed - answer-a Hadoop cluster can have several machines

Hadoop is scalable - answer-can add more machines to the cluster (proportionally adds
capacity)

Hadoop is Fault-Tolerant - answer-can recover hardware failures
-Master re-assigns work
-Data replicates by default on 3 machines
-Nodes that recover rejoin the cluster automatically

Hadoop is Open Source - answer--overseen by Apache
-close to 100 committers from companies like Cloudera, Hortonworks, etc.

Hadoop tools? - answer--ETL(extract, transform, load)
-BI
-Data Storage
-Predictive and Statistical Modeling
-Machine Learning
-others

Hadoop MapReduce - answer--processing framework to process the data
-other processing frameworks, also now available.
- A MapReduce job usually splits the input data-set into independent chunks which are
processed by the map tasks in a completely parallel manner. The framework sorts the
outputs of the maps, which are then input to the reduce tasks. Typically both the input
and the output of the job are stored in a file-system.

MapReduce Process - answer-usually splits the input data-set into independent chunks
which are processed by the map tasks in a completely parallel manner. The framework
sorts the outputs of the map, which are then the input to the reduce task. Typically both
the input and the output of the job are stored in a file-system.

Japan is seeking ________, while India craves _____________ and ___________. The leaders of
both countries, _______________(india) and _______________(Japan), are also working to
counter the growing regional influence of _________ -- an important economic partner to
both but also historically a rival. - answer-Japan is seeking growth markets, while India
craves Advanced technology and Foreign Investment. The Shinzo Abe (Japan), are also
working to counter the growing regional influence of Chine -- an important economic
partner to both but also historically a rival.

Reports suggest that big data and analytics market in India will grow approximately __
times, to ______ by 2020 - answer-8 Times
$16 Billion

Japanese companies are using India as a _____________ ______ to expand into Africa, and
service providers are expanding from Japan into India - answer-Manufacturing base

Hadoop "Ecosystem" - answer--Tools built around the core Hadoop
-All ecosystem tools are open source

, -Tools are designed to extend Hadoop's Functionality
-New tools are added all the time

Hadoop Ecosystem projects included in Cloudera's CDH: - answer--Spark, Hbase, Hive,
Impala, Parquet, Sqoop, Flume/ Kafka, Solr, Hue, Sentry

Spark - answer-in-memory and Streaming processing framework

HBase - answer-noSQL database built on HDFS

Hive - answer-SQL processing engine designed for batch workloads

Impala - answer-SQL Query Engine designed for BI workloads

Parquet - answer-Columnar data storage format

Sqoop - answer-Data movement/ETL to and from RDBMS

Flume, Kafka - answer-streaming data ingestion

Solr - answer-test search functionality

Hue - answer-web based user interface for Hadoop

Sentry - answer-an authorization tool for managing security

Hadoop Is? - answer--Scalable, for parallel/distributable problems (no dependencies
across data)
-A write once, read many solution (vs. RDMS for write and update a lot)

Hadoop is not? - answer--Database (random Access)
-Interactive OLAP (for the moment)
-Updates to files
-Nonparallel work
-Many small files
-Low latency

What do most organizations prefer? - answer--An enterprise-ready distribution of Hadoop
that is: Tested thoroughly, supported, and integrates well with Hadoop projects and
other key software like ETL tools and databases.

Most widely used enterprise-ready Hadoop distributions? - answer-Cloudera,
Hortonworks, and MapR

A cluster? - answer-a group of computers working together

a node? - answer-is an individual computer in that cluster

Two kind of nodes? - answer--Master node (Name Node)

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller TOPDOCTOR. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for CA$18.68. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

75759 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 14 years now

Start selling
CA$18.68
  • (0)
  Add to cart