Databricks - Data Engineer Associate
Practice Exam 2 questions with correct
answers 2024
What is the best way to describe a data lakehouse compared to a data warehouse?
w w w w w w w w w w w w w w
a. A data lakehouse provides a relational system of data management
w w w w w w w w w w
b. A data lakehouse captures snapshots of data for version control purposes.
w w w w w w w w w w w
c. A data lakehouse couples storage and compute for complete control.
w w w w w w w w w w
d. A data lakehouse utilizes proprietary storage formats for data.
w w w w w w w w w
e. A data lakehouse enables both batch and streaming analytics. - ANSWER: ➡ e. A data
w w w w w w w w w w w ww w w w
lakehouse enables both batch and streaming analytics.
w w w w w w w
Explanation
Answer is A data lakehouse enables both batch and streaming analytics.
w w w w w w w w w w
A lakehouse has the following key features:
w w w w w w
Transaction support: In an enterprise lakehouse many data pipelines will often be reading
w w w w w w w w w w w w
and writing data concurrently. Support for ACID transactions ensures consistency as
w w w w w w w w w w w
multiple parties concurrently read or write data, typically using SQL.
w w w w w w w w w w
Schema enforcement and governance: The Lakehouse should have a way to support schema
w w w w w w w w w w w w
enforcement and evolution, supporting DW schema architectures such as star/snowflake-
w w w w w w w w w w
schemas. The system should be able to reason about data integrity, and it should have robust
w w w w w w w w w w w w w w w
governance and auditing mechanisms.
w w w w
BI support: Lakehouses enable using BI tools directly on the source data. This reduces
w w w w w w w w w w w w w
staleness and improves recency, reduces latency, and lowers the cost of having to
w w w w w w w w w w w w w
operationalize two copies of the data in both a data lake and a warehouse.
w w w w w w w w w w w w w w
,Storage is decoupled from compute: In practice this means storage and compute use separate
w w w w w w w w w w w w w
clusters, thus these systems are able to scale to many more concurrent users and larger data
w w w w w w w w w w w w w w w w
sizes. Some modern data warehouses also have this property.
w w w w w w w w w
Openness: The storage formats they use are open and standardized, such as Parquet, and
w w w w w w w w w w w w w
they provide an API so a variety of tools and engines, including machine learning and
w w w w w w w w w w w w w w w
Python/R libraries, can efficiently access the data directly.
w w w w w w w w
Support for diverse data types ranging from unstructured to structured data: The lakehouse
w w w w w w w w w w w w
can be used to store, refine, analyze, and access data types needed for many new data
w w w w w w w w w w w w w w w w
applications, including images, video, audio, semi-structured data, and text.
w w w w w w w w w
Support for diverse workloads: including data science, machine learning, and SQL and
w w w w w w w w w w w
analytics. Multiple tools might be needed to s
w w w w w w w w
You are designing an analytical to store structured data from your e-commerce platform and
w w w w w w w w w w w w w
unstructured data from website traffic and app store, how would you approach where you
w w w w w w w w w w w w w w
store this data?
w w w
a. Use traditional data warehouse for structured data and use data lakehouse for
w w w w w w w w w w w w
unstructured data.
w w
b. Data lakehouse can only store unstructured data but cannot enforce a schema
w w w w w w w w w w w w
c. Data lakehouse can store structured and unstructured data and can enforce schema
w w w w w w w w w w w w
d. Traditional data warehouses are good for storing structured data and enforcing schema -
w w w w w w w w w w w w w
ANSWER: ➡ c. Data lakehouse can store structured and unstructured data and can enforce
w ww w w w w w w w w w w w w
schema
w
explanation
The answer is, Data lakehouse can store structured and unstructured data and can enforce
w w w w w w w w w w w w w
schema
w
,What Is a Lakehouse? - The Databricks Blog
w w w w w w w
You are currently working on a production job failure with a job set up in job clusters due to a
w w w w w w w w w w w w w w w w w w w
data issue, what cluster do you need to start to investigate and analyze the data?
w w w w w w w w w w w w w w w
a. A Job cluster can be used to analyze the problem
w w w w w w w w w w
b. All-purpose cluster/ interactive cluster is the recommended way to run commands and
w w w w w w w w w w w w
view the data.
w w w
c. Existing job cluster can be used to investigate the issue
w w w w w w w w w w
d. Databricks SQL Endpoint can be used to investigate the issue - ANSWER: ➡ b. All-purpose
w w w w w w w w w w w w ww w w
cluster/ interactive cluster is the recommended way to run commands and view the data.
w w w w w w w w w w w w w w
explanation
Answer is All-purpose cluster/ interactive cluster is the recommended way to run commands
w w w w w w w w w w w w
and view the data.
w w w w
A job cluster can not provide a way for a user to interact with a notebook once the job is
w w w w w w w w w w w w w w w w w w w
submitted, but an Interactive cluster allows to you display data, view visualizations write or
w w w w w w w w w w w w w w
edit quries, which makes it a perfect fit to investigate and analyze the data.
w w w w w w w w w w w w w w
Which of the following describes how Databricks Repos can help facilitate CI/CD workflows
w w w w w w w w w w w w
on the Databricks Lakehouse Platform?
w w w w w
a. Databricks Repos can facilitate the pull request, review, and approval process before
w w w w w w w w w w w w
merging branches
w w
b. Databricks Repos can merge changes from a secondary Git branch into a main Git branch
w w w w w w w w w w w w w w w
, c. Databricks Repos can be used to design, develop, and trigger Git automation pipelines
w w w w w w w w w w w w w
d. Databricks Repos can store the single-source-of-truth Git repository
w w w w w w w w
(Incorrect)
e. Databricks Repos can commit or push code changes to trigger a CI/CD process - ANSWER:
w w w w w w w w w w w w w w w w
➡ e. Databricks Repos can commit or push code changes to trigger a CI/CD process
w w w w w w w w w w w w w w w
explanation
Answer is Databricks Repos can commit or push code changes to trigger a CI/CD process
w w w w w w w w w w w w w w
See below diagram to understand the role Databricks Repos and Git provider plays when
w w w w w w w w w w w w w
building a CI/CD workdlow.
w w w w
All the steps highlighted in yellow can be done Databricks Repo, all the steps highlighted in
w w w w w w w w w w w w w w w
Gray are done in a git provider like Github or Azure Devops.
w w w w w w w w w w w w
You noticed that colleague is manually copying the notebook with _bkp to store the previous
w w w w w w w w w w w w w w
versions, which of the following feature would you recommend instead.
w w w w w w w w w w
a. Databricks notebooks support change tracking and versioning
w w w w w w w
b. Databricks notebooks should be copied to a local machine and setup source control locally
w w w w w w w w w w w w w w
to version the notebooks
w w w w
c. Databricks notebooks can be exported into dbc archive files and stored in data lake
w w w w w w w w w w w w w w
d. Databricks notebook can be exported as HTML and imported at a later time - ANSWER: ➡
w w w w w w w w w w w w w w w ww
a. Databricks notebooks support change tracking and versioning
w w w w w w w w
Explanation
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller Fordenken. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $13.49. You're not tied to anything after your purchase.