DATABRICKS - DATA ENGINEER
ASSOCIATE EXAM 2 2024/2025
What is the best way to describe a data lakehouse compared to a data warehouse?
a. A data lakehouse provides a relational system of data management
b. A data lakehouse captures snapshots of data for version control purposes.
c. A data lakehouse couples storage and compute for complete control.
d. A data lakehouse utilizes proprietary storage formats for data.
e. A data lakehouse enables both batch and streaming analytics. - Precise Answer ✔✔e. A data
lakehouse enables both batch and streaming analytics.
Explanation
Answer is A data lakehouse enables both batch and streaming analytics.
A lakehouse has the following key features:
Transaction support: In an enterprise lakehouse many data pipelines will often be reading and writing
data concurrently. Support for ACID transactions ensures consistency as multiple parties concurrently
read or write data, typically using SQL.
Schema enforcement and governance: The Lakehouse should have a way to support schema
enforcement and evolution, supporting DW schema architectures such as star/snowflake-schemas. The
system should be able to reason about data integrity, and it should have robust governance and auditing
mechanisms.
BI support: Lakehouses enable using BI tools directly on the source data. This reduces staleness and
improves recency, reduces latency, and lowers the cost of having to operationalize two copies of the
data in both a data lake and a warehouse.
Storage is decoupled from compute: In practice this means storage and compute use separate clusters,
thus these systems are able to scale to many more concurrent users and larger data sizes. Some modern
data warehouses also have this property.
,Openness: The storage formats they use are open and standardized, such as Parquet, and they provide
an API so a variety of tools and engines, including machine learning and Python/R libraries, can
efficiently access the data directly.
Support for diverse data types ranging from unstructured to structured data: The lakehouse can be used
to store, refine, analyze, and access data types needed for many new data applications, including
images, video, audio, semi-structured data, and text.
Support for diverse workloads: including data science, machine learning, and SQL and analytics. Multiple
tools might be needed to s
You are designing an analytical to store structured data from your e-commerce platform and
unstructured data from website traffic and app store, how would you approach where you store this
data?
a. Use traditional data warehouse for structured data and use data lakehouse for unstructured data.
b. Data lakehouse can only store unstructured data but cannot enforce a schema
c. Data lakehouse can store structured and unstructured data and can enforce schema
d. Traditional data warehouses are good for storing structured data and enforcing schema - Precise
Answer ✔✔c. Data lakehouse can store structured and unstructured data and can enforce schema
explanation
The answer is, Data lakehouse can store structured and unstructured data and can enforce schema
What Is a Lakehouse? - The Databricks Blog
You are currently working on a production job failure with a job set up in job clusters due to a data issue,
what cluster do you need to start to investigate and analyze the data?
a. A Job cluster can be used to analyze the problem
,b. All-purpose cluster/ interactive cluster is the recommended way to run commands and view the data.
c. Existing job cluster can be used to investigate the issue
d. Databricks SQL Endpoint can be used to investigate the issue - Precise Answer ✔✔b. All-purpose
cluster/ interactive cluster is the recommended way to run commands and view the data.
explanation
Answer is All-purpose cluster/ interactive cluster is the recommended way to run commands and view
the data.
A job cluster can not provide a way for a user to interact with a notebook once the job is submitted, but
an Interactive cluster allows to you display data, view visualizations write or edit quries, which makes it a
perfect fit to investigate and analyze the data.
Which of the following describes how Databricks Repos can help facilitate CI/CD workflows on the
Databricks Lakehouse Platform?
a. Databricks Repos can facilitate the pull request, review, and approval process before merging
branches
b. Databricks Repos can merge changes from a secondary Git branch into a main Git branch
c. Databricks Repos can be used to design, develop, and trigger Git automation pipelines
d. Databricks Repos can store the single-source-of-truth Git repository
(Incorrect)
e. Databricks Repos can commit or push code changes to trigger a CI/CD process - Precise Answer ✔✔e.
Databricks Repos can commit or push code changes to trigger a CI/CD process
, explanation
Answer is Databricks Repos can commit or push code changes to trigger a CI/CD process
See below diagram to understand the role Databricks Repos and Git provider plays when building a
CI/CD workdlow.
All the steps highlighted in yellow can be done Databricks Repo, all the steps highlighted in Gray are
done in a git provider like Github or Azure Devops.
You noticed that colleague is manually copying the notebook with _bkp to store the previous versions,
which of the following feature would you recommend instead.
a. Databricks notebooks support change tracking and versioning
b. Databricks notebooks should be copied to a local machine and setup source control locally to version
the notebooks
c. Databricks notebooks can be exported into dbc archive files and stored in data lake
d. Databricks notebook can be exported as HTML and imported at a later time - Precise Answer ✔✔a.
Databricks notebooks support change tracking and versioning
Explanation
Answer is Databricks notebooks support automatic change tracking and versioning.
When you are editing the notebook on the right side check version history to view all the changes, every
change you are making is captured and saved.
Newly joined data analyst requested read-only access to tables, assuming you are owner/admin which
section of Databricks platform is going to facilitate granting select access to the user ____.
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller YANCHY. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $16.49. You're not tied to anything after your purchase.