100% satisfaction guarantee Immediately available after payment Read online or as PDF No strings attached 4.6 TrustPilot
logo-home
Exam (elaborations)

DATABRICKS EXAM 2024/2025 WITH 100% ACCURATE SOLUTIONS

Rating
-
Sold
-
Pages
42
Grade
A+
Uploaded on
03-09-2024
Written in
2024/2025

DATABRICKS EXAM 2024/2025 WITH 100% ACCURATE SOLUTIONS

Institution
DATABRICKS ENGINEER ASSOCIATE
Course
DATABRICKS ENGINEER ASSOCIATE

Content preview

DATABRICKS - DATA ENGINEER
ASSOCIATE EXAM 1 2024/2025

You were asked to create a table that can store the below data, <orderTime> is a timestamp but the
finance team when they query this data normally prefer the <orderTime> in date format, you would like
to create a calculated column that can convert the <orderTime> column timestamp datatype to date
and store it, fill in the blank to complete the DDL.



CREATE TABLE orders (

orderId int,

orderTime timestamp,

orderdate date _____________________________________________ ,

units int)



A. AS DEFAULT (CAST(orderTime as DATE))

B. GENERATED ALWAYS AS (CAST(orderTime as DATE))

C. GENERATED DEFAULT AS (CAST(orderTime as DATE))

D. AS (CAST(orderTime as DATE))

E. Delta lake does not support calculated columns, value should be inserted into the table as part of the
ingestion process - Precise Answer ✔✔B. GENERATED ALWAYS AS (CAST(orderTime as DATE))



Explanation

The answer is, GENERATED ALWAYS AS (CAST(orderTime as DATE))



https://docs.microsoft.com/en-us/azure/databricks/delta/delta-batch#--use-generated-columns



Delta Lake supports generated columns which are a special type of columns whose values are
automatically generated based on a user-specified function over other columns in the Delta table. When
you write to a table with generated columns and you do not explicitly provide values for them, Delta
Lake automatically computes the values.

,Note: Databricks also supports partitioning using generated column



The data engineering team noticed that one of the job fails randomly as a result of using spot instances,
what feature in Jobs/Tasks can be used to address this issue so the job is more stable when using spot
instances?



A. Use Databrick REST API to monitor and restart the job

B. Use Jobs runs, active runs UI section to monitor and restart the job

C. Add second task and add a check condition to rerun the first task if it fails

D. Restart the job cluster, job automatically restarts

E. Add a retry policy to the task - Precise Answer ✔✔E. Add a retry policy to the task



The answer is, Add a retry policy to the task



Tasks in Jobs support Retry Policy, which can be used to retry a failed tasks, especially when using spot
instance it is common to have failed executors or driver.



What is the main difference between AUTO LOADER and COPY INTO?



A. COPY INTO supports schema evolution.

B. AUTO LOADER supports schema evolution.

C. COPY INTO supports file notification when performing incremental loads.

D. AUTO LOADER supports reading data from Apache Kafka

E, AUTO LOADER Supports file notification when performing incremental loads. - Precise Answer ✔✔E,
AUTO LOADER Supports file notification when performing incremental loads.



Explanation

Auto loader supports both directory listing and file notification but COPY INTO only supports directory
listing.

,Auto loader file notification will automatically set up a notification service and queue service that
subscribe to file events from the input directory in cloud object storage like Azure blob storage or S3.
File notification mode is more performant and scalable for large input directories or a high volume of
files.



Auto Loader and Cloud Storage Integration



Auto Loader supports a couple of ways to ingest data incrementally



Directory listing - List Directory and maintain the state in RocksDB, supports incremental file listing

File notification - Uses a trigger+queue to store the file notification which can be later used to retrieve
the file, unlike Directory listing File notification can scale up to millions of files per day.




[OPTIONAL]

Auto Loader vs COPY INTO?



Auto Loader

Auto Loader incrementally and efficiently processes new data files as they arrive in cloud storage
without any additional setup. Auto Loader provides a new Structured Streaming source called cloudFiles.
Given an input directory path on the cloud file storage, the cloudFiles source automatically processes
new files as they arrive, with the option of also processing existing files in that directory.

When to use Auto Loader instead of the COPY INTO?



You want to load data from a file location that contains files in the order of millions or higher. Auto
Loader can discover files more efficiently than the COPY INTO SQL command and can split file processing
into multiple batches.

You do not plan to load subsets of previously uploaded files. With Auto Loader, it can be more difficult
to reprocess subsets of files. However, you can use the COPY INTO SQL



Why does AUTO LOADER require schema location?

, A. Schema location is used to store user provided schema



B. Schema location is used to identify the schema of target table



C. AUTO LOADER does not require schema location, because its supports Schema evolution



D. Schema location is used to store schema inferred by AUTO LOADER



E. Schema location is used to identify the schema of target table and source table - Precise Answer
✔✔D. Schema location is used to store schema inferred by AUTO LOADER



Explanation

The answer is, Schema location is used to store schema inferred by AUTO LOADER, so the next time
AUTO LOADER runs faster as does not need to infer the schema every single time by trying to use the
last known schema.



Auto Loader samples the first 50 GB or 1000 files that it discovers, whichever limit is crossed first. To
avoid incurring this inference cost at every stream start up, and to be able to provide a stable schema
across stream restarts, you must set the option cloudFiles.schemaLocation. Auto Loader creates a
hidden directory _schemas at this location to track schema changes to the input data over time.



The below link contains detailed documentation on different options



Auto Loader options | Databricks on AWS



Which of the following statements are incorrect about the lakehouse?



A. Support end-to-end streaming and batch workloads



B. Supports ACID

Written for

Institution
DATABRICKS ENGINEER ASSOCIATE
Course
DATABRICKS ENGINEER ASSOCIATE

Document information

Uploaded on
September 3, 2024
Number of pages
42
Written in
2024/2025
Type
Exam (elaborations)
Contains
Questions & answers

Subjects

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
EXAMCOLLECTIVES Herzing University
View profile
Follow You need to be logged in order to follow users or courses
Sold
1760
Member since
3 year
Number of followers
1159
Documents
23310
Last sold
1 week ago
Ace Your Exams with Elite Study Resources | ExamEliteHub on Stuvia

I offer genuine and dependable exam papers that are directly obtained from well-known, reputable institutions as a highly regarded professional who specializes in sourcing study materials. These papers are invaluable resources made to help people who want to become nurses and people who work in other fields prepare for exams. Because of my extensive experience and in-depth knowledge of the subject, I take great care to ensure that each exam paper meets the highest quality, accuracy, and relevance standards, making them an essential component of any successful study plan.

Read more Read less
4.1

446 reviews

5
252
4
57
3
84
2
18
1
35

Trending documents

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions