Databricks - Data Engineer Associate
Practice Exam 1 questions with correct
answers
You were asked to create a table that can store the below data, <orderTime> is a timestamp
x x x x x x x x x x x x x x x x
but the finance team when they query this data normally prefer the <orderTime> in date
x x x x x x x x x x x x x x x
format, you would like to create a calculated column that can convert the <orderTime>
x x x x x x x x x x x x x x
column timestamp datatype to date and store it, fill in the blank to complete the DDL.
x x x x x x x x x x x x x x x x
CREATE TABLE orders ( x x x
x orderId int, x
x orderTime timestamp, x
x orderdate date _____________________________________________ ,
x x x
x units int) x
A. AS DEFAULT (CAST(orderTime as DATE))
x x x x x
B. GENERATED ALWAYS AS (CAST(orderTime as DATE))
x x x x x x
C. GENERATED DEFAULT AS (CAST(orderTime as DATE))
x x x x x x
D. AS (CAST(orderTime as DATE))
x x x x
E. Delta lake does not support calculated columns, value should be inserted into the table as
x x x x x x x x x x x x x x x
part of the ingestion process - ANSWER: ➡ B. GENERATED ALWAYS AS (CAST(orderTime as
x x x x x x x xx x x x x x x
DATE))
x
Explanation
The answer is, GENERATED ALWAYS AS (CAST(orderTime as DATE))
x x x x x x x x
,Delta Lake supports generated columns which are a special type of columns whose values are
x x x x x x x x x x x x x x
automatically generated based on a user-specified function over other columns in the Delta
x x x x x x x x x x x x x
table. When you write to a table with generated columns and you do not explicitly provide
x x x x x x x x x x x x x x x x
values for them, Delta Lake automatically computes the values.
x x x x x x x x x
Note: Databricks also supports partitioning using generated column
x x x x x x x
The data engineering team noticed that one of the job fails randomly as a result of using spot
x x x x x x x x x x x x x x x x x
instances, what feature in Jobs/Tasks can be used to address this issue so the job is more stable
x x x x x x x x x x x x x x x x x x
when using spot instances?
x x x x
A. Use Databrick REST API to monitor and restart the job
x x x x x x x x x x
B. Use Jobs runs, active runs UI section to monitor and restart the job
x x x x x x x x x x x x x
C. Add second task and add a check condition to rerun the first task if it fails
x x x x x x x x x x x x x x x x
D. Restart the job cluster, job automatically restarts
x x x x x x x
E. Add a retry policy to the task - ANSWER: ➡ E. Add a retry policy to the task
x x x x x x x x x xx x x x x x x x x
The answer is, Add a retry policy to the task
x x x x x x x x x
Tasks in Jobs support Retry Policy, which can be used to retry a failed tasks, especially when
x x x x x x x x x x x x x x x x
using spot instance it is common to have failed executors or driver.
x x x x x x x x x x x x
What is the main difference between AUTO LOADER and COPY INTO?
x x x x x x x x x x
A. COPY INTO supports schema evolution.
x x x x x
B. AUTO LOADER supports schema evolution.
x x x x x
C. COPY INTO supports file notification when performing incremental loads.
x x x x x x x x x
D. AUTO LOADER supports reading data from Apache Kafka
x x x x x x x x
E, AUTO LOADER Supports file notification when performing incremental loads. - ANSWER: ➡
x x x x x x x x x x x xx
E, AUTO LOADER Supports file notification when performing incremental loads.
x x x x x x x x x x
,Explanation
Auto loader supports both directory listing and file notification but COPY INTO only supports
x x x x x x x x x x x x x
directory listing.
x x
Auto loader file notification will automatically set up a notification service and queue service
x x x x x x x x x x x x x
that subscribe to file events from the input directory in cloud object storage like Azure blob
x x x x x x x x x x x x x x x x
storage or S3. File notification mode is more performant and scalable for large input
x x x x x x x x x x x x x x
directories or a high volume of files.
x x x x x x x
Auto Loader and Cloud Storage Integration
x x x x x
Auto Loader supports a couple of ways to ingest data incrementally
x x x x x x x x x x
Directory listing - List Directory and maintain the state in RocksDB, supports incremental file
x x x x x x x x x x x x x
listing
x
File notification - Uses a trigger+queue to store the file notification which can be later used to
x x x x x x x x x x x x x x x x
retrieve the file, unlike Directory listing File notification can scale up to millions of files per day.
x x x x x x x x x x x x x x x x x
[OPTIONAL]
Auto Loader vs COPY INTO?
x x x x
Auto Loader x
Auto Loader incrementally and efficiently processes new data files as they arrive in cloud
x x x x x x x x x x x x x
storage without any additional setup. Auto Loader provides a new Structured Streaming
x x x x x x x x x x x x
source called cloudFiles. Given an input directory path on the cloud file storage, the cloudFiles
x x x x x x x x x x x x x x x
source automatically processes new files as they arrive, with the option of also processing
x x x x x x x x x x x x x x
existing files in that directory.
x x x x x
When to use Auto Loader instead of the COPY INTO?
x x x x x x x x x
, You want to load data from a file location that contains files in the order of millions or higher.
x x x x x x x x x x x x x x x x x x
Auto Loader can discover files more efficiently than the COPY INTO SQL command and can split
x x x x x x x x x x x x x x x x
file processing into multiple batches.
x x x x x
You do not plan to load subsets of previously uploaded files. With Auto Loader, it can be more
x x x x x x x x x x x x x x x x x
difficult to reprocess subsets of files. However, you can use the COPY INTO SQL
x x x x x x x x x x x x x x
Why does AUTO LOADER require schema location?
x x x x x x
A. Schema location is used to store user provided schema
x x x x x x x x x
B. Schema location is used to identify the schema of target table
x x x x x x x x x x x
C. AUTO LOADER does not require schema location, because its supports Schema evolution
x x x x x x x x x x x x
D. Schema location is used to store schema inferred by AUTO LOADER
x x x x x x x x x x x
E. Schema location is used to identify the schema of target table and source table - ANSWER: ➡
x x x x x x x x x x x x x x x x xx
D. Schema location is used to store schema inferred by AUTO LOADER
x x x x x x x x x x x x
Explanation
The answer is, Schema location is used to store schema inferred by AUTO LOADER, so the next
x x x x x x x x x x x x x x x x
time AUTO LOADER runs faster as does not need to infer the schema every single time by trying
x x x x x x x x x x x x x x x x x x
to use the last known schema.
x x x x x x
Auto Loader samples the first 50 GB or 1000 files that it discovers, whichever limit is crossed
x x x x x x x x x x x x x x x x
first. To avoid incurring this inference cost at every stream start up, and to be able to provide a
x x x x x x x x x x x x x x x x x x x
stable schema across stream restarts, you must set the option cloudFiles.schemaLocation.
x x x x x x x x x x x
Auto Loader creates a hidden directory _schemas at this location to track schema changes to
x x x x x x x x x x x x x x x
the input data over time.
x x x x x
The below link contains detailed documentation on different options
x x x x x x x x
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller Fordenken. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $13.99. You're not tied to anything after your purchase.