100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached
logo-home
DATABRICKS EXAM 2024/2025 WITH 100% ACCURATE SOLUTIONS $17.49   Add to cart

Exam (elaborations)

DATABRICKS EXAM 2024/2025 WITH 100% ACCURATE SOLUTIONS

 1 view  0 purchase
  • Course
  • DATABRICKS ENGINEER ASSOCIATE
  • Institution
  • DATABRICKS ENGINEER ASSOCIATE

DATABRICKS EXAM 2024/2025 WITH 100% ACCURATE SOLUTIONS

Preview 4 out of 42  pages

  • September 3, 2024
  • 42
  • 2024/2025
  • Exam (elaborations)
  • Questions & answers
  • DATABRICKS ENGINEER ASSOCIATE
  • DATABRICKS ENGINEER ASSOCIATE
avatar-seller
YANCHY
DATABRICKS - DATA ENGINEER
ASSOCIATE EXAM 1 2024/2025

You were asked to create a table that can store the below data, <orderTime> is a timestamp but the
finance team when they query this data normally prefer the <orderTime> in date format, you would like
to create a calculated column that can convert the <orderTime> column timestamp datatype to date
and store it, fill in the blank to complete the DDL.



CREATE TABLE orders (

orderId int,

orderTime timestamp,

orderdate date _____________________________________________ ,

units int)



A. AS DEFAULT (CAST(orderTime as DATE))

B. GENERATED ALWAYS AS (CAST(orderTime as DATE))

C. GENERATED DEFAULT AS (CAST(orderTime as DATE))

D. AS (CAST(orderTime as DATE))

E. Delta lake does not support calculated columns, value should be inserted into the table as part of the
ingestion process - Precise Answer ✔✔B. GENERATED ALWAYS AS (CAST(orderTime as DATE))



Explanation

The answer is, GENERATED ALWAYS AS (CAST(orderTime as DATE))



https://docs.microsoft.com/en-us/azure/databricks/delta/delta-batch#--use-generated-columns



Delta Lake supports generated columns which are a special type of columns whose values are
automatically generated based on a user-specified function over other columns in the Delta table. When
you write to a table with generated columns and you do not explicitly provide values for them, Delta
Lake automatically computes the values.

,Note: Databricks also supports partitioning using generated column



The data engineering team noticed that one of the job fails randomly as a result of using spot instances,
what feature in Jobs/Tasks can be used to address this issue so the job is more stable when using spot
instances?



A. Use Databrick REST API to monitor and restart the job

B. Use Jobs runs, active runs UI section to monitor and restart the job

C. Add second task and add a check condition to rerun the first task if it fails

D. Restart the job cluster, job automatically restarts

E. Add a retry policy to the task - Precise Answer ✔✔E. Add a retry policy to the task



The answer is, Add a retry policy to the task



Tasks in Jobs support Retry Policy, which can be used to retry a failed tasks, especially when using spot
instance it is common to have failed executors or driver.



What is the main difference between AUTO LOADER and COPY INTO?



A. COPY INTO supports schema evolution.

B. AUTO LOADER supports schema evolution.

C. COPY INTO supports file notification when performing incremental loads.

D. AUTO LOADER supports reading data from Apache Kafka

E, AUTO LOADER Supports file notification when performing incremental loads. - Precise Answer ✔✔E,
AUTO LOADER Supports file notification when performing incremental loads.



Explanation

Auto loader supports both directory listing and file notification but COPY INTO only supports directory
listing.

,Auto loader file notification will automatically set up a notification service and queue service that
subscribe to file events from the input directory in cloud object storage like Azure blob storage or S3.
File notification mode is more performant and scalable for large input directories or a high volume of
files.



Auto Loader and Cloud Storage Integration



Auto Loader supports a couple of ways to ingest data incrementally



Directory listing - List Directory and maintain the state in RocksDB, supports incremental file listing

File notification - Uses a trigger+queue to store the file notification which can be later used to retrieve
the file, unlike Directory listing File notification can scale up to millions of files per day.




[OPTIONAL]

Auto Loader vs COPY INTO?



Auto Loader

Auto Loader incrementally and efficiently processes new data files as they arrive in cloud storage
without any additional setup. Auto Loader provides a new Structured Streaming source called cloudFiles.
Given an input directory path on the cloud file storage, the cloudFiles source automatically processes
new files as they arrive, with the option of also processing existing files in that directory.

When to use Auto Loader instead of the COPY INTO?



You want to load data from a file location that contains files in the order of millions or higher. Auto
Loader can discover files more efficiently than the COPY INTO SQL command and can split file processing
into multiple batches.

You do not plan to load subsets of previously uploaded files. With Auto Loader, it can be more difficult
to reprocess subsets of files. However, you can use the COPY INTO SQL



Why does AUTO LOADER require schema location?

, A. Schema location is used to store user provided schema



B. Schema location is used to identify the schema of target table



C. AUTO LOADER does not require schema location, because its supports Schema evolution



D. Schema location is used to store schema inferred by AUTO LOADER



E. Schema location is used to identify the schema of target table and source table - Precise Answer
✔✔D. Schema location is used to store schema inferred by AUTO LOADER



Explanation

The answer is, Schema location is used to store schema inferred by AUTO LOADER, so the next time
AUTO LOADER runs faster as does not need to infer the schema every single time by trying to use the
last known schema.



Auto Loader samples the first 50 GB or 1000 files that it discovers, whichever limit is crossed first. To
avoid incurring this inference cost at every stream start up, and to be able to provide a stable schema
across stream restarts, you must set the option cloudFiles.schemaLocation. Auto Loader creates a
hidden directory _schemas at this location to track schema changes to the input data over time.



The below link contains detailed documentation on different options



Auto Loader options | Databricks on AWS



Which of the following statements are incorrect about the lakehouse?



A. Support end-to-end streaming and batch workloads



B. Supports ACID

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller YANCHY. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $17.49. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

79316 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 14 years now

Start selling
$17.49
  • (0)
  Add to cart