What does the Lakehouse consist of? - answer-The lakehouse brings the scalability and cost effectiveness of data lakes with the reliability and performance of data warehouses
How does Databricks differentiate from Snowflake? - answer-Data Science/Machine Learning
Data Ingestion
ETL
Streaming...
Databricks Competition Study
What does the Lakehouse consist of? - answer-The lakehouse brings the scalability and cost
effectiveness of data lakes with the reliability and performance of data warehouses
How does Databricks differentiate from Snowflake? - answer-Data Science/Machine Learning
Data Ingestion
ETL
Streaming Capabilities
Data Sharing
With Snowflake, you will have high costs, limited capabilities, limited unstructured data
support, inefficient engineering and vendor lock in
For Data Science- how does Databricks differ from Snowflake? - answer-Snowflake is built for
SQL workloads. They either use Snowpark or rely on 3rd party tools for DS and ML-like Dataiku
or DataRobot
Databricks can allow your data scientist to use any libraries/languages in the same platform as
their DE, and manage the end-to-end ML pipeline with MLflow
For Data Ingestion- how does Databricks differ from Snowflake - answer-With Snowflake
ingestion go through various Snowflake stages, often needing to utilize limited SQL pipelines or
Snowpipe which is inefficient and expensive
2) Snowflake tax - egress tax moving data to and from and as that data is being stored
Databricks - ingestion is simple with AutoLoader where data is automatically transformed into
Delta Tables
For ETL- how does Databricks differ from Snowflake - answer-Snowflake has no ETL tools rely
heavily on 3rd party vendors which increases cost and complexity
Databricks - Delta Live Tables
What is Delta Live Tables? - answer-Delta Live Tables is the first ETL framework that allows you
to build reliable data pipelines (accelerate ETL development)
- automatically manages your infrastructure at scale so data analysts and engineers can spend
less time on tooling and focus on getting value from data.
DLT- fully supports Python and SQL and works with both batch and streaming
DLT - manages task oircehstration, cluster management, monitoring, data quality and error
handling
supports CDC (change data capture)
Streaming Capabilities- how does Databricks differ from Snowflake - answer-Databricks reads in
any streaming services while Snowflake only supports Kafka and was not designed for high
velocity data but rather structured data at rest
, What Databricks features helps out with observability and governance? - answer-Expectations-
help prevent bad data from flowing into tables, track data quality over time and provide tools
to troubleshoot bad data with granular pipeline observability so you get a high fidelity lineage
diagram of your pipeline, track dependencies and aggregate data quality metrics across all of
your pipeline.
For Data Sharing- how does Databricks differ from Snowflake? - answer-Snowflake's data
format is proprietary, users can only share data with other Snowflake accounts (vendor lock in)
-Snowflake would take the data from your cloud storage, conduct transformations and then
push the data back so you are paying an egress tax to and from and as that data is going to be
stored
-pay for compute to send data
Delta Sharing - an open standard for data sharing, no replication of datasets
How does Delta Lake differ Iceberg? - answer-1) Overall performance - loading and querying
data is 3.5x faster
2) Load Performance- load from Parquet to intended formats (delta is faster)
3) Query performance - Delta is 4.5x faster
What are Data cleanrooms? - answer-It's a secure environment to run computations on joint
data
Run any computation on Python, SQL, R, or java
No data replications
Scalability
What is Databricks Marketplace? - answer-open marketplace for data solutions, built on Delta
Sharing
Consist of
Notebooks
Data files, Data Tables
Solution Accelerators
ML Models
Dashboards
Why have data marketplaces seen limited use? - answer-Closed platforms (one per vendor)
Limited to just datasets
What is Project Lightspeed? - answer-Faster and simpler stream processing
1) predictable low latency
2) enhanced functionality
3) Operations and Troubleshooting
4) Connectors & Ecosystem
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller TOPDOCTOR. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $11.99. You're not tied to anything after your purchase.