Pythian Blog: Technical Track

Datascape Episode 65: Recapping The 2022 Databricks Data and AI Summit With Luan Moreno Maciel

Episode 65 Shownotes

Welcome to another episode of the Datascape Podcast. In this episode, the hosts discuss updates from Databricks as a product and its open-source projects. Tune in to hear about new integrations, improved features, performance optimization capabilities, proprietary announcements, and more. 

Don’t miss this jam-packed episode on all things Databricks.

Key Points From this Episode

  • Introduction of today’s topic: Recapping the 2022 Databricks Data and AI summit.
  • Luan introduces Delta Lake 2.0, Databricks’s Spark tables upgrade, and some of its features, including change data feed for batch streaming, Z-Ordering, and Python in Scala API support to optimize Z-Ordering.
  •  Luan introduces MLflow pipelines and their ability to simplify entire workloads for machine learning.
  • Luan describes Project Lightspeed, which provides offset management, checkpoint recovery, and stream checkpointing on Spark.
  • The hosts explore Spark structured streaming.
  • The hosts discuss Spark Connect, its security protocols to improve connections into Spark, and remote connectivity capabilities. 
  • The hosts cover proprietary announcements: Spark 3.3’s query execution improvements, copying capabilities, trigger jobs, and dbt integration.
  • Luan discusses Delta Live Tables implementation.
  • The hosts examine Unity Catalog.
  • The hosts discuss the state of Databricks and where it sits in the organization data platform.

 

Links Mentioned in Today’s Episode

 

No Comments Yet

Let us know what you think

Subscribe by email