A big thank you to Jixin Jia (Gin) for a brilliant presentation at the August Sydney Databricks meetup. It was full of practical tips to improve speed with a set of Databricks Notebooks to back it up as proof! (download them below if you want to try them out).
I highly recommend checking out his blog, the Book of Architecture – you are guaranteed to learn something new!
Here’s what was covered:
- The sweet spot between cluster performance and cost
- More about partitioning
- How to manage small files (A common problem that we face regularly with Data Lakes)
- The use of Delta Lake
- Learn things about Databricks/Spark Caching that you didn’t know existed
Download the Slides & Databricks Notebook
Fill in the form below to download the goodies…
- 10 reasons to use Azure SQL in your next analytics project - November 3, 2020
- A Developer’s Guide to Building AI Application - September 4, 2020
- Things You Wish You Had Known Earlier About Databricks Performance - August 31, 2020