A big thank you to Jixin Jia (Gin) for a brilliant presentation at the August Sydney Databricks meetup. It was full of practical tips to improve speed with a set of Databricks Notebooks to back it up as proof! (download them below if you want to try them out).
I highly recommend checking out his blog, the Book of Architecture – you are guaranteed to learn something new!
Here’s what was covered:
- The sweet spot between cluster performance and cost
- More about partitioning
- How to manage small files (A common problem that we face regularly with Data Lakes)
- The use of Delta Lake
- Learn things about Databricks/Spark Caching that you didn’t know existed
Download the Slides & Databricks Notebook
Fill in the form below to download the goodies…