Spark

Delta Lake Performance

Databricks Performance: Fixing the Small File Problem with Delta Lake

Anjana Rupasinghege is the Technical Director and Lead
Architect at Data Driven, specialised in Cloud, Security, Data
and Analytics.

With a background in Azure modern data architecture, he
has over 15 years of experience working in Information
Technology in industries such as Government, Banking,
Telecommunication and Consulting.

Avatar of Anjana Rupasinghege
Latest posts by Anjana Rupasinghege (see all)

A common Databricks performance problem we see in enterprise data lakes are that of the “Small Files” issue.  One of our customers is a great example – we ingest 0.5TB of JSON and CSV data per day made of 5kb files which equates to millions of files a week in the data lake Raw zone. …

Databricks Performance: Fixing the Small File Problem with Delta Lake Read More »

Data Science for dummies

Data Science for Dummies – Data Engineering with Titanic dataset + Databricks + Python (Tech Talk 3 of 9)

Azure-certified Data Architect with a focus on delivering business value and guiding customers through the maze of analytical architectures, design and implementation activities.

Experienced in setting up modern data platforms with advanced predictive analytic workloads. Brings strong people skills and a devops-centric, entrepreneurial approach to Enterprise software delivery.


Avatar of Rodney Joyce

I put together a tech talk on Machine Learning and Databricks which is the 3rd part of an 9 part Data Science for Dummies series: Data Engineering with Titanic dataset + Databricks + Python. Preparing & feature engineering highlighted the importance of domain knowledge, even with something as simple as a 10 column dataset! It …

Data Science for Dummies – Data Engineering with Titanic dataset + Databricks + Python (Tech Talk 3 of 9) Read More »

Subscribed! We'll let you know when we have new blogs and events...