Data Science for Dummies – Data Science Overview with Databricks (Tech Talk 1 of 9)


You might have heard of Spark and how it’s the evolution of Hadoop… great for processing Big Data…. but have you heard of Databricks?

Here are the slides for the next tech talk in Data Science for Dummies series I am presenting around Sydney:  Part 1 of 9: Data Science Overview with Databricks†

Think Spark-as-a-service, serverless Spark, the unification of data engineering and data science and you start to get the picture. The business value add is HUGE as you can focus on outcomes and without the overheads of Spark server management and set up time. Add to this 4 familiar languages (Python, Scala/Java, R and SQL and soon, Apache .net)) and massively distributed parallel processing power not to mention the ability to import Tensorflow, Scikit-Learn and 1001 other open source libraries and you’ll wonder why you are not already using it.

I haven’t even got started on MLFlow, Databricks Delta and the other amazing products from the creators of Spark!

Check it out, spin up a Databricks in Azure (or AWS if that’s your fancy) and start playing around….

Here’s the rest of the series:

  1. Data Science overview with Databricks
  2. Titanic survival prediction with Azure Machine Learning Studio + Kaggle
  3. Data Engineering with Titanic dataset + Databricks + Python
  4. Titanic with Databricks + Spark ML
  5. Titanic with Databricks + Azure Machine Learning Service
  6. Titanic with Databricks + MLS + AutoML
  7. Titanic with Databricks + MLFlow
  8. Titanic with .NET Core + ML.NET
  9. Deployment, DevOps/MLOps and Productionisation Z
