Data Science for Dummies – Data Engineering with Titanic dataset + Databricks + Python (Tech Talk 3 of 9)

Data Science for dummies

I put together a tech talk on Machine Learning and Databricks which is the 3rd part of an 9 part Data Science for Dummies series: Data Engineering with Titanic dataset + Databricks + Python.

Preparing & feature engineering highlighted the importance of domain knowledge, even with something as simple as a 10 column dataset! It also aptly demonstrated how much time is spent on ingesting and prepping data for machine learning versus the actual modelling. I also get asked how important the maths and statistics are to get started. There’s no doubt they are essential for this field, however, I personally enjoy the data engineering/DataOps role and am happy to hand over to a dedicated data science when it gets too hairy. It’s important for all roles involved to have an idea of the end to end workflow. With tools like AutoML I can focus on data engineering & architecture.

I’ll be back for Part 2 where we’ll finish the feature engineering and then run the training data through a series of machine learning classifiers to determine which gives the best accuracy.

PM me if you’d like to give your dev team some technical training on how to get started with Machine Learning or Azure/Databricks/Spark for advanced analytics.

Slides can be found here (Note: Powerpoint animation is not working so well 😉

[slideshare id=150788201&doc=datasciencefordummies-1titanicwithdatabricks-190620044318]

Here’s the rest of the series: https://data-driven.com/blog/tag/data-science-for-dummies/

  1. Data Science overview with Databricks
  2. Titanic survival prediction with Azure Machine Learning Studio + Kaggle
  3. Data Engineering with Titanic dataset + Databricks + Python
  4. Titanic with Databricks + Spark ML
  5. Titanic with Databricks + Azure Machine Learning Service
  6. Titanic with Databricks + MLS + AutoML
  7. Titanic with Databricks + MLFlow
  8. Titanic with .NET Core + ML.NET
  9. Deployment, DevOps/MLOps and Productionisation Z
Data Science for dummies
7 parts of the series
20190502 224959
Share this
Facebook
Twitter
LinkedIn
Picture of Rodney Joyce

Rodney Joyce

Azure-certified Data Architect with a focus on delivering business value and guiding customers through the maze of analytical architectures, design and implementation activities.

Experienced in setting up modern data platforms with advanced predictive analytic workloads. Brings strong people skills and a devops-centric, entrepreneurial approach to Enterprise software delivery.

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Posts

Subscribed! We'll let you know when we have new blogs and events...