Data Science for Dummies – Data Engineering with Titanic dataset + Databricks + Python (Tech Talk 3 of 9)

I put together a tech talk on Machine Learning and Databricks which is the 3rd part of an 9 part Data Science for Dummies series: Data Engineering with Titanic dataset + Databricks + Python.

Preparing & feature engineering highlighted the importance of domain knowledge, even with something as simple as a 10 column dataset! It also aptly demonstrated how much time is spent on ingesting and prepping data for machine learning versus the actual modelling. I also get asked how important the maths and statistics are to get started. There’s no doubt they are essential for this field, however, I personally enjoy the data engineering/DataOps role and am happy to hand over to a dedicated data science when it gets too hairy. It’s important for all roles involved to have an idea of the end to end workflow. With tools like AutoML I can focus on data engineering & architecture.

I’ll be back for Part 2 where we’ll finish the feature engineering and then run the training data through a series of machine learning classifiers to determine which gives the best accuracy.

PM me if you’d like to give your dev team some technical training on how to get started with Machine Learning or Azure/Databricks/Spark for advanced analytics.

Slides can be found here (Note: Powerpoint animation is not working so well 😉

[slideshare id=150788201&doc=datasciencefordummies-1titanicwithdatabricks-190620044318]

Here’s the rest of the series: https://data-driven.com/blog/tag/data-science-for-dummies/

Data Science overview with Databricks
Titanic survival prediction with Azure Machine Learning Studio + Kaggle
Data Engineering with Titanic dataset + Databricks + Python
Titanic with Databricks + Spark ML
Titanic with Databricks + Azure Machine Learning Service
Titanic with Databricks + MLS + AutoML
Titanic with Databricks + MLFlow
Titanic with .NET Core + ML.NET
Deployment, DevOps/MLOps and Productionisation Z

Data Science for dummies

7 parts of the series

20190502 224959

Data Science for Dummies

Share this

Rodney Joyce

Azure-certified Data Architect with a focus on delivering business value and guiding customers through the maze of analytical architectures, design and implementation activities.

Experienced in setting up modern data platforms with advanced predictive analytic workloads. Brings strong people skills and a devops-centric, entrepreneurial approach to Enterprise software delivery.

Leave a Reply Cancel reply

Related Posts

What is a Frontier Firm?

Agentic DNA

What Is Agentic DNA?

The Hidden Cost Traps in Microsoft Fabric (Even After “Unified Pricing”)