If you want to break into competitive data science, then this course is for you! Participating in predictive modelling competitions can help you gain practical experience, improve and harness your data modelling skills in various domains such as credit, insurance, marketing, natural language processing, sales’ forecasting and computer vision to name a few. At the same time you get to do it in a competitive context against thousands of participants where each one tries to build the most predictive algorithm. Pushing each other to the limit can result in better performance and smaller prediction errors. Being able to achieve high ranks consistently can help you accelerate your career in data science.
Offered By
About this Course
Learner Career Outcomes
14%
10%
Skills you will gain
Learner Career Outcomes
14%
10%
Offered by

HSE University
HSE University is one of the top research universities in Russia. Established in 1992 to promote new research and teaching in economics and related disciplines, it now offers programs at all levels of university education across an extraordinary range of fields of study including business, sociology, cultural studies, philosophy, political science, international relations, law, Asian studies, media and communicamathematics, engineering, and more.
Syllabus - What you will learn from this course
Introduction & Recap
This week we will introduce you to competitive data science. You will learn about competitions' mechanics, the difference between competitions and a real life data science, hardware and software that people usually use in competitions. We will also briefly recap major ML models frequently used in competitions.
Feature Preprocessing and Generation with Respect to Models
In this module we will summarize approaches to work with features: preprocessing, generation and extraction. We will see, that the choice of the machine learning model impacts both preprocessing we apply to the features and our approach to generation of new ones. We will also discuss feature extraction from text with Bag Of Words and Word2vec, and feature extraction from images with Convolution Neural Networks.
Final Project Description
This is just a reminder, that the final project in this course is better to start soon! The final project is in fact a competition, in this module you can find an information about it.
Exploratory Data Analysis
We will start this week with Exploratory Data Analysis (EDA). It is a very broad and exciting topic and an essential component of solving process. Besides regular videos you will find a walk through EDA process for Springleaf competition data and an example of prolific EDA for NumerAI competition with extraordinary findings.
Validation
In this module we will discuss various validation strategies. We will see that the strategy we choose depends on the competition setup and that correct validation scheme is one of the bricks for any winning solution.
Data Leakages
Finally, in this module we will cover something very unique to data science competitions. That is, we will see examples how it is sometimes possible to get a top position in a competition with a very little machine learning, just by exploiting a data leakage.
Metrics Optimization
This week we will first study another component of the competitions: the evaluation metrics. We will recap the most prominent ones and then see, how we can efficiently optimize a metric given in a competition.
Advanced Feature Engineering I
In this module we will study a very powerful technique for feature generation. It has a lot of names, but here we call it "mean encodings". We will see the intuition behind them, how to construct them, regularize and extend them.
Hyperparameter Optimization
In this module we will talk about hyperparameter optimization process. We will also have a special video with practical tips and tricks, recorded by four instructors.
Advanced feature engineering II
In this module we will learn about a few more advanced feature engineering techniques.
Ensembling
Nowadays it is hard to find a competition won by a single model! Every winning solution incorporates ensembles of models. In this module we will talk about the main ensembling techniques in general, and, of course, how it is better to ensemble the models in practice.
Reviews
TOP REVIEWS FROM HOW TO WIN A DATA SCIENCE COMPETITION: LEARN FROM TOP KAGGLERS
Top Kagglers gently introduce one to Data Science Competitions. One will have a great chance to learn various tips and tricks and apply them in practice throughout the course. Highly recommended!
Really excellent. Very practical advice from top competitors. This specialization is much more information-dense than most machine learning MOOCs. You really get your money's worth.
This course is fantastic. It's chock full of practical information that is presented clearly and concisely. I would like to thank the team for sharing their knowledge so generously.
I really enjoyed this course but it was probably 2-3 times more work than I anticipated. Most of that extra time comes from working on the final project, testing things out, etc.
About the Advanced Machine Learning Specialization
This specialization gives an introduction to deep learning, reinforcement learning, natural language understanding, computer vision and Bayesian methods. Top Kaggle machine learning practitioners and CERN scientists will share their experience of solving real-world problems and help you to fill the gaps between theory and practice. Upon completion of 7 courses you will be able to apply modern machine learning methods in enterprise and understand the caveats of real-world data and settings.

Frequently Asked Questions
When will I have access to the lectures and assignments?
What will I get if I subscribe to this Specialization?
Is financial aid available?
Will I earn university credit for completing the Course?
More questions? Visit the Learner Help Center.