For the previous 3 weeks I took the new coursera online course on “How to win Data Science competitions”. I wanted to take this course because I thought it will have a good ROI for my Master Tier Journey and it exceeded my expectations!

This is an advanced 5 weeks course on methods, tricks, tips on how to win data science competitions. The course is presented by Kaggle GrandMasters and they share their expertise on how they won multiple competitions. The videos are teach-by-example: they start with they concept to discuss and then finish the video with how that particular method helped them win a certain competition.

Topics Covered

The course covers the different stages of the data pipeline. Some of these topics are specific to data competitions, however, I think a big part is transposable to real world data applications. Topics included in the course are the following:

  • Overview of different models
  • Feature preprocessing
  • Feature generation
  • Feature extraction
  • Feature interactions
  • Setting up correct validation schemes
  • Discussion of evaluation metrics and how to optimize them
  • Hyperparameter tuning
  • Ensembling
  • Analysis of past competitions

Workload

Each week in the course is made of 30 min to 90 min of video lectures, quizzes and programming assignments. The videos are split into 10 to 20 min videos, good thing for short attention span. The quizzes are available only for students that pay for the course. Programming assignment are related to the course kaggle competition. You can access them even when you just audit the course. The assignments get you to implement the methods discussed in the video lectures on real world data. All of the work for the course is in Python.

coursera

Competition cover on kaggle

Final competition

The highlight of this course is the final competition hosted on kaggle. The goal is to predict future sales volume of different items based on sales history. The data provided representes daily sales from January 2013 to October 2015 provided by 1C, items description and item categories. The tutors provide some notebooks to guide students to get started with the competitions and try and implement the different methods and tricks. The challenge with this data is that the goal is to predict next month sales for (shop, item) combinations and the provided data is daily sales. Another challenge is that the train and test data only share 50% of the items.

Final thought

I highly recommend this course for anyone wanting to get better at data science competitions if you want to get that edge needed to take your skills to the next level.