Kaggle Exercises

Intro to Machine Learning

This module contains the use of following Machine Learning commands and concepts:

  • Decision Trees
  • Pandas DataFrame: A structure that handles 2 dimensional data like an SQL table or Excel sheet.
  • Pandas data analysis and description methods: describe(), head(), columns etc
  • Selecting ML features and prediction values
  • Scikit-learn DecisionTreeRegressor
  • Fitting the model and making predictions
  • Comparing the error calculations and selecting ideal number of max_leaf_nodes for DecisionTreeRegressor
  • Using Random Forest algorithm to minimize error

Data and code files can be found here.

Data Visualization

Use of Python matplotlib and seaborn visualization libraries   Data and code files can be found here.

Intermediate Machine Learning

Intermediate Machine Learning module covers following topics:

  • Handling Categorical data: Label Encoding, One-Hot Encoding
  • Cross Validation
  • Data Leakage: Target Leakage, Train-Test Contamination
  • Gradient Boosting: Using xgboost library
  • Imputation
  • Pipelines

Data and code files can be found here.

Pandas

Pandas module covers following topics:

  • Creating, Reading and Writing
  • Indexing, Selecting & Assigning
  • Summary Functions and Maps
  • Grouping and Sorting
  • Data Types and Missing Values
  • Renaming and Combining

Data and code files can be found here.

Feature-Selection

Feature Selection module covers following topics:

  • Baseline Model
  • Categorical Encodings
  • Feature Generation
  • Feature Selection

Data and code files can be found here.

Written on May 23, 2020