Kaggle Exercises
Intro to Machine Learning
This module contains the use of following Machine Learning commands and concepts:
- Decision Trees
- Pandas DataFrame: A structure that handles 2 dimensional data like an SQL table or Excel sheet.
- Pandas data analysis and description methods: describe(), head(), columns etc
- Selecting ML features and prediction values
- Scikit-learn DecisionTreeRegressor
- Fitting the model and making predictions
- Comparing the error calculations and selecting ideal number of max_leaf_nodes for DecisionTreeRegressor
- Using Random Forest algorithm to minimize error
Data and code files can be found here.
Data Visualization
Use of Python matplotlib and seaborn visualization libraries Data and code files can be found here.
Intermediate Machine Learning
Intermediate Machine Learning module covers following topics:
- Handling Categorical data: Label Encoding, One-Hot Encoding
- Cross Validation
- Data Leakage: Target Leakage, Train-Test Contamination
- Gradient Boosting: Using xgboost library
- Imputation
- Pipelines
Data and code files can be found here.
Pandas
Pandas module covers following topics:
- Creating, Reading and Writing
- Indexing, Selecting & Assigning
- Summary Functions and Maps
- Grouping and Sorting
- Data Types and Missing Values
- Renaming and Combining
Data and code files can be found here.
Feature-Selection
Feature Selection module covers following topics:
- Baseline Model
- Categorical Encodings
- Feature Generation
- Feature Selection
Data and code files can be found here.
Written on May 23, 2020