Kaggle-Tabular_Playground_Series_Jan_2021

Predicting a continuous target (regression) based on number of anonymised features columns given in the data. Below is a documentation of the changes and techniques tried/implemented for reference.

NB: You may view the notebook in here (since github only renders static notebooks).

Attempt 1 (Submission 1):

Feature Scaling - MinMax/StandardScaler —> Popular regression models (trial and error), hyperparameter tuning with sklearn GridSearchCV

Linear Regression
Support Vector Regression
sklearn Decision Tree
sklearn Random Forest
CatBoost Regression
Lightgbm Regression
XGB Regression
Averaging all 7 moels of the above. But models with lesser performance dragged down the predictive power of the other models.

The highest performing model cb regression was used for submission 1 (RMSE at 0.701 (Public)).

Attempt 2 (Submission 2):

Feature Engineering - Creating Interaction features (multiply each other to another for 2nd order).
Feature Selection - SelectKBest was used but there was noticable performance decrease. Thus, no selection was used.
CatBoost Regression, Lightgbm Regression and XGB Regression together (the three highest performing models) - average out the three.

This did not perform better than submission 1.

Attempt 3:

After adding 2nd order feature, used Gaussian Rank Transformation on all features (since features with normal distrbution helps non-tree models). Then, using kerastuner (Hyperband) to tune a NN with 2 to 20 layers (relu Dense layers and dropout regularisation). The model did not perform well at ~0.707

Note - Adversial Validation was performed and confirmed that train data is good representation of test data (AUC 0.5)

Attempt 4 (Submissions 3 and 4):

Back to CB/LGBM/XGB, kept 2nd order terms, using optuna to tune all three models.
Results:

Optuna XGB never converged lower than 0.702 validation RMSE - was not submitted.
Optuna LGBM converged to 0.6987 validation RMSE
Optuna CB converged to 0.6985 validation RMSE (there was a small bug which led to suboptimal hyperparameter tuning)
I used average of CB and LGBM for submission 3 and just CB for submission 4. Both did not do better than my initial score. Seems that 2nd order term was not helpful in lifting the score.

Attempt 5:

No feature Engineering for 2nd order term, simple optuna tuning on LGBM and CB.

Submission 5 - LGBM regressor with optuna hyperparameter tuning
Submission 6 - CatBoost regressor with optuna hyperparameter tuning
Submission 7 - XGB regressor with optuna hyperparameter tuning

None of the results were close to satisfactory.

Attempt 6:

I realised my mistake in hyperparameter tuning was over reliance on one held-out validation set as the metric for tuning performance. This led to overfit on that particular set. This should have been obvious since the hyperparameter tuning tool provided by sklearn uses cross-validation on entire training set.

Thus, I switched the objective metric of optuna tuner to cross_val_score (6-fold CV which I believe to be optimal for 6-core CPU parallel processing) the to ensure that we evaluate a more generalised performance. Given the unsatisfactory training speed after implementing this change, I have tinkered with the pruners in the optuna library. In the hopes of speeding up hyperparameter tuning, I switched to hyperband pruner.

Submission 8 & 9 - Optuna + cross_val_score + hyperband pruner LGBM regressor (LB RMSE 0.70182) which is starting to be close to my previous best score
Submission 10 - Optuna + cross_val_score for average of both LGBM regressor and xgboost regressor

Attempt 7:

Submission 11 - smaller learning rate
Upon reading the discussion shown by shogosuzuki, I noticed that the models converge with a significantly better RMSE when a very small learning rate is used.

I borrowed this approach for optimised LGBM (LB 0.69760)