predicting a continuous target (regression) based on number of anonymous features columns given in the data.
Predicting a continuous target (regression) based on number of anonymised features columns given in the data. Below is a documentation of the changes and techniques tried/implemented for reference.
NB: You may view the notebook in here (since github only renders static notebooks).
Feature Scaling - MinMax/StandardScaler —> Popular regression models (trial and error), hyperparameter tuning with sklearn GridSearchCV
The highest performing model cb regression was used for submission 1 (RMSE at 0.701 (Public)).
Feature Engineering - Creating Interaction features (multiply each other to another for 2nd order).
Feature Selection - SelectKBest was used but there was noticable performance decrease. Thus, no selection was used.
CatBoost Regression, Lightgbm Regression and XGB Regression together (the three highest performing models) - average out the three.
This did not perform better than submission 1.
After adding 2nd order feature, used Gaussian Rank Transformation on all features (since features with normal distrbution helps non-tree models). Then, using kerastuner (Hyperband) to tune a NN with 2 to 20 layers (relu Dense layers and dropout regularisation). The model did not perform well at ~0.707
Note - Adversial Validation was performed and confirmed that train data is good representation of test data (AUC 0.5)
Back to CB/LGBM/XGB, kept 2nd order terms, using optuna to tune all three models.
Results:
No feature Engineering for 2nd order term, simple optuna tuning on LGBM and CB.
None of the results were close to satisfactory.
I realised my mistake in hyperparameter tuning was over reliance on one held-out validation set as the metric for tuning performance. This led to overfit on that particular set. This should have been obvious since the hyperparameter tuning tool provided by sklearn uses cross-validation on entire training set.
Thus, I switched the objective metric of optuna tuner to cross_val_score (6-fold CV which I believe to be optimal for 6-core CPU parallel processing) the to ensure that we evaluate a more generalised performance. Given the unsatisfactory training speed after implementing this change, I have tinkered with the pruners in the optuna library. In the hopes of speeding up hyperparameter tuning, I switched to hyperband pruner.
I borrowed this approach for optimised LGBM (LB 0.69760)