William's Data Science Blog

In this post, I’ll try using scikit’s GridSearchCV to optimize hyperparameters. GridSearchCV is a powerful tool in scikit-learn that automates the process of hyperparameter tuning by exhaustively searching through a predefined grid of parameter combinations. It evaluates each configuration using cross-validation, allowing you to identify the settings that yield the best performance. It doesn’t guarantee the globally optimal solution, but GridSearchCV provides a reproducible way to improve model accuracy, reduce overfitting, and better understand how a model responds to different parameter choices

Hyperparameter Tuning with GridSearchCV

First Attempt

The images below show the initial parameters I used in my GridSearchCV experimentation and the results. Based on my reading, I decided to try just a few parameters to start. Here are the parameters I chose to start with and a brief description of why I felt each was a good place to start.

Parameter	Description	Why It’s a Good Starting Point
`n_estimators`	Number of trees in the forest	Controls model complexity and variance; 100–300 is a practical range for balancing performance and compute.
`bootstrap`	Whether sampling is done with replacement	Tests the impact of bagging vs. full dataset training—can affect bias and variance. Bagging means each decision tree in the forest is trained on a random sample of the training data.
`criterion`	Function used to measure the quality of a split	Offers diverse loss functions to explore how the model fits different error structures.

You may recall in my earlier post that I achieved these results during manual tuning:
Mean squared error: 160.7100736652691 RMSE: 12.677147694385717 R2 score: 0.3248694960846078

Interpretation

My Manual Configuration Wins on Performance

Lower MSE and RMSE: Indicates better predictive accuracy and smaller average errors.
Higher R²: Explains more variance in the target variable.

Why Might GridSearchCV Underperform Here?

Scoring mismatch: I used "f1" as the scoring metric, which I discovered while reading, is actually for classification! So, the grid search may have optimized incorrectly. Since I’m using a regressor, I should use "neg_mean_squared_error" or "r2".
Limited search space: My grid only varied n_estimators, bootstrap, and criterion. It didn’t explore other impactful parameters like min_samples_leaf, max_features, or max_depth.
Default values: GridSearchCV used default settings for parameters like min_samples_leaf=1, which could lead to overfitting or instability.

Second Attempt

In this attempt, I changed the scoring to neg_mean_squared_error. What that does is, it returns the negative of the mean squared error, which makes GridSearchCV minimize the mean square error (MSE). That in turn means that GridSearchCV will choose parameters that minimize large deviations between predicted and actual values.

So how did that affect results? The below images show what happened.

While the results aren’t much better, they are more valid because it was a mistake to use F1 scoring in the first place. Using F1 was wrong because:

The F1 score is defined for binary classification problems. and I am fitting continuous outputs.
F1 needs discrete class labels, not continuous outputs.
When used in regression, scikit-learn would have forced predictions into binary labels, which distorts the optimization objective.
Instead of minimizing prediction error, it tried to maximize F1 on binarized outputs.

Reflections

The "f1"-optimized model accidentally landed on a slightly better MSE, but this is not reliable or reproducible.
The "neg_mean_squared_error" model was explicitly optimized for MSE, so its performance is trustworthy and aligned with my regression goals.
The small difference could simply be due to random variation or hyperparameter overlap, not because "f1" is a viable scoring metric here.

In summary, using "f1" in regression is methodologically invalid. Even if it produces a superficially better score, it’s optimizing the wrong objective and introduces unpredictable behavior.

In my next post I will try some more parameters and also RandomizedSearchCV.

– William

Trying my hand at Hyperparameter tuning with GridSearchCV

William's Data Science Blog

Hyperparameter Tuning with GridSearchCV

First Attempt

Interpretation

Second Attempt

Reflections

Comments

Leave a comment Cancel reply

More posts

Short Post

First Experiment with SHAP Visualizations

Making Sense of the Black Box: A Guide to Model Explainability

Data Science in the World Pt. 5: Data Science in high education

Trying my hand at Hyperparameter tuning with GridSearchCV

William's Data Science Blog

Hyperparameter Tuning with GridSearchCV

First Attempt

Interpretation

Second Attempt

Reflections

Share this:

Comments

Leave a comment Cancel reply

More posts

Short Post

First Experiment with SHAP Visualizations

Making Sense of the Black Box: A Guide to Model Explainability

Data Science in the World Pt. 5: Data Science in high education