William's Data Science Blog

There’s something cool about watching a solution evolve. In this case, it was a population of digital organisms competing, mutating, and adapting until a solution emerges. That’s what genetic algorithms do, they take a problem that may feel too tangled to reason about directly and then explore and optimize to a final solution.

After reading about genetic algorithms, I wanted to understand them more deeply, not just in theory but also in practice. So I opened a Jupyter notebook, loaded a simple dataset, and built a genetic algorithm from scratch. No libraries, no shortcuts. Just Python, NumPy, and a willingness to let evolution take over. Just like my first experimentation with Naive Bayes Classifiers.

I chose a simple dataset on Kaggle that contains heart disease data. I chose this dataset because it isn’t too large but has a decent set of features to use for optimization.

A Simple Idea: Evolving Feature Sets

The experiment was straightforward: Could a genetic algorithm discover the best subset of features for predicting heart disease?

Each potential solution was represented as a row of 0s and 1s that indicate which features to keep and which to remove. So for example, a row might look like this:

[1, 0, 1, 1, 0, 0, 1]

That means “only use features 1, 3, 4, and 7.”

It’s a really simple encoding. How that is translated to biological terms is: each 1 or 0 is a gene, each list of 1s and 0s is a genome, and each generation is a chance for something better to emerge.

How the Algorithm Works (In Human Terms)

The notebook follows a classic evolutionary loop:

1. We start with a population of random individuals, made up of a subset of features of the dataset

Many of them may be terrible but that’s ok. Evolution doesn’t actually need a good starting point, just variation.

2. Evaluate each individual

For every set of features, we train a small decision tree using only those features. The accuracy of the tree becomes the “fitness score.”

3. Select parents

We use tournament selection: pick two individuals at random, keep the better one. It’s very simple, but it pushes the population toward improvement.

4. Crossover

Two parents randomly combine their “genes” to create a child. Some genes from one, the rest from the other. This is where new combinations emerge.

5. Mutation

Every 1 or 0 has a small chance of flipping. This simulates mutation, the spark of creativity. The thing that keeps evolution from getting stuck.

6. Repeat for many generations

And watch the accuracy climb. The notebook prints out the best accuracy of each generation, like this:

Generation 1: Best Accuracy = 0.7692 Generation 2: Best Accuracy = 0.8022 Generation 3: Best Accuracy = 0.8352 Generation 4: Best Accuracy = 0.8242 Generation 5: Best Accuracy = 0.8352

It’s like watching a species learn.

What I Learned by Building It Myself

The most fun part of this project wasn’t the accuracy score or the final feature set. It was what I learned by writing the code myself. When you don’t rely on a library or a prebuilt GA tool, you’re forced to think through the problem directly. You get a feel for the algorithm.

That helps make it all click.

Just as I read, genetic algorithms don’t assume the world is smooth or predictable. They don’t need gradients or clean math. They don’t freeze when the search space is really messy. They just explore, adapt, and keep going and going. Watching that happen in code, watching the population of feature selections slowly learn which features matter more, made the philosophy behind GAs feel real in a way that reading about them or using a library never would.

It showed me that in complex systems, you don’t get to reason your way to the perfect solution upfront. You need to start wide, stay curious and let patterns emerge before you decide what matters. Writing the notebook by hand was a lesson in how exploration leads to clarity.

Why This Notebook Is a Great Playground

Because it’s small, clear, and easy to modify. You can:

swap in a different model
evolve hyperparameters instead of features
visualize fitness over time
and a lot more

It’s a simple sandbox for learning how evolutionary computation works.

When you see a population of solutions improving generation after generation, it’s hard not to appreciate the elegance of genetic algorithms.

Closing Thoughts

Genetic algorithms aren’t the hottest technique in machine learning anymore. But they are still pretty cool and were a very important part of the evolution of data science. They show us that exploration is not a waste of time, it’s a strategy. That creativity can be computational. And they prove that sometimes the best solutions emerge from processes we don’t control.

Building one in a notebook made that lesson tangible. And honestly, it made me appreciate evolution, both biological and computational, in a whole new way.

– William

My notebook can be found in my GitHub repo here:

Genetic Algorithm Notebook

Tag: science

When Code Evolves: Learning Genetic Algorithms Through a Simple Notebook