Tag: science

  • From Scratch to Streamlined: Comparing My Hand-Built Genetic Algorithm with sklearn-genetic

    From Scratch to Streamlined: Comparing My Hand-Built Genetic Algorithm with sklearn-genetic

    After building a genetic algorithm from scratch in Jupyter, I wanted to see what would happen if I used a library instead. Specifically, I tried out sklearn-genetic, a tool that wraps genetic feature selection into a few clean lines of code.

    The difference is incredible. My original notebook was over one hundred lines of code. With sklearn-genetic, the same process became a single call:

    selector = GAFeatureSelectionCV(
    estimator=DecisionTreeClassifier(random_state=RANDOM_STATE),
    cv=CROSS_VALIDATION_SPLITTING,
    scoring=SCORING_STRATEGY,
    population_size=POPULATION_SIZE,
    generations=NUM_GENERATIONS,
    mutation_probability=MUTATION_RATE,
    )
    selector.fit(X, y)

    It worked beautifully. But it’s worth thinking about what I gained and lost with the different approaches.

    What the Library Does Well

    • Speed of implementation: No need to write selection, crossover, or mutation logic. It’s all built in.
    • Robustness: It easily handles edge cases, parallelism, and scoring strategies.
    • Integration: Fits seamlessly into scikit-learn pipelines and workflows.
    • Convenience: You can run a full GA in minutes, with clean syntax and very little code.

    Certainly the library has some big advantages, as libraries really should! 😀

    What I Missed from Building It Myself

    • Visibility: In my notebook, I saw every generation evolve. With the library, that process is hidden.
    • Control: I had access to the state of the system at all times, so I could change parameters or visualize data in the middle of a run.
    • Learning: Writing the GA by hand taught me how each operator affects convergence, diversity, and exploration.
    • Philosophy: My notebook felt like a real experiment. The library felt like a tool.

    The approaches serve different purposes. But if your goal is to actually learn genetic algorithms, building one yourself is irreplaceable.

    Side-by-Side Summary

    AspectHand-Built GAsklearn-genetic
    TransparencyFull control over internalsAbstracted
    FlexibilityEasy to customize logicLimited to API
    SpeedSlower to build, faster to understandFaster to run, harder to inspect
    Learning ValueHighModerate

    Final Thoughts

    Using sklearn-genetic felt like using any library, you hand off control. It’s efficient, clean, and powerful. But building the algorithm myself taught me how the engine works, how selection pressure shapes populations, how mutation keeps diversity alive, and how exploration leads to clarity.

    If you’re just trying to get results, use the library.
    If you’re trying to understand the process, build it yourself.
    And if you’re trying to do both — start with the notebook, then graduate to the tool.

    – William

    My notebook can be found in my GitHub repo here:

    Genetic Algorithm Notebook

  • When Code Evolves: Learning Genetic Algorithms Through a Simple Notebook

    When Code Evolves: Learning Genetic Algorithms Through a Simple Notebook

    There’s something cool about watching a solution evolve. In this case, it was a population of digital organisms competing, mutating, and adapting until a solution emerges. That’s what genetic algorithms do, they take a problem that may feel too tangled to reason about directly and then explore and optimize to a final solution.

    After reading about genetic algorithms, I wanted to understand them more deeply, not just in theory but also in practice. So I opened a Jupyter notebook, loaded a simple dataset, and built a genetic algorithm from scratch. No libraries, no shortcuts. Just Python, NumPy, and a willingness to let evolution take over. Just like my first experimentation with Naive Bayes Classifiers.

    I chose a simple dataset on Kaggle that contains heart disease data. I chose this dataset because it isn’t too large but has a decent set of features to use for optimization.

    A Simple Idea: Evolving Feature Sets

    The experiment was straightforward: Could a genetic algorithm discover the best subset of features for predicting heart disease?

    Each potential solution was represented as a row of 0s and 1s that indicate which features to keep and which to remove. So for example, a row might look like this:

    [1, 0, 1, 1, 0, 0, 1]

    That means “only use features 1, 3, 4, and 7.”

    It’s a really simple encoding. How that is translated to biological terms is: each 1 or 0 is a gene, each list of 1s and 0s is a genome, and each generation is a chance for something better to emerge.

    How the Algorithm Works (In Human Terms)

    The notebook follows a classic evolutionary loop:

    1. We start with a population of random individuals, made up of a subset of features of the dataset

    Many of them may be terrible but that’s ok. Evolution doesn’t actually need a good starting point, just variation.

    2. Evaluate each individual

    For every set of features, we train a small decision tree using only those features. The accuracy of the tree becomes the “fitness score.”

    3. Select parents

    We use tournament selection: pick two individuals at random, keep the better one. It’s very simple, but it pushes the population toward improvement.

    4. Crossover

    Two parents randomly combine their “genes” to create a child. Some genes from one, the rest from the other. This is where new combinations emerge.

    5. Mutation

    Every 1 or 0 has a small chance of flipping. This simulates mutation, the spark of creativity. The thing that keeps evolution from getting stuck.

    6. Repeat for many generations

    And watch the accuracy climb. The notebook prints out the best accuracy of each generation, like this:

    Generation 1: Best Accuracy = 0.7692 Generation 2: Best Accuracy = 0.8022 Generation 3: Best Accuracy = 0.8352 Generation 4: Best Accuracy = 0.8242 Generation 5: Best Accuracy = 0.8352

    It’s like watching a species learn.

    What I Learned by Building It Myself

    The most fun part of this project wasn’t the accuracy score or the final feature set. It was what I learned by writing the code myself. When you don’t rely on a library or a prebuilt GA tool, you’re forced to think through the problem directly. You get a feel for the algorithm.

    That helps make it all click.

    Just as I read, genetic algorithms don’t assume the world is smooth or predictable. They don’t need gradients or clean math. They don’t freeze when the search space is really messy. They just explore, adapt, and keep going and going. Watching that happen in code, watching the population of feature selections slowly learn which features matter more, made the philosophy behind GAs feel real in a way that reading about them or using a library never would.

    It showed me that in complex systems, you don’t get to reason your way to the perfect solution upfront. You need to start wide, stay curious and let patterns emerge before you decide what matters. Writing the notebook by hand was a lesson in how exploration leads to clarity.

    Why This Notebook Is a Great Playground

    Because it’s small, clear, and easy to modify. You can:

    • swap in a different model
    • evolve hyperparameters instead of features
    • visualize fitness over time
    • and a lot more

    It’s a simple sandbox for learning how evolutionary computation works.

    When you see a population of solutions improving generation after generation, it’s hard not to appreciate the elegance of genetic algorithms.

    Closing Thoughts

    Genetic algorithms aren’t the hottest technique in machine learning anymore. But they are still pretty cool and were a very important part of the evolution of data science. They show us that exploration is not a waste of time, it’s a strategy. That creativity can be computational. And they prove that sometimes the best solutions emerge from processes we don’t control.

    Building one in a notebook made that lesson tangible. And honestly, it made me appreciate evolution, both biological and computational, in a whole new way.

    – William

    My notebook can be found in my GitHub repo here:

    Genetic Algorithm Notebook