man-running

Building a Triathlon Photo Classifier: The Dataset Challenge

July 26, 2025
technologyprojects
This post is the first of a series. Stay tuned for the next one!

Intro

Leading up to my first olympic triathlon, i thought it would be a fun exercise to be able to classify race photos into different disciplines: When I got my own race photos, I could feed them back into my model and have it predict which category it belonged to.

If you are unfamiliar, a triathlon is a multi-sport race consisting of three sequential events: swimming, cycling, and running.

To begin, my model would classify photos into the following categories:

  1. Swim
  2. Bike
  3. Run
  4. Transitions (not a discipline but very important parts of the race!)

My first goal was to fine-tune an exiting base residual neural network on a small dataset of scraped triathlon images and build a working model with 85% or higher accuracy I could deploy.

My second goal was to allows users to "correct" the model if it predicted the wrong category for an image. That way, mistakes the model made could benefit from live user-input and I could continue to improve the model through retraining.

My third goal was to use modern ML tools like Weights & Biases to compare different model training runs. I also wanted to see how easy it would be to engineer with the fastai library, which was introduced to me by the second chapter of a great book called Practical Deep Learning for Coders (Norvig et Howard).

Without further ado, lets dive into the model building process:

The Dataset

To start, I needed to collect enough photos to fine tune my base model, resnet-18 , which fastai provides out of the box. I created an unsplash.com account and used their API pull 50 URLs of each triathlon discipline:

1params = {
2    "query": query, # for example: "triathlon biking"
3    "per_page": per_page,
4    "page": page
5}
6url = "<https://api.unsplash.com/search/photos>"
7response = requests.get(url, headers=headers, params=params)
8
9data = response.json()
10page_urls = [photo["urls"]["regular"] for photo in data["results"]]
11all_urls.extend(page_urls)

I wanted to take a look at the different photos I had gotten, so i created a DataBlock as follows:

DataBlock: High level API to quickly get data into DataLoaders
DataLoader: FastAI’s replacement for pytorch’s DataLoader that adds useful functionality and flexibility
1photos = DataBlock(
2    blocks=(ImageBlock, CategoryBlock),
3    get_items=get_image_files,
4    splitter=RandomSplitter(valid_pct=0.2, seed=42),
5    get_y=parent_label,
6    item_tfms=Resize(128))
7path = Path('/content/drive/MyDrive/triathlon')
8dataloaders = photos.dataloaders(path)
9dataloaders.valid.show_batch(max_n=5, nrows=1)
test

test

So far, so good. Each of the photos matched the label.

The First Run

To get an initial impression of how the model performed, I didn’t verify the photos that were downloaded beyond spot-checking a few above. I knew the model performance would be terrible but I could get a baseline for fine-tuning and improving my accuracy from there.

I trained for just 5 epochs on the base resnet18 model:

1learn = vision_learner(dls, resnet18, metrics=error_rate)
2learn.fine_tune(5)
MetricValue
Accuracy61.6%
Error Rate0.384
Epochs5
Architectureresnet18
Training Loss1.380
Valid Loss1.096

Unsurprisingly, my first run would turn out to be the worst by far - 61.6% accuracy! Why was my model performing so poorly?

There are three main building blocks of training a well-performing classification model.

  1. Data Foundation:
    1. Quality data over quantity with correctly labeled data representative of real world scenarios
    2. Proper training/validation splits with normalization and handling class imbalance
  2. Model Architecture & Training
    1. Choosing an appropriate base model to avoid overfitting
    2. Finding the optimal learning rate, loss function, regularization techinques and early stopping
  3. Iteration & Improvement
    1. Systematic approach to changing one variable at a time while tracking experiments
    2. Domain knowledge to understand distinguishable classes and considering edge/corner cases

To address my own model's performance, i focused on the first, Data Foundation. To identify and clean my data, I could use the useful fastai ImageClassifierCleaner :

1cleaner = ImageClassifierCleaner(learn)
2cleaner

I ran through each category's training and validation sets. For each photo, i could choose to keep, delete, or re-classify it's category.

After running the tool, I reloaded the dataset and re-ran the model.

The Second Run

MetricValue
Accuracy74.36%
Error Rate0.256
Epochs5
Architectureresnet18
Training Loss1.080
Valid Loss0.702

Improvement already! My accuracy had increased to 74.36% from 61.6% 🥳

How much better could we get?

"Tuning" the Dataset Further

I knew I could see more improvement, so i output the state of my training and validation sets:

Total Dataset

DisciplineNum. of Images
swim34
bike67
transition28
run46

Validation Dataset

DisciplinePhoto Count
swim5
bike17
transition3
run10

This skewed data balance was causing the model to perform better on the majority class, bike and less well on the minority classes: swim and transition

This means i needed to find more photos of these activities.

There were also duplicates and closeups in swim, which probably wasn’t helping the model. Here are some of the examples of duplicates:

Gallery image 1
Gallery image 2

Additionally, we had some photos that were way too zoomed in:

Gallery image 1
Gallery image 2

Too many zoomed-in photos can cause models problems like:

  1. Context Loss: Missing environmental cues and setting context that helps classification
  2. Scale Sensitivity: The model may overfit to body parts or lose the ability to handle regular photography
  3. Feature Confusion: Close-ups can cause the model to confuse one activity with another.

To help, I decided to remove some of the close up shots and also remove the duplicate-looking images.

Additionally, I added more transition photos to try to balance out the skewed dataset.

Confusion Matrix Deep Dive

I built a confusion matrix to see what the model was getting wrong and right.

Confusion Matrix: A table used to evaluate the performance of a classification model by summarizing the number of correct & incorrect predictions made.
1interp = ClassificationInterpretation.from_learner(learn)
2interp.plot_confusion_matrix()
3interp.most_confused()
4interp.plot_top_losses(5, nrows=1)
confusion matrix

confusion matrix

The good: Bike predictions were looking pretty good. Swim was also looking excellent.

  • Bike: 13/14 correct (93% accuracy) - this was working really well now
  • Swim: 7/7 correct (100% accuracy) - perfect classification!
  • Transition: 6/7 correct (86% accuracy) - huge improvement from before

The bad: Transitions and runs were the trouble

  • Run: Only 3/11 correct (27% accuracy) - yikes!

According to the matrix, I was predicting transitions when they were supposed to be runs!I needed to understand what was actually happening with my predictions.

The main issue was clear: 5 run images were being predicted as transitions, and 2 runs were being predicted as bike. I needed to figure out why.

FastAI has a useful method to call on the results to determine which photos were giving the model the most trouble:

1interp = ClassificationInterpretation.from_learner(learn)
2interp.plot_confusion_matrix()
3interp.most_confused()
4interp.plot_top_losses(5, nrows=1)

Sure enough, several photos I had scraped from the web that were not really related to triathlon running at all:

Gallery image 1
Gallery image 2

Investigating the Run Confusion

I started analyzing the misclassified run images to understand the pattern:

Run → Transition (5 errors): Looking at these images, I could see they often contained:

  • Athletes with prominent race numbers and gear
  • Backgrounds that looked like transition areas
  • Athletes who appeared to be in the process of switching from bike to run gear

Run → Bike (2 errors): These seemed to include:

  • Athletes running on what looked like bike paths or mixed terrain
  • Images with bikes visible in the background
  • Athletes wearing similar athletic gear that could be confused between disciplines

This analysis helped me realize I needed to be more selective about which run images I included in my dataset.

The Third Run

After one final cleaning of the dataset, I was ready to run the same model and see how much I could improve.

This time, I saw a huge jump from runs 1 and 2: 89.47% accuracy from 74.36%.

MetricValue
Accuracy89.47%
Error Rate0.105
Epochs5
Architectureresnet18
Training Loss0.9414
Valid Loss0.4738

From these stats, i surmised the following:

  1. Accuracy jump meant that my data cleaning contributed to the increase
  2. Validation loss being lower than training loss mean that its possible my validation set was smaller and/oreasier
  3. We trained on a low number of epochs so we could possibly train longer and with a larger model in the future!

Conclusion

This project taught me valuable lessons about the importance of data quality over quantity. While I started with a modest 61.6% accuracy, careful data curation and balancing improved performance to 89.4%. The biggest insight was that model performance is often limited by the quality and balance of training data rather than architectural complexity.

The journey from a barely-better-than-random classifier to a reasonably accurate model highlighted the iterative nature of machine learning projects. Each confusion matrix analysis revealed new insights about data quality issues, and addressing these systematically led to measurable improvements.

The next phase will focus on experimentation with different hyperparameter tuning and model selection. Additionally, I'd like to add a new category: "finish line" since most often transition photos were being confused with it. Finally, my last phase will focus on deploying the model and building a small web application to do real-time inference!

To summarize, this learning reinforced that building effective ML models is as much about understanding your data as it is about choosing the right algorithms. Sometimes the best optimization isn't a fancier model - it's simply better data.

Try it out!

Lastly, I exported my model and created a Gradio app on HuggingFace's Spaces platform.

Feel free to classify your own race photo.

https://huggingface.co/spaces/cjkorv3r/triathlon-classifier

If you don't have a photo, you can use mine from my first triathlon as a sample image.

Here I am riding my bike during the race:

After I upload it, the model predicts it with 100% confidence that this is indeed biking.

Nice work, computer!