Project 5 - Visualising Olympics data

Author
Affiliation

Andrew Moles

Learning Developer, Digital Skills Lab

Published

September 25, 2025

Learning objectives:

  • Load in data
  • Load and use packages
  • Filter data to show only information to need
  • Recode and categorise your data to be more meaningful
  • Perform an aggregation of a dataset
  • Make a bar plot using your aggregated data
  • Use factors to re-order categorical variables

Outcomes

We will write a program that provides us with summary statistics of a question we have from a dataset. Your code will automate the process of loading the data, performing operations on that data, and making it into a presentable format.

We will be looking at data on the Olympics. From this data we are asking the following question:

In all Olympic games over the years, how many medals have Great Britain won, and not won, in the Men’s and Women’s 800 and 1500 meters running events in athletics?

We will be aiming for two outcomes in this project.

Outcome 1 - Make an aggregation to show your results

# A tibble: 13 × 3
   medal    event                          n_indiv_medals
   <fct>    <chr>                                   <int>
 1 Gold     Athletics Men's 1,500 metres                5
 2 Gold     Athletics Men's 800 metres                  6
 3 Gold     Athletics Women's 1,500 metres              1
 4 Gold     Athletics Women's 800 metres                2
 5 Silver   Athletics Men's 1,500 metres                6
 6 Silver   Athletics Men's 800 metres                  3
 7 Bronze   Athletics Men's 1,500 metres                3
 8 Bronze   Athletics Men's 800 metres                  1
 9 Bronze   Athletics Women's 800 metres                1
10 No medal Athletics Men's 1,500 metres               76
11 No medal Athletics Men's 800 metres                 77
12 No medal Athletics Women's 1,500 metres             29
13 No medal Athletics Women's 800 metres               32

Outcome 2 - Make a bar plot to make your results look better

The data

We will be using a csv file that is stored on GitHub - https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2024/2024-08-06/olympics.csv.

The dataset provides individual level data on participants of the summer and winter Olympics up to and including the 2016 event.

Steps to help you get to the outcome

Part 1 - the setup

Open an R script file and save it.

Load in the Olympics dataset and store it in R.

Part 2 - making a subset

Filter your data to create a subset where by your data should include:

  • Only Great Britain
  • Only the following events:
    • Athletics Men’s 800 metres
    • Athletics Women’s 800 metres
    • Athletics Men’s 1,500 metres
    • Athletics Women’s 1,500 metres

Part 3 - creating a new category

In our data the medals column has a lot of missing data. The missing values show that a participant did not win a medal. Instead of having missing values we want to turn those missing values into a string such as “no_medal”.

Using conditional element selection, convert the missing values in the medal column from NA to no_medal.

Part 4 - aggregation

Aggregation is the process of splitting your data, applying a function, then combining the results.

Get summary statistics by group to show the number of individual medals won for each event.

Save the results as a new data frame, and print the outcome. Your results should look like what we see in Section 2.1.

Part 5 - making a initial visualisation

R has packages that allow for elegant graphics for data analysis.

Using such a package make a simple bar plot using the data we just created with the aggregation. You should have medal on your x axis and individual medal count on the y axis.

Part 6 - re-ordering the x axis

Re-order the x axis so it is in a more sensible order such as gold, silver, bronze, no medal.

R will default to ordering characters alphabetically.

Part 7 - making several visuals in one plot

We want to split the chart window so each category of event has a separate plot. This is known as faceting.

Part 8 - final improvements on how the visualisation looks

There are a few more minor adjustments to make to finish up our visualisation so it looks like what we see in Section 2.2.

  • Add colours to the bars, which has two steps:
    • Make the fill aesthetic the same as your x axis
    • Manually change the fill scale. The colours used in outcome 2 are: "#D6AF36", "#A7A7AD", "#A77044", "#333333"
  • Add a title to the visualisation

Final task - fill out the survey!

We are always looking to improve and iterate our workshops. Follow the link to give your feedback.