# A tibble: 13 × 3
medal event n_indiv_medals
<fct> <chr> <int>
1 Gold Athletics Men's 1,500 metres 5
2 Gold Athletics Men's 800 metres 6
3 Gold Athletics Women's 1,500 metres 1
4 Gold Athletics Women's 800 metres 2
5 Silver Athletics Men's 1,500 metres 6
6 Silver Athletics Men's 800 metres 3
7 Bronze Athletics Men's 1,500 metres 3
8 Bronze Athletics Men's 800 metres 1
9 Bronze Athletics Women's 800 metres 1
10 No medal Athletics Men's 1,500 metres 76
11 No medal Athletics Men's 800 metres 77
12 No medal Athletics Women's 1,500 metres 29
13 No medal Athletics Women's 800 metres 32
Project 5 - Visualising Olympics data
On this page
- Learning objectives:
- Outcomes
- Steps to help you get to the outcome
- Part 1 - the setup
- Part 2 - making a subset
- Part 3 - creating a new category
- Part 4 - aggregation
- Part 5 - making a initial visualisation
- Part 6 - re-ordering the x axis
- Part 7 - making several visuals in one plot
- Part 8 - final improvements on how the visualisation looks
- Final task - fill out the survey!
Learning objectives:
- Load in data
- Load and use packages
- Filter data to show only information to need
- Recode and categorise your data to be more meaningful
- Perform an aggregation of a dataset
- Make a bar plot using your aggregated data
- Use factors to re-order categorical variables
Outcomes
We will write a program that provides us with summary statistics of a question we have from a dataset. Your code will automate the process of loading the data, performing operations on that data, and making it into a presentable format.
We will be looking at data on the Olympics. From this data we are asking the following question:
In all Olympic games over the years, how many medals have Great Britain won, and not won, in the Men’s and Women’s 800 and 1500 meters running events in athletics?
We will be aiming for two outcomes in this project.
Outcome 1 - Make an aggregation to show your results
Outcome 2 - Make a bar plot to make your results look better
The data
We will be using a csv file that is stored on GitHub - https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2024/2024-08-06/olympics.csv
.
The dataset provides individual level data on participants of the summer and winter Olympics up to and including the 2016 event.
Steps to help you get to the outcome
Part 1 - the setup
Open an R script file and save it.
Load in the Olympics dataset and store it in R.
Part 2 - making a subset
Filter your data to create a subset where by your data should include:
- Only Great Britain
- Only the following events:
- Athletics Men’s 800 metres
- Athletics Women’s 800 metres
- Athletics Men’s 1,500 metres
- Athletics Women’s 1,500 metres
Part 3 - creating a new category
In our data the medals column has a lot of missing data. The missing values show that a participant did not win a medal. Instead of having missing values we want to turn those missing values into a string such as “no_medal”.
Using conditional element selection, convert the missing values in the medal column from NA
to no_medal
.
Part 4 - aggregation
Aggregation is the process of splitting your data, applying a function, then combining the results.
Get summary statistics by group to show the number of individual medals won for each event.
Save the results as a new data frame, and print the outcome. Your results should look like what we see in Section 2.1.
Part 5 - making a initial visualisation
R has packages that allow for elegant graphics for data analysis.
Using such a package make a simple bar plot using the data we just created with the aggregation. You should have medal on your x axis and individual medal count on the y axis.
Part 6 - re-ordering the x axis
Re-order the x axis so it is in a more sensible order such as gold, silver, bronze, no medal.
R will default to ordering characters alphabetically.
Part 7 - making several visuals in one plot
We want to split the chart window so each category of event has a separate plot. This is known as faceting.
Part 8 - final improvements on how the visualisation looks
There are a few more minor adjustments to make to finish up our visualisation so it looks like what we see in Section 2.2.
- Add colours to the bars, which has two steps:
- Make the fill aesthetic the same as your x axis
- Manually change the fill scale. The colours used in outcome 2 are:
"#D6AF36", "#A7A7AD", "#A77044", "#333333"
- Add a title to the visualisation
Final task - fill out the survey!
We are always looking to improve and iterate our workshops. Follow the link to give your feedback.