[1] "Player: Harry Kane | Seasons: 15 | Appearances: 436"
[1] "Goals: 289 | Goals and assists: 351 | Average expected goals: 20.3"
We will write a program that provides us with various metrics on Harry Kane (a football player). Your code will automate the process of loading the data, performing operations on that data, and making it into a presentable format.
We will be aiming for two outcomes in this project.
You should end up with a output that contains the following information, presented as two text outputs:
[1] "Player: Harry Kane | Seasons: 15 | Appearances: 436"
[1] "Goals: 289 | Goals and assists: 351 | Average expected goals: 20.3"
You should end up with a output that shows a filtered dataset with some selected columns.
You will need to create the columns goals_assists
, goals_xg_diff
, goals_no_pens
to get this output.
season age squad goals goals_assists goals_xg_diff goals_no_pens
9 2016-2017 23 Tottenham 29 34 NA 24
10 2017-2018 24 Tottenham 30 32 5.2 28
13 2020-2021 27 Tottenham 23 37 2.9 19
15 2022-2023 29 Tottenham 30 33 8.6 25
16 2023-2024 30 Bayern Munich 36 44 5.4 31
17 2024-2025 31 Bayern Munich 26 35 5.7 17
We will be using the csv file provided (harry_kane_stats.csv
). Click the link below to download the data.
The dataset provides general football metrics on each season Harry Kane has played, from 2010-2025.
If you are not sure what the expected goals and assists columns mean, you can find the definitions on the statsperform webpage, under the header ‘Expected Goals & Expected Assists’.
Open an R script file and save it. Make sure to save the dataset in the same folder as your R script file.
Import the csv file into R.
View the dataset you just loaded into R.
Our dataset is missing useful information about our players season statistics. Fortunately we can calculate these!
Calculate the following metrics and add them as columns to your dataset.
To build our career summary we will need to create variables with the metrics we need by performing calculations on columns in the dataset.
The metrics you produce should match what we see in Section 2.1.
Print out the multi line string message that shows you the career summary metrics.
Subset your dataset to only contain seasons where our players combined goals and assists where greater than 30.
It is good practice to assign the results of your subset to a new data frame under a different name.
Select the columns, shown in the second outcome, to make your results more presentable.
You should be able to adjust the code you have written to select the columns. Your outcome should be the same as seen in Section 2.2.
We are always looking to improve and iterate our workshops. Follow the link to give your feedback.