Why learn R

There are a lot of reasons why you should learn R, and it is very prevalent in academia, bioinformatics (analysis of biological data), data analyst and data scientist roles

A lot of LSE courses that involved statistics or data primarily use R. This is because R is a excellent tool for:

R is an open-source tool, which means you do not need to buy a licence in order to use it, and is a popular programming language, as shown in the PYPL index from 2023

Image that shows most popular programming languages in 2023 with R being 7th at the time.

PYPL index 2023

Easy to learn

A key advantage of R is that it is retentively easy to learn, especially in comparison with other programming languages used for data analysis like Python. A large part of this is due to R being designed to work with data, so it feels natural, and some of the excellent extensions that have been added to R over the years to make it more slick (great examples are the tidyverse and data.table packages).

Further, there are the vast amount of online resources in forms of tutorials, blogs, online courses, people sharing code/examples and help available through online discussion forums like stackoverflow and slack.

Some of the best available resources to learn R are:

Some examples of R being awesome for Data visualisation

Animated gif of rainfall and temperature changes over time in Australian cities

Animated gif of rainfall and temperature changes over time in Australian cities

Regional map showing how each area of the UK is doing in terms of gender pay equality for each year data has existed from 2017-2023.

Regional map UK showing how each area is doing in terms of gender pay equality for each year data has existed, from 2017-2022
Fun side note

This webpage was built using R. If you are interested, come to the workshops and chat to one of the instructors about it!

Interactive graphic of Dr Who episodes IMDB rating by actor playing the Doctor

Below is an example of using the Quanteda package for text analysis, which is developed here at LSE by Ken Benoit. It is a visualisation to show how readable inaugural speeches from presidents have been since 1945. A higher score means the text used in the speech is more readable.

Show the code
# load libraries
library(quanteda)
library(quanteda.textstats)
library(ggplot2)

# prepare data
inaugural_readability <- data_corpus_inaugural %>%
    corpus_subset(Year > 1945) %>% 
    textstat_readability(measure = c("Flesch"))

# calculate readability
avg_readability <- mean(inaugural_readability$Flesch)

# text and point colours
point_col <- "black"
text_col <- "white"

# make visual
ggplot(inaugural_readability, 
       aes(x = document, y = Flesch)) +
  geom_point(size = 6.5, colour = point_col) +
  geom_hline(yintercept = avg_readability,
             alpha = 0.75, linetype = 5, linewidth = 1.2) +
  geom_segment(aes(xend = document, yend = avg_readability),
               linetype = 3, linewidth = 1.2) +
  geom_text(aes(label = round(Flesch, 1)), 
            colour = text_col, size = 2, family = "Avenir") +
  annotate(geom = "text", x = "2005-Bush", y = 69,
           family = "Avenir", size = 4,
           label = paste0("Average readability: ", round(avg_readability,2))) +
  annotate(geom = "curve",x = 14.5, y = 66, alpha = 0.6,
           xend = "2005-Bush", yend = avg_readability+0.4,
           curvature = -0.35, arrow = arrow(length = unit(0.15, "inches"))) +
  coord_flip() +
    labs(x = NULL, y = "Readability (Flesch)",
         title = "Readability of inaugural speeches since 1945",
         subtitle = "Higher score means text is more readable") +
  theme_minimal(base_family = "Avenir") +
  theme(plot.title.position = "plot")

Lollipop chart showing inaugural speeches from presidents since 1945 and how readable they are. Truman in 1949 has the worse rating with Trump and Bidan having the highest rating (2021 and 2025).