Weather anomalies
Climate change and temperature anomalies
If we wanted to study climate change, we can find data on the Combined Land-Surface Air and Sea-Surface Water Temperature Anomalies in the Northern Hemisphere at NASA’s Goddard Institute for Space Studies. The tabular data of temperature anomalies can be found here
To define temperature anomalies you need to have a reference, or base, period which NASA clearly states that it is the period between 1951-1980.
weather <-
read_csv("https://data.giss.nasa.gov/gistemp/tabledata_v4/NH.Ts+dSST.csv",
skip = 1,
na = "***")
For each month and year, the dataframe shows the deviation of temperature from the normal (expected). Further the dataframe is in wide format.
Cleaning Data
tidyweather <- weather %>%
select(1:13) %>%
pivot_longer( cols = 2:13,
names_to = "Month",
values_to = "delta")
glimpse(tidyweather)
## Rows: 1,716
## Columns: 3
## $ Year <dbl> 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880…
## $ Month <chr> "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "…
## $ delta <dbl> -0.39, -0.53, -0.23, -0.30, -0.05, -0.18, -0.21, -0.25, -0.24, -…
Plotting Information
Plotting data using a time-series scatterplot with a trendline.
tidyweather <- tidyweather %>%
mutate(date = ymd(paste(as.character(Year), Month, "1")),
month = month(date, label=TRUE),
year = year(date))
ggplot(tidyweather, aes(x=date, y = delta))+
geom_point()+
geom_smooth(color="red") +
theme_bw() +
labs (
title = "Weather Anomalies",
x = "Year",
y = "Temperature deviaton",
caption = "Source: https://data.giss.nasa.gov/gistemp/tabledata_v4/NH.Ts+dSST.txt"
) +
NULL

Producing a scatter plot showing the temperature anomalies by month.
ggplot(tidyweather, aes(x=date, y = delta))+
geom_point()+
geom_smooth(color="red") +
theme_bw() +
labs (
title = "Weather Anomalies",
x = "Year",
y = "Temperature deviaton",
caption = "Source: https://data.giss.nasa.gov/gistemp/tabledata_v4/NH.Ts+dSST.txt"
) +
facet_wrap(~month) +
NULL

Grouping data into different time periods to study historical data.
comparison <- tidyweather %>%
filter(Year>= 1881) %>% #remove years prior to 1881
#create new variable 'interval', and assign values based on criteria below:
mutate(interval = case_when(
Year %in% c(1881:1920) ~ "1881-1920",
Year %in% c(1921:1950) ~ "1921-1950",
Year %in% c(1951:1980) ~ "1951-1980",
Year %in% c(1981:2010) ~ "1981-2010",
TRUE ~ "2011-present"
))
Creating a density plot to study the distribution of monthly deviations grouped by the different time periods.
ggplot(data = comparison, aes(delta)) +
geom_density(aes(fill = interval), alpha = 1/4) +
labs(title = "Distribution of Monthly Temperature Anomalies in Time Intervals",
x = "Monthly Temperature Anomaly",
y = "Density",
caption = "Source: https://data.giss.nasa.gov/gistemp/tabledata_v4/NH.Ts+dSST.txt") +
facet_wrap(~ interval, ncol = 1) +
theme_bw() +
theme(legend.position = "none") +
NULL

Calculating average annual anomalies.
#creating yearly averages
average_annual_anomaly <- tidyweather %>%
group_by(Year) %>% #grouping data by Year
# creating summaries for mean delta
summarise(mean_delta = mean(delta, na.rm=TRUE))
#plotting the data:
ggplot(average_annual_anomaly,
aes (x = Year,
y = mean_delta)) +
geom_point() +
theme_bw() +
geom_smooth(method = "loess") +
labs(title = "Average annual anomalies by year",
y = "Average annual anomalies",
caption = "Source: https://data.giss.nasa.gov/gistemp/tabledata_v4/NH.Ts+dSST.txt") +
NULL

Confidence Interval for delta
NASA points out on their website that
A one-degree global change is significant because it takes a vast amount of heat to warm all the oceans, atmosphere, and land by that much. In the past, a one- to two-degree drop was all it took to plunge the Earth into the Little Ice Age.
Construction of a confidence interval for the average annual delta since
2011, both using a formula and using a bootstrap simulation with the
infer package.
formula_ci <- comparison %>%
filter(interval == "2011-present") %>% # choose the interval 2011-present
filter(!delta == "NA") %>% # drop NA observations in delta
summarise(count = n(),
t = qt(0.975, count-1), # use qt with probability and degrees of freedom
mean = mean(delta), # calculate mean
sd = sd(delta), # calculate sd
se = sd(delta)/sqrt(count), # calculate se
margin = t * se, # calculate margin of error
lower = mean - margin, # calculate lower bound
upper = mean + margin #calculate upper bound
)
#print out formula_CI
formula_ci
## # A tibble: 1 × 8
## count t mean sd se margin lower upper
## <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 140 1.98 1.07 0.265 0.0224 0.0443 1.02 1.11
library(infer)
set.seed(1234)
boot_ci <- comparison %>%
filter(interval == "2011-present") %>% # choose the interval 2011-present
filter(!delta == "NA") %>% # drop NA observations in delta
specify(response = delta) %>% # specify the variable of interest
generate(reps = 1000, type = "bootstrap") %>% # extract 1000 bootstrap samples
calculate(stat = "mean") %>% # calculate sample means from each bootstrap sample
get_confidence_interval(level = 0.95, type = "percentile") # calculate confidence interval of this analysis
# Display confidence interval
boot_ci
## # A tibble: 1 × 2
## lower_ci upper_ci
## <dbl> <dbl>
## 1 1.02 1.11
Two different methods of constructing 95% confidence interval were used in this example. First was based on filtering appropriate interval and calculating confidence interval using summary statistics. Second involved ‘infer’ package, which allowed to use bootstrap method and produced the confidence intervals without any additional summary statistics. According to the summary calculations the average annual anomalies since 2011 already exceeded 1 degree, even when 95% confidence interval is taken into account. Therefore, it is highly likely that anomalies will become even more frequent and significant than before.