--- title: "NOAA Buoy Data" date: "June 2016" output: pdf_document --- ```{r echo=FALSE,message=FALSE, warning=FALSE} require(mosaic) require(ggplot2) require(tidyr) require(dplyr) require(readr) require(lattice) ``` # Introduction The National Oceanic and Atmospheric Administration (NOAA) is the American federal agency in charge of collecting information and making decisions related to the oceans and the atmosphere. Throughout North America, they supply weather stations which are located both along the coast as well as in the middle of the ocean (on buoys). Among other variables, the weather stations collect information on wind, humidity, temperature, visibility, and atmospheric pressure. The data is all publicly available on NOAA's website. # Data information & loading data All the buoys are listed at [http://www.ndbc.noaa.gov/to_station.shtml](http://www.ndbc.noaa.gov/to_station.shtml). The Santa Monica buoy information is at [http://www.ndbc.noaa.gov/station_page.php?station=46025](http://www.ndbc.noaa.gov/station_page.php?station=46025). The historical data is given at [http://www.ndbc.noaa.gov/station_history.php?station=46025](http://www.ndbc.noaa.gov/station_history.php?station=46025). ```{r eval=TRUE, echo=FALSE} buoy_url <- "http://www.ndbc.noaa.gov/view_text_file.php?filename=46025h2014.txt.gz&dir=data/historical/stdmet/" buoy_data <- read_table(buoy_url, skip=2, col_names=FALSE) temp = read_table(buoy_url, n_max=1, col_names=FALSE) temp = unlist(strsplit(unlist(temp), "\\s+")) names(buoy_data) <- temp ``` Always a good idea to look at the data! One thing to notice is that there are some variables coded as 99/999/9999. From user experience, we surmize that those values should be NA. Additionally, if we want to consider only the 2014 data, we should remove any previous data. ```{r} summary(buoy_data) buoy_data <- buoy_data %>% mutate(WVHT = ifelse(WVHT==99, NA, WVHT)) %>% mutate(DPD = ifelse(DPD==99, NA, DPD)) %>% mutate(APD = ifelse(APD==99, NA, APD)) %>% mutate(MWD = ifelse(MWD==999, NA, MWD)) %>% mutate(PRES = ifelse(PRES==9999, NA, PRES)) %>% mutate(DEWP = ifelse(DEWP==99, NA, DEWP)) %>% select(-VIS, -TIDE) %>% filter(`#YY`==2014) dim(buoy_data) summary(buoy_data) ``` # Using dynamic data within a typical classroom One might be interested in the difference between the wind temperature and the air temperature. Generally, the air temperature is cooler than the wind temperature, but confidence intervals and prediction intervals allow us to quantify the difference. Note that the data lend themselves nicely to ideas of paired observations acting as a univariate sample. As expected, a 95% confidence interval for the true difference in temperatures gives us a value of between 1.25 and 1.31 degrees. However, 95% of the individual observations have a difference in wind and air temperature between -1.5 degrees (air is warmer) and 4.06 degrees (wind is warmer). ```{r} buoy_data$TempDiff <- buoy_data$WTMP - buoy_data$ATMP densityplot(~TempDiff, data=buoy_data) tempdiff.mod <- lm(TempDiff ~ 1, data=buoy_data) tempdiff.func <- makeFun(tempdiff.mod) tempdiff.func() tempdiff.func( interval="prediction") tempdiff.func( interval="confidence") ``` # Thinking outside the box The data are nicely set up to think about analyses is the time domain. Indeed, looking at the autocorrelation function shows clear 24-hour trends for the wind speed variable. ```{r} acf(buoy_data$WSPD, main="Series: Wind Speed", xlab="Lag (hours)") ``` Although a full analysis of the data would warrant multiple years of data (so as to understand yearly trends), we can estimate the spectral density of the time series using a smoothed periodogram. ```{r} spec.pgram(buoy_data$WSPD, spans=c(50), xlab="Frequency = 1/period", main="Wind Speed, Smoothed Periodogram") abline(v=c(1/12,1/24), lty=2) ``` In the smoothed periodogram, the x-axis is the frequency (one over the period) and y-axis represents the correlation (normalized) between the cosine wave at that frequency and the time series. We can see that wind speed has strong correlation at period 12 hours and period 24 hours. ### Additional ideas for analysis: A more sophisticated analysis or longer project could include collecting data from multiple buoys, extended years, and/or additional information on storms [https://www.ncdc.noaa.gov/stormevents/](https://www.ncdc.noaa.gov/stormevents/).