R Data Analysis Workshop – Efficient and reproducible data analysis techniques for ecologists by Ben Fanson

Ben FansonLecturer: Dr Ben Fanson
Date: First week of Sept to mid-Nov 2014 (10 weekly meetings)
Time and Place: To be decided closer to start

Abstract:
With the exponential increase in the amount of data being collected, data analysis is quickly becoming a rate limiting step in scientific discovery. With these larger and more complex datasets, basic data analysis approaches (e.g. Excel, cut/paste, drop-down menu graphing) are inefficient, error-prone, and lack reproducibility.

Consequently, advanced data analysis skills are becoming essential for most researchers. The aim of this workshop is to develop these necessary skills that are fundamental to all analysis approaches. In this workshop, participants will learn efficient workflows used by data analysts that minimize errors and promote reproducibility.

The participant will implement workflows using the most widely used analysis platform in academy: R. By the end of the workshop, the participant will understand proper data management, how to clean and summarise data, and methods for visualising data.

Workshop Structure:
The first 10 workshop sessions will be held weekly, lasting up to 75 minutes, and will consist of a ~30-40 min presentation followed by hands on programming exercises.

These 10 sessions will provide the core concepts and skills for analysing data. As with any language, participants will get the most of the workshop if they practice these skills outside the sessions. Following these 10 sessions, if there is interest, special topic sessions could be held to discuss more specific analysis approaches and packages (e.g. lmer package in R, geomapping in R).

Tentative Outline of Topics

  1. Data analysis workflow and management strategies
  2. R fundamentals 1
    • Editors, initiation files, importing/exporting data, R classes
  3. R fundamentals 2
    • Regular expressions, missing values, dates
  4. Data manipulation 1: Basic dataset functions
    • Subsetting, sorting, creating new variables
  5. Data manipulation 2: String functions
    • Concatenating, splitting, trimming, replacing
  6. Data manipulation 3: Advanced dataset functions
    • Merging (inner joins, left joins, outer joins), transposing
  7. Grouping/flow control functions
    • Loops, apply, if-else, by
  8. Writing functions
  9. Visualising data 1: Base graphics
  10. Visualising data 2: ggplot2 graphics

If you have any questions please feel free to contact Ben Fanson