Software and Coding Resources

Data, coding and doing statistics with R.  A nowhere-near-exhaustive list of things I have found useful or even invaluable from the many experts generously putting things on the Web.  Here mainly as a reference for myself, but you never know ...

General

  • R for Data Science (by Garrett Grolemund and Hadley Wickham) Getting started - explains the basics of workflow in analysis, and then takes the reader through practical examples using R.  This is a great place to start if new to R, and tackling an actual project from first-principles with little experience.
  • Data Science at the Command Line by Jeroen Janssens.  A great resource if you want to leverage the power of Linux (and Unix-alikes) to script large data science/analyses.  Covers data cleaning using Unix tools, talking to internet-based databases, and the kinds of files management scripts one needs to use for preparing and processing large data sets (e.g. fMRI, where you have large structured directories of data for participants, multiple scans or experiments)

Visualisation

  • Fundamentals of Data Visualization by Claus O. Wilke.   If you want to look at beautiful data, but aren't sure where to start, this book has this gallery and then chapters dedicated to explaining how to produce the plots, with source code for R
  • An R toolbox called funModeling - covered in the book Data Science Live by Pablo Casas - contains useful tools for quickly visualising large multi-variable data sets (which unless you know the lattice, ggplot2, dplyr and reshape packages well, can be time consuming).

Specific Topics

  • A short introduction to linear regression models in R from Chris Brown's lab - helpful if you want to try out R on a regression problem and understand how your model relates to the code, as well as some introductory diagnostics on model performance.
  • When you have data pre-intervention and post-intervention (so common in biomedical work) this tutorial by Keith Goldfield explains what to do (and how to do it in R)

Experimental Stuff

In short, things I've found that look exciting and worth trying out, but I've yet to get around to ... Listed here more as a reminder for me to look at them

  • Robert Kubinec's paper on item point models (a kind of latent variable model) using Stan and the idealstan package for R - including relationship to dimension reduction.