This is a series of tutorials on improving the reproducibility of data analysis for those doing microbial ecology research. Although the materials focus on issues in microbial ecology, the principles are broadly applicable. Also, this series of tutorials is not designed to teach you R or mothur. Again, although the tutorials use R and mothur, you could use other tools (e.g. Python, QIIME) to achieve the same goals. This workshop will focus on the importance of command line practices (e.g. bash), scripting languages (e.g. mothur, R), version control (e.g. git), automation (e.g. make), and literate programming (e.g. Rmarkdown). These are the tools that are used in the Schloss lab to help improve the reproducibility of our manuscripts. By completing the activities in the tutorials you will be listed on the Reproducible Research Tutorial Honor Roll, which provides a certification of your training.
To get started the outline in the Tutorial section below provides links to slides that correspond to each lesson. Hovering your mouse over each tutorial title will reveal links for the slide deck and blue YouTube icons, which will take you to videos of Pat Schloss leading learners through the slide decks. Some tutorials are rather lengthy and so links to "chapters" within the slides and video are provided. Within each of the slide decks, if you press "p", your browser should open the presenter notes, which are a transcription of the audio from the videos. Each video ranges in length from 30 to 90 minutes; however, you will likely need to take longer to ensure that you are comfortable with the material and to do the exercises for yourself. The tutorials are directed primarily towards researchers who are active in microbiome-based research. Throughout the materials there are numerous discussion points and activities that ask learners to engage their research group and group director in conversations and activities. Furthermore, group directors will likely find the videos useful for providing a broad overview of handling issues of reproducibility in microbiome research. Although the tutorial series is directed towards microbiome researchers, the topics outlined in the series should be of general interest to most microbiologists and scientists.
Much has been written on reproducibility over the past few years. These short papers provide a useful background for the overall scope of these materials and should be read before starting:
A big pain in making your analysis reproducible is being explicit about the methods that are used to performing the analysis. The same goes for this series of tutorials! I would strongly recommend either setting up an AWS account and creating an AMI instance or learning to use your local high performance computer (HPC) facility so that you can more easily transition from this tutorial to your future analyses. This is covered in the fourth tutorial, Using high performance computers. Regardless of the operating system you are in, here's what you'll need to work through the tutorials...
Part of the justification for my recommendation to use either AWS or your local HPC is that these will likely already have everything installed. Most of these will be installed if you are running Linux or Mac OS X. If you are using Mac OS X, homebrew is your friend for installing various Linux-based programs. For Windows users, running Windows 10 with bash or installing git bash and then installing R and make will likely get you where you need to be.
We love to get feedback and improvements from others. That's the idea behind Riffomonas - that we riff on each other's work to make it better! If you would like to contribute to the project or even ask a question, please feel free to file an issue on our GitHub repository.