This site contains lesson materials developed by Justin Kitzes for teaching introductory data science skills to practicing scientists. The lessons introduce the core best practices that are needed to make computational research efficient, accurate, maintainable, and reproducible. The current v1.0 of these materials covers the shell, scientific programming, version control, unit testing, and reproducible workflows.
If you are a beginning data scientist looking for a self-guided tutorial, I suggest that you work through the lessons in the order presented below. You’ll need to install a few software packages as described in the setup page.
If you are a Software Carpentry instructor who would like to use these lessons in a workshop, refer to the GitHub repo associated with this site, which contains instructions for adding these lessons into the standard Software Carpentry workshop repo.
These lessons were developed by me, Justin Kitzes, mostly in the context of teaching two day Software Carpentry workshops at the novice level. In my day job, I am a quantitative ecologist at the University of California, Berkeley, where I develop new theory and data sets to predict the effects of land use and climate change on biodiveristy.
All materials here are licensed under the Creative Commons Attribution 4.0 International License license, allowing you to use, share, and adapt them however you see fit so long as you credit me as the original author.