Practical data management takes its show on the road

This title should really read “Practical data management takes its show ACROSS the road” because that’s what I’m actually doing. Next week, I’ll be a co-instructor for the second Data Carpentry bootcamp at MSU’s BEACON Center, which is right across the street from me. Who said agricultural ecologists don’t get to travel?

My part of this session will focus on spreadsheet best practices for spreadsheet users, data validation in spreadsheets, and a bit of higher-order data cleaning using applications like OpenRefine.

I’ve been burying myself in lesson planning for the past few days, and I think I have a workable plan. However, I wanted to crowd-source some commentary on my Spreadsheet lesson plan. The goal of this lesson is to get scientists who are just starting out to design their spreadsheets in ways that will give them (and their future data managers) minimal headaches down the road. Here it is, as a Google doc that you can comment on, but not edit*:

Using Spreadsheet Programs For Scientific Data

So, interested parties- I would like you to tell me if there’s anything major I’ve overlooked, any particular spreadsheet quirks you’d like to see covered, etc. You might notice that some of the copy looks very familiar- turns out some of the stuff I’ve already written for this blog came in quite handy.

I’m pretty excited for my first time as a Data Carpentry instructor! I’ll let you know how it goes.

*lest anyone vandalize it with references to body parts or comments about my or my progenitor’s characters.

Advertisements

About cbahlai

Hi! I'm Christie and I'm an applied quantitative ecologist and new professor. I am an #otherpeoplesdata wrangler, stats enthusiast, and, of course, a bug counter. I cohabitate with five other vertebrates: one spouse, one first grader, one preschooler and two cats.
This entry was posted in Uncategorized and tagged , , , , , . Bookmark the permalink.

5 Responses to Practical data management takes its show on the road

  1. Jon Borrelli says:

    I think these are great notes, as someone else who has seen some pretty bad spreadsheets (although not nearly as bad as some of your examples) I am glad you are teaching those scientists of the future good practices!

  2. I love your best practices recommendations for spreadsheets. They are really useful. The one thing I would add is that people can use a data dictionary to elaborate on variable information. This allows you to use short variable names (concise strings) in the spreadsheet and thoroughly describe the variables (full variable name, general notes, formatting, known errors, etc.) in an external document. Since spreadsheet rows belong to observations, such variable information doesn’t fit into a spreadsheet. Also important to note: data dictionaries really help with data sharing.

  3. Pingback: Practical data management takes its show on the road | Viral Bioinformatics Resource Center

  4. Pingback: Data Dictionaries » Data Ab Initio

  5. citewave says:

    One of the things people often overlook (Windows users) is leaving spaces in the id columns of spreadsheets. Glad to see that you included a comment on that in your document. Spaces in column names can cause problems down the road if you are going to import your data into a script or other program, so best to always just use underscores or camel case. Thanks!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s