So, I survived my first time as a Data Carpentry instructor. Yes, it was a month ago, but I thought it would be good to take some time to reflect. Also, I’ve been busy. And travelling. And [excuses].
It was a lot of fun, although stressful. Because this was the first time I, or any of the instructors, had taught the freshly prepared materials, we weren’t sure how they were going to hit, or miss. We had a Mopad where students were able to ask questions, and it was invaluable for getting real time feedback as the lesson was being taught.* We, as instructors, could see what was hitting, and what was missing.
The lesson I taught focused on using spreadsheets, particularly Excel, for basic data handling and operations prior to analysis, and followed the general formula:
2. Formatting data tables in spreadsheets
3. Common formatting mistakes and the problems they cause
4. Dates as data
5. Basic quality control in spreadsheets
6. Exporting data from spreadsheets
If you read through those lessons, you’ll find a lot of the material looks pretty familiar. As it turns out, sharing my thoughts on data management through this blog provided a very convenient starting point for a lot of that content. Neat!
If you want my humble opinion, I think the material provided in the lesson is fantastic.** All the pieces are there to help a user make a dataset that I would be ecstatic to receive, as a data manager and analyst. However, after delivering the lesson, I realized that there was a major factor I overlooked. Motivating the lesson. I did not explicitly tell my students why adopting these practices would help them, or was good, or right. I was so worried about making sure I fit in all the HOW I forgot to explain any of the WHY. Basically, I realize, I became the condescending IT guy.
Without providing motivations in the lesson upfront, I was essentially saying “What, you mean, the motivations for reproducible practice aren’t self-evident? You should always organize your data in a way that’s best for me to work with it.”
The motivations seem so bleeding obvious to me. But I’ve spent months of my life, trying to make sense of #otherpeoplesdata. I’ve helped at least a dozen students now who have come to me with data and analysis problems late in their grad programs, only to realize their data isn’t in an analyzable format, and the clock is ticking for thesis completion. But why does this become such a universal problem? I think, when something becomes “obvious” to a person, it leads them to have certain blinders to the perspectives of others. Most of my students in this class were grad students somewhere fairly early in their respective programs. Almost none had any formal training in data handling or analysis, and many had not gotten beyond the data collection part of their thesis work.These students arranged their data in a way that made sense to them and was practical for them as they were entering it, and it certainly isn’t going to help persuade them to my viewpoint by going in, guns blazing, telling them they’re wrong, without first offering a compelling reason why, especially if they hadn’t encountered any problems with how they were doing things. And I was asking them to make big changes in how they did things. Why? If it ain’t broke, don’t fix it.
So, I learned something big, teaching this class- we need to emphasize the why before we even get to the how.
Why make your data understandable to others? (I use these practices you say are bad to make my data more understandable to me- what am I hurting?)
Why focus on machine readability in data, possibly even at the expense of human readability?*** (Isn’t the end-user of my data a human? Why the focus on making our data fit programs like R’s needs?)
Why do we want to get data out of Excel? (It plots, I can do the calculations I need, why the hate?)
Why worry about reproducible data management? (No one’s actually doing reproducible research, are they?)
So, next time I teach this, or anything else, I’m going to have a careful think about WHY I’m teaching it, but also WHY students would want to learn it. And I’m going to front-load my lesson with these motivations. Teaching how is great, but if your students don’t know why, the lesson is going to fall flat. And exciting topics**** like data management should NEVER fall flat.
* Although, also, a bit unnerving. Often, when I get lecturing, I just get “in the zone” where I can forget about my deeply debilitating social anxieties, reinforced through years as a nerdy scientist, and whatnot. This was a constant reminder- “You are teaching! YOU ARE TEACHING RIGHT NOW! PEOPLE ARE WATCHING YOOOOOUUUU!”
**Signed, the always humble Dr. B.
*** This is actually a surprisingly big issue. People like to see data in cross-tab format rather than list format, because they can infer relationships between things better, but most analytic tools down the way need to see your data in list format. If you have reasonable excel Skillez, it’s easy to turn a list to a cross-tab, but if you can’t code, it’s A NIGHTMARE to turn a cross-tab to a list. Believe me. I’ve done it. By hand. Back before I became a marginally 1337 R h4Xorz. (*coughdorkcoughcough*)
**** No, really! Data management, insect physiology, things that seem dry at first actually can be quite exciting if you toss in some real-world examples, cool tricks, and interpretive dance for good measure.