I have a three year old daughter, so you don’t have to tell me how hard it is for some people to share. It’s hard, when you perceive something as yours, when you love it and MOMMY! SHE TOOK IT AND ITS MINE!!! MINE MINE!!!
But what about data? Who’s the boss of it?
It’s a matter of personal policy that, whenever possible*, the data I’m currently working on is publicly available. In fact- here are the two datasets I’ve been spending most of my time on for the past year or so:
And there they are. Yep.
But it’s not that simple, as it turns out. If it was, everyone would share everything, right? ** So, why aren’t you sharing your data?
This is a real question. And I don’t know the answer to it.
In fact, earlier, I tweeted:
— Christie Bahlai (@cbahlai) January 21, 2014
And I have heard nary a peep since then. Of course, I could be polling the converted, so I wanted to put this out there again.
The best impediments I can come up with from the side of a data-creator,*** real or perceived:
1) getting scooped. IE Jerks who take your data, pass it off as something created by them, get a paper in Nature, and laugh at you from atop their piles of scientific accolades. I’m not aware of any specific examples of this in my field, but we’re mostly laid back ecology types. There’s lots of bug counting to go around. But this could be my good-natured Canadian naiveté.**** Does this happen?
2) your data is inappropriate for sharing. Say it’s proprietary, sensitive, personal, or even dangerous. What are examples where this is the case, and how do you publish findings, then?
3) Your data is too messed up for anyone to be able to interpret but you. Well, that’s a solvable problem, and I want to help you fix it. See every other post I’ve written. Heck, call me. Don’t just leave it.
I really want to know- what are your personal impediments to sharing data? I think the key to creating truly open science is to understand what scientists perceive as roadblocks, and to work WITH them to remove the impediments. We can accomplish a lot more working together.
*whenever possible means whenever I’m working with data that I’m primarily responsible for, and my colleagues are okay with me sharing. Usually, they’re okay with it, because usually, they were sharing the data with me in the first place, but when I’m say, consulting on the analysis for a grad student’s project, I treat those data as if they were confidential, and provide the student with guidance to encourage them to share. In case they’re hit by a bus, their legacy lives on. Nothing convinces a grad student more than a looming bus and the possibility of a legacy.
**I’m not going to go into detail about licensing in this post, but this IS a big issue in data sharing. In fact, it was a discussion with @davidjayharris, @ethanwhite and others that inspired this very post. For more on that, check this out. There’s lots of other resources on the web, but that’s a good place to start.
***data-creators, legacy….am I buttering up the grad students enough?
****We’re all in this together, eh?