This post is written by Tasia North, an undergraduate student who’s working with us on the Bad Breakup project. As part of our research plan on this project, we’re identifying barriers to data reuse from publicly shared, NSF-produced data sources. Tasia is working on a project which is examining patterns within tri-trophic interactions in long term data- basically asking the question- do significant trends move between trophic levels, and if they do, how? But in order to do it, they were first tasked with finding a few representative sets of data. I wanted them to have an authentic experience- just that there were data like this, here’s a database of datasets, now see how you can find information to support this investigation- and write down what you find. This is Tasia’s first blog post with their reflections on the experience!
I’m the new undergrad working here at the Bahlai lab. If you’ve been following along with the blog you’re aware of the bad break up project that’s been going on. This project looks at a long term data set, and breaks it up into shorter clumps to look at the trends. This will allow us to quantify how often we are wrong when we base conclusions off of three or four year studies.
We are digging through approximately 58,000 datasets from the US-LTER. My task is to continue working on the tritrophic interactions that Julia had started. She had created a list of sites that are likely to have the data needed, and what organisms I can look for. I needed to take that list, sort through available LTER data to find the data set, determine if it was at least 12 years or longer, and that it is usable and accessible data.
Easy enough right?
So a few things about me might be useful for context here. As stated, I am an undergraduate student studying Ecology and Conservation Biology. I’ve essentially no experience with large scale data management, and no experience using the LTER website or getting data from this site. In fact I had to google to find the LTER website since I have never used it before and thought everyone was saying LTR. In other words I am a newb at this. However I am armed with four and a half years of college experience (super seniors represent!), and I’m a millennial with the standard ‘navigating internet and sorting through stuff’ skills that are common to my generation. Someone with my education level, computer skills, and the reasonable level of guidance that I have should be able to navigate this site and find the information that I need. Here’s a step by step walkthrough of how successful I was at navigating these sites, what I found, and also some memes to express the feelings that arose during this experience.
The first thing I did was google for the LTER website, this takes me to the data portal. Now Christie wrote in the last blog post that this portal was, erm, less than helpful. But everyone else in the lab was busy when I started working on this and I didn’t want to interrupt anyone. So I got to find out about the data portal all on my own! I start out clicking on the advanced search option. According to the list Julia gave me there is probably a survey of small mammals in the Konza Prairie LTER site that is at least 12 years long. So I type in small mammals, select Konza Prairie, and I am presented with . . . this . .
As you can see, nowhere does it say how many years are included in the data set. It only lists the publication date, which is of no use to me if I want to know how long they studied something. In order to find the length of study, I have to click on the title, scroll down, find the metadata report, click on that, and then scroll down to find the years.
I have spent literal hours over the last couple weeks going through searching for keywords that will hopefully bring up what I need, clicking on a title, then clicking on the metadata report, and then scrolling all the way down just to see something like this:
This was a huge time suck and mildly frustrating to say the least. I was about ready to take my extensive credentials as a *checks notes* Mildly Annoyed Undergrad™ and march right up the LTER office and demand they change their name to the Year Long Ecological Research Network. In fact I was so peeved I took a break to make this extremely niche meme that about 6 people will think is funny.
Finally though, after sifting through what feels like a million data sets, I find one that actually goes for more than a few years. A bit of clicking around brings me to an excel spreadsheet with the data on it.
Now I just need to determine if this data on small mammals is usable. Thankfully it looks complete and without any weird blank spaces or scary looking errors (an earlier excel file I found had an error code of -99999 and that was a scary looking data sheet if I’d ever seen one).
Most of the spreadsheet is logical. There’s the year (this was out of order but clearly labeled so it’s fine), the season, and a watershed ID number, all of that’s fine. Then we get to the actual data, it’s a whole bunch of acronyms, followed by a series of numbers.
Now it may be a cool science thing that I’m not privy to, but this spreadsheet is so full of acronyms that it’s essentially illegible to an outsider unfamiliar with the system. Isn’t the goal of these types of data sets to allow future scientists to come in and reuse the data with relative ease? Well, that’s what the metadata was for!
Thankfully there was an easy to find (it was not) and logically labeled (it also was not) file attached named knb-lter-knz.88.7.txt. This file is not to be confused with knb-lter-knz.88.7.report.xml or knb-lter-knz.88.7.xml, these other two files contain… information (its actually probably really important stuff but I don’t know what any of it means yet). Thankfully the metadata was mostly legible and explained the acronyms clearly. Took me a couple extra clicks but I think this data set will work for what I need!
Next steps are cleaning up the data!