One of the challenges we face in my field is that there is precious little data documenting small numbers of bugs.
In agricultural entomology, especially when you’re dealing with pest insects, we worry much more about too many insects than too few, and thus, most of the controlled, thorough experiments document populations that are high in number. When there are few pest insects, we don’t tend to study them because 1) they’re probably not causing problems (which is a fundamental underlying concept in Integrated Pest Management); 2) your summer students will go batty after spending days in the field, in the hot sun, and only seeing one #%@$#! aphid after processing countless plants; 3) zero-biased data is a pain in the butt to analyse and 4) you’re going to have to collect a LOT of data before you can see any patterns, making it expensive or impossible to do the work you want to do.
As I mentioned, in integrated pest management, we mainly care what’s going on, and what we can do about it, in places where pests are causing, or have the potential to cause damage. Yet pest populations are variable, patchy, and natural populations don’t always show up. We agricultural entomologists have a few ways of coping with this- we can conduct broad surveys to find hotspots and do our fieldwork there, we can ‘seed’ research plots with pests on research stations.1 So the uptake is with many pest species, we know a lot about what’s going on when populations are outbreaking, but we simply don’t study non-outbreaking populations very often. This means there’s lots of #otherpeoplesdata out there documenting lots of bugs, but research rarely systematically documents a lack of bugs.2
But what happens when you’re trying to understand what happened when there are suddenly fewer insects? I was faced with this problem when working on my recent study. My coauthors and I noticed a shift in how ladybeetle populations were being regulated through time and needed a way to connect it to their primary prey, the soybean aphid. I knew, anecdotally, that soybean aphid had undergone a shift in dynamics as well,3 but I didn’t have data that directly showed that. Other researchers throughout the midwest told similar stories- we could all still find aphid populations if we looked really, really hard, and we could use those fields for our studies of outbreaks- but it didn’t give me a good picture of what was happening with the greater population.
For a while, it seemed like unbiased data on aphid populations at the regional level didn’t exist.4, 5 But then, my coauthors and I realized that not all #otherpeoplesdata is neatly (or not-so-neatly) arranged in a spreadsheet, even if it’s publicly available. Sometimes, #otherpeoplesdata needs some legwork.
Extension reports. Eureka.
Land-grant universities in the US midwest typically have a co-operative extension program. Agricultural extension educators are responsible for disseminating research information, for helping stakeholders translate findings to practice, and providing services that help stakeholders with decision-making. Extension agents often prepare materials such as newsletters for circulation to farmers containing up-to-date crop management recommendations, pest alerts, etc. Their reports rarely publish raw data like number of insects per plant in a given field, but they instead synthesize regional information, and use expert opinion to interpret broad surveys. And Extension personnel were often charged with surveying for soybean aphid over broad geographic regions. Exactly what we needed.
These data came in the form of newsletters, websites, blogs and even podcasts- which means we had to sift through a lot of information in different formats to get what we needed. What we6 did was read/listen to/search through each and every extension news record7. For every mention of the term “aphid” in the context of soybean crop reports for four midwestern states over a 13 year period, we recorded the date and copied, verbatim, what was being said about the aphid. Since this data came in the format of narrative, we had to use our understanding of the system and context cues to interpret it- for example, Extension agents described aphid infestations in more alarmist terms early in the aphid’s invasion, likely because back in 2001-2, we didn’t know how many aphids it would take to cause damage. Also, different agents used different language to describe aphid populations, and some were more or less precise than others, but most related infestation to the economic threshold of 250 aphids/plant. After combining all this narrative data together, we looked at it for common elements, and decided to classify each comment using a four-point ordinal classification system:
High = many fields in state surpassing economic threshold;
Moderate = some fields in one or more subregion exceeding economic threshold;
Spotty = rare fields exceeding economic threshold;
Low = Few or no aphids detected
Then, for each state in each year, we recorded the highest classification, corresponding to the severity of the aphid outbreak at the seasonal peak in population. When you put it all together, you get something that looks like this:
A lot of work to get it all together, but finally! Here we have, all in one place, a dataset that gives us a bigger-picture look at what’s been going on with a major agricultural pest over a long time series and through much of its invaded range. It’s not a beautiful, count based dataset, but it provides insight where we just had hunches before. Ah, #otherpeoplesdata! Guess I didn’t need a time machine after all!
So- as it turns out, open sources of #otherpeoplesdata are around, and they’re very, very useful- but they don’t always look like data. In fact, many of the data-creators that shared data that we used in this study probably didn’t even realize they were practicing a form of open science. To my fellow data-wranglers- ask lots of questions- you never know when you’ll stumble on data-diamonds.
1. Which is great for understanding what’s going on WITHIN an outbreak, but gives a biased sample if you’re trying to interpret what’s going on at a regional level- these are not random samples- they’ve been selected *because* they have pests- so they’re not representing what’s going on with the population as a whole.
2. This is our own publication bias– you don’t see treatment effects if the bugs don’t show, so these studies are not typically publishable in conventional journals. But I’d argue there’s still lots of valuable information in these studies. Heck, that’s what I *am* arguing.
3. I have been working with soybean aphid, in some form, since 2004. In 2007, my life got harder, not just because I started my PhD that year, but because I started several field studies that were dependant on me finding *outbreak* populations of soybean aphid (that is, fields with aphid populations in excess of ~250 aphids per plant, the level where aphids start economically affecting the yield of soybean), and around that time, outbreaks became very rare in southwestern Ontario, where I was doing my studies.
4. And, unfortunately, building a time machine and hiring an army of student workers to send into the past to count all aphids on all soybean plants grown in the midwest from the year 2000 on was slightly above my research budget. And possibly unethical.
5. This was a dark time. A dark, dataless time.
6. And by ‘we’ I mean our long-suffering technician, Julia, who is worth her weight in gold for suffering through all the crazy ideas for data collection I throw at her.
7. For example, here are some of the Iowa archives