Bad breakups (with your data)

Hi all,

It’s been a long time. It turns out this assistant professoring thing does not leave me with a lot of time. Hmm. Who knew? Since we last spoke, I’ve been building my lab- both the physical space:

and the online infrastructure.

I’m building collaborations with friends and colleagues all over the place, and working to help finish the student projects that I became involved with through my previous positions at Michigan State. I’m also working hard to get my uniquely Bahlai Lab research vision off the ground. A big part of that is people.

I have people now! Julia, my long-suffering technician, puts up with my barrage of ridiculous ideas and helps me bring the vision to reality. She’s also in training to be a butt-kicking librarian. Cheyan, PhD student, is studying how ‘non-traditional’ data sources (with a focus on citizen science) can be used to develop and engage people in long term ecosystem management. Katie, PhD Student, is studying how we can measure insect mediated ecosystem services and functions in green infrastructure projects. Christian, PhD Student, is examining the use cases and factors affecting the quality and quantity of citizen science data, in the context of Odonate conservation under climate and habitat change.*  (Yes, you counted right- that’s 3 PhD students already). And, my undergraduate project student, Erin, will be working with me on my next big thing.

Which brings us to IT. My Next big thing.

A while back, I had an idea. It wasn’t completely my idea- it came out of conversations with a few people. As you know, I’m interested in big(ish) data- finding trends from patterns we see when we put together a lot of information about a system.** But the big is a combination of a lot of littles.***

When we look at systems for a while we get to see a lot more of the whole, big, messy variability of a system. I’ll illustrate with an example.

Y’all know about the fireflies. No? Okay, it’s been a while and I can remind you about the fireflies. I recorded a video****:

TL:DW- My Reproducible Quantitative methods class produced a paper about firefly phenology.  Fancy people liked it and it got media attention.

During my interview about the piece, the reporter kept going back to the idea of trajectory. Yes, sure, phenology, cool, but what is the *trajectory* of firefly populations? ARE firefly populations in decline?

Here I had one of the longest time series documenting systematic collection of fireflies, known to science, and I could not answer this seemingly simple question. For your reference, here is the data from our site, grouped by plant community of capture:


My reply to the reporter was “I don’t know. But if I had less data I would tell you, and I’d be surer of my answer.”

A little tongue in cheek, to be certain, but isn’t that what we’re doing every day in science? One of the fundamental questions we ask in ecology is where is my system going? and we’re making extrapolations based on the data we have available. We know it’s not always the right thing to do, but we do our best, looking at the world through the limited windows available to us. In ecology, the three year study is pretty much the standard:

We know that this is problematic. This is why the USLTER network exists. People get that. But we’ve still got to do work in the shorter time scales. We gotta graduate students. My grants don’t go on forever. We can learn lots of things from studying systems in the short term.


How do we know when these short term studies are misleading us? What are the effects of the time period we’re looking at? The length of time we’re watching, and the type of process? How often we’re measuring?  and how the heck can we test this, if we’re mostly doing short term studies?

Friends, I had an idea. Why not re-analyse long time series data– as if they were short term data? Break it up in all sorts of objectively bad ways (THERE! I EXPLAINED THE POST TITLE), analyse using standard statistical methods, collect these statistics up, and look for trends in conclusions we reach, given different ways of collecting the data?

All this would take would be a relatively simple algorithm, a whole pile of time series data, and some money, time and patience for the personnel to drop data in and collect the stuff that comes out of the algorithm machine. I can write an algorithm, and hey, the USLTER has lots of data that would be appropriate to get this done, but the latter components are a little harder for a new professor to come by.  So, I put it on the back burner.

Anyway, this summer, my friend and collaborator Kaitlin Stack Whitney brought this grant opportunity to my attention.

EAGER proposals for high-risk/high-reward innovative studies that address development and testing of important science and engineering ideas and theories through use of existing data. […..] proposals must:

Involve, for data proposed for use, publicly-available data generated through NSF funding; and

Agree to make public the details about their experiences reusing the data, including especially challenges associated with that reuse.


Hey! I, in fact, am a professional at reusing data produced by NSF project and publicly documenting my experiences using said data! I [cough] kinda have a blog about it. So me, Kaitlin, and my technician Julia sat down.

We wrote a proposal.

And it got funded.


Three junior women scientists getting an NSF award? No Big Deal.***** We’re getting this project underway, now. My undergraduate, Erin, will be gathering candidate data sets for trial bulk runs of the algorithm over the winter semester. Collaborators Sarah Cusser and Nick Haddad at Michigan State are using the algorithm on a focal dataset to do a deep dive into how patterns of observations affect conclusions in agricultural systems. Basically, we’re going to figure out once and for all- how often are we wrong when we look at our data?

This is going to be big, my friends, stay tuned.

*Note to self, get Christian on the website!! another item for The List.

**thank you for coming to my TED talk that’s not actually a TED talk.

*** you can put that wisdom on my tombstone

****thank you for coming to my other TED talk that’s not actually a TED talk.

*****This is a big deal and I am pretty excited about it.

About cbahlai

Hi! I'm Christie and I'm a computational ecologist and professor. I am an #otherpeoplesdata wrangler, stats enthusiast, and, of course, a bug counter. I cohabitate with five other vertebrates: one spouse, one spirited grade schooler, one energetic preschooler and two cats.
This entry was posted in Uncategorized. Bookmark the permalink.

3 Responses to Bad breakups (with your data)

  1. Pingback: The calm before the (algorithmic) storm | Practical Data Management for Bug Counters

  2. Pingback: How do I find stuff? An undergraduate’s journey through an online data archive | Practical Data Management for Bug Counters

  3. Pingback: Irrigrated: In which Tasia Complains About Things in List Form Because Narratives are Difficult | Practical Data Management for Bug Counters

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s