The calm before the (algorithmic) storm

Kaitlin, Julia and I have been busy. Not that you’d know. It’s been a lot of behind the scenes work so far on this project- it’s started with weekly meetings where we talk out our ideas and approaches. You’ll recall, the bad breakup project is an algorithm I designed to mine data from the perspective of a human who was looking to find statistically significant trends.  Essentially, it asks “Given a snippet of data, what conclusion would a typical biologist applying typical statistics make about these data?”* We were going to use the 58,000 datasets compiled through the US-LTER over its 40 year history as our data fuel- with the longest series (the complete record) serving as a proxy for truth.**

Turns out, this problem that we’re working on is pretty big, so establishing exactly how to approach these 58,000 archival  datasets  is our main challenge. It looks kinda like this:

hypothetical workflow

A crudely constructed powerpoint flowchart. Top row: Hypothetical workflow- Get data > put data in the thing > ?????? > Profit!  Bottom row: Actual workflow-  ?????? > Put data in the thing > ?????? > Profit!

Our collaborator, Sarah Cusser at Michigan State, has had amazing successes in the context of a ‘deep look’ at a single long term experiment- she has been examining how long we have to watch a system to see treatment differences across several common agricultural practices- and how consistent this effect is. Her findings, in prep right now, can be used to make recommendations about how we make recommendations to farmers- essentially when can we be confident our recommendations are right, and when should we moderate our confidence, when guiding how farmers select practices.

Building on the idea of the deep dive, Kaitlin got an idea- what if we specifically sought out datasets that document tritrophic interactions from the LTER- we could use these focal datasets to examine within-site patterns between trophic levels- i.e. do misleading results travel together between trophic levels? A brilliant idea, so Julia sat down and started combing the organismal abundance/plant biomass data across the LTER- and proceeded to spend a lot of time spinning her wheels.

The challenge, it seems, is also one of LTER’s strengths.  LTER data is available through two paths- in data archives set by individual sites, and centrally, through a portal at DataONE. There is a heck of a lot of stuff going on at each individual LTER site, which is so, so awesome and cool and I love learning from it***, but, given that each site is so unique, navigating between them to try and find information which supports synthetic approaches is…okay, I’m not going to mince words, not awesome****. Julia has been working on LTER related projects for nearly a decade now***** and found that the DataOne catalog was fine if she knew what she was looking for, but it wasn’t easy to browse, while extracting meaning from anything. She quickly found the individual sites were, generally, superior from a browsing perspective, because each site gave a bit of context about what the sites actually are, what the major experiments were, and their data. The data catalogs at each site, though, were all slightly different, and thus varied in their ease of navigation- and moving between the sites, the approaches differed.  So she’s still chipping away at this.

Meanwhile, though, something big hit the news. The INSECT DECLINE THING.

Oh boy.  So, I am going to summon the full authority of my position as your friendly local data scientist, insect ecologist and time series expert. This study has some serious design flaws and its results are unreliable******.  Manu Saunders wrote an excellent blog post on the subject so I won’t go into it in detail here.  But our group sat down and discussed the study at one of our regular meetings, and realized we were perhaps in one of the best positions to critically, quantitatively examine these claims- so we got to work. More on this soon.

Soon after we got started, a reporter from Discover magazine reached out to Kaitlin for comment on the story, and a truly excellent article resulted.

To quote a quote from the article:

“You can’t just draw a line through some data points, take it down to zero and say, ‘Right, that’s how long we’ve got,’” Broad says. “That’s not how stats works, it’s not how insect populations work, either.”

Later in the article,  Kaitlin highlights how we’re addressing these problems in the work we’re doing– and examines why the scientists making these dire extrapolations are reaching the conclusions they do.

I’m really excited to see what we find.

——-

*  my long term goal is to replace typical biologists making typical conclusions with about 185 lines of well commented R code by 2027

** It’s a proxy because even the longest time series I have is a snippet of the whole story. Ever since I started partying with sociologists, I find I use phrases like “proxy for truth” in  my day-to-day more than I’d like to admit. Wait till we write our book together.

*** Can you tell that I have a deep, deep, identity-defining love for LTER? Because I do.

**** MAXIMUM CANADIAN SHADE.

***** Just to make you feel old, Juj.

****** DOUBLE MAXIMUM CANADIAN SHADE. NOT MANY CAN SURVIVE THIS LEVEL OF SHADE

Advertisements
Posted in Uncategorized | 1 Comment

Bad breakups (with your data)

Hi all,

It’s been a long time. It turns out this assistant professoring thing does not leave me with a lot of time. Hmm. Who knew? Since we last spoke, I’ve been building my lab- both the physical space:

and the online infrastructure.

I’m building collaborations with friends and colleagues all over the place, and working to help finish the student projects that I became involved with through my previous positions at Michigan State. I’m also working hard to get my uniquely Bahlai Lab research vision off the ground. A big part of that is people.

I have people now! Julia, my long-suffering technician, puts up with my barrage of ridiculous ideas and helps me bring the vision to reality. She’s also in training to be a butt-kicking librarian. Cheyan, PhD student, is studying how ‘non-traditional’ data sources (with a focus on citizen science) can be used to develop and engage people in long term ecosystem management. Katie, PhD Student, is studying how we can measure insect mediated ecosystem services and functions in green infrastructure projects. Christian, PhD Student, is examining the use cases and factors affecting the quality and quantity of citizen science data, in the context of Odonate conservation under climate and habitat change.*  (Yes, you counted right- that’s 3 PhD students already). And, my undergraduate project student, Erin, will be working with me on my next big thing.

Which brings us to IT. My Next big thing.

A while back, I had an idea. It wasn’t completely my idea- it came out of conversations with a few people. As you know, I’m interested in big(ish) data- finding trends from patterns we see when we put together a lot of information about a system.** But the big is a combination of a lot of littles.***

When we look at systems for a while we get to see a lot more of the whole, big, messy variability of a system. I’ll illustrate with an example.

Y’all know about the fireflies. No? Okay, it’s been a while and I can remind you about the fireflies. I recorded a video****:

TL:DW- My Reproducible Quantitative methods class produced a paper about firefly phenology.  Fancy people liked it and it got media attention.

During my interview about the piece, the reporter kept going back to the idea of trajectory. Yes, sure, phenology, cool, but what is the *trajectory* of firefly populations? ARE firefly populations in decline?

Here I had one of the longest time series documenting systematic collection of fireflies, known to science, and I could not answer this seemingly simple question. For your reference, here is the data from our site, grouped by plant community of capture:

fireflies

My reply to the reporter was “I don’t know. But if I had less data I would tell you, and I’d be surer of my answer.”

A little tongue in cheek, to be certain, but isn’t that what we’re doing every day in science? One of the fundamental questions we ask in ecology is where is my system going? and we’re making extrapolations based on the data we have available. We know it’s not always the right thing to do, but we do our best, looking at the world through the limited windows available to us. In ecology, the three year study is pretty much the standard:

We know that this is problematic. This is why the USLTER network exists. People get that. But we’ve still got to do work in the shorter time scales. We gotta graduate students. My grants don’t go on forever. We can learn lots of things from studying systems in the short term.

But.

How do we know when these short term studies are misleading us? What are the effects of the time period we’re looking at? The length of time we’re watching, and the type of process? How often we’re measuring?  and how the heck can we test this, if we’re mostly doing short term studies?

Friends, I had an idea. Why not re-analyse long time series data– as if they were short term data? Break it up in all sorts of objectively bad ways (THERE! I EXPLAINED THE POST TITLE), analyse using standard statistical methods, collect these statistics up, and look for trends in conclusions we reach, given different ways of collecting the data?

All this would take would be a relatively simple algorithm, a whole pile of time series data, and some money, time and patience for the personnel to drop data in and collect the stuff that comes out of the algorithm machine. I can write an algorithm, and hey, the USLTER has lots of data that would be appropriate to get this done, but the latter components are a little harder for a new professor to come by.  So, I put it on the back burner.

Anyway, this summer, my friend and collaborator Kaitlin Stack Whitney brought this grant opportunity to my attention.

EAGER proposals for high-risk/high-reward innovative studies that address development and testing of important science and engineering ideas and theories through use of existing data. […..] proposals must:

Involve, for data proposed for use, publicly-available data generated through NSF funding; and

Agree to make public the details about their experiences reusing the data, including especially challenges associated with that reuse.

!!!

Hey! I, in fact, am a professional at reusing data produced by NSF project and publicly documenting my experiences using said data! I [cough] kinda have a blog about it. So me, Kaitlin, and my technician Julia sat down.

We wrote a proposal.

And it got funded.

Tina-Fey-giving-herself-high-five

Three junior women scientists getting an NSF award? No Big Deal.***** We’re getting this project underway, now. My undergraduate, Erin, will be gathering candidate data sets for trial bulk runs of the algorithm over the winter semester. Collaborators Sarah Cusser and Nick Haddad at Michigan State are using the algorithm on a focal dataset to do a deep dive into how patterns of observations affect conclusions in agricultural systems. Basically, we’re going to figure out once and for all- how often are we wrong when we look at our data?

This is going to be big, my friends, stay tuned.

*Note to self, get Christian on the website!! another item for The List.

**thank you for coming to my TED talk that’s not actually a TED talk.

*** you can put that wisdom on my tombstone

****thank you for coming to my other TED talk that’s not actually a TED talk.

*****This is a big deal and I am pretty excited about it.

Posted in Uncategorized | 1 Comment

Official Blog-nouncement

Hi, all.

So I’ve been very quiet lately here on the blog. This is, largely, because: academic job market. My search kicked up a notch this past semester, and I was 1) very busy with travelling for job interviews and 2) it’s considered gauche to talk about interviews while they’re ongoing. I was also busy with other things (y’know, research and teaching), but I’ll get into that in a bit, but for now, here’s the big announcement.

20170508_192020-ANIMATION

Tah-dahhh!

The Bahlai Lab of Applied Quantitative Ecology starts this fall at Kent State University, in the Department of Biological Sciences. It’s not a coincidence that the lab bears my name. It’s my lab.1

In the Bahlai Lab2, we focus on developing tools and metrics for better understanding the functional ecology of communities over time. Our two major areas of research are currently:
1) Studying the responses of insect predator-prey communities in response to disturbance, such as invasions and climate change
-and-
2) Developing break-point analysis tools to better quantify the impacts of change in long term ecological observations.

But more on that later. Lots, lots more 🙂

Incidentally, now that I’m on the other side, I’m allowed to talk about fight club the academic job market, so I wrote a piece for American Scientist about it. It seems to be resonating with people, so go check it out.

The past semester was very full, as I mentioned. I taught another offering of Reproducible Quantitative Methods, this time to a group of 20. Projects (which are still ongoing) included examining the role of environmental drivers in rockhopper penguin egg size, a community and functional analysis of landscape and climate impacts on Midwest aphid populations, an integrated population model to better understand the trajectory of, and constraints on American woodcock population dynamics, and a data rescue project to digitally preserve the National Eutrophication database.

In my personal research, I’m also working on building out my technique for detecting regime shifts (ie- changes to the rules governing population regulation) in dynamic populations.  Currently (like, literally, iterations 201-250 are running right now), I’m running simulated data through the model to determine which conditions it works best under. I’ll be applying it to case studies from insect populations, and seeing what we can see.

It’s going to be an exciting few months, as I transition from Mid-Michigan to Northeast Ohio. 🙂

 

1. [barely audible squeal of joy]
2. I like saying it. In the Bahlai Lab… Welcome to the Bahlai lab…

Posted in Uncategorized | 1 Comment

Soft(ware) skills

In case you’ve been missing hearing my voice in your head, I had a post published over at the Data Carpentry blog today.

Posted in Uncategorized | Tagged , , | Leave a comment

Inference and being wrong in a post-truth era

There’s been a confluence of recent events that have got me thinking about truth and facts. First of all, I’ve been thinking a lot about how we move forward, as scientists when there’s this steaming mess going on.  Secondly, there’s been a series of really honest blog posts where some amazing, respected scientists, talk about the times they’ve found errors in their published work. Thirdly a student member of a online community of academic women that I’m part of posted about a severe dressing down she got from her adviser for what appeared to be a few minor errors that she worried she’d undermined her professional credibility, and possibly her whole career. Finally, I’ve been preparing for another offering of my quantitative methods course, and as such, am thinking about how to best talk to students about data, inference, and what it all means.

robot

Sci, Robot.

It’s more important, now more than ever, that scientists are willing to stand up against misinformation. We are knee deep in it, and friends, we’re the ones holding the shovels. But we don’t always, because it’s exhausting and often makes us vulnerable. We live in an era where scientists are not particularly trusted. When we’re trusted, we’re seen as inhuman. And we’re certainly not understood- who can forget how our research is openly mocked with reductio ad absurdum arguments by people with political agendas to disprove our work1 or undermine science in general (a situation that, I fear, is only going to get worse over the next few years)?  Add into the mix the funding climate leading to cutthroat competition for positions2 and stir in the scientific enterprise’s (essential and important) self-criticism, and  it’s no wonder that scientists are hesitant to engage. In short, it’s soul crushing, and the stakes are high.

This puts a lot of pressure on scientists to be right.

But I believe this climate is hurting us- in a lot of ways. Toxic levels of perfectionism. In the example I mentioned above, we have a graduate student worried about her career because of a typo. I’ve often cited the times I’ve observed students I’m working with get stuck on data problems because they’re afraid to do the wrong thing. Heck, I’m struggling to get the words out right now, because I’m afraid that someone will read what I write and think I’m advocating for sloppy science.3

But how much value does science gain when its practitioners are paralyzed into inaction?  And even more pressing, how do we encourage more diverse contributions in science when people trying to join the community are are shunned when they misstep?

This whole seeing scientists as robots, expecting scientists to be robots, even from within, I believe has a much more insidious consequence as well. It sets up the expectation  that science, all science, every preliminary study, to be absolute.  And we, as practitioners, know that’s not true. And when science is seen as absolute and invariable, it sets itself up as a convenient straw-man for people with anti-science agendas. For example, anthropogenic climate change is happening.4 If you’re reading this blog, you probably agree with this statement. But there’s a controversy, and often cited by team Give-The-Planet-An-Uninvited-Sweaty-Hug5 is the so-called Antarctic cooling, as a point to ‘disprove’ global climate change.6

So, no. Obviously no. Thinking that an overwhelming body of evidence can be brought down by localized variation in trends? And this is not even to mention the role interpretation, statistics choice, even method of observation play. This is a problem, and I can’t help but feel it is up to me, at least in a small part, to fix it. My gut says this is something we can only address by helping people really understand how science is done, and by inviting more people in.

So back to my course- my corner of influence. As this is a course that advocates a open science approach, it forces students into a paradigm where they will be publicly wrong, or at least incomplete and unpolished. As I mentioned, this is something not all scientists take kindly to. But I know I’ve had a lot more luck convincing people of my sometimes controversial conclusions when I open up, show them the steps I took to get there. This is why I applaud the scientists who are open about their missteps and mistakes- these people are teaching the world more about science and process than any ‘perfect’ paper may.

In my own work, I’ve adopted something of a radical openness. The projects I’m leading right now are out there  for the world to see, from the time I create the first file and start dumping stream-of-consciousness comments in about what I intend to do, through each bump in the road.  It is my hope that through this openness and transparency, people feel invited to build on my work,  they can validate the robustness of what I conclude, and they can use these ideas to help, in a small way, understand and buffer change in the world.7

Seventeen students have chosen to join me on this journey next semester. I hope to give them what they need to save the world, too. No pressure.

  1. The blog title makes it easy for you to do this with my work. Since really transitioning to quantitative ecology full time, you might even say I use a computer to count imaginary bugs.
  2. #operationhiremeplease2017
  3. See footnote #2. I’m not. I’m meticulous.
  4. I do hope that NASA article I linked isn’t turned into an ad for a certain prominent brand of luxury hotel in newly coastal Ames, Iowa anytime soon.
  5. This was the most PG  glib nickname for this group I could muster.
  6. Incidentally, I was at a conference in South Africa in October, and Michael Gooseff presented the data from the McMurdo dry valleys that showed this apparent slight cooling trend from the 1978-1998, and then showed the rapid increase since then. I was most intrigued by the inflection point, naturally, as a person interested in breakpoint analysis would naturally be, so stay tuned for some upcoming work in that area. I’ve started a new project. 🙂8
  7. And I mean this quite literally. The project I’ve linked to is developing a tool to understand when changes are occurring in dynamic populations from legacy population data. Because it’s really hard to determine when and how the factors regulating a population changed in an unbiased way- and without knowing that, it’s pretty hard to mitigate those factors.
  8. I’m being all secretive about this to create intrigue but if you’re motivated you can check out exactly what I’m doing on my github. 🙂 🙂
Posted in Uncategorized | 1 Comment

Back to it!

Oh, hey you! Blog. I missed you.

I’ve been busy.  Here’s, in general, what I’ve been up to, in the form of an annotated git contribution log:

git-log
I’m working on getting back into a groove, which will involve some more posts here detailing some of these previous events, and where I’m going with some new projects.

In the meantime, though, I recently recorded a 20-minute talk on my course. In case you’re one of the few people who have not been a captive audience for the live version of the talk, you can see it here:

More to come in the near future.

Posted in Uncategorized | Tagged , , , , , | Leave a comment

Some Mozilla Science Fellowship FAQ

You still have 14 days to submit your application for the Mozilla Fellows for Science! I’ve been putting the call out there on my networks, and there is lots of interest from the community. I’ve had a lot of questions, and many of them are falling into similar themes, so in the name of openess and fairness (and efficiency!) I thought I’d share them with my answers here, so everyone is working from the same information.

NB: You may also find some of my previous posts on the topic, particularly this one, useful.

Q1. So, the fellowship asks for 80% of your time, reserving only 20% for your research program. Christie, you’re a really research intensive, highly productive scientist, how the heck did you reconcile that? 

First of all, thanks for noticing- I am really into my science1, and as a person in pursuit of a research-intensive tenure track position, I know it’s important to show consitent productivity, so stopping or considerably slowing a research program is not an option. That being said, what you do with the fellowship is fairly open.

I’ve held the fellowship as a postdoc, and it worked well for me because I could adapt and shift my focus while still keeping my research program going, albeit at a slightly2 slower pace for hard publication production- but my work is in LTER data science, data synthesis etc, so it really was just a small departure to make it more about open science training. Of the other fellows, Joey is a MS student who successfully defended while on the fellowship, Richard a PhD student who is taking a break from his PhD work to go into full time open science advocacy and development work (just recently packed up and moved to Nairobi, the rest of us fellows are going to go visit him at the end of July), and Jason is an associate professor whose work has largely continued as it was, as it was open science aligned, just with the weight of Mozilla behind it now (meaning he gets invited to Very Important Parties in DC now because people listen to him 🙂 ). We all did very different things for our fellowship work- Richard and Joey developed open tools for researchers and engaged in a lot of training activities, I taught a class and developed curriculum, Jason studied and wrote guidebook on how people were sharing information in participatory medical trials. For me, I spent a lot of time thinking about the gulf that exists between research and training, and how to close that up, using existing scientific infrastructure.

So what I’m saying is the 20% of your own research can mean different things, depending on what your work is now, and where you want to take it for the fellowship. For me, it meant feeling less guilty about pushing the work I was already doing in directions I wanted to see it going, but never really drawing a line between “20% postdoc Christie” and “80% fellow Christie.” It can be intense at times- there is a lot of travel expected which has been hard but also amazing (I’ve got two young kids), but it’s been super, broadening experience.

Q2. What do you even *do* as a fellow? Are you, like, full time, open science superheros?

for_science


Yep, this is the sort of thing I do. Professionally. That’s Kaitlin Thaney on the left. Photo by Joey K. Lee. Photoshopping by Richard Smith-Unna

Well. Kinda.

Sometimes, I wear a cape.

My average day as a fellow might look pretty similar to my average day from the before-time. However, I was already involved in the open science community, the data science communtiy, the open and reproducible training community before the fellowship started. See, unh, this blog, for example. I was spending an undefined portion of my time devoted to improving reproducibility in science, particularly as this relates to data and analysis. Both for the good of humanity, and my own selfish reasons3.

The fellowship dialed these activities to 11, but also took me out of my office to meet with more people, and takes me out of the academic bubble just enough to see our inefficiencies/issues with new perspective.  Fellows maintain close, if remote, working connections to the MSL staff and each other- we have a fellows chat that we basically keep open at all times- which functions like a water cooler in an office we’re all in. As a result, we’ve become as close as colleagues/friends working in a shared lab.4

I still work on papers, analyse data, help grad students with their projects, read the literature, write grant proposals, go to scientific conferences etc. I also blog, write curriculum, participate in conference calls/video chats, travel to meetings, teach classes. It is simultaneously not different and very, very different. In a good way.

Q3. Can I have the money without doing anything?

No. Well, I don’t make this call. But. No.

Q4. What’s the application process like? Where do I put my detailed sampling plan into this form? There’s no place for my 20 page research proposal. What gives?

The application form is simple. The application is not long. Don’t panic.

The reason for this is that the fellowship is fairly open and does not require you to have a completely, 100% developed idea. The first part of the fellowship is devoted to developing your idea(s), and figuring out the best way to impliment them in your community with the help of MSL staff. Think about the goals of the MSL and the fellowship program:

The Mozilla Fellowships for Science present a unique opportunity for researchers who want to influence the future of open science and data sharing within their communities.

We’re looking for researchers with a passion for open source and data sharing, already working to shift research practice to be more collaborative, iterative and open. Fellows will spend 10 months starting September 2016 as community catalysts at their institutions, mentoring the next generation of open data practitioners and researchers and building lasting change in the global open science community.

Think about how the fellowship will help you help make science a more open, more collaborative place. Think about why your community needs it. And then tell us about it.5

Good luck with your application!

1. I *am* charming.
2. 3 4 papers so far this year, but in fairness, only one of them was first authored by me.
3. People ask me to help analyse their data all the time. When their data sucks, it makes my life harder. And it significantly constipates our science. And makes me a sad panda.
4. Real talk: the community we’ve built within the MSL and with the fellows reminds me very much of my time as a(n incredibly socially awkward) teenager who just discovered the internet. Suddenly, a new community that answers to a need I was not finding in my local population. My local scientific peers are awesome, don’t get me wrong, but between postdoc nomadism and being a data geek in a biology lab, it can get lonely.
5. Pro-tip: Use simple, clear language. Avoid jargon. Part of opening up science is becoming better communicators of science. Scientists have a bad habit of excluding people from the club with jargon because it makes us feel smart and a member of the elite- another thing that hinders efforts to diversify the scientific community- makes people with diverse backgrounds feel like they don’t fit/don’t now where or how to engage and also makes established scientists take people who don’t talk like them less seriously. I could rant about this for a long time.

Posted in Uncategorized | 1 Comment