I have for years, wanted, nay, dreamed of teaching a statistics course with this title. In all my years in academia, this really seems to be where people panic the most. Heck, I’ve panicked about statistics. Who hasn’t?
But why are people panicking? For me, my panic was sheer terror at doing something wrong. Because stats- they’re objective! They’re absolute! They can be planned out completely before you even collect data! So there’s only one path to take! And then your results are significant, or they’re not. And if your P values are less than 0.05, publish, if not, nothing to see here. Bam, done, right? But…people are still panicking.
As an undergraduate physics student, I had very little exposure to statistics per se, although, as you can imagine, we did talk a lot about experimental uncertainty. We used equations to predict the behavior of things (think: a cylinder rolling down a ramp), and we looked at how our measurements differed from our predictions. We then quantified this difference, ascribed a mechanism to it (friction!) and used this quantification to see if we could improve our predictions, and it allowed us to ask additional questions (is the effect of friction consistent between trials, within the bounds of experimental uncertainty?* What factors affect friction? How much?).
We never once asked the question “is the effect of friction significant?” Friction was always there, we were more interested in how it affected the predictions of our model. So when I switched fields,** and started taking statistics courses aimed at biologists***, it represented a big switch in thinking. My formal statistical education focused entirely on a branch of statistics that most (all?) biologists should be familiar with, Null-Hypothesis Significance Testing (NHST). Most biologists don’t call NHST, NHST. They just call it “statistics” because this branch of stats is all they’re taught. Confused? If the stats you’re doing use a test to reject a null hypothesis (Ho= Treatment 1 is the same as control) and uses a p-value to help you decide if you can (p=0.049: reject the null! Your treatment worked! You are successful at science and life! or p=0.051: Treatment 1 and control are the same, hang your head in shame, here is a job application for a position at an undesirable fast-food restaurant.)
I bought into this NHST orthodoxy, fully and completely. What can I say? I was young, impressionable, and ready to accept any lesson, assertively delivered, because I was a Good Student. I eagerly applied what I learned to my own research. It was only late in my masters program, when I took a reading course (We read The Nature Of Scientific Evidence, which is a good book, but not a light read), that I was even introduced to the idea that NHST was not “statistics” but one of several statistical ideologies. I still didn’t buy it. At the end of the course, I wrote an essay where I argued that sure, Bayesian and information theoretic approaches might be useful in certain rare cases, but if you just designed your experiments better, NHST is better, because reasons.
Then, I started my PhD. I was handed an #otherpeoplesdata dataset. It was messy. It was real.**** And the questions I wanted to ask of it didn’t fit a NHST approach. The cracks formed in my NHST foundation.
Now, if you know me well, you’ll know that nothing gets me going more than when people in positions of authority uphold practices or ideas that crumble under logical or empirical scrutiny. As an expert in certain fields, I rely on the expertise of others in their respective fields of specialization to keep the world running. When I see cracks in their foundations, it makes me hot under the collar. *****
So as the cracks led to full-on crumbling of the house my stats classes built, I started to ask questions. First, scientific questions. Instead of asking “Does landscape effect soybean aphid establishment in soybean fields?” because, of course it does, I asked “Which landscape elements are most important in explaining variation in soybean aphid establishment patterns?” It seems a subtle difference, I know. But the second question is so much more powerful- it lets you dig deeper than the first. And this led me to more questions- what questions could I ask about patterns I saw? How did my data fit predictions made by other hypotheses? Why was everyone hung up on testing null hypotheses when there were plenty of great real hypotheses to test and compare? Why the heck had I been taught to think of hypothesis testing in this crazy backwards way, where I assume my hypothesis is true if I can reject something that’s almost definitely not true?
Yep. It was like a religious conversion. And suddenly, I was no longer panicking about statistics. I, in fact, started to love statistics. I stopped seeing stats at the thing that was going to objectively tell me if I was sciencing good enough and producing significant results and started to see stats as a set of tools to help me understand patterns in my data. So, I’m not saying the NHST is never appropriate, but I am saying that if we teach students to try and solve all problems using a single tool:
Well, you see what happens. Either every problem looks like a nail, or you end up with an almost entirely, but not quite unusable hammer. And sadly, this hammer likes to tell us that it’s actually the only tool.
So, how can we stop the stats-hate among ecologists, before it’s started hurting young brains? My proposal is that we change how we teach it. Rather than front-loading a course with theory, by hand calculations, and orthodoxy, make it about playing with neat tools. Get a group of students together, give them real data, and work through data cleaning, exploration, and several approaches to analyses and visualization. Encourage students to play with different tools with different foci, and see what patterns emerge. It’s not unlike the first time you used hand lenses, a dissecting scope, and a compound scope. You see different things and can ask different questions at different levels of zoom and focus. You didn’t need to know all the particulars of how the microscope works to start seeing patterns emerging, and you gained a better understanding of using the scope and its limitations through trial and error.
On the KBS-LTER Information Management committee, we’ve discussed something like this being offered as a graduate course, using some of the lesser-used datasets generated by the site as a backbone, and change the data core with each offering of the course. That way, we might be able to take care of two problems at once- first, helping students develop their practical data management and analytical skills, and secondly, giving under-utilized data an in-depth examination every now and again with fresh eyes. But, talks about this sort of course are in a very preliminary state, and there’s a fairly high activation energy required to develop a stats course with such an unconventional structure. Nevertheless, it’s something that always bounces forward in my brain whenever I hear a grad student lamenting about their data, their analyses, their interpretations. Or when a reviewer asks me where my p-values are. But that’s a rant for another day.
Mark my words, friends. One day, “Hey! Let’s all just relax about statistics” will be a real course and I will be teaching it.
* which you could actually get at pretty easily if you have a good guess at the uncertainty of each of your measurements and then used some cool math tricks to propagate that uncertainty through your model.
** No offense, physics. It’s just, when we started calculating the vacuum energy of the universe, I started feeling very uncomfortable at how intangible it all was. I felt small, insignificant. I decided to go count bugs, because, hey, I could SEE them.
*** The grad stats class I took would probably have best been entitled “Doing ANOVAS in SAS on normalized and continuous data produced by highly controlled crop science trials.”
**** The data was observational, and taken from an unmanipulated system. In agricultural data, it was as wild at it gets.
***** As follows, I have ended up giving fully-referenced rants at 1) University parking authorities 2) Utility companies and 3) Public health run child-birthing classes documenting their flawed or erroneous approaches. I might be insufferable.