I have for years, wanted, nay, *dreamed* of teaching a statistics course with this title. In all my years in academia, this really seems to be where people panic the most. Heck, I’ve panicked about statistics. Who hasn’t?

But why are people panicking? For me, my panic was sheer terror at doing something wrong. Because stats- they’re objective! They’re absolute! They can be planned out completely before you even collect data! So there’s only one path to take! And then your results are significant, or they’re not. And if your P values are less than 0.05, publish, if not, nothing to see here. Bam, done, right? But…people are still panicking.

As an undergraduate physics student, I had very little exposure to statistics per se, although, as you can imagine, we did talk a lot about experimental uncertainty. We used equations to predict the behavior of things (think: a cylinder rolling down a ramp), and we looked at how our measurements differed from our predictions. We then quantified this difference, ascribed a mechanism to it (friction!) and used this quantification to see if we could improve our predictions, and it allowed us to ask additional questions (is the effect of friction consistent between trials, within the bounds of experimental uncertainty?* What factors affect friction? How much?).

We never once asked the question “is the effect of friction significant?” Friction was always there, we were more interested in how it affected the predictions of our model. So when I switched fields,** and started taking statistics courses aimed at biologists***, it represented a big switch in thinking. My formal statistical education focused entirely on a branch of statistics that most (all?) biologists should be familiar with, Null-Hypothesis Significance Testing (NHST). Most biologists don’t call NHST, NHST. They just call it “statistics” because this branch of stats is all they’re taught. Confused? If the stats you’re doing use a test to reject a null hypothesis (Ho= Treatment 1 is the same as control) and uses a p-value to help you decide if you can (p=0.049: reject the null! Your treatment worked! You are successful at science and life! or p=0.051: Treatment 1 and control are the same, hang your head in shame, here is a job application for a position at an undesirable fast-food restaurant.)

I bought into this NHST orthodoxy, fully and completely. What can I say? I was young, impressionable, and ready to accept any lesson, assertively delivered, because I was a Good Student. I eagerly applied what I learned to my own research. It was only late in my masters program, when I took a reading course (We read The Nature Of Scientific Evidence, which is a good book, but not a light read), that I was even introduced to the idea that NHST was not “statistics” but one of several statistical ideologies. I still didn’t buy it. At the end of the course, I wrote an essay where I argued that sure, Bayesian and information theoretic approaches might be useful in certain rare cases, but if you just designed your experiments better, NHST is better, because reasons.

Then, I started my PhD. I was handed an #otherpeoplesdata dataset. It was messy. It was real.**** And the questions I wanted to ask of it didn’t fit a NHST approach. The cracks formed in my NHST foundation.

Now, if you know me well, you’ll know that nothing gets me going more than when people in positions of authority uphold practices or ideas that crumble under logical or empirical scrutiny. As an expert in certain fields, I rely on the expertise of others in their respective fields of specialization to keep the world running. When I see cracks in their foundations, it makes me hot under the collar. *****

So as the cracks led to full-on crumbling of the house my stats classes built, I started to ask questions. First, scientific questions. Instead of asking “Does landscape effect soybean aphid establishment in soybean fields?” because, of course it does, I asked “Which landscape elements are most important in explaining variation in soybean aphid establishment patterns?” It seems a subtle difference, I know. But the second question is so much more powerful- it lets you dig deeper than the first. And this led me to more questions- what questions could I ask about patterns I saw? How did my data fit predictions made by other hypotheses? Why was everyone hung up on testing null hypotheses when there were plenty of great real hypotheses to test and compare? Why the heck had I been taught to think of hypothesis testing in this crazy backwards way, where I assume my hypothesis is true if I can reject something that’s almost definitely not true?

Yep. It was like a religious conversion. And suddenly, I was no longer panicking about statistics. I, in fact, started to love statistics. I stopped seeing stats at the thing that was going to objectively tell me if I was *sciencing good enough* and producing significant results and started to see stats as a set of tools to help me understand patterns in my data. So, I’m not saying the NHST is never appropriate, but I am saying that if we teach students to try and solve all problems using a single tool:

Well, you see what happens. Either every problem looks like a nail, or you end up with an almost entirely, but not quite unusable hammer. And sadly, this hammer likes to tell us that it’s actually the only tool.

So, how can we stop the stats-hate among ecologists, before it’s started hurting young brains? My proposal is that we change how we teach it. Rather than front-loading a course with theory, by hand calculations, and orthodoxy, make it about playing with neat tools. Get a group of students together, give them real data, and work through data cleaning, exploration, and several approaches to analyses and visualization. Encourage students to play with different tools with different foci, and see what patterns emerge. It’s not unlike the first time you used hand lenses, a dissecting scope, and a compound scope. You see different things and can ask different questions at different levels of zoom and focus. You didn’t need to know all the particulars of how the microscope works to start seeing patterns emerging, and you gained a better understanding of using the scope and its limitations through trial and error.

On the KBS-LTER Information Management committee, we’ve discussed something like this being offered as a graduate course, using some of the lesser-used datasets generated by the site as a backbone, and change the data core with each offering of the course. That way, we might be able to take care of two problems at once- first, helping students develop their practical data management and analytical skills, and secondly, giving under-utilized data an in-depth examination every now and again with fresh eyes. But, talks about this sort of course are in a very preliminary state, and there’s a fairly high activation energy required to develop a stats course with such an unconventional structure. Nevertheless, it’s something that always bounces forward in my brain whenever I hear a grad student lamenting about their data, their analyses, their interpretations. Or when a reviewer asks me where my p-values are. But that’s a rant for another day.

Mark my words, friends. One day, “Hey! Let’s all just relax about statistics” will be a real course and I will be teaching it.

** which you could actually get at pretty easily if you have a good guess at the uncertainty of each of your measurements and then used some cool math tricks to propagate that uncertainty through your model.
** No offense, physics. It’s just, when we started calculating the vacuum energy of the universe, I started feeling very uncomfortable at how intangible it all was. I felt small, insignificant. I decided to go count bugs, because, hey, I could SEE them.
*** The grad stats class I took would probably have best been entitled “Doing ANOVAS in SAS on normalized and continuous data produced by highly controlled crop science trials.”
**** The data was observational, and taken from an unmanipulated system. In agricultural data, it was as wild at it gets.
***** As follows, I have ended up giving fully-referenced rants at 1) University parking authorities 2) Utility companies and 3) Public health run child-birthing classes documenting their flawed or erroneous approaches. I might be insufferable. *

Pretty sure I took the same stats course as a masters student 🙂

Hi Christie,

Thanks for a cool post with lots of good points (I totally agree with giving students good, bad and ugly data to play with), but having gone through a slightly different route (NHST in psychology then NHST in biology then model comparison and more) I can’t help wondering…

(1) Is it possible that you became more relaxed about stats simply because you became more familiar with them?

(2) How do you know that newbies won’t be just as uncomfortable starting with model comparison approaches?

also…

(3) If model A has lower AIC than model B, but model A still has no significant parameters, we’re back at

~~H0~~square 1, right?None of these are rhetorical questions – I’d really appreciate hearing your views! I’m teaching a couple of biology stats courses at different levels and always looking for better ways to get the important stuff across.

Hi Mike:

1) Seeing things from a likelihood perspective rather than a NHST perspective is highly correlated with feeling differently about stats, but unfortunately I don’t have a control, so my assertion is based on correlative evidence alone 🙂

2) I don’t, but I do know it felt like a big jump when I started taking stats to start doing tests on nulls, rather than experimental hypotheses. Using my paper as an example, if I’d used an NHST approach, I would have built a model using landscape parameters and asked if they were significant- ie: could I show that landscape was not NOT effecting aphid colonization of fields. When I used the information theoretic approach, I could ask the questions much more directly with the stats- ie: which landscape model best explained the variation in distribution?

3) not necessarily. This just means that model A leaves less leftover variation than model B, but it means parameter estimates are highly variable, so it’s probably not a very good predictive model, if it’s all you’ve got. There are probably other, external factors that weren’t accounted for in the model that are driving a lot of the variation. To go back to the example I’ve cited- many of the models used in that soybean aphid paper probably didn’t have statistically significant parameters (and I’m okay with this 🙂 ) because in addition to landscape, weather, population density at the beginning of the season, interactions with predators, etc, etc, can, and probably do, affect the distribution of aphids. Unfortunately, the student that collected the population data was unable to account for all these variables. So, in the discussion of this paper, my co-authors and I discuss how landscape plays a role in distribution, but it likely interacts with all these other factors, and so, you shouldn’t assume you’ll always get a colonized field beside a hedgerow with a lot of aphid overwintering sites, for instance, because of these factors.

re point 2 – one of the challenges is that you have to be very comfortable with linear regressions and what mixtures of categorical and linear covariates as both independent and interaction effects first before you can even start to visualise potentially complex models arising from model selection processes. And if you start with linear regression and ANOVA before putting your hands together and getting ANCOVA-type glms then you are pretty much teaching NHSTs, which is why i think the tend to come first…

but see my post below… i really like this blog post, and agree on many levels.

Hi Andrew- this is an absolutely fair criticism- I’ve been thinking a lot lately about the mechanics of *how* to teach this way recently- namely, this season, I’m applying for a few jobs that emphasize innovative approaches to teaching undergrad and graduate stats in the ads. I don’t want to re-invent the wheel, and the foundation of most stats for biologists courses is the NHST approach. I just remember the logical *thud* I felt when I switched fields, and it got me wondering if there was a better way. The problem with starting with NHST isn’t NHST per se…it’s the inflexible thinking it tends to enforce early. I’m wondering if it might make sense to start students out with something a little different- perhaps probability theory, like I learned in finite mathematics as as senior in high school, or error/uncertainty propagation as I was taught early in my physics program. Clearly, I don’t have this nailed down, but I want to make sure my students become critical thinkers about data and evidence, and not so hung up on the idea that P<0.05 = truth. There's a lot a philosophy in statistical interpretation, and I want to make sure I encourage students to appreciate the nuances!

Well my goodness. Thank you for sparking hope in my life for the future of education. I truly hope your plans go through. I appreciate the basic language you used in your writing. It was simple and effective. good job. I enjoyed the material. Keep adding value to your readers lives.

Pingback: Recommended reads #37 | Small Pond Science

Hi Christie

great post – ive shared it with my undergrad statistics class for when we get to model selection after covering NHST… I love it.

One of the reasons I think NHST is a good starting point is that it is clean. It demands well-designed, well-thought out experiments or investigations, and it encourages clear planning and thinking. I insist that students try to graph our their expected results before they collect their data in order to get them to think through the challenges of collecting the exact data they need, and it helps afterwards with selecting the correct analysis.

I love model selection, information theory and Bayesian inference, but one thing they can encourage is ill-directed, poorly thought out investigations (experiments, field studies, data dredges). Often students, and well practiced scientists who should know better, will “write everything down”. While sensible, it can then lead to a horrendous spiral down the rabbit-hole as one starts chasing different statistical tests, or trying to cram everything into one model, never mind problems with collinear variables or permutations that generates zero observations.

This problem can of course happen even if one starts from a NHST point of view, but I do find that clarity of thought from the start is so important, especially when having to cut through the harsh reality that is real-world data, that its a useful starting point, and one that we should aspire to from the start and use if we can.

Of course it all comes down to our question that we want to answer… but even with model selection approaches, coming up with some mechanistic hypotheses is very important, and model selection does not offer a cheap way out… rather it can turn into an analysis from hell very quickly

best wishes

Andrew

Pingback: Dispatches from the field: scattered thoughts edition | Practical Data Management for Bug Counters

Pingback: Why do we make statistics so hard for our students? | Scientist Sees Squirrel

Pingback: Spreading the open science love like it’s my job because it is my job. | Practical Data Management for Bug Counters

Pingback: Reflections: My ride on the Mozilla Fellow Ship | Practical Data Management for Bug Counters