One of the critical elements of our ongoing projects is to identify barriers to reproducible science with public data. Our teammates have previously mentioned lacking or irregular data, data collection, sampling period, and metadata as examples. But today I want to focus on accessibility in science that’s trying to be open and reproducible.
“Open science” means a lot of things. I don’t have space in one blog post to get into all the uses, but today I’m concerned with the phrase “access” when it comes to open science. Often “access” is used within the open science domain to refer to multiple things itself! Some people mean cost – as in, open software and hardware is cheaper and therefore more people can purchase and use scientific gear and protocols that people and institutions with more resources can. Some people mean learning curve – as in, the tools of open science may be easier to learn than some proprietary ones, therefore it’s easier for nonexperts or nonacademics to participate using the same protocols.
But access also means – and I mean – disability accessibility. Is your project, product, dataset, workflow, team – let alone your code – accessible to collaborators (and would be collaborators) with disabilities? This work is informed by two researchers and professors I greatly respect – Dr. Jon Henner of UNC Greensboro and Dr. Liz Hare of Dog Genetics. Dr. Henner has previously noted that “access and inclusion” is often now used to indicate the other uses of “accessible” but NOT disability. And Dr. Hare informed me – and the rest of the twittersphere – about the lack of compatibility between open science software and access technologies. Open science that isn’t accessible isn’t open! So what can practitioners do to contribute to an actually more open science in the public domain?
Number one is (always) follow the lead of disabled scientists and researchers. Disabled scientists are the experts in making their science accessible. So if you’re a nondisabled scientist like me, the first and best thing to do is educate yourself to ensure you’re not erasing, paving over, or counteracting the work of disabled experts, peers, and mentees. The rest of what I’ll type about today uses an example our team realized was a barrier in our own workflow and products. But that guidance above holds constant across any example I could share.
Christie and collaborator Sarah Cusser have been working on an analysis studying how length of time period impacts the signal and significance of the phenomenon studied. And with the help of some of Sarah’s other collaborators, they developed a way to visualize their results in what they’re referring to as a pyramid plot. These plots are built into the latest software release, which you can find on the Bahlai Lab github repo here – https://github.com/BahlaiLab/bad_breakup_2. To give you an example of the kind of figure from Rowan’s already online presentations on FigShare, here’s one of the pyramid plots:
The figure above is titled “Adult deer ticks in Cary Forest” and the x axis reads “slope” with a scale from -1.5 to 1, while the y axis reads “number of years in window” with a scale from 0 to 25 years. There’s also a key that is labeled significance with red X meaning ‘no’ and black O meaning Yes. Then in the plot itself, there’s a dotted trendline parallel to the y axis, that shows the trend the full length dataset converges on – and two fainter dotted lines parallel to the Y axis indicating a standard error range. For every possible number of years in the total number of years in the dataset, there are Xs and Os showing what signal (positive or negative, based on where on the x axis it falls) and significance (represented by the size of the circle). Scores outside the error range would indicate a false result, as in, misrepresenting the longer term pattern by the longer dataset. In the case of this specific figure, the trend seems to converge after about 15 years, based on the Xs and Os being within the error bounds.
This is a complicated figure, with a lot of information embedded into the image. And this is just one, specific to this dataset, slope, significance, and years included. We want to make our results easily interpreted. So Christie adjusted it based on best practices for image access – for example, ensuring that the symbols weren’t color coded to indicate significance differences, that they are also different shapes and still high contrast. Same with the effect size, being represented by size of X and O, not color shading.
Yet knowing what Dr. Hare has previously shared about the software tools we’re using, I was interested in making sure our figures and the data within them was represented multiple ways, to ensure access. Just like Dr. Drew Hasley told everyone at EDSIN – data visualization alone isn’t ideal, multimodal representation of data is – to ensure there’s multiple points of entry for collaborators and learners. So while Christie continues to update the code to output text tables that contain everything in the image (and several of those functions are now in the latest release!)
Since the whole goal of releasing the moving window analysis code is for other people to use it with their data, why not write image description templates into our code with the annotations? As well as the code that makes the plots and the tables. But I realized I had never, ever come across R code with image description templates for scientific figures.
So earlier in the summer, I shared a poll on twitter, asking R users whether they had ever come across or written image description templates into code to go with the plots created with the code?
Here’s a screenshot of the poll from my twitter feed:
My poll on August 20, 2019 asked “hey #RStats and other #OA #opensci friends, poll time – have you ever written – or seen – image description templates built into code?” 46 people voted, and 83% of respondents replied that they had never come across image description templates written into code. 15% responded they had no idea or just wanted to see the results and 2% (so, 1 person) responded that they had come across or written this (they didn’t clarify). Unfortunately I can’t say I am surprised, but it’s something we’d like to include with our next release – again, in addition to tabular representations of the same information in the figure, to achieve that multimodal representation of the analysis to make it more accessible.
Getting back to listening/reading and educating ourselves, rather than reinventing or breaking the wheel, I’ll be drafting our first attempt at the image description templates for the code using best practices from the National Center for Accessible Media and WebAIM, using specifically their guidance for image descriptions for complex scientific images. And while image descriptions are supposed to be *specific* to the image, we are trying to write a template, so we’ll also need to indicate where in our template the description needs to be adjusted to be specific to the completed analysis.
Know of a team doing this well that we should learn from? Read or listened to a great scientific or complex figure description that we should model descriptions on? We’re very open to feedback as we try to make our work more open – as in, accessible.