Git and the n00b

Do people still say n00b? Well, I do.*

I am not entirely a n00b to open source living. Back in my physics undergrad days, I was, exclusively, a Linux person. Windows was my enemy, because Windows was The Man.** Using open sourced software was very important to me, because CORPORATIONS.*** I put a Linux fish on my car. So it seems a little ironic that when I entered the science world, my work and how I did it became more and more closed source, considering how important I think Open Science is now. There were two main drivers in my shift in attitude. The first was that using open sourced software made me a ‘difficult collaborator.’ Everyone around me used proprietary software to do everything- MS Word, SAS, Excel, and so on. Even simply drafting a document in Open Office led to strange compatibility issues. I’d send collaborators documents that would develop formatting issues, and the people I was sending things to would not know how to resolve them. It was, frankly, a constant pain in the butt. The second force driving me into closed-practice science was, well, fear. When I entered into graduate school, I became initiated into and indoctrinated by an oral tradition of Bad Things That Can Happen To Your Science.**** Basically, I was told by my peers I better hold my cards close to my chest, because if I didn’t, I would get scooped. And if I got scooped, my science would be unpublishable, and I would have spent eight years in graduate school and the only “open” thing I would be employable for is breaking big rocks into smaller rocks in an open pit quarry.*****

And so, I closed my science. I held my cards close. For YEARS. But during my PhD, my attitude started to shift. If you’re a long-term reader, you know why- but the short version is: I found that I could do bigger, more interesting things if I built on the work of others. This meant making it easier for people to build on work that I’d done. It’s only fair. I don’t want my work to be the only endpoint.

I’ve talked lots about data quality control on this blog, because this is where I spend a lot of my time. Making your data usable by others. But I’ve skirted around another very important aspect of open science- making your data manipulation and analysis open and usable to others because, well, frankly, THIS CONCEPT IS NEW AND TERRIFYING TO ME.

Pro Git- it just came in the mail. Yes, this is happening.

Pro Git- it just came in the mail. Yes, this is happening.

I now consider myself a reasonably adept R programmer. I’m not ashamed to say I’m something of an Excel ninja.****** But version control? Git? I have avoided. Why? Learning curves. Actually, no. Giant learning brick walls that need to be scaled.

Open science community: I love you. I really, really love you. But you have to understand that the version control thing is terrifying to the novice because, first and foremost, there is a very specific dialect associated with it. Not only is it hard for a novice to see HOW to use things like Git, the language is obtuse enough for your casual bug counter to have trouble seeing what you’re even doing with git, and why.

But the why is pretty compelling, if you can get to it. A platform for backing up your scripts, collaborating, and making your work freely available? As far as fostering open science goes, this seems like a Good Thing.

I came to deciding that using Git was something I needed to learn how to do like I make most radical shifts in thinking. First, by being exposed to the idea by experts/fanatics, and deciding immediately it was ridiculous and not something that would ever be useful to me, nay, anyone. Then, by being exposed to reasonable people, using the new thing is reasonable ways tangentially related to what I’m doing. Then, through the course of my natural activities, realizing this new thing would probably be applicable to my work, but still not finding time to do it. Then, finally, by my brain exploding under a seemingly innocuous blip of peer influence.*******

So, yep. I found myself a tutorial. I interfaced Git with RStudio.********

And I did it. I created a repo. I put the code I’ve been working on in it. And lord help me, the day before yesterday, for the first time, I committed changes to my script, then pushed them to my github repo. I think what this effectively means is that when I made changes to my code, I was able to save them in a way that marked what the changes were. Then I was able to upload them to a free file-sharing service without over-writing my previous file, in a folder devoted to this project on this website. If this is, indeed what I did, that’s pretty cool.

A wise man once told me (I’m paraphrasing here) that non-experts make the best teachers because they don’t assume students know things that experts feel are givens. Experts use the word ‘just’ to gloss over important steps they feel are obvious. Well, if this is true, I might be the best qualified person in the world to teach you about Github. Yup. Just let me read ahead a bit in this tutorial and we’re golden.

TL:DR, I’m learning to use version control. I will tell you how it goes. Love, Christie.

*because I’m committed to at least translating the jargon for people who are more n00b than I, here’s the definition of n00b. Now, we can all avoid getting pwned.
** Yet, these days, i find myself pretty consistently interested in the work of the Gates Foundation. Go figure.
*** Back in those days, I also had pink hair, because, like, screw your conventional beauty standards, man. I’m pretty hardcore. Yup.
**** I hear there was this one guy, who couldn’t normalize his data, no matter what transformation he applied, and he DIED.
***** No judgement if this is what you do, it’s just not my thing. The world needs aggregates, probably even more than the world needs quantitative population ecologists.
****** Everyone, and I mean EVERYONE in my field uses Excel for about 90% of their data handling and manipulation.
******* It was Titus Brown‘s casual mention of Github at Preschool Gymnastics class last Saturday morning. That was my tipping point. Damned peer pressure. Next thing you know, I’ll be smoking behind the portables with these young toughs.
******** If you use R, you should probably also be using R Studio. I also avoided this strange and foreign technology for far too long, but it’s fantastic. I’ll write about that later.

About cbahlai

Hi! I'm Christie and I'm a computational ecologist and professor. I am an #otherpeoplesdata wrangler, stats enthusiast, and, of course, a bug counter. I cohabitate with five other vertebrates: one spouse, one spirited grade schooler, one energetic preschooler and two cats.
This entry was posted in Uncategorized and tagged , , , , , , , , , , , , , . Bookmark the permalink.

6 Responses to Git and the n00b

  1. “using open sourced software made me a ‘difficult collaborator.’ ”

    Sadly, I know how that feels… to the max 😦

    I was horrified when I saw what the first draft of a manuscript I wrote looked like on my supervisor’s Mac, in M$ Word. Shit jumbled everywhere… but it looked fine in LibreOffice on my computers!

    I work exclusively in *buntu (mostly Lubuntu these days), occasionally Windoze in a VM if something really won’t work under wine. It’s definitely a big & on-going problem for a lot of people I think (even between Win & Mac). For our next manuscript I’m definitely thinking of starting it on Authorea or WriteLatex (even though I’m not exactly great at LaTeX myself) – too many annoying problems arise from M$ Word compatibility/formatting issues!!! #grr

    Glad to hear I’m not the only one 😀

    • cbahlai says:

      Yep- the compatibility thing made me feel like I was just being a jerk. This was back around 2000-2004, and I could never get any emulators to work to my satisfaction. I tried to come back to Linux when I wiped my PhD laptop and installed Ubuntu, and I was pleased with how much more user friendly it’s become over the years. Unfortunately, my PhD laptop is creaky and old and so that machine is being used exclusively as a data logger, attached to some of my weird husband’s weird bird monitoring equipment. I’m thinking that I’ll try to make the transition over to linux again when/if I (ever) move to a new position.

  2. joshuarherr says:

    Great to hear you’re learning git and happy to see that you are spreading the word!

    One of the things that I think doesn’t get stressed enough is the difference between git (one mechanism out of many to version control your code/text) and github (first a “hub” to version control shared code and now secondly what has become the place for “social coding”). You don’t need to attach the “open data” label to git — everyone should use git (or any other flavor) for their own version control via their personal computer — it’s much more useful than keeping a million copies of different versions of a data frame, manuscript, etc. The fact that services like github make sharing data easier is just more icing on the git cake.

  3. Pingback: This week in learning to be an open scientist | Practical Data Management for Bug Counters

  4. Jerry Corum says:

    I’m going through the Git learning curve right now. Mostly I just wander up to the programmers I work with and make them tell me how to do a particular task. I figure I’ll eventually accumulate enough one-off knowledge to put it all together and appear as a competent user.

    Good Luck! I look forward to any tutorials and tips!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s