Dealing with dates as data in Excel

This post is not about dating, as in romantic entanglements, although lots of scientists I know collect data on the dates they go on.* But like dates of the human variety, dealing with dates, as in those pesky things that denote the passage of time, can be perplexing.

I am a Canadian. I am living in the United States. This alone can cause heaps of frustration. Why? Because each country treats written dates differently.

Take today’s date- the second of July, 2014.

In America, we write this as

7/2/14

But in Canada (and most of the rest of the world), we write:

2/7/14

So, a Canadian would read the American date and think “Oh, yeah, February 7th!” and vice versa. So, if you’re recording dates in a datasheet, and you don’t make it clear what date format you’re using, you’ve added a very, very frustrating element of ambiguity to your data. This is the sort of thing that makes data managers dislike you.

But wait. It gets worse.

Dates in Excel. Holy moly, dates in Excel.

These are some of the 'features' that allow you to 'customize' how you display dates in Excel. Y'know, to avoid ambiguity. And stuff.

These are some of the ‘features’ that allow you to ‘customize’ how you display dates in Excel. Y’know, to avoid ambiguity. And stuff.

The first thing you need to know is that Excel stores dates as a number- see the last column in the above figure. Essentially, it counts the days from a default of December 31, 1899, and thus stores today’s date as 41822.**

But wait. That’s the default on my version of Excel. Turns out there are multiple defaults for different versions of the software. And you can switch between them all willy-nilly. So… say you’re compiling Excel-stored data from multiple sources. There’s dates in each file- excel interprets them as their own internally consistent serial numbers. When you combine the data, excel will take the serial number from the place you’re importing it from, and interpret it using the rule set for the version of excel you’re using.

It’s enough to take years off your life. Four years and a day, to be precise. But don’t worry, it’s not a bug, it’s a feature.

This serial number thing can actually be useful in some circumstances. Say you had a sampling plan where you needed to sample every thirty seven days.*** In another cell, you could type:

=B2+37

And it would return

8-Aug

because it understands the date as a number 41822, and 41822 +37 =41859 which Excel interprets as August 8, 2014. It retains the format (for the most part) of the cell that is being operated upon, unless you did some sort of formatting to the cell before, and then all bets are off.

Which brings us to the many ‘wonderful’ customizations Excel provides in how it displays dates. If you refer to my figure above, you’ll see that there’s many, MANY ways that ambiguity creeps into your data depending on the format you chose when you enter your data, and if you’re not fully cognizant of which format you’re using, you can end up actually entering your data in a way that Excel will heavily misinterpret.

Once, I received a dataset from a colleague representing insect counts that were taken every few days over the summer, and things went something like this:

Ambiguity kills. KNOW YOUR DATE FORMAT.

Ambiguity kills. KNOW YOUR DATE FORMAT.

If excel was to be believed, my colleague had been collecting bugs IN THE FUTURE. Now, I have no doubt this person is highly capable, but I believe time travel was beyond even his grasp.

So, what can be done? Won’t somebody please think of the data managers?

In my own work, I tend to store my dates in two fields: year, and day of year (DOY). Why? Because this is what’s useful to me, and there is practically no possibility for ambiguity creeping in.

The types of statistical models I build usually incorporate year as a factor, to account for year-to-year variation, and then I use DOY to measure the passage of time within a year. Because most of my work involves insects in northern temperate climates, I don’t need to worry about discontinuities with DOY at December 31/January 1 because insects ’round these parts don’t do much to celebrate New Year’s Eve. DOY can be easily converted to photoperiod, it takes out an extra step in converting the month and day of the month to something more useful. And besides all the practical concerns from my own work, the third day of the year is always the third day of the year when you record your dates as DOY, and it will never be mistaken for March 1st.

So, now that I’ve sold you, how are you going to convert all your dates into DOY format? Well, in excel, here’s a handy dandy guide:

Extract year, day of year, and convert back

Extract year, day of year, and convert back

You can pull your day of year and year out of your dates in excel with relative ease. As you can see, the DATE() formula sees things as Year, Month, Day. To get DOY out of your date, take the serial number, and subtract the number associated with the year from your date, and the 0th of January for that year- basically, take today’s date’s number and subtract last New Year’s eve from it. To convert back, give the date formula your year, and then tell it it’s the 183rd day of January. Desired result: returned.

This is also a good way to check if the date you’ve entered is being understood by the computer.

Next time, we can talk about manipulating dates in R, which I recently figured out how to do, after much key mashing (turns out if you spell ‘month’ as ‘moth’ you get errors. Entomologist Freudian slip?)

* Note to non-scientists who are dating scientists: yes, there is a spreadsheet out there with your name on it.
** Has it really been that many days? It only feels like 39822.
*** Your sampling protocol is weird. Good luck with the reviewers on your manuscript

Advertisements

About cbahlai

Hi! I'm Christie and I'm an applied quantitative ecologist. I am an #otherpeoplesdata wrangler, stats enthusiast, and, of course, a bug counter. I cohabitate with five other vertebrates: one spouse, one first grader, one preschooler and two cats.
This entry was posted in Uncategorized and tagged , , , , , , , , , . Bookmark the permalink.

5 Responses to Dealing with dates as data in Excel

  1. Kara Woo says:

    If there’s one thing I’ve angrily muttered at my computer more than anything else it’s “Damnit, Excel, leave my *!&$#@! dates alone!”. I’d much rather force Excel to save my dates as text or put year, month, and day in separate columns. I trust Excel formulas about as far as I can throw them, and IMO the `lubridate` package’s `yday()` function is a much easier way to get Day of Year than setting up an Excel formula.

    • cbahlai says:

      Agreed. Weird #&^#@ happens when excel formats a date, and, well, given that in my field, the majority of data is entered into excel by summer students who don’t know it, it’s the perfect storm for loss of integrity. I’ll get to the nice stuff you can do with dates in R, I promise!

  2. emilyatlas says:

    I use dates all the time for bird migration analyses and realized that Mac and Windows versions of Excel have different defaults – now I just do separate columns for day, month, year and ordinal date (your DOY)… I also have an ordinal date cheat-sheet hanging by my desk for quick conversion, with leap years on one side and non-leap years on the other. It’s rather archaic but at least I know Excel won’t screw up my dates!

  3. Pingback: Two Strategies for Working with Dates in Excel » Data Ab Initio

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s