So, what am I going to do with this specimen list? (hint: Reshape2)

All right, I’m finally back from all my travelling. I’ve seen many a strange thing in my travels.

I  found this in Louisiana.

I found this in Louisiana.


But now, back to reality.* It’s time for some data manipulation.

Where we left off, we had a nice, clean specimen list, with information on the site where the specimen was collected, date, plant community at the site at the time of collection, as well as information about the specimen itself (species identification, taxonomic family, nesting habitat, sociality, and a variable describing membership to various pollen deposition groupings)**

So, what I’ve got is a perfectly serviceable specimen list with a bunch of supplemental information about each of the specimens. But I’m an ecologist. I’m interested in the bee communities by habitat- namely, how the habitat affects the bee community as a whole, and how it affects different functional groups (ie nesting guilds, sociality structures), and how habitat affects the community’s ability to pollinate. In order to do these analyses, I need observations on a per-habitat basis, not a per-specimen basis.

Friends, it’s time to get our reshape2 on. ***

First, we need to get my species list into R, and into an object called bee.specimens. Then, let’s see what we have:

> colnames(bee.specimens)
[1] "State" "Unique.ID" "Year" "DOY" "Family" "Genus" "Species" "Taxon"
[9] "Group" "Site.ID" "Sample" "Treatment" "Trap.height" "Nest.biology" "Sociality"

For your reference, “DOY” refers to ‘day of year’ and “Taxon” is a concatenation of the most precise taxonomic identification of the specimen possible, usually in the form Genus.species, and “Group” is the pollen deposition group of that species. “Unique.ID” refers to the tag number that specimen has in our cupboard, and “Site.ID” is our unique site name (with details of location, landscape parameters, etc, describing that site elsewhere)

As I said before, I want to know how bee diversity, abundance, pollen deposition, nesting guild, etc, varies across site and treatment. Since I’m not really interested in within season variation, and all sites were sampled the same number of times within a year, I want to reshape my data so I can compute these values on a by site, by year basis.

Reshaping data relies on two key commands- melt and cast. You ‘melt’ the data to make it fluid, and then you ‘cast’ it into the shape you want it to take. In reshape2, ‘cast’ takes the form ‘acast’ and ‘dcast’, where the former outputs to an array or vector, and the later outputs to a data frame. For this application, we want a data frame output, so we’ll use dcast


# load the reshape package
library(reshape2)
#melt the data so that R understands which headers are needed for moving the data around
specimen.list<-melt(bee.specimens, id=1:17, na.rm=TRUE)

And now the fun part begins. First, let’s ask for a species list, by year, with number of observations of each species:
#see how many species we have
species.list<-dcast(specimen.list, Taxon~Year, length)

And we get something that looks like this:

Number of each species captured by year

Number of each species captured by year

So a bit about the code- ‘length’ is the default for ‘fun.aggregate’- that is, the aggregation function, and what ‘length’ does is simply count the number of observations of each taxon, in each year. In this dataset, one row equals one specimen, but in many cases, you’d have a ‘count’ variable for the number of individuals of a given species observed at a time- that you could do multiple operations on- say you wanted the mean value of rows meeting your criteria, standard deviation, etc…those functions are all possible.

Now that we know we’ve got our data flowing freely, let’s focus on casting it in some useful ways. We need to create a matrix of bee species by year and site so that we can calculate rarefied richness, diversity, and so on. That goes a little something like this:
#cast a cross-tab by taxon
bee.matrix<-dcast(specimen.list, Year+Site.ID~Taxon, length)

Well, that was easy:

Matrix of species by year and site

Matrix of species by year and site

We can follow through and reshape the data to get values by other classifications to- ie groupings, nest biology etc. I’m going to go do this now. Next time, we can talk about re-aggregating these data into a single matrix.****

* For those of you whose reality includes writing a blog about data management. Although, honestly, if I don’t keep busy, I will climb the walls waiting to hear from the places I’ve interviewed.
** Note that data such as family, nesting habits, etc, could have been better handled in a connected database table, and not repeated over and over again in this original data sheet- because
Bombus impatiens is always a bumble bee, and are always hive nesters- but, well, #otherpeoplesdata.
*** An excellent guidebook for the reshape package is available here. It’s not for Reshape2 but I found this an incredibly useful resource to teach me the basic concepts of reshaping data
**** And for those of you who can’t wait to get out of R, I might even show you how to export the data. Or maybe I won’t. I haven’t decided.

Advertisements

About cbahlai

Hi! I'm Christie and I'm an applied ecologist and postdoc in the midwestern US. I am an #otherpeoplesdata wrangler, stats enthusiast, and, of course, a bug counter. I cohabitate with five other vertebrates: one spouse, one preschooler, one teeny baby and two cats.
This entry was posted in Uncategorized and tagged , , , , , , , , , . Bookmark the permalink.

2 Responses to So, what am I going to do with this specimen list? (hint: Reshape2)

  1. Pingback: Git and the n00b | Practical Data Management for Bug Counters

  2. Thank you for writing this, I spent a couple of hours looking for an elegant solution to generating a species by site matrix. Melt FTW!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s