Research Data vs Research Data

As I’ve been looking closer at various requirements for Orbital, as well as other research data management projects, it’s becoming increasingly apparent that Orbital has taken a different tack when it comes to defining what research data actually is. Whilst not a problem, it does lead to a certain disconnect when talking to people with a different idea about what data means. When it comes to storing data the disconnect is even bigger, caused by people experiencing problems breaking the transit format of the data away from the data itself. In true engineering/computing style, it’s time for an analogy. I’m using sweets because hey, sweets are awesome.

Delicious Coloured Candy — Sugar! Sugar!

Imagine a tube of Smarties (or sugar-coated chocolate beans of choice). When I talk about research data I’m talking about the individual smarties, the individual nuggets of information. You could tip 100 tubes of smarties into a bowl and you’d just end up with a big pile of smarties. You could then go through and sort the smarties by colour, or perform some other type of organisation. Since you’ve got the individual smarties out of their containers it’s a lot easier to see a whole overview and work with them all at once.

Taking this approach makes sense to me, because if I want to throw in a couple of bags of Peanut M&Ms I can do without suddenly having a tube saying “Smarties” which contains nuts. I can still sort my pile of sweets into colours, or into types. I can orient them by the little letters on top. I could throw in a handful of jelly beans and a bar of chocolate broken into squares, and then order by sugar content, colour, and number of artificial flavours. The possibilities are quite literally limited only by my tolerance for sugar highs.

Increasingly what I’m perceiving from the other side of the MRD fence is the notion that research data is the tubes of Smarties and the bags of M&Ms. There is a desire for a system which can tell us what’s inside the tubes and keep the bags filed away for eternity, but never actually bother taking out the individual sweets and describing them. If a researcher wants to find every blue sweet they still have to go get all the individual bags, boxes and wrappers and hunt through them checking each sweet individually. Yes, the system has made it a lot easier to find the containers which contain sweets (as opposed to containing coffee beans) but still lacks the understanding of the individual bits. It’s good for storing your stash of chocolate in readiness for the apocalypse, but not so good if you’re trying to find a different colour every day.

So, back to research data. Where do Smarties fit in? Imagine that an individual Smartie is a row in your spreadsheet of research data, and that the tube is the Excel spreadsheet. Many other projects are putting a lot of time and effort into working out how to keep the entire spreadsheet and all that it contains safe, retrievable and well described so that it can be found again in the future – a goal which is amazingly desirable for all kinds of reasons. What Orbital is trying to do is discard the outer wrapper and instead focus on keeping the individual bits safe, retrievable and well described so that they’re ready to be manipulated at a moment’s notice halfway through a research project.

Obviously both bits are really, really important. At the end of a project nobody wants to look at a gigantic pile of random sweets when all they’re interested in are those coloured blue, which is why Orbital is going to include some really nice output tools to make sure that research data can be nicely bundled up, described and filed away.