One of the outcomes from the Orbital project that I’m part of is a set of new policies on the subject of research data management. Early on it was decided that this would – in the spirit of open research – be made available under an open licence along with the rest of our resources on the subject (such as training and support materials).
Being the technically minded folk that we are, we wanted to make sure that several of us could work on documentation at the same time without running the risk of overwiting each others changes. We also wanted a comprehensive versioning system to be in place from us putting the first words into the keyboard so that we could see every single change and who made it, something that we think is a big part of making a resource truly open. Finally, we also wanted a mechanism which could allow other people indirectly connected to the project to propose changes. Given our history of using similar systems to manage code there was an obvious choice – the Git source control system.
Git is a system which primarily relies on tracking line-by-line changes, meaning that when we wrote stuff we’d want to use a file format which behaved on a line-by-line basis. This made compiled binary formats such as Microsoft Word or even PDF a bit unsuitable, since a small change could result in a huge set of changes spanning hundreds (or more) of lines. We also wanted to use an open standard which didn’t have prohibitive licence restrictions and which was simple enough to be read and understood by anybody with a basic text editor. There are quite a few standards out there which meet this requirement, but again based on past experience we’re using Markdown for our RDM Policy.
Finally, inspired in no small way by the efforts of the Bundestag to convert their entire body of law to Git we wanted to store policy on a platform which not just allowed community involvement, but which positively encouraged it. GitHub is the world’s largest repository of open development, covering every language under the sun and projects ranging from hardcore low level programming through writing documentation through to communal story writing. Even better, they provide free hosting space for open projects. We already had a University of Lincoln user kicking around from past work, so it was a logical place to stick our Git repository. If you’re interested you can take a look at what we’ve got.
What’s interesting about using open text-based standards to write policy, Git for managing revisions and GitHub as a storage provider is that we’ve inadvertently made it very easy for people to do things that they couldn’t do before.
Where previously the process of creating policy was a bit mysterious, the entire world can now see not only the published version of the document – and by digging through the archives previously published versions – but every single change made in the history of the policy and who made it. Every change from the beginning of the document is trackable on a line-by-line basis, and using Git it’s even possible to see who is responsible for any single line of content. This gives us the immediate benefit of accountability, something which is great for finding out exactly who wrote a bit of a document if it needs clarification.
Another huge benefit of doing things this way is in the maintenance of different versions of a document. We’re not longer restricted by having one ‘published’ copy, but because of the ability of Git repositories to create new branches of documents we can do things like create a draft branch, or have a branch purely with spelling fixes. We can tag our ‘definitive’ versions which people should refer to whilst simultaneously working on the next edition or submitting changes to our ‘working’ version. To help illustrate quite how useful this is, I’ve copied our ICT Acceptable Use Policy to GitHub and reformatted it using an open standard (reStructuredText to be specific, more on this later). You can take a look at the tags which show distinct historical versions, or look at the different branches of the policy. The “draft” branch contains some changes I’ve proposed for the next version (visible for the world to see, but not yet in our “master” copy), and you can even see exactly what I’ve changed or do a line by line comparison of the latest “master” and “draft” copies.
Where Git and GitHub really come to the fore however is their encouragement of open collaboration. Although there is the official copy of the ICT policies, since it’s an open repository anybody in the world can make their own copy (such as this one I made earlier) and start making changes. The official repository remains untouched by this (no point in letting the whole world change it willy-nilly), but through an awesome feature of Git known as pull requests it’s possible for people to propose changes back to the official copy, and through the awesome of GitHub these can be discussed before being either accepted or rejected. Suddenly you’re not just sharing the policy openly, but actively allowing the entire world to suggest changes to it. To illustrate this I’ve made some changes in my own version which I’m proposing. Again, you can see each individual set of changes I’m proposing and make a line-by-line comparison. Pull requests can even be used internally within the same repository, which can be used to add mechanisms such as approval and signing off of a large set of changes (such as might be made when a document is published as a new version).
This all neatly leads on to the subject of publishing – when a document such as a policy is published it’s usually disseminated through a number of channels, most commonly print and digital. The University of Lincoln has a mixed history when it comes to publishing digitally, and more than once I’ve had to go through unnecessary amounts of trouble to read something simply because it’s been unnecessarily released as a Word 2010 document with a bunch of weird macros rather than a simple PDF. Fortunately, since we’re using standards such as Markdown or reStructuredText (much the same as Markdown, but with more power for more complex documents), publishing becomes really simple. Through freely available tools such as pandoc we can take our source document in our clean, open, text-only format and quickly get it formatted as HTML for presentation on the web, PDF for digital archiving and print, an eBook format for those who want to dig through it on their iPad and even as a Word document for those who feel an inexplicable need to read things in Word.
Hopefully the University will start to see more benefit in doing things this way – I’m going to be chatting to ICT to see if they want to consider using this method as their new definitive way of maintaining the AUP, and then hopefully I can bring it up the people in Registry and Secretariat who are responsible for the University Regulations. In the meantime, please feel free to go wild with our RDM repository.