Open Resources and Open Standards

The Orbital project is about a lot more than just developing a cool bit of software. In fact, the majority of the project impact is to do with policy and training rather than development. However, we think there are some good practices in software development which apply equally to the development of documentation around policy and training. Specifically, revision control.

Throughout the day as we make changes to the source code which makes up Orbital Bridge we record significant states in the development against our revision control software (specifically Git). We can then rewind the state of the entire codebase to any one of these conditions, compare differences between the two, and even pick and choose specific changes to move between states on a line-by-line basis. We can create diverging versions to test new features in isolation and merge them together again with no fear of messing up the working version.

Given that we’re planning to release all of our RDM policy and documentation under an Open licence (specifically CC-BY) it made a lot of sense to use a platform for revision control which makes the most of the community and both allows and encourages people to view our stuff, take it, make changes and even propose changes back to us. Enter GitHub, the most popular source code sharing site in the world. GitHub provides us with a ready to go Git hosting platform, as well as a load of really easy to use tools to help us and other people make the most of our resources.

At the University of Lincoln we already use GitHub for Open Source software projects from both the Online Services Team and the LNCD development group, so it made sense to use it for our RDM documentation as well. The definitive copy of our RDM policy and training materials can now be viewed in the state it was at any given point in time, branched, merged and so-on — but there’s a problem with making documents the Old Fashioned Way that people in the University may be used to. Namely, using Microsoft Word to store a document will cause all kinds of problems for revision management in that Word doesn’t just keep the text, but a whole load of other stuff which is then compressed down into a single binary blob. Using Word would mean that although technically the main features of revision control (versions, branching etc) would work we’d lose some of the more elegant solutions to problems such as line-by-line comparisons of versions and merging of different branches.

A better solution was needed for writing documents, and we ended up with a shortlist of three potential plain-text markup standards. These are ways of marking up a plain text document (such as you’d write in Notepad) with semantic structure and styling so that we can take the document and re-render it in a number of different places. Our three contenders were LaTeX, Markdown and reStructuredText. All three have pros and cons, but have the same basic idea behind the scenes – plain text is surrounded with bits of other plain text that give it meaning. All three result in a document that is fundamentally human readable without the need for any proprietary software, and all three allow for the document to be re-rendered in a form appropriate for the audience.

LaTex is by far the most powerful of the three, having a background in typesetting complex scientific academic papers. It would allow for policy documents to be rendered for both the web and print, but has the downside of being the most complex to use and having a less user-friendly syntax. We want the policy to be as accessible as possible, without needing to understand what a set of tags means.

Markdown and reStructuredText both take a much simpler approach, and use almost identical syntax for most things. However, reStructuredText has a bundle of other markup which mades it better suited to long, structured documents with nested lists. reStructuredText would be ideal if we ever decided to convert the University’s Regulations to a plain text format, but for a simple document such as the RDM policy doesn’t really have any advantage over Markdown.

The tipping point for our decision then lay in the technical implementation of Markdown over reStructuredText. Fortunately this was an easy call, as reStructuredText is very tightly linked into the Python ecosystem whereas Bridge is built entirely in PHP. We could easily drop a PHP library to do Markdown rendering into Bridge, whereas reStructuredText would need additional work to call an external Python library to do the best job of rendering. Should we decide in the future that we need the extra capability of reStructuredText then the migration as far as the document is concerned is virtually non-existent.

You can view our current draft RDM policy in Markdown in our RDM repository on GitHub, as well as fork it and submit pull requests if you want to use it as a basis for your own or propose changes. We will be moving all our training presentations to use a Markdown based in-browser format in the near future.

“Managing Your Research Data” – training for postgrad students

As part of the JISC-funded Orbital project, we are starting to offer introductory training to (initially) postgraduate students, on how to look after their research data.

The first workshop is on 23 January 2013 at 10.00 in the Graduate School classroom, and there are further workshops every couple of weeks throughout 2013.

I’ll be arranging further workshops aimed more at staff in due course.

MANAGING YOUR RESEARCH DATA

The Graduate School – University of Lincoln Multiple dates throughout 2013

Research data management is an important part of the research process, and a vital part of academic practice. This one-hour workshop will include a presentation and discussion of what you should consider when creating, looking after, and sharing/publishing your research data.

The workshop will cover:

  • What do we mean by research data?
  • Policies affecting your data
  • Data Management Planning (DMP)
  • The research data lifecycle
  • Practical tools for looking after your data
  • Data publishing and citation
  • Where to go for help

Postgraduate students can book a place on a workshop, online at: http://uolresearchdata.eventbrite.co.uk/

Orbital Team meeting 13-12-12

Present

Joss
Melanie
Harry
Nick

Apologies

Annalisa
Paul

Previous Actions

  • JW to circulate draft documents for business case and SMT presentation to Orbital team. DONE
  • PS to talk to NJ and HN about ingestion of this content to Bridge. DONE
  • MB to send NJ/HN information on impact recording systems. DONE

Agenda

Policy and Business Case

Business case presentation to SMT has moved to Jan 14th
JW has distributed draft documents for SMT presentation to project team.
Documents & presentation aim to secure the groundwork for a research data management ‘road map’ over next 2 years from end of Orbital. Includes Research Services developer, supporting ePrints, Orbital, bibliometrics, RDM, etc. Also to raise awareness of Data Scientist position.
Action: MB/JW to contact Lisa Mooney regarding review of committee structure.

Training/Documentation

PS and JW met with Mike Neary at Graduate School, with agreement that Orbital would run training workshop with graduates on RDM to refine workshops and documentation. PS has blogged an outline of this training.
Action: PS/MB/JW/AJ to meet regarding training materials.
Joss spoke to Martin Donnelly at the DCC about RDM training and a branded version of DMPOnline. Will arrange a DCC workshop at Lincoln end to February.
ACTION: Joss to contact Martin about requirements for a branded version of DMPOnline.

Technical

Orbital Bridge is ‘Researcher Dashboard’. v0.2 released yesterday. Will collect metrics from ePrints, Scopus, Web of Science, Google Analytics, CKAN, etc.  Provides researcher with overview of their reserch profile and impact. Aggregates metrics for the institution.
ACTION: NJ to discuss bibliometrics with PS/HN
Still waiting for access to the AMS. NJ has met with ICT. Still issues around user permissions. John Bark will talk to Worktribe.
ACTION: NJ to organise conference call with Worktribe/ICT
Waiting to hear from DCC about DMPOnline APIs. HN has written an Orbital library for DMPOnline.
ICT Cloud Scoping Study includes Research Data Management requirement. Reports back May 2013.
Nucleus v2 (N2) is ready for production use. Will be a source of data for Researcher profile and store metrics, etc.
Open Stack not yet built. Will spend a day before Christmas looking at this. Joss is being interviewed by David Flanders (ANDS) for a podcast about academic uses of OpenStack.

Dissemination/Outreach/External

Joss is meeting with JISC and OKFN 14th December to discuss CKAN.
Carlos Silva (KAPTUR)  is visiting Lincon to discuss our use of CKAN in January
Paul has booked to attend the DCC conference in Amsterdam in January: Theme “What is a data scientist?”
Joss attended MRD Benefits and Impact event and discussed the Orbital project.

Budget

Joss is meeting with Jill Hubbard to get budget update.

Orbital deposit of dataset records to the Lincoln Repository: workflow

Further to yesterday’s blog post about linking our CKAN datastore with our EPrints Repository (to allow researchers to deposit permanent, public, citable records of their datasets), here’s a fleshed-out diagram of the proposed dataset deposit workflow process.

At the moment, this assumes a one-time “fire and forget” deposit. At some point, we’re going to have to tackle versioning.

The original diagram is available on Lucidchart. See the table in my previous blog post for details of which data fields are involved in the process (i.e. passed between CKAN, Orbital Bridge, the DataCite API, and EPrints).

This is a proposal and still has to be road-tested. Comments welcome.

Diagram of the dataset deposit process

Stages in the proposed deposit process:

  1. User enters project metadata in AMS
  2. AMS creates project container in CKAN
  3. User creates dataset record in CKAN
  4. Nucleus adds user metadata to CKAN
  5. User deposits data in CKAN
  6. User presses “DEPOSIT DATASET” button in CKAN
  7. Orbital Bridge requests DOI
  8. DataCite API returns DOI
  9. Orbital Bridge adds DOI to dataset record in CKAN
  10. User reviews and approves dataset metadata (making changes if necessary)
  11. Orbital Bridge writes changes back to dataset record in CKAN
  12. Orbital Bridge creates a new EPrints record via SWORD
  13. EPrints confirms existence of new record
  14. Orbital Bridge writes EPrints record URL back to CKAN dataset record

Orbital: AMS–CKAN–EPrints–DataCite

One important piece of work that we’re undertaking at the moment in Orbital is the facility to deposit the existence of a dataset, from CKAN and the University’s new Awards Management System (AMS), into our (EPrints) Repository via SWORD – at the same time requesting a DOI for the dataset via the DataCite API. The software at the centre of this operation is what we refer to as Orbital Bridge.

Here’s a diagram of how the various systems will need to link together.

Diagram of data flow between systems

The table below shows how fields may be mapped between systems. DataCite properties are taken from the DataCite MetaData Schema (v2.2). This is very much a work in progress! In particular, the red question marks (?) in the “CKAN field” column indicate fields that may not yet exist in the source system (CKAN). It’s in no particular order yet.

The following DataCite properties are optional, and we don’t intend to use them at the moment.

  • 3.1 – TitleType
  • 9 – Language
  • 12 – RelatedIdentifier
  • 12.1 – relatedIdentifierType
  • 12.2 – relationType
  • 13 – Size
  • 15 – Version