Orbital deposit of dataset records to the Lincoln Repository: workflow

Further to yesterday’s blog post about linking our CKAN datastore with our EPrints Repository (to allow researchers to deposit permanent, public, citable records of their datasets), here’s a fleshed-out diagram of the proposed dataset deposit workflow process.

At the moment, this assumes a one-time “fire and forget” deposit. At some point, we’re going to have to tackle versioning.

The original diagram is available on Lucidchart. See the table in my previous blog post for details of which data fields are involved in the process (i.e. passed between CKAN, Orbital Bridge, the DataCite API, and EPrints).

This is a proposal and still has to be road-tested. Comments welcome.

Diagram of the dataset deposit process

Stages in the proposed deposit process:

  1. User enters project metadata in AMS
  2. AMS creates project container in CKAN
  3. User creates dataset record in CKAN
  4. Nucleus adds user metadata to CKAN
  5. User deposits data in CKAN
  6. User presses “DEPOSIT DATASET” button in CKAN
  7. Orbital Bridge requests DOI
  8. DataCite API returns DOI
  9. Orbital Bridge adds DOI to dataset record in CKAN
  10. User reviews and approves dataset metadata (making changes if necessary)
  11. Orbital Bridge writes changes back to dataset record in CKAN
  12. Orbital Bridge creates a new EPrints record via SWORD
  13. EPrints confirms existence of new record
  14. Orbital Bridge writes EPrints record URL back to CKAN dataset record

Orbital: AMS–CKAN–EPrints–DataCite

One important piece of work that we’re undertaking at the moment in Orbital is the facility to deposit the existence of a dataset, from CKAN and the University’s new Awards Management System (AMS), into our (EPrints) Repository via SWORD – at the same time requesting a DOI for the dataset via the DataCite API. The software at the centre of this operation is what we refer to as Orbital Bridge.

Here’s a diagram of how the various systems will need to link together.

Diagram of data flow between systems

The table below shows how fields may be mapped between systems. DataCite properties are taken from the DataCite MetaData Schema (v2.2). This is very much a work in progress! In particular, the red question marks (?) in the “CKAN field” column indicate fields that may not yet exist in the source system (CKAN). It’s in no particular order yet.

The following DataCite properties are optional, and we don’t intend to use them at the moment.

  • 3.1 – TitleType
  • 9 – Language
  • 12 – RelatedIdentifier
  • 12.1 – relatedIdentifierType
  • 12.2 – relationType
  • 13 – Size
  • 15 – Version

 

Research data training at the University of Lincoln

As part of the Orbital project to build a pilot Research Data Management (RDM) infrastructure at the University of Lincoln, I’m looking particularly at support, training and documentation.

We aim to start offering—early in 2013—an introductory 1-hour workshop on managing your research data, aimed at early-career researchers and postgraduate research students. In particular, we want to promote this training through three avenues:

  1. As part of the Lincoln Graduate School‘s standard timetable of postgrad training;
  2. Directly, to PhD students in the School of Engineering (our pilot group);
  3. To researchers who completed our Data Asset Framework questionnaire.

The training will be supported by documentation (written and maintained through WordPress and a dedicated RDM reading list), presented through the main Orbital “bridge” site, which we’re starting to treat as a VRE.

Here’s an outline of the initial workshop. I’m meeting the Graduate School this afternoon to agree this.

“Managing your research data”

  1. Definitions, terminology and scope (what do we mean by research data?)
  2. Policies and laws affecting your data
  3. The “research data lifecycle
  4. Data Management Planning (DMP)
  5. Practical tools for looking after your data
  6. Data publishing and citation
  7. Where to go for further help and support

Comments welcome!

Orbital training and documentation

I’ve been quiet—too quiet—about the Orbital project recently. While I’ve not been blogging, Joss, Nick and Harry have overseen several fairly important developments:

As Orbital-the-product (coherent set of products, really) develops, my own focus between now and the end of the project (March 2013) will be on Orbital-the-servicetraining, support, documentation, and implementation of RDM policy at the University of Lincoln. I’ll work closely with the Research & Enterprise department on these aspects.

Four level hierarchy of documentationAs part of this strand of the project (which cuts across workpackages 7, 11, and 12), I want to consider the following:

  1. The current usability of ownCloud, CKAN, EPrints, etc. – what ‘sticking plaster’ help materials do we need to provide right now (if any?).
  2. How the production of documentation fits in to the software development release cycle (“change management“?) – particularly so in an agile/iterative environment, and how we ensure we meet our responsibility to ‘leave no feature undocumented’ as well as provide adequate contextual information on RDM. Related: I’m thinking about a four-level hierarchy of documentation (see right): how do the different levels relate to each other (how do we ensure internal consistency?), and how do we ensure all four levels are covered?
  3. [How] should we contribute to an (OKFN-co-ordinated) open research [data] handbook initiative (c.f. the Open Data Handbook; Data Journalism Handbook) instead of—or as well as—writing our own operational help guides? Contributing to and re-consuming community-written RDM materials will be more efficient than writing our own guidebook from scratch, but we need to make sure our local documentation is relevant to Lincoln.
  4. I’ve already started collated a list of other peoples’ RDM help materials (Joss has collected many more) – I’ll publish the list to this blog soon. I’ll be looking to see what we can re-use. There are some very good, openly-licensed training materials available, but I don’t want us to use them uncritically.
  5. How do we use our (still not-yet-accepted) RDM policy as a jumping-off point for training events?
  6. What did we learn from our recent(ish) Data Asset Framework exercise? How can we use researchers’ priorities as identified in the DAF to inform training? Should we re-run the exercise and/or follow it up with more detailed discussions?
  7. It possible/likely that we will shortly have a new member of staff to work with the Lincoln Repository and the University’s REF submission. What responsibility might that person have for RDM training and support?

Next I need to organise a meeting with the Research & Enterprise department to plan our ‘version 0.1’ training programme, possibly consisting of (i) a discussion of the issues raised in our DAF survey and people’s current RDM practice, (ii) a discussion of the RDM policy, and (iii) presentation of the various VRE tools available (CKAN, ownCloud, EPrints, DataCite, DMPOnline). We’ll probably pilot this on a group of willing PhD students in the School of Engineering.

Notes on Orbital v0.2.1

A few notes on some of the new features in the latest version of Orbital: these were presented to Dr Bingo Wing-Kuen Ling on 15 June 2012.

  1. ‘Your Projects’ now includes an Activity Timeline of comments and file changes aggregated across all projects in Orbital; each project page also displays a timeline for that project.
    Screenshot of the Orbital timeline
  2. Files from the File Archives can be organised using Collections (which are ‘tag-like’ rather than ‘folder-like’: i.e. a file can belong to more than one Collection).
    Screenshot of Orbital project
  3. You can now edit project information and add new members to a Project. To do this, go to the Project within Orbital, click on the ‘edit’ button, and scroll down to Project Members.
    Screenshot of the Orbital project page
    Screenshot of the Orbital add members section
  4. Finally, a bug which was preventing the upload of files using Internet Explorer has now been fixed.