A JISC-funded Managing Research Data project

Posts tagged DataCite

Further to yesterday’s blog post about linking our CKAN datastore with our EPrints Repository (to allow researchers to deposit permanent, public, citable records of their datasets), here’s a fleshed-out diagram of the proposed dataset deposit workflow process.

At the moment, this assumes a one-time “fire and forget” deposit. At some point, we’re going to have to tackle versioning.

The original diagram is available on Lucidchart. See the table in my previous blog post for details of which data fields are involved in the process (i.e. passed between CKAN, Orbital Bridge, the DataCite API, and EPrints).

This is a proposal and still has to be road-tested. Comments welcome.

Diagram of the dataset deposit process

Stages in the proposed deposit process:

  1. User enters project metadata in AMS
  2. AMS creates project container in CKAN
  3. User creates dataset record in CKAN
  4. Nucleus adds user metadata to CKAN
  5. User deposits data in CKAN
  6. User presses “DEPOSIT DATASET” button in CKAN
  7. Orbital Bridge requests DOI
  8. DataCite API returns DOI
  9. Orbital Bridge adds DOI to dataset record in CKAN
  10. User reviews and approves dataset metadata (making changes if necessary)
  11. Orbital Bridge writes changes back to dataset record in CKAN
  12. Orbital Bridge creates a new EPrints record via SWORD
  13. EPrints confirms existence of new record
  14. Orbital Bridge writes EPrints record URL back to CKAN dataset record

I’ve been quiet—too quiet—about the Orbital project recently. While I’ve not been blogging, Joss, Nick and Harry have overseen several fairly important developments:

As Orbital-the-product (coherent set of products, really) develops, my own focus between now and the end of the project (March 2013) will be on Orbital-the-servicetraining, support, documentation, and implementation of RDM policy at the University of Lincoln. I’ll work closely with the Research & Enterprise department on these aspects.

Four level hierarchy of documentationAs part of this strand of the project (which cuts across workpackages 7, 11, and 12), I want to consider the following:

  1. The current usability of ownCloud, CKAN, EPrints, etc. – what ‘sticking plaster’ help materials do we need to provide right now (if any?).
  2. How the production of documentation fits in to the software development release cycle (“change management“?) – particularly so in an agile/iterative environment, and how we ensure we meet our responsibility to ‘leave no feature undocumented’ as well as provide adequate contextual information on RDM. Related: I’m thinking about a four-level hierarchy of documentation (see right): how do the different levels relate to each other (how do we ensure internal consistency?), and how do we ensure all four levels are covered?
  3. [How] should we contribute to an (OKFN-co-ordinated) open research [data] handbook initiative (c.f. the Open Data Handbook; Data Journalism Handbook) instead of—or as well as—writing our own operational help guides? Contributing to and re-consuming community-written RDM materials will be more efficient than writing our own guidebook from scratch, but we need to make sure our local documentation is relevant to Lincoln.
  4. I’ve already started collated a list of other peoples’ RDM help materials (Joss has collected many more) – I’ll publish the list to this blog soon. I’ll be looking to see what we can re-use. There are some very good, openly-licensed training materials available, but I don’t want us to use them uncritically.
  5. How do we use our (still not-yet-accepted) RDM policy as a jumping-off point for training events?
  6. What did we learn from our recent(ish) Data Asset Framework exercise? How can we use researchers’ priorities as identified in the DAF to inform training? Should we re-run the exercise and/or follow it up with more detailed discussions?
  7. It possible/likely that we will shortly have a new member of staff to work with the Lincoln Repository and the University’s REF submission. What responsibility might that person have for RDM training and support?

Next I need to organise a meeting with the Research & Enterprise department to plan our ‘version 0.1′ training programme, possibly consisting of (i) a discussion of the issues raised in our DAF survey and people’s current RDM practice, (ii) a discussion of the RDM policy, and (iii) presentation of the various VRE tools available (CKAN, ownCloud, EPrints, DataCite, DMPOnline). We’ll probably pilot this on a group of willing PhD students in the School of Engineering.

Agenda & notes

  1. Application update: Orbital v0.3 will be release today. New dynamic dataset features. Import/export/query data. Already auto importing Siemens’ sensor data for analysis. Integration with MatLab will be presented to the Siemens/Lincoln research group on August 1st. We have set up ownCloud as an alternative to Dropbox and integrated with it with Lincoln SSO. Working on full integration with Orbital for Orbital v0.4 release next month. Nick, Harry and other members of LNCD will take third/fourth weeks in August to set up and build OpenStack private cloud for R&D/Academic Computing. Following that, Orbital servers on Rackspace will be moved in-house to OpenStack. Initially starting with three servers (26 cores, 140GB RAM) and 30TB storage. We intend to provide a Continuous Integration (CI) environment (Gitorious/Jenkins) for staff and student research, as well as  support LNCD R&D.
  2. Edinburgh/OR12:  Presented a paper at OR12, which was well received. Came away thinking that RDM has ‘arrived’ and that platforms for RDM need to be more integrated into research process than research output repositories have been. Orbital is well placed for this e.g. ‘dynamic datasets’. Discussed systems with OKFN/CKAN representative at OR12 and have invited OKFN to Lincoln to discuss CKAN and Orbital. Initial Skype meeting Monday 30th July. We have asked OKFN to challenge/persuade us as to why we should adopt CKAN rather than continue to develop Orbital. Need to think about the pros/cons of eight more months of Orbital development vs. eight months contributing to CKAN development. May lead to Lincoln extending CKAN for academic RDM, re-using bits of Orbital. Still not clear. Nick has also contacted Patrick at Soton about his winning proposal for DevCSI developer challenge. Patrick’s proposal is very similar to Orbital and both Nick and Patrick are keen to work on dynamic data and visualisations together with others.
  3. RDM Policy: Joss and Annalisa met with Andrew S, Head of Research & Enterprise Office. Reviewed draft RDM Policy. Will make minor amendments but on the whole agreement on way forward.
  4. Training: Joss, Annalisa, Melanie, Bev, Paul to meet early September and bring together RDM training materials produced by other projects. Evaluate, synthesise, extend and re-package for Lincoln. Agreed to arrange RDM training workshops via HR for staff every three weeks from late September to test and inform the development of this project deliverable. Annalisa/Melanie to arrange meeting and workshops.
  5. Metadata: Discussion about adopting BL’s minimum metadata requirements for DataCite. Agreement that the mandatory and optional attributes should be part of Orbital. Need to talk to/confirm with Bev about this and add relevant tasks to Pivotal Tracker. CKAN (see above) meeting relevant to this. If CKAN is used to publish datasets, need to ensure it meets this requirement.
  6. Research Information/Systems integration: Research Information Management (RIM) at the university is dependent on three systems: new Awards Management System (AMS) (Research and Enterprise Office), Orbital, and EPrints (Library). Need to contact Worktribe about API access to AMS for Orbital. EPrints work planned to add the ‘REF plugin’ will give us better data. SWORD2 deposit from Orbital planned. Again, need to consider in light of CKAN conversations. Other work going on in university to build business intelligence dashboards. Lee to arrange meeting with Registry DBAs and Orbital team to discuss data warehousing and dashboards. Lots of overlap in interest/experience/skills. Not enough talking.
  7. Business Case: Joss still waiting on storage modelling costs from ICT to present to SMT in September. Will start process of writing Business Case following that meeting.

 

It feels like there’s been quite a lot coming together recently. Here’s what we’re working on:

  • A draft RDM policy (WP7)
  • Our Implementation Plan (WP6), which includes our Literature Review (WP4), initial user requirements analysis (WP5), our technical evaluation (WP10), and assessment of data sources (WP9). All of this goes to the Steering Group tomorrow and all being well, will be posted on this project blog a day or two later.
  • A DAF-based survey (23 questions) of researchers’ data management practices and requirements. We’ve asked all Engineering academics to complete this via the Head of School; we’ve sent a request to all Research Directors in the university to encourage their academic colleagues to complete the survey, and have also advertised it to all staff on three occasions this week via the daily all staff alerts. So far, 28 people have completed it (about 5% of staff on academic contacts). We’ll continue to push this after the Easter break and publish a summary once we think we’ve exhausted our chances of staff filling it in on mass. Probably later in the month.
  • Harry Newton joined us as a second Developer, working with Nick Jackson. Harry graduated with an MSc in Computer Science from Lincoln last year. He’s been ‘bedding in’ this week and started working on adding ‘projects’ functionality to Orbital.
  • A release date for v0.1 of the Orbital software is May 1st. A couple of weeks ago, a Robotics researcher asked us if we could help him publish his datasets (20GB). We did so, offering him server space, guidance on his choice of license and a proxy URL to use for citation. It made us realise that there’s probably quite a few researchers like him that just need to get data on the web for citation purposes, so we thought we’d aim to have something permanently in place for the university by May 1st. Functionality for the v0.1 release will be: secure login, basic project creation and deletion, file upload, license picker, publish to permanent URL for citation. We think this is the bare minimum needed for a researcher to publish open research data so that it is permanently citable. From May 1st, we’ll maintain a working system on which to base discussions with users about additional functionality. For the time-being, researchers wishing to upload data will have to discuss it with us first.
  • We’re on the list of testers for the new DataCite API and have registered with ORCID, too, which has a mock API for testing against.
  • We’re helping organise the JISC MRD/DevCSI MRD Hackday on May 2-3rd, when we hope to be able to demo this work and talk about the implementation in detail. Fingers crossed.