A JISC-funded Managing Research Data project

Posts tagged DAF

I’ve been quiet—too quiet—about the Orbital project recently. While I’ve not been blogging, Joss, Nick and Harry have overseen several fairly important developments:

As Orbital-the-product (coherent set of products, really) develops, my own focus between now and the end of the project (March 2013) will be on Orbital-the-servicetraining, support, documentation, and implementation of RDM policy at the University of Lincoln. I’ll work closely with the Research & Enterprise department on these aspects.

Four level hierarchy of documentationAs part of this strand of the project (which cuts across workpackages 7, 11, and 12), I want to consider the following:

  1. The current usability of ownCloud, CKAN, EPrints, etc. – what ‘sticking plaster’ help materials do we need to provide right now (if any?).
  2. How the production of documentation fits in to the software development release cycle (“change management“?) – particularly so in an agile/iterative environment, and how we ensure we meet our responsibility to ‘leave no feature undocumented’ as well as provide adequate contextual information on RDM. Related: I’m thinking about a four-level hierarchy of documentation (see right): how do the different levels relate to each other (how do we ensure internal consistency?), and how do we ensure all four levels are covered?
  3. [How] should we contribute to an (OKFN-co-ordinated) open research [data] handbook initiative (c.f. the Open Data Handbook; Data Journalism Handbook) instead of—or as well as—writing our own operational help guides? Contributing to and re-consuming community-written RDM materials will be more efficient than writing our own guidebook from scratch, but we need to make sure our local documentation is relevant to Lincoln.
  4. I’ve already started collated a list of other peoples’ RDM help materials (Joss has collected many more) – I’ll publish the list to this blog soon. I’ll be looking to see what we can re-use. There are some very good, openly-licensed training materials available, but I don’t want us to use them uncritically.
  5. How do we use our (still not-yet-accepted) RDM policy as a jumping-off point for training events?
  6. What did we learn from our recent(ish) Data Asset Framework exercise? How can we use researchers’ priorities as identified in the DAF to inform training? Should we re-run the exercise and/or follow it up with more detailed discussions?
  7. It possible/likely that we will shortly have a new member of staff to work with the Lincoln Repository and the University’s REF submission. What responsibility might that person have for RDM training and support?

Next I need to organise a meeting with the Research & Enterprise department to plan our ‘version 0.1′ training programme, possibly consisting of (i) a discussion of the issues raised in our DAF survey and people’s current RDM practice, (ii) a discussion of the RDM policy, and (iii) presentation of the various VRE tools available (CKAN, ownCloud, EPrints, DataCite, DMPOnline). We’ll probably pilot this on a group of willing PhD students in the School of Engineering.

The Orbital project team met today (24 May 2012) and agreed the following:

  • Documentation
  • User documentation will focus on the “why”s of Research Data Management, rather than being a point-and-click guide to the Orbital UI (which should not require detailed explanations).
  • JW will create a changelog (human readable text file) for each major release of Orbital, so that documentation for each feature is review if that feature is updated.
  • PS will lead on writing documentation (as HTML pages, stored in the GitHub repository), with documentation for release v0.N completed and available by the launch of v0.N+1
  • PS will email colleagues from the Library and Research/Enterprise for assistance on writing documentation.
  • Training
  • JW will invite Melanie Bullock and David Sheppard on to the Orbital working group. He is meeting Annalisa Jones to discuss RDM training for staff.
  • Releases/development
  • Orbital v0.1.1 (including bug fixes) met all of the initial ‘minimum viable product‘ requirements specified by Dr Tom Duckett, and also includes the basics of project administration.
  • v0.2 will include improvements to the file upload/management, project management, and license management interfaces, as well as clearer distinction between language files and operating code.
  • NJ demoed the current version of Orbital to Siemens staff. He now has access to Siemens machine data for testing within Orbital.
  • The group discussed the LNCD plans for internal servers/private cloud, and about the disk space requirements and costs.
  • Integration
  • The current version of the DMPOnline tool has been installed on a test server. The group discussed our approach to integration between external tools/software (such as DMPOnline, R, Gephi) and Orbital.
  • NJ is going to email Adrian Richardson at the DCC to ask when the DMPOnline APIs will become available.
  • RDM policy
  • JW presented the draft policy to the University RIEC committee. The committee have been asked to send comments to Joss. (One comment at the committee meeting was that our having a policy too geared around the requirements of the Research Councils may not be appropriate for Lincoln, which generates a lot of non-RC income. However it was noted that the good practice specified by the RCs is good practice for management of all research data, whatever the funding source.)
  • Conferences and meetings
  • The group discussed the recent DAF survey which we conducted at the University of Lincoln.
  • JW will convene a sub-group to consider the responses in detail, and plan follow-up interviews.
  • Business case
  • JW is currently gathering costs for long-term data storage. This will form the first strand of the Orbital business case, which will be presented to University SMT (along with the agreed RDM policy) in September 2012.

Over the last month, we’ve been asking academic staff to complete a local version of the DAF survey. Here are the anonymised results. For almost all questions, there was a comments box, which many staff used and proved very useful in understanding researchers’ specific issues.

Click on the image below to download a PDF summary of the survey, which 44 staff completed. This represents about 8% of staff on research or research/teaching contracts.

DAF summary
Click to download the full summary

One thought I have at the moment… 50% of staff are holding up to 100GB of data right now. Among all of the comments, the requirement for better storage is repeated again and again by our researchers. If we were to provide 100GB/person to all 500+ academic staff, that’s over 50TB of online storage required, plus backups. I’m told that the university currently has 24TB of online, centrally managed storage in use, with 13TB of this backed up offsite, and 1 TB ‘archived’. Clearly introducing research data into the mix will significantly increase the amount of storage that is being managed centrally.

It feels like there’s been quite a lot coming together recently. Here’s what we’re working on:

  • A draft RDM policy (WP7)
  • Our Implementation Plan (WP6), which includes our Literature Review (WP4), initial user requirements analysis (WP5), our technical evaluation (WP10), and assessment of data sources (WP9). All of this goes to the Steering Group tomorrow and all being well, will be posted on this project blog a day or two later.
  • A DAF-based survey (23 questions) of researchers’ data management practices and requirements. We’ve asked all Engineering academics to complete this via the Head of School; we’ve sent a request to all Research Directors in the university to encourage their academic colleagues to complete the survey, and have also advertised it to all staff on three occasions this week via the daily all staff alerts. So far, 28 people have completed it (about 5% of staff on academic contacts). We’ll continue to push this after the Easter break and publish a summary once we think we’ve exhausted our chances of staff filling it in on mass. Probably later in the month.
  • Harry Newton joined us as a second Developer, working with Nick Jackson. Harry graduated with an MSc in Computer Science from Lincoln last year. He’s been ‘bedding in’ this week and started working on adding ‘projects’ functionality to Orbital.
  • A release date for v0.1 of the Orbital software is May 1st. A couple of weeks ago, a Robotics researcher asked us if we could help him publish his datasets (20GB). We did so, offering him server space, guidance on his choice of license and a proxy URL to use for citation. It made us realise that there’s probably quite a few researchers like him that just need to get data on the web for citation purposes, so we thought we’d aim to have something permanently in place for the university by May 1st. Functionality for the v0.1 release will be: secure login, basic project creation and deletion, file upload, license picker, publish to permanent URL for citation. We think this is the bare minimum needed for a researcher to publish open research data so that it is permanently citable. From May 1st, we’ll maintain a working system on which to base discussions with users about additional functionality. For the time-being, researchers wishing to upload data will have to discuss it with us first.
  • We’re on the list of testers for the new DataCite API and have registered with ORCID, too, which has a mock API for testing against.
  • We’re helping organise the JISC MRD/DevCSI MRD Hackday on May 2-3rd, when we hope to be able to demo this work and talk about the implementation in detail. Fingers crossed.

This week sees the formal two-day launch event for the JISC Managing Research Data programme 2011–2013 (the programme which is funding Orbital). It’s being held in the National College for School Leadership, next to the University of Nottingham’s Jubilee Campus.

Unfortunately, after schlepping it from the furthest fringes of Lincolnshire (and then having to go back home for the evening), I was only able to attend a couple of hours of day 1. But it was worth it.

I arrived just in time for a workshop about a number of research data management tools developed/provided by the Digital Curation Centre (DCC). Dr Mansur Darlington, who’s acting as external assessor/consultant to the Orbital project, was also in this workshop and contributed greatly to the discussions. (My Orbital colleagues Joss Winn and Nick Jackson attended the [parallel] workshop on various JANET, Eduserv and UMF SaaS/cloud storage services.)

(more…)