A JISC-funded Managing Research Data project

Posts tagged Paul Stainthorp

Introduction

The Orbital Implementation Plan (WP6) is intended to be a synthesis of our initial user requirements gathering (WP5), an assessment of Engineering research data (WP9), an evaluation of standards and technologies (WP10), informed by a literature review of previous work relevant to the Research Data Management (RDM) domain as it relates the discipline of Engineering (WP4).

Therefore, appended to this Implementation Plan is: i) a Technical Specification based on user requirements; ii) a Literature Review; iii) a summary of an institution-wide survey based on the Data Asset Framework; iv) and a draft Research Data Management Policy for the University of Lincoln (WP7), which is currently under-going internal review.

The Implementation Plan has been written at exactly a third of the way into the Orbital project (six months), allowing for a further year of development based on the work brought together in this document. It is worth repeating the objectives of the project, as stated in the Project Plan:

We intend to build on our previous work around the deposit, management and access to university research as well as further existing work in which we are building a platform for data-driven services at the university.

Throughout this undertaking, we aim to improve our understanding of the issues around research data management; develop the requisite skills among the university community to better manage research data; re-use and develop some of the underlying tools we have built to provide an institution-wide service for the ingest, description, preservation and dissemination of research data; improve the way we work on such projects, refining our use of agile methods; build capacity for the local development of academic technologies at the university; develop and implement appropriate institutional policy for the deposit, management and sharing of research data; and develop a Business Plan for the university for the long-term sustainability of our research data.

Our work to-date has pursued many of these objectives closely, reflecting continued effort over the last six months, both inside and outside the project, to build on previous work by using institutional data to drive application development; to improve our methods of access and identity management; and develop an environment that fosters and supports in-house innovation.

This planning document is primarily intended to support the technical implementation of the Orbital application to manage research data at the University of Lincoln. What it does not address is the training to support the use of the application (WP11), nor the Business Case for sustaining the pilot service (WP13), which we are implementing. However, some preliminary work is underway to consider appropriate business models for sustaining Orbital as open source software and we believe that the technical decisions laid out in this Implementation Plan will support the development of a sustainable Business Case for Orbital. This area of work continues and the outcomes are due to be delivered towards the end of the project.

What follows is a brief summary of the appended Technical Specification and Literature Review. I would like to thank Nick Jackson and Paul Stainthorp for their work on these documents, which have brought clarity to the Orbital project and contributed to a much better understanding of RDM at the University of Lincoln.

Joss Winn, Orbital Project Manager, 2nd April 2012.

Literature Review

The management of research data is recognised as one of the most pressing challenges facing the higher education and research sectors. Research data generated by publicly-funded research is seen as a public good and should be available for verification and re-use. In recognition of this principle, all UK Research Councils require their grant holders to manage and retain their research data for re-use, unless there are specific and valid reasons not to do so. (JISC Managing Research Data Programme 2011-13).

To gain a clearer understanding of the more complex and unfamiliar concepts in the emerging discipline of Research Data Management, the Orbital project conducted a review of published literature on the subject (mainly web sites, project reports and guidance documents), with particular reference to RDM in the discipline of engineering.

An online Research Data Management bibliography is being maintained at: http://lncn.eu/bcf6

The project team identified the following nine themes in the literature – for each theme, a recommendation is made which will support the development of RDM infrastructure at the University of Lincoln.

1. Fundamentals of research data and RDM

Researchers are not a homogeneous group, and their data needs are changing as the research landscape becomes more complex. Recommendation: the Orbital project continue its work to assess the storage and other requirements of Lincoln researchers using surveys and interviews.

2. Particular requirements of the discipline of engineering

The ERIM (Engineering Research Information Management) project at the University of Bath has specified the first ever set of RDM principles and terminology designed specifically for engineers. Recommendation: the Orbital continue to work with the Bath team on implementing ERIM’s findings.

3. The behaviour of researchers

What motivates researchers to invest in RDM is not the same as what motivates their institutions. Recommendation: Orbital to use surveys and interviews to understand researchers’ requirements and develop appropriate advocacy materials.

4. RDM policies and legal aspects

All UK Research Councils are introducing mandates for data curation, and in some cases data publication. Recommendation: the Orbital team to support the University’s response to the imminently required EPSRC data policy roadmap and to help develop institutional policies.

5. Data sharing

Research data are at their most useful when they are interoperable with other data. Sharing data leads to a range of real and measurable benefits, and researchers’ interests are protected by a principle of ‘proprietary period’ of privileged access. Recommendation: Orbital work with Research & Enterprise to formulate clear policies on data sharing and licensing.

6. Costs and benefits

The most significant RDM costs for the institution occur at the data acquisition/ingest stage. Institutions that invest in RDM can expect significant benefits including new, unforeseen research activities made possible through the re-use and aggregation of data. Recommendation: Orbital provide guidance to researchers on ensuring RDM is costed into future research funding bids.

7. Curation standards, metadata and citation

Without a system for assigning citations to research data, further curation and sharing is impossible. Recommendation: Orbital incorporate the functionality of DataCite to allow Lincoln researchers to secure a DOI (Digital Object Identifier) for their data objects.

8. Technical considerations

The range of file formats involved in engineering research is a significant area of complexity. Recommendation: Orbital continue to work with Siemens, the School of Engineering, the University of Bath and the DCC to develop expertise in handling engineering data formats.

9. Tools, support and training

A range of immediately re-purposable RDM training kits and planning tools already exists. Recommendation: Orbital review the available material, and use them to design a RDM training programme for the University of Lincoln – also incorporate Data Management Planning (DMP) tools within the Orbital application.

In light of this, the initial objectives of the Orbital project were on the mark, but indicate a broad area of institutional responsibility that goes beyond scholarly communication to affect strategic areas such as recruitment and training, business intelligence and continuity, IP and income generation, as well as future curriculum design and our corresponding investment in infrastructure and estate. No small task.

Technical Approach

Our Project Plan outlined the technical approach that we originally anticipated and six months later this has not fundamentally changed. As detailed in the Technical Specification, we remain convinced of the benefits of pursuing a data-driven, API-centric model of development, using storage and access control methods that support the creation of a modular and scaleable web application that is attractive to both Users and Developers.

As we have learned from our requirements gathering and literature review, Research practices both within and across subject disciplines are varied, suggesting that over the next 12 months, the Orbital project should concentrate on developing an application that remains open and attractive to further development, rather than seeking to design a single workflow for all users’ needs – an impossible task.

We believe this approach best supports the sustained development of Orbital beyond the life of the pilot project, allowing both Researchers and software Developers to create applications for Orbital to suit the requirements of specific research disciplines at a given point in time. Likewise, an API-centric approach will also ensure that our existing and related applications, such as institutional repository software and research information systems can equally be treated as ‘users’ (producers and consumers) of Orbital.

As we outlined in our Project Plan, this approach allows us to benefit from work which continues outside the Orbital project such as that around Access and Identity Management and academic profiles, and the development of data.lincoln.ac.uk. It is also a suitable approach for the development of Orbital as open source software, which should remain simple to develop for specific user’s needs if it is to receive interest and contributions from developers outside the university.

The Technical Specification contains five core functional requirements: Projects, Workspace, Archives, Working Dataset, and Publication. A Project may result in a specific Publication(s), while the Workspace, Archives and Working Dataset allow for three non-sequential methods of data storage, manipulation and analysis. These requirements are loosely coupled to one another, but do not represent a publication workflow. Orbital is not simply intended to be a data repository, but the basis of a flexible collaborative environment for working Researchers.

Each Project acts as a conceptual container for all data and represents the ‘space’ in which administrative, descriptive and contextual metadata is captured and stored, as well as the datasets themselves. It is at the level of a Project that Orbital will interface with other systems, such as an institutional repository or research information system by storing, exchanging and publishing information according to recognised standards, such as CERIF, SWORD2, DOI, etc.

Finally, a core requirement from Orbital is that data should be stored, accessed and transported securely. Being a native web application, we have opted to implement the OAuth 2 protocol to provide secured access to all API functions over HTTPS. As such, all user applications will be treated equally and will be required to access the core Orbital APIs via this popular and mature standard for application authentication on the web. OAuth is increasingly being deployed at the University of Lincoln and work continues outside of the Orbital project to implement it as part of an institution-wide Single Sign On (SSO) architecture.

Related project blog posts

Chosen Methodology

Jenkins, build my software

Pivoting Around

Project Planning: Quality Assurance

Understanding and participating in open source culture

The Toolchain: First Pass

Tracking progress

Literature Review

An Orbital project reading list

Initial User Requirements

Meeting our users, the Engineers

Assessment of Data Sources

Research Data vs Research Data

Let’s Look At Data

Data, Data Everywhere

Gluing people together

Evaluation of standards and technologies

How the National Archives use MongoDB

Forecast: Cloudy

Piloting the cloud

Why Orbital is all about the API

Servers, Servers Everywhere

Eating your own dog food: Building a repository with API-driven development

Hello? Is it me you’re looking for?

Orbital and the OAIS reference model

Attending: Nick Jackson, Annalisa Jones, Bev Jones, Chris Leach, Paul Stainthorp, Joss Winn

Apologies: Lee Mitchell, David Young

Agenda

  1. Review Project Plan and Workpackages
  2. Status updates: Literature Review, User Requirements Analysis, Technology/Standards evaluation
  3. Forthcoming meetings and conferences (Agile method, Open Source policy, ERIM, Engineers, OR12, DCC, Start-up)
  4. Poster, papers, website
  5. Staffing and accommodation
  6. AOB

Notes

Joss Winn (JW) reported in detail:

  • JW reported on the work done to date (mostly relating to workpackage WP1), and reported back on:
    • The successful first meeting with users from the School of Engineering
    • The first Steering Group meeting on 3 November
    • The submission of the project plan
    • The appointment of NJ as lead developer
    • The relocation of NJ and PS (part-time) to CERD’s offices to work on Orbital
  • JW ran through the project outputs and workpackages in detail, identifying deadlines – most notably the Implementation Plan, which must be submitted by February 2012, with the following four pieces of work completed by then:
    • Data sources (NJ/CL)
    • User requirements (NJ)
    • Literature review (PS/BJ/CL)
    • Technical review (NJ/JW)
  • The group discussed the further user-engagement work to be completed in workpackages WP5, including Nick Jackson’s work with the School of Engineering to assess their requirements (through workshops, questionnaires, observation, and use of the Data Asset Framework – DAF), and on a planned round-table meeting about ERIM in late January
  • ACTION (NJ): dates needs to be set for user requirements exercises.
  • ACTION (PS): Date in late January needs to be set for ERIM workshop with Engineers.
  • PS reported on the work that he and NJ have begun to benchmark against the EPrints deposit workflow (WP8). NJ will work closely with BJ on this.
  • The group discussed WP9—the planned assessment of data sources—and CL’s role as library user. There are three obvious areas where Orbital crosses over with the Library’s priorities:
    • Integration with the Library’s Discovery selection & implementation project (CL)
    • Integration with the Repository (BJ)
    • Authentication (CL)
  • The Research & Enterprise office (i.e. AJ) will lead on WP11 – developing training materials & workshops.
  • JW will carry on the work with the University’s IP manager, James Murray on the correct approach to Open Sourcing code from Orbital – WP13.
  • ACTION: JW to follow up contacts with EPrints Services and OSSWatch.
  • Dissemination (WP14):
    • PS has been invited to speak at two events in January/February. The group will aim to have a publishable conference paper ready by Summer 2012. Submit abstract to OR12 by ?.
    • NJ, PS and JW are attending the project startup meeting in Nottingham on 1-2 December; presenting a poster. Also attending the DCC roadshow in Cardiff in mid-December.
  • Any other business:
    • JW is convening a meeting (8th December) about agile software and project development methods.
  • ACTION: as many people as possible from Orbital to attend ‘agile’ meeting.

Long day on the trainI’ve been at the University of Warwick today, for a workshop organised by the Digital Curation Centre (DCC), entitled RDMF7: Incentivising Data Management & Sharing. There appeared to be a wide range of attendees, from data curators & data scientists, ICT/database folk. actual researchers and academics, as well as at least one fellow library/repository rat.

Unfortunately I was only able to attend part of the event (which ran over two days). The following notes have been reconstructed from the Twitter stream (hashtag #RDMF7)!

The first speaker I heard was Ben Ryan of the funding council, the EPSRC. He talked about the “long-established” principles of responsible data management [links below]… this may be my own interpretation of Ben’s presentation, but I don’t think I was imagining undertones of “…so there’s really no excuse!“. He also covered individual and institutional motivations for taking care of data [much more about which later], policy and the enforcement of policy, dataset discoverability/metadata, funding (including the EPSRC’s expectation that institutions will make room in existing budgets to meet the costs of RDM), and embargo periods (inc. researchers’ entitlement to a period of “privileged use of the data they have collected, to enable them to publish” first – important to stress this in order to allay fears/get researchers on board?).

Some links:

Next up was Miggie Pickton, ‘queen bee’ of the University of Northampton‘s repository (and self-described RDM “novice”, indeed!), talking about their participation in the multi-institution, JISC-funded KeepIt project, which aimed to design “not one repository but many that, viewed as a whole, represent all the content types that an institutional repository might present (research papers, science data, arts, teaching materials and theses).” This work lead almost by chance to Northampton’s undertaking of a university-wide audit of its research data management processes using the DCC’s Data Asset Framework (DAF) methodology. This helped them to make the case for an institutional research data management working group and [eventually, and not without resistance] to establish a mandatory, central policy for RDM. (Show of hands at this point: how many other institutions have completed a DAF? I counted perhaps only three, Lincoln certainly not being amongst them. Q. Should the University of Lincoln complete a Data Asset Framework exercise as part of the Orbital project?)

After coffee, we heard a third presentation from Neil Beagrie of (management consultancy partnership) Charles Beagrie Ltd. Neil delivered a very comprehensive explanation of the KRDS (“Keeping Research Data Safe”) project, which has developed both an activity model and a benefits analysis toolkit for the management and preservation-of-access to ‘long-lived data’. I have to come clean here and admit that I was a little bewildered by the detail: much of it went through both ears without sticking to the brain on the way through. I need to go back over the tweets more carefully and have a look at the KRDS toolkit and reports at: beagrie.com/krds.php

The morning’s presentations over, we split into three groups for breakout discussion.

I attached myself to the second of the three groups, led by (JISC programme manager for Orbital) Simon Hodson; our job to consider the question: “What really are the sticks and carrots that will make a long-term difference to the pursuit of structured data management processes?“. After spending some time picking apart the terminology, and what each of the various ‘processes’ might include, we had a wide-ranging (and allocated-time-overrunning) discussion about the things that genuinely motivate scientists, universities, and funding councils(!) to care about RDM; about some of the problems caused by the complexity and inconsistency of metadata for datasets; also about the issue of citations/digital object identifiers for data—how those citations might be treated by publishers and citation data services—and how that relates to any notions of ‘peer review’ in experimental data.

As requested, our group came up with three actions which we believe will help address the question of motivation:

  1. Data citation – publishers should consistently include e.g. DOIs for datasets in final published articles, so that citations of the data can be measured.
  2. Measurement of RDM “maturity” – departments and whole institutions should adopt a standardised quality mark for research data management, to give [potential] researchers, funding bodies, and the public confidence in their ability to handle data appropriately.
  3. Discovery – the research councils (probably) should push for common metadata standards for describing datasets and underlying data-generating research/experimental processes.

Lunch followed, and I had time to hear two more presentations in the afternoon before I had to run for a bus:

Catherine Moyes of the Malaria Atlas Project: in effect, demonstrating what really clear and consistent management of large-scale (geo)data looks like. This seems to consist of an extremely rigorous approach to requesting, tracking, and licensing data from the contributors of the project’s data… and an equally strict (but in a good way) expectation of clarity when dealing with requests from third parties to use the data. If that all comes across as restrictive, I’d point to Catherine’s slide on ‘legalities’ of the data that the Malaria Atlas Project has released openly – it’s about as open as it gets, with no registration needed, no terms & conditions placed on re-use of the published data, and all software/artefacts released under very permissive and free licences (Creative Commons or GNU). N.B. the Orbital project should look at the Malaria Atlas Project’s “data explorer”, available via map.ox.ac.uk, as an example of a really nifty set of applications built on top of openly accessible and re-usable data.

Finally (and I’m sorry I only got to hear part of his presentation), University of So’ton chemistry professor Jeremy Frey on their IDMB (Institutional Data Management Blueprint) Project—southamptondata.org—and some rather funny anecdotes about the underlying knowledge, expectations, and problems faced by researchers managing their own data, which emerged when they were surveyed as part of the above project.

Lots to take in (lots). But some useful suggestions for Orbital, which I’ll be bringing to the next project meeting: and plenty more reading material which I’ll add to the project reading list asap.

Paul Stainthorp, lead researcher on the Orbital project.

The Orbital project formally begins today and I’m pleased to be able to write that we’ve just advertised internally for the post of Lead Web Developer. If all goes well, the chosen candidate should be in place and working on the project by the end of this month. The Lead Developer role is key to the project and will be working closely with myself (Joss Winn – Project Managers) and Paul Stainthorp, Lead Researcher.

This post is established within the Centre for Educational Research and Development (CERD) to work as Lead Developer on the JISC-funded ‘Orbital’ project. The Orbital project has been funded to develop, implement and pilot a new infrastructure for managing research data at the university. Further information on the Orbital project can be found at: http://lncn.eu/t48

Working closely with colleagues in the ICT Online Services team and Library, CERD has been successful in leading a number of innovative research and development projects that improve the use of technology in higher education and the University of Lincoln in particular. The new post of Lead Developer will work alongside colleagues in CERD, ICT and the Library to build on this recent success and contribute to the delivery of the Orbital project objectives.

The role requires extensive knowledge of the web and its attendant technologies and the software development and analytical skills to put this knowledge to good effect. In particular, candidates for the role should have demonstrable experience as both a producer and consumer of RESTful web services for large data stores.