Implementation Plan

Introduction

The Orbital Implementation Plan (WP6) is intended to be a synthesis of our initial user requirements gathering (WP5), an assessment of Engineering research data (WP9), an evaluation of standards and technologies (WP10), informed by a literature review of previous work relevant to the Research Data Management (RDM) domain as it relates the discipline of Engineering (WP4).

Therefore, appended to this Implementation Plan is: i) a Technical Specification based on user requirements; ii) a Literature Review; iii) a summary of an institution-wide survey based on the Data Asset Framework; iv) and a draft Research Data Management Policy for the University of Lincoln (WP7), which is currently under-going internal review.

The Implementation Plan has been written at exactly a third of the way into the Orbital project (six months), allowing for a further year of development based on the work brought together in this document. It is worth repeating the objectives of the project, as stated in the Project Plan:

We intend to build on our previous work around the deposit, management and access to university research as well as further existing work in which we are building a platform for data-driven services at the university.

Throughout this undertaking, we aim to improve our understanding of the issues around research data management; develop the requisite skills among the university community to better manage research data; re-use and develop some of the underlying tools we have built to provide an institution-wide service for the ingest, description, preservation and dissemination of research data; improve the way we work on such projects, refining our use of agile methods; build capacity for the local development of academic technologies at the university; develop and implement appropriate institutional policy for the deposit, management and sharing of research data; and develop a Business Plan for the university for the long-term sustainability of our research data.

Our work to-date has pursued many of these objectives closely, reflecting continued effort over the last six months, both inside and outside the project, to build on previous work by using institutional data to drive application development; to improve our methods of access and identity management; and develop an environment that fosters and supports in-house innovation.

This planning document is primarily intended to support the technical implementation of the Orbital application to manage research data at the University of Lincoln. What it does not address is the training to support the use of the application (WP11), nor the Business Case for sustaining the pilot service (WP13), which we are implementing. However, some preliminary work is underway to consider appropriate business models for sustaining Orbital as open source software and we believe that the technical decisions laid out in this Implementation Plan will support the development of a sustainable Business Case for Orbital. This area of work continues and the outcomes are due to be delivered towards the end of the project.

What follows is a brief summary of the appended Technical Specification and Literature Review. I would like to thank Nick Jackson and Paul Stainthorp for their work on these documents, which have brought clarity to the Orbital project and contributed to a much better understanding of RDM at the University of Lincoln.

Joss Winn, Orbital Project Manager, 2nd April 2012.

Literature Review

The management of research data is recognised as one of the most pressing challenges facing the higher education and research sectors. Research data generated by publicly-funded research is seen as a public good and should be available for verification and re-use. In recognition of this principle, all UK Research Councils require their grant holders to manage and retain their research data for re-use, unless there are specific and valid reasons not to do so. (JISC Managing Research Data Programme 2011-13).

To gain a clearer understanding of the more complex and unfamiliar concepts in the emerging discipline of Research Data Management, the Orbital project conducted a review of published literature on the subject (mainly web sites, project reports and guidance documents), with particular reference to RDM in the discipline of engineering.

An online Research Data Management bibliography is being maintained at: http://lncn.eu/bcf6

The project team identified the following nine themes in the literature – for each theme, a recommendation is made which will support the development of RDM infrastructure at the University of Lincoln.

1. Fundamentals of research data and RDM

Researchers are not a homogeneous group, and their data needs are changing as the research landscape becomes more complex. Recommendation: the Orbital project continue its work to assess the storage and other requirements of Lincoln researchers using surveys and interviews.

2. Particular requirements of the discipline of engineering

The ERIM (Engineering Research Information Management) project at the University of Bath has specified the first ever set of RDM principles and terminology designed specifically for engineers. Recommendation: the Orbital continue to work with the Bath team on implementing ERIM’s findings.

3. The behaviour of researchers

What motivates researchers to invest in RDM is not the same as what motivates their institutions. Recommendation: Orbital to use surveys and interviews to understand researchers’ requirements and develop appropriate advocacy materials.

4. RDM policies and legal aspects

All UK Research Councils are introducing mandates for data curation, and in some cases data publication. Recommendation: the Orbital team to support the University’s response to the imminently required EPSRC data policy roadmap and to help develop institutional policies.

5. Data sharing

Research data are at their most useful when they are interoperable with other data. Sharing data leads to a range of real and measurable benefits, and researchers’ interests are protected by a principle of ‘proprietary period’ of privileged access. Recommendation: Orbital work with Research & Enterprise to formulate clear policies on data sharing and licensing.

6. Costs and benefits

The most significant RDM costs for the institution occur at the data acquisition/ingest stage. Institutions that invest in RDM can expect significant benefits including new, unforeseen research activities made possible through the re-use and aggregation of data. Recommendation: Orbital provide guidance to researchers on ensuring RDM is costed into future research funding bids.

7. Curation standards, metadata and citation

Without a system for assigning citations to research data, further curation and sharing is impossible. Recommendation: Orbital incorporate the functionality of DataCite to allow Lincoln researchers to secure a DOI (Digital Object Identifier) for their data objects.

8. Technical considerations

The range of file formats involved in engineering research is a significant area of complexity. Recommendation: Orbital continue to work with Siemens, the School of Engineering, the University of Bath and the DCC to develop expertise in handling engineering data formats.

9. Tools, support and training

A range of immediately re-purposable RDM training kits and planning tools already exists. Recommendation: Orbital review the available material, and use them to design a RDM training programme for the University of Lincoln – also incorporate Data Management Planning (DMP) tools within the Orbital application.

In light of this, the initial objectives of the Orbital project were on the mark, but indicate a broad area of institutional responsibility that goes beyond scholarly communication to affect strategic areas such as recruitment and training, business intelligence and continuity, IP and income generation, as well as future curriculum design and our corresponding investment in infrastructure and estate. No small task.

Technical Approach

Our Project Plan outlined the technical approach that we originally anticipated and six months later this has not fundamentally changed. As detailed in the Technical Specification, we remain convinced of the benefits of pursuing a data-driven, API-centric model of development, using storage and access control methods that support the creation of a modular and scaleable web application that is attractive to both Users and Developers.

As we have learned from our requirements gathering and literature review, Research practices both within and across subject disciplines are varied, suggesting that over the next 12 months, the Orbital project should concentrate on developing an application that remains open and attractive to further development, rather than seeking to design a single workflow for all users’ needs – an impossible task.

We believe this approach best supports the sustained development of Orbital beyond the life of the pilot project, allowing both Researchers and software Developers to create applications for Orbital to suit the requirements of specific research disciplines at a given point in time. Likewise, an API-centric approach will also ensure that our existing and related applications, such as institutional repository software and research information systems can equally be treated as ‘users’ (producers and consumers) of Orbital.

As we outlined in our Project Plan, this approach allows us to benefit from work which continues outside the Orbital project such as that around Access and Identity Management and academic profiles, and the development of data.lincoln.ac.uk. It is also a suitable approach for the development of Orbital as open source software, which should remain simple to develop for specific user’s needs if it is to receive interest and contributions from developers outside the university.

The Technical Specification contains five core functional requirements: Projects, Workspace, Archives, Working Dataset, and Publication. A Project may result in a specific Publication(s), while the Workspace, Archives and Working Dataset allow for three non-sequential methods of data storage, manipulation and analysis. These requirements are loosely coupled to one another, but do not represent a publication workflow. Orbital is not simply intended to be a data repository, but the basis of a flexible collaborative environment for working Researchers.

Each Project acts as a conceptual container for all data and represents the ‘space’ in which administrative, descriptive and contextual metadata is captured and stored, as well as the datasets themselves. It is at the level of a Project that Orbital will interface with other systems, such as an institutional repository or research information system by storing, exchanging and publishing information according to recognised standards, such as CERIF, SWORD2, DOI, etc.

Finally, a core requirement from Orbital is that data should be stored, accessed and transported securely. Being a native web application, we have opted to implement the OAuth 2 protocol to provide secured access to all API functions over HTTPS. As such, all user applications will be treated equally and will be required to access the core Orbital APIs via this popular and mature standard for application authentication on the web. OAuth is increasingly being deployed at the University of Lincoln and work continues outside of the Orbital project to implement it as part of an institution-wide Single Sign On (SSO) architecture.

Related project blog posts

Chosen Methodology

Jenkins, build my software

Pivoting Around

Project Planning: Quality Assurance

Understanding and participating in open source culture

The Toolchain: First Pass

Tracking progress

Literature Review

An Orbital project reading list

Initial User Requirements

Meeting our users, the Engineers

Assessment of Data Sources

Research Data vs Research Data

Let’s Look At Data

Data, Data Everywhere

Gluing people together

Evaluation of standards and technologies

How the National Archives use MongoDB

Forecast: Cloudy

Piloting the cloud

Why Orbital is all about the API

Servers, Servers Everywhere

Eating your own dog food: Building a repository with API-driven development

Hello? Is it me you’re looking for?

Orbital and the OAIS reference model

News from Orbital: Policy, Implementation Plan and a first release date

It feels like there’s been quite a lot coming together recently. Here’s what we’re working on:

  • A draft RDM policy (WP7)
  • Our Implementation Plan (WP6), which includes our Literature Review (WP4), initial user requirements analysis (WP5), our technical evaluation (WP10), and assessment of data sources (WP9). All of this goes to the Steering Group tomorrow and all being well, will be posted on this project blog a day or two later.
  • A DAF-based survey (23 questions) of researchers’ data management practices and requirements. We’ve asked all Engineering academics to complete this via the Head of School; we’ve sent a request to all Research Directors in the university to encourage their academic colleagues to complete the survey, and have also advertised it to all staff on three occasions this week via the daily all staff alerts. So far, 28 people have completed it (about 5% of staff on academic contacts). We’ll continue to push this after the Easter break and publish a summary once we think we’ve exhausted our chances of staff filling it in on mass. Probably later in the month.
  • Harry Newton joined us as a second Developer, working with Nick Jackson. Harry graduated with an MSc in Computer Science from Lincoln last year. He’s been ‘bedding in’ this week and started working on adding ‘projects’ functionality to Orbital.
  • A release date for v0.1 of the Orbital software is May 1st. A couple of weeks ago, a Robotics researcher asked us if we could help him publish his datasets (20GB). We did so, offering him server space, guidance on his choice of license and a proxy URL to use for citation. It made us realise that there’s probably quite a few researchers like him that just need to get data on the web for citation purposes, so we thought we’d aim to have something permanently in place for the university by May 1st. Functionality for the v0.1 release will be: secure login, basic project creation and deletion, file upload, license picker, publish to permanent URL for citation. We think this is the bare minimum needed for a researcher to publish open research data so that it is permanently citable. From May 1st, we’ll maintain a working system on which to base discussions with users about additional functionality. For the time-being, researchers wishing to upload data will have to discuss it with us first.
  • We’re on the list of testers for the new DataCite API and have registered with ORCID, too, which has a mock API for testing against.
  • We’re helping organise the JISC MRD/DevCSI MRD Hackday on May 2-3rd, when we hope to be able to demo this work and talk about the implementation in detail. Fingers crossed.

It’s like having a whole new research partner…

Orbital, as you know, has many cool features in the pipeline which are designed to make research easier. From keeping tabs on your data and helping you find what you’re looking for through to helping you build your data management plan Orbital is going to be an essential tool in your daily research work. Today we’re pleased to announce that it’s being made even better, making use of some really rather clever machine learning and artificial intelligence to actually do parts of the research for you.

To start with, all you need to do is to upload your research data in whatever format you’ve got it in. If you don’t have research data then just give Orbital a few keywords and it’ll generate research data for you based on over 200 individual variables gathered from (amongst other things) the news, weather, stock markets and punctuality of the rail network. Once your data is loaded Orbital will begin to sift through it, sorting it into a more easily understood form which can be searched and queried with ease. From there Orbital will begin to look for statistically significant patterns of data, pick them out for further analysis and finally output a conclusion for you – complete with any necessary citations – ready for inclusion in your paper.

Yet another example of how Orbital isn’t just a place to keep your data, but an active part of your day-to-day work.

Data, Data Everywhere…

For a project which is essentially about storing data, we’ve not actually done that much talking about it. This may seem sensible to some — after all, everybody knows what data is, don’t they?

It turns out that what people define as ‘data’ is a hugely wide ranging topic (you can find a myriad of research on how different people define it), and what we’re trying to do is basically trying to fit mis-shapen data into a one-size-fits-nothing storage system. Allow me to elaborate.

First of all we had to look at what data was currently available to us. Fortunately we have some awesome project partners in the School of Engineering who provided us with some of what they’re researching on, and thus presented the first problem: The data doesn’t exist in any kind of standardised format. We’ve got to content with flat text database formats, weird (often invalid) XML, Excel spreadsheets, CSV files (again often invalid), folders of images or audio files, proprietary binary formats, non-binary flat files which nonetheless need parsing to be made understandable, plain strings of data, and the occasional random file format which even the source of the data can’t explain.

The solution to this problem is fairly simple in principle, yet complex in practice. First of all when it comes to archive storage of files (ie without any pre-processing) Orbital is designed to be file type agnostic — if you give it a random stream of bytes and say it’s a file then a Orbital will duly store the file as provided, with no further work needed. It doesn’t care if your XML file has no DTD and has unclosed tags, since it doesn’t do any work inside the stream. You will later be able to retrieve the file exactly as it was first loaded into the system without any changes or alterations. It’s worth pointing out, however, this does mean that if Orbital is given a corrupt file to store then it will do so blindly without any attempt at validation.

Continue reading “Data, Data Everywhere…”

Hello? Is it me you’re looking for?

Orbital loves APIs, to the extent that the entire project hangs off them working as expected. One thing that Orbital must also do is ensure that it keeps potentially sensitive or confidential data secure during the process of slinging it over the ether. Fortunately, the fact we’re using HTTP as our transport mechanism of choice means that we can leverage something which has proven to be pretty darned reliable thus far: HTTPS!

The pretty green HTTPS icon in Chrome means that your communication with Orbital is secured.

HTTPS is the secure version of the HTTP standard which drives browsing the web — the ‘green padlock’ which appears on secured websites when you’re browsing. Behind the scenes it’s full of plenty of technological awesomeness including various protocols (SSL and TLS if you’re curious) and complex encryption and certification policies which aim to ensure three things:

  1. Traffic between the client (in most cases your browser) and the server is encrypted, ie it’s useless to anybody who may be able to intercept it.
  2. The connection between the client and the server is secure, ie the only parties who can communicate over that channel are the client and server who initiated it.
  3. The server you’re talking to is verifiably the right one, and not simply pretending in order to intercept data.

Orbital makes use of HTTPS throughout, and offers no option at all for unsecured access. Simply put, this means that all your communication with Orbital is guaranteed to be secure whilst it’s in transit. In fact, we decided to find a tool which can quantifiably express just how secure it is, so we had a look at Qualys. You can view our SSL reports below:

Whilst we always made sure that both aspects of Orbital offered a standard level of security, our initial reports highlighted a few flaws. Our servers were vulnerable to the BEAST attack, and offered some low security ciphers in their list of supported ones. Based on these reports we were able to plug those holes, improving our overall security. We also decided to implement the HSTS draft specification, providing an explicit instruction to browsers that communication with Orbital must always take place over HTTPS with a valid certificate, and any attempts to do otherwise should be blocked as a security risk.

Just a few of the ways we’re making sure that Orbital can be trusted with your sensitive or commercial data.