Orbital project team meeting: notes

Here are the notes of the most recent Orbital project team meeting (31 January 2013).

Present: Nick Jackson, Harry Newton, Paul Stainthorp, Joss Winn.

The project team discussed the following development tasks. The aim is for the following to be completed by the end of February 2013:

  • Demonstratable AMS-CKANEPrints workflow in Orbital Bridge (a minimal but operational RDM infrastructure);
  • Researcher dashboard to include projects and project metadata;
  • Users able to display and create datasets in CKAN from within Orbital Bridge (N.B. need to check changes to CKAN APIs between versions);
  • Demonstrator using the DataCite test API (until a budget is agreed for use of the live DataCite service);
  • Ability to publish dataset metadata to EPrints Repository, with a complete ‘publish’ UI in Orbital Bridge (to be tested on the University’s upgraded EPrints 3.3 Repository in March) – questions over versioning/locking of deposited metadata to be resolved;
  • Researcher dashboard to include analytics fom EPrints, CKAN, AMS, and bibliometric/citation services – add links to external profiles (Scopus, WoS, ORCID, Google Scholar) in the first instance. ACTION: JW to contact Planning to discuss reporting from the researcher dashboard (also data.lincoln.ac.uk; bibiometrics).

JW presented the Orbital business case to the University Senior Management Team on 14th January 2013. JW to work with the Dean of Research (Lisa Mooney) / Deputy V-c (Ieuan Owen) to discuss ongoing resourcing for RDM.

ICT are undertaking a cloud major scoping study, including RDM storage requirements.

The draft RDM policy is to be presented to the Research & Enterprise committee in April.

NJ, HN and PS are working on the display of RDM training and documentation in Orbital Bridge, with versioned text stored as Markdown in Github. Pages in Orbital can be linked to Github.

The next RDM training for postgraduate students will take place on 6th March 2013. ACTION: PS to embed a calendar feed of training events on the Orbital website.

Upcoming events:

Implementation Plan

Introduction

The Orbital Implementation Plan (WP6) is intended to be a synthesis of our initial user requirements gathering (WP5), an assessment of Engineering research data (WP9), an evaluation of standards and technologies (WP10), informed by a literature review of previous work relevant to the Research Data Management (RDM) domain as it relates the discipline of Engineering (WP4).

Therefore, appended to this Implementation Plan is: i) a Technical Specification based on user requirements; ii) a Literature Review; iii) a summary of an institution-wide survey based on the Data Asset Framework; iv) and a draft Research Data Management Policy for the University of Lincoln (WP7), which is currently under-going internal review.

The Implementation Plan has been written at exactly a third of the way into the Orbital project (six months), allowing for a further year of development based on the work brought together in this document. It is worth repeating the objectives of the project, as stated in the Project Plan:

We intend to build on our previous work around the deposit, management and access to university research as well as further existing work in which we are building a platform for data-driven services at the university.

Throughout this undertaking, we aim to improve our understanding of the issues around research data management; develop the requisite skills among the university community to better manage research data; re-use and develop some of the underlying tools we have built to provide an institution-wide service for the ingest, description, preservation and dissemination of research data; improve the way we work on such projects, refining our use of agile methods; build capacity for the local development of academic technologies at the university; develop and implement appropriate institutional policy for the deposit, management and sharing of research data; and develop a Business Plan for the university for the long-term sustainability of our research data.

Our work to-date has pursued many of these objectives closely, reflecting continued effort over the last six months, both inside and outside the project, to build on previous work by using institutional data to drive application development; to improve our methods of access and identity management; and develop an environment that fosters and supports in-house innovation.

This planning document is primarily intended to support the technical implementation of the Orbital application to manage research data at the University of Lincoln. What it does not address is the training to support the use of the application (WP11), nor the Business Case for sustaining the pilot service (WP13), which we are implementing. However, some preliminary work is underway to consider appropriate business models for sustaining Orbital as open source software and we believe that the technical decisions laid out in this Implementation Plan will support the development of a sustainable Business Case for Orbital. This area of work continues and the outcomes are due to be delivered towards the end of the project.

What follows is a brief summary of the appended Technical Specification and Literature Review. I would like to thank Nick Jackson and Paul Stainthorp for their work on these documents, which have brought clarity to the Orbital project and contributed to a much better understanding of RDM at the University of Lincoln.

Joss Winn, Orbital Project Manager, 2nd April 2012.

Literature Review

The management of research data is recognised as one of the most pressing challenges facing the higher education and research sectors. Research data generated by publicly-funded research is seen as a public good and should be available for verification and re-use. In recognition of this principle, all UK Research Councils require their grant holders to manage and retain their research data for re-use, unless there are specific and valid reasons not to do so. (JISC Managing Research Data Programme 2011-13).

To gain a clearer understanding of the more complex and unfamiliar concepts in the emerging discipline of Research Data Management, the Orbital project conducted a review of published literature on the subject (mainly web sites, project reports and guidance documents), with particular reference to RDM in the discipline of engineering.

An online Research Data Management bibliography is being maintained at: http://lncn.eu/bcf6

The project team identified the following nine themes in the literature – for each theme, a recommendation is made which will support the development of RDM infrastructure at the University of Lincoln.

1. Fundamentals of research data and RDM

Researchers are not a homogeneous group, and their data needs are changing as the research landscape becomes more complex. Recommendation: the Orbital project continue its work to assess the storage and other requirements of Lincoln researchers using surveys and interviews.

2. Particular requirements of the discipline of engineering

The ERIM (Engineering Research Information Management) project at the University of Bath has specified the first ever set of RDM principles and terminology designed specifically for engineers. Recommendation: the Orbital continue to work with the Bath team on implementing ERIM’s findings.

3. The behaviour of researchers

What motivates researchers to invest in RDM is not the same as what motivates their institutions. Recommendation: Orbital to use surveys and interviews to understand researchers’ requirements and develop appropriate advocacy materials.

4. RDM policies and legal aspects

All UK Research Councils are introducing mandates for data curation, and in some cases data publication. Recommendation: the Orbital team to support the University’s response to the imminently required EPSRC data policy roadmap and to help develop institutional policies.

5. Data sharing

Research data are at their most useful when they are interoperable with other data. Sharing data leads to a range of real and measurable benefits, and researchers’ interests are protected by a principle of ‘proprietary period’ of privileged access. Recommendation: Orbital work with Research & Enterprise to formulate clear policies on data sharing and licensing.

6. Costs and benefits

The most significant RDM costs for the institution occur at the data acquisition/ingest stage. Institutions that invest in RDM can expect significant benefits including new, unforeseen research activities made possible through the re-use and aggregation of data. Recommendation: Orbital provide guidance to researchers on ensuring RDM is costed into future research funding bids.

7. Curation standards, metadata and citation

Without a system for assigning citations to research data, further curation and sharing is impossible. Recommendation: Orbital incorporate the functionality of DataCite to allow Lincoln researchers to secure a DOI (Digital Object Identifier) for their data objects.

8. Technical considerations

The range of file formats involved in engineering research is a significant area of complexity. Recommendation: Orbital continue to work with Siemens, the School of Engineering, the University of Bath and the DCC to develop expertise in handling engineering data formats.

9. Tools, support and training

A range of immediately re-purposable RDM training kits and planning tools already exists. Recommendation: Orbital review the available material, and use them to design a RDM training programme for the University of Lincoln – also incorporate Data Management Planning (DMP) tools within the Orbital application.

In light of this, the initial objectives of the Orbital project were on the mark, but indicate a broad area of institutional responsibility that goes beyond scholarly communication to affect strategic areas such as recruitment and training, business intelligence and continuity, IP and income generation, as well as future curriculum design and our corresponding investment in infrastructure and estate. No small task.

Technical Approach

Our Project Plan outlined the technical approach that we originally anticipated and six months later this has not fundamentally changed. As detailed in the Technical Specification, we remain convinced of the benefits of pursuing a data-driven, API-centric model of development, using storage and access control methods that support the creation of a modular and scaleable web application that is attractive to both Users and Developers.

As we have learned from our requirements gathering and literature review, Research practices both within and across subject disciplines are varied, suggesting that over the next 12 months, the Orbital project should concentrate on developing an application that remains open and attractive to further development, rather than seeking to design a single workflow for all users’ needs – an impossible task.

We believe this approach best supports the sustained development of Orbital beyond the life of the pilot project, allowing both Researchers and software Developers to create applications for Orbital to suit the requirements of specific research disciplines at a given point in time. Likewise, an API-centric approach will also ensure that our existing and related applications, such as institutional repository software and research information systems can equally be treated as ‘users’ (producers and consumers) of Orbital.

As we outlined in our Project Plan, this approach allows us to benefit from work which continues outside the Orbital project such as that around Access and Identity Management and academic profiles, and the development of data.lincoln.ac.uk. It is also a suitable approach for the development of Orbital as open source software, which should remain simple to develop for specific user’s needs if it is to receive interest and contributions from developers outside the university.

The Technical Specification contains five core functional requirements: Projects, Workspace, Archives, Working Dataset, and Publication. A Project may result in a specific Publication(s), while the Workspace, Archives and Working Dataset allow for three non-sequential methods of data storage, manipulation and analysis. These requirements are loosely coupled to one another, but do not represent a publication workflow. Orbital is not simply intended to be a data repository, but the basis of a flexible collaborative environment for working Researchers.

Each Project acts as a conceptual container for all data and represents the ‘space’ in which administrative, descriptive and contextual metadata is captured and stored, as well as the datasets themselves. It is at the level of a Project that Orbital will interface with other systems, such as an institutional repository or research information system by storing, exchanging and publishing information according to recognised standards, such as CERIF, SWORD2, DOI, etc.

Finally, a core requirement from Orbital is that data should be stored, accessed and transported securely. Being a native web application, we have opted to implement the OAuth 2 protocol to provide secured access to all API functions over HTTPS. As such, all user applications will be treated equally and will be required to access the core Orbital APIs via this popular and mature standard for application authentication on the web. OAuth is increasingly being deployed at the University of Lincoln and work continues outside of the Orbital project to implement it as part of an institution-wide Single Sign On (SSO) architecture.

Related project blog posts

Chosen Methodology

Jenkins, build my software

Pivoting Around

Project Planning: Quality Assurance

Understanding and participating in open source culture

The Toolchain: First Pass

Tracking progress

Literature Review

An Orbital project reading list

Initial User Requirements

Meeting our users, the Engineers

Assessment of Data Sources

Research Data vs Research Data

Let’s Look At Data

Data, Data Everywhere

Gluing people together

Evaluation of standards and technologies

How the National Archives use MongoDB

Forecast: Cloudy

Piloting the cloud

Why Orbital is all about the API

Servers, Servers Everywhere

Eating your own dog food: Building a repository with API-driven development

Hello? Is it me you’re looking for?

Orbital and the OAIS reference model

Tracking progress

Did you know that you can watch our user requirements gathering and see how Orbital development is progressing by following our Github and Pivotal Tracker activity? Here are the key links:

Orbital Manager (the front end) (RSS)

Orbital Core (the back end) (RSS)

Pivotal Tracker (RSS)

Updates are also merged in a single stream of activity on Splendid Bacon.

Internally, we watch all of this activity through Campfire, thanks to Hubot and a bit of plumbing. Commits to Github, new stories and other activity on Pivotal Tracker, fire off API notifications which Hubot (‘Zakia’), delivers to Campfire. Here’s what this afternoon’s activity looked like.

Campfire
Watching Orbital progress on Campfire, using Hubot (Zakia)

Using a mixture of friendly APIs, asynchronous messaging and a chat bot provides us with a handy method of keeping track of what’s going on when we can’t all be in the same room.

Gluing people together

In December, colleagues in the Web Team (who manage the corporate web site in the Department of Marketing and Communications) approached a few of us about building a tool to allow staff to edit their profile for the new version of the lincoln.ac.uk website. We suggested that much of the work was already done and it just needed gluing together. Yesterday we met with the Web Team again to tell them that our part of the work is pretty much complete. Here’s how it works.

Quick sketch of profile building at Lincoln
Quick sketch of profile building at Lincoln

This requires a bit of explanation, but let me tell you, it’s the holy grail as far as I’m concerned and having this in place brings benefits to Orbital and any other new application we might develop. Here’s a clearer rendering.

 

Building staff profiles
Building staff profiles

The chart above strips out the stuff around authentication that you see in the bottom right of the whiteboard photo. That’s for another post – something Alex is better placed to write.

Information about staff at the university starts with the HR database. This feeds the Active Directory, which authenticates people against different web services. Last year, Nick and Alex pulled this data into Nucleus, our MongoDB datastore, and with it built a new, slick staff directory. Then they started bolting things on to it, like research outputs from the repository and blog posts from our WordPress/BuddyPress platform. To illustrate what was possible, they started pulling information from my BuddyPress profile, which I could edit anytime I wanted to. It got to the point where I started using my staff directory link in my email signature because it offered the most comprehensive profile of me anywhere on a Lincoln website.

By the time we first met with the Web Team about the possibility of helping them with staff profiles, Alex and Nick had 80% of the work already done. What remained was to create a richer number of required fields in BuddyPress for staff to edit about themselves and a scheduled XML dump for the Web Team to wrangle into their new templates on www.lincoln.ac.uk.

So the work is nearly done. The XML file is RDF Linked Data, which means that we have a rich aggregation of staff information and some simple relationships, feeding the Staff Directory, being refreshed every three hours and then being output either as HTML, JSON or RDF/XML.

For the Orbital project, all this glue is invaluable. When staff login to Orbital (Nick’s working on this part right now), we’ll already know who they are, which department they work in, what research outputs they’ve deposited in the institutional repository, what their research interests are, what projects they’re working on, the research groups they’re members of, their recent awards and grants, and the keywords they’ve chosen to tag their profile with. It’s our intention that with some simple AI, we’ll be able to make Orbital a space where Researchers find themselves in an environment which already knows quite a bit about their work and the context of the research they’re undertaking. Once Orbital starts collecting specific staff data of its own, it can feed that back into Nucleus, too.

This reminds me of our discussion last month with Mansur Darlington of the ERIM/REDm-MED project. Mansur stressed the importance of gathering data about the context of the research itself, emphasising that without context, research data becomes increasingly meaningless over time. Having rich user profiles in Orbital and ensuring that we record data about the Researcher’s activity while using Orbital, should help provide that context to the research data itself.

Orbital, therefore, becomes an infrastructure not only for storing and managing research data, but also a system for storing and managing data about the research itself.

Pivoting Around

As part of Orbital’s development we need to keep what we’re doing on track, and ensure that what is produced is actually what people are after. We’re building the project using agile development methods, which mean that instead of generating a load of documentation and exacting requirements up front and then building software, we generate a basic set of requirements, start developing and then return to look at new or changed requirements at regular intervals.

Keeping tabs on this kind of thing requires a management tool, and in our case we’re using the wonderful Pivotal Tracker, and here’s why.

Pivotal allows us to break down user requirements (gathered through a variety of means, including meetings, surveys, observation and so-on) into discreet bundles called ‘stories’, each of which represents something that a user needs (or wants) to be able to do with the final product. An example may be “project administrators must be able to assign roles to project users”, or “users must be able to manually add a data point”. By creating these stories it starts to become clearer what actually needs to be done.

From there we can start to fully analyse each of these stories, providing them with information such as a ‘score’ of how difficult to achieve each story will be, or including sub-tasks for actual development purposes. Stories can be assigned to various people based on who needs to be involved, and go through a clearly defined workflow of being started, being finished, being delivered in a product version and being approved by the customer.

On top of this management of user stories we can also pack out Pivotal with higher-level package deliverables and deadlines, along with bug reporting and general project chores. Once we’ve got all these things into the Tracker we’re able to re-order them as priorities shift, giving us an instant overview of what’s happening in the current iteration (a 2-week long development cycle) as well as what’s going to be happening in future iterations. At this point, Pivotal Tracker comes into its own with something called ’emergent planning’.

Emergent planning takes a look at how we’re actually performing in terms of crunching through our list of user stories and dynamically adjusts which stories we’re going to be tackling in upcoming iterations. If we’re doing well we begin to see more points worth of development per iteration, and if we’re slipping then Tracker gives us fewer. Since we’ve told Pivotal what needs to happen before certain deadlines are met (when we ordered stories and tasks), and since Pivotal knows roughly how fast we’re working, it’s easy to see if we’re predicted to hit or miss development milestones.

Want to see what we’re up to? Our Pivotal Tracker project is open for you all to see.