A JISC-funded Managing Research Data project

Posts tagged development

Here are the notes of the most recent Orbital project team meeting (31 January 2013).

Present: Nick Jackson, Harry Newton, Paul Stainthorp, Joss Winn.

The project team discussed the following development tasks. The aim is for the following to be completed by the end of February 2013:

  • Demonstratable AMS-CKAN-EPrints workflow in Orbital Bridge (a minimal but operational RDM infrastructure);
  • Researcher dashboard to include projects and project metadata;
  • Users able to display and create datasets in CKAN from within Orbital Bridge (N.B. need to check changes to CKAN APIs between versions);
  • Demonstrator using the DataCite test API (until a budget is agreed for use of the live DataCite service);
  • Ability to publish dataset metadata to EPrints Repository, with a complete ‘publish’ UI in Orbital Bridge (to be tested on the University’s upgraded EPrints 3.3 Repository in March) – questions over versioning/locking of deposited metadata to be resolved;
  • Researcher dashboard to include analytics fom EPrints, CKAN, AMS, and bibliometric/citation services – add links to external profiles (Scopus, WoS, ORCID, Google Scholar) in the first instance. ACTION: JW to contact Planning to discuss reporting from the researcher dashboard (also data.lincoln.ac.uk; bibiometrics).

JW presented the Orbital business case to the University Senior Management Team on 14th January 2013. JW to work with the Dean of Research (Lisa Mooney) / Deputy V-c (Ieuan Owen) to discuss ongoing resourcing for RDM.

ICT are undertaking a cloud major scoping study, including RDM storage requirements.

The draft RDM policy is to be presented to the Research & Enterprise committee in April.

NJ, HN and PS are working on the display of RDM training and documentation in Orbital Bridge, with versioned text stored as Markdown in Github. Pages in Orbital can be linked to Github.

The next RDM training for postgraduate students will take place on 6th March 2013. ACTION: PS to embed a calendar feed of training events on the Orbital website.

Upcoming events:

It feels like there’s been quite a lot coming together recently. Here’s what we’re working on:

  • A draft RDM policy (WP7)
  • Our Implementation Plan (WP6), which includes our Literature Review (WP4), initial user requirements analysis (WP5), our technical evaluation (WP10), and assessment of data sources (WP9). All of this goes to the Steering Group tomorrow and all being well, will be posted on this project blog a day or two later.
  • A DAF-based survey (23 questions) of researchers’ data management practices and requirements. We’ve asked all Engineering academics to complete this via the Head of School; we’ve sent a request to all Research Directors in the university to encourage their academic colleagues to complete the survey, and have also advertised it to all staff on three occasions this week via the daily all staff alerts. So far, 28 people have completed it (about 5% of staff on academic contacts). We’ll continue to push this after the Easter break and publish a summary once we think we’ve exhausted our chances of staff filling it in on mass. Probably later in the month.
  • Harry Newton joined us as a second Developer, working with Nick Jackson. Harry graduated with an MSc in Computer Science from Lincoln last year. He’s been ‘bedding in’ this week and started working on adding ‘projects’ functionality to Orbital.
  • A release date for v0.1 of the Orbital software is May 1st. A couple of weeks ago, a Robotics researcher asked us if we could help him publish his datasets (20GB). We did so, offering him server space, guidance on his choice of license and a proxy URL to use for citation. It made us realise that there’s probably quite a few researchers like him that just need to get data on the web for citation purposes, so we thought we’d aim to have something permanently in place for the university by May 1st. Functionality for the v0.1 release will be: secure login, basic project creation and deletion, file upload, license picker, publish to permanent URL for citation. We think this is the bare minimum needed for a researcher to publish open research data so that it is permanently citable. From May 1st, we’ll maintain a working system on which to base discussions with users about additional functionality. For the time-being, researchers wishing to upload data will have to discuss it with us first.
  • We’re on the list of testers for the new DataCite API and have registered with ORCID, too, which has a mock API for testing against.
  • We’re helping organise the JISC MRD/DevCSI MRD Hackday on May 2-3rd, when we hope to be able to demo this work and talk about the implementation in detail. Fingers crossed.

This is a proposal for a paper at the Open Repositories 2012 conference in July.

The JISC-funded Orbital project is building on earlier work at the University of Lincoln to develop a state-of-the-art research data management infrastructure, piloted with the first purpose-built School of Engineering in the UK in over 20 years.

Orbital (figure c) differs from traditional database applications in three significant ways:

  1. Orbital Core uses MongoDB, a document-oriented, schema-less, so-called ‘NoSQL’ database. MongoDB offers flexibility in that it is capable of accepting an object representing any kind of data (e.g. tabular data, survey results, images) without the need to develop a schema beforehand. MongoDB also includes useful features which can boost performance and resiliency, namely sharding – slicing data across multiple servers so a request may be processed by multiple servers in parallel – and replication — keeping multiple identical copies of data on different servers in case one of them fails. Orbital is also designed to be able to spread the ‘core’ – the application which does the heavy lifting – and the ‘manager’ – the front-end user interface – across multiple servers without causing stress. In our experience MongoDB, combined with the Sphinx search engine to perform full-text searching, is also extremely fast and allows us to develop simple, attractive APIs which we can expose to user applications.
  2. Orbital Core mediates access to the data via an open source OAuth 2 server we have developed and implemented at Lincoln.  The use of OAuth 2 allows access to the data from multiple authorised systems providing that the owner of the data has given permission, instantly opening the Orbital application to third-party extension. This method establishes the identity, authentication and authorisation of users, providing direct access to individual data sets or portions of data sets (e.g. specific rows/columns) through APIs on Orbital Core.
  3. The design and development of Orbital Core is API-driven, resulting in an application that offers 100% of its functionality through APIs, whether to our own Orbital Manager or a third-party application, each of which are treated equally by Orbital Core (figure c). As far as Orbital Core is concerned there is no functional difference between Orbital Manager (the front-end) and an application that a researcher has developed to meet a specific need; they are subject to the exact same access controls, restrictions, sanity checking and limitations. We have therefore eschewed some of the traditional approaches of building a database application, where access to the database is either provided via a stand-alone application (figure a) or via an API bolted on to the database (figure b). Orbital is also designed to be both stateless, i.e. all of the API functions are RESTful and thus represent a complete transaction with no requirement for session affinity, are not reliant on SQL features like transactions and joins, and have a reduced requirement for referential integrity.

Under this design, the API is the only way to interface with the data and functionality of the system. This API-driven approach offers several benefits:

  • Architecture is better: We are forced to think about data types and methods early on. Consistent behaviour across the application is easier to achieve.
  • Development is easier: Calling a well designed API is simple; error messages become cleanly captured by design; APIs encourage code reuse at both API and application end.
  • Updates become simpler:  We can run two or more API versions concurrently; tweak the API back-end and all front-end applications (‘official’ and 3rd party) benefit at once.
  • The APIs are better: The APIs must include everything we want our application to be able to do. Reliability of the API is now critical which encourages better design of resiliency and error handling; and usability of the API is essential which encourages better documentation.

The challenges of this approach are that every time we want to build user-facing functionality we have to assess our APIs and work out where the functionality belongs as well as ensuring that we have lightweight data transfer and reliable error handling designed into the application. We also have to double up on some areas of development, writing both the respective Core and Manager parts of the system.

Illustrations

Figure a: The only way to interact with this application is to either be a user, or pretend to be one (for example via screen-scraping).
Figure b: The most common form of API, consisting of a ‘second view’ on the data and functionality of an application. This style of API often exposes a limited subset of the application’s functionality.
Figure c: In an API-driven model the API is the only way to interface with the application.

I’m very pleased to write that today we recruited Nick Jackson as the Orbital Lead Web Developer. Nick has worked with us on three former JISC projects (Total Recal, Jerome and Linking You) and was instrumental in developing http://data.lincoln.ac.uk. Nick’s work on developing and implementing a number of the tools we now take for granted at the University will be extremely valuable to the Orbital Project and we look forward to working with him and learning from him in his new role. He formally starts on the project on Monday 31st October.

The Orbital project formally begins today and I’m pleased to be able to write that we’ve just advertised internally for the post of Lead Web Developer. If all goes well, the chosen candidate should be in place and working on the project by the end of this month. The Lead Developer role is key to the project and will be working closely with myself (Joss Winn – Project Managers) and Paul Stainthorp, Lead Researcher.

This post is established within the Centre for Educational Research and Development (CERD) to work as Lead Developer on the JISC-funded ‘Orbital’ project. The Orbital project has been funded to develop, implement and pilot a new infrastructure for managing research data at the university. Further information on the Orbital project can be found at: http://lncn.eu/t48

Working closely with colleagues in the ICT Online Services team and Library, CERD has been successful in leading a number of innovative research and development projects that improve the use of technology in higher education and the University of Lincoln in particular. The new post of Lead Developer will work alongside colleagues in CERD, ICT and the Library to build on this recent success and contribute to the delivery of the Orbital project objectives.

The role requires extensive knowledge of the web and its attendant technologies and the software development and analytical skills to put this knowledge to good effect. In particular, candidates for the role should have demonstrable experience as both a producer and consumer of RESTful web services for large data stores.