A JISC-funded Managing Research Data project

Posts tagged OpenStack

It’s been a while since I gave you an update on the technical side of Orbital, so here’s a lightning-fast overview of what’s going on.

CKAN

We’re still working on fine-tuning CKAN for our needs. Although we’ve made advances in the fields of theming, datastore, HTTPS and a few other tweaks we’re still plagued by mixed HTTP/HTTPS resources, plugins which are difficult to install, broken sign-in using our OAuth 2 SSO service, a broken search and a complete unwillingness of the Recline preview to work. I suspect a lot of this is down to unfamiliarity with the codebase and with Python in general, although some areas of CKAN do feel like they’re a collection of hacks built on top of some more hacks built on a framework which is built on another framework which is built on a collection of libraries which is built on a hack.

In short, CKAN is still in need of a lot of work before our deployment can be considered production ready (hence the “beta” tag). That said, we are already using it to store some research data and the aspects which we’ve managed to get working are working well. We’re going easy though, because CKAN 1.8 and 2.0 are apparently due to land in the next couple of months.

Orbital Bridge

Our awesomely named Orbital Bridge will serve as the central point for all RDM activity around a project, as well as helping people through the process of general project management by being a springboard to our existing policy and training documentation.

Currently Bridge’s public-facing side is in a very basic state, with only static content, but is serving as a test of our deployment toolchain. However, behind the scenes Harry has been working on ways of shuffling data around between systems using abstraction layers for aspects such as datasets, files, people and projects. Today we sat down with Paul and went through some aspects of minimal metadata which are required to construct things to an acceptable standard, which will lead to additional work both on CKAN and our existing ePrints repository to smooth the transfer of things between them.

AMS

The University’s new Awards Management System is designed to help researchers plan their funded research, walking them through the process of building their bid. The system itself has begun its roll-out across the University, and as soon as we’re given access to the APIs we’ll be integrating the AMS with Orbital Bridge, allowing seamless creation of a research project based on the data in the AMS.

This work also helps to inform stuff we’re doing in Bridge around abstracting the notion of a ‘project’ between all our different systems.

Kumo

Our ongoing OpenStack project, which we will use as the bed to provide the technical infrastructure, is slowly moving closer to a state which we can begin to develop on. Tied in with this effort is our continued work on automating our provisioning, configuring, deployment, maintenance, monitoring and scaling.

On Wednesday, we hosted three people from the Open Knowledge Foundation, to discuss the Orbital project and their software, CKAN. It was a very engaging and productive day spent with Peter Murray-Rust (on the Advisory Board of OKFN), Mark Wainwright (community co-ordinator) and Ross Jones (core developer). We asked them at the start of the day to challenge us about our technical work on Orbital so far and I described the day to them as an opportunity to evaluate our work developing the Orbital software so far. We didn’t touch on the other aspects of the Orbital project such as policy development and training for researchers.

To cut to the chase, the Orbital project will be adopting CKAN as the primary platform for further development of the technical infrastrcuture for RDM at Lincoln. This is subject to approval by the Steering Group, but the reasons are compelling in many ways and I am confident that the Steering Group will accept this recommendation. More importantly, the Implementation Plan that was approved by the Steering group and submitted to JISC remains unchanged.

The raw notes from our meeting are available here. Remember these are raw notes written throughout the day, primarily for our own record. They probably mean more to us than they do to you! Thanks to Paul Stainthorp for his fanatical note taking :-)

Here’s the list of attendees and our agenda:

Present

Peter Murray-Rust (OKFN)
Mark Wainwright (OKFN)
Ross Jones (OKFN)
Joss Winn (University of Lincoln, CERD)
Nick Jackson (University of Lincoln, CERD)
Harry Newton (University of Lincoln, CERD)
Jamie Mahoney (University of Lincoln, CERD)
Alex Bilbie (University of Lincoln, ICT services)
Paul Stainthorp (University of Lincoln, Library)

Agenda

09.30 Introductions
10.00 Orbital introduction and context: Student as Producer, LNCD; Orbital bid and pilot project; Discussion of Orbital approach, the data we’re using, user needs etc.
10.30 CKAN introduction and context
11.00 Technical discussion – Orbital
12.00 LUNCH
12.30 Technical discussion – CKAN
13.30 Discussion – should Orbital adopt CKAN?
14.00 data[.lincoln].ac.uk
15.00 Next steps; Opportunities for collaboration/funding?

What is probably of most interest to people reading this are the pros & cons of the Orbital project adopting CKAN. I’ll provide more context further into the post, but here’s a summary copied from our notes:

(more…)

Our aim is to release a new version of Orbital every month until the end of the year. Yesterday, we released version 0.3, which, as well as many small improvements and bug fixes,  improves the handling of dynamic datasets and begins work on implementing and integrating ownCloud with Orbital. Here’s the changelog.

  • Improvements to project activity timelines:
    • Public/private modes
    • Calendar events
  • Improvements to filetype handling and file uploads
  • Improvements to file management, collections and private/public modes
  • Dynamic datasets:
    • A working query builder
    • Queries can be saved and re-run against data
    • CSV output of data for use by external tools e.g. Matlab
  • Working Datasets:
    • Preparation for ownCloud integration (integration with Lincoln SSO, evaluation of product, contact with developers)

The plan for version 0.4 is full ownCloud integration with Orbital via the respective APIs, which will provide the first part of the overall Orbital workflow: ‘Working Data’ -> ‘Dynamic Data’ -> Archive Files. During two weeks in August we’ll also be setting up our own private in-house cloud using OpenStack and moving Orbital in-house from Rackspace.

Agenda & notes

  1. Application update: Orbital v0.3 will be release today. New dynamic dataset features. Import/export/query data. Already auto importing Siemens’ sensor data for analysis. Integration with MatLab will be presented to the Siemens/Lincoln research group on August 1st. We have set up ownCloud as an alternative to Dropbox and integrated with it with Lincoln SSO. Working on full integration with Orbital for Orbital v0.4 release next month. Nick, Harry and other members of LNCD will take third/fourth weeks in August to set up and build OpenStack private cloud for R&D/Academic Computing. Following that, Orbital servers on Rackspace will be moved in-house to OpenStack. Initially starting with three servers (26 cores, 140GB RAM) and 30TB storage. We intend to provide a Continuous Integration (CI) environment (Gitorious/Jenkins) for staff and student research, as well as  support LNCD R&D.
  2. Edinburgh/OR12:  Presented a paper at OR12, which was well received. Came away thinking that RDM has ‘arrived’ and that platforms for RDM need to be more integrated into research process than research output repositories have been. Orbital is well placed for this e.g. ‘dynamic datasets’. Discussed systems with OKFN/CKAN representative at OR12 and have invited OKFN to Lincoln to discuss CKAN and Orbital. Initial Skype meeting Monday 30th July. We have asked OKFN to challenge/persuade us as to why we should adopt CKAN rather than continue to develop Orbital. Need to think about the pros/cons of eight more months of Orbital development vs. eight months contributing to CKAN development. May lead to Lincoln extending CKAN for academic RDM, re-using bits of Orbital. Still not clear. Nick has also contacted Patrick at Soton about his winning proposal for DevCSI developer challenge. Patrick’s proposal is very similar to Orbital and both Nick and Patrick are keen to work on dynamic data and visualisations together with others.
  3. RDM Policy: Joss and Annalisa met with Andrew S, Head of Research & Enterprise Office. Reviewed draft RDM Policy. Will make minor amendments but on the whole agreement on way forward.
  4. Training: Joss, Annalisa, Melanie, Bev, Paul to meet early September and bring together RDM training materials produced by other projects. Evaluate, synthesise, extend and re-package for Lincoln. Agreed to arrange RDM training workshops via HR for staff every three weeks from late September to test and inform the development of this project deliverable. Annalisa/Melanie to arrange meeting and workshops.
  5. Metadata: Discussion about adopting BL’s minimum metadata requirements for DataCite. Agreement that the mandatory and optional attributes should be part of Orbital. Need to talk to/confirm with Bev about this and add relevant tasks to Pivotal Tracker. CKAN (see above) meeting relevant to this. If CKAN is used to publish datasets, need to ensure it meets this requirement.
  6. Research Information/Systems integration: Research Information Management (RIM) at the university is dependent on three systems: new Awards Management System (AMS) (Research and Enterprise Office), Orbital, and EPrints (Library). Need to contact Worktribe about API access to AMS for Orbital. EPrints work planned to add the ‘REF plugin’ will give us better data. SWORD2 deposit from Orbital planned. Again, need to consider in light of CKAN conversations. Other work going on in university to build business intelligence dashboards. Lee to arrange meeting with Registry DBAs and Orbital team to discuss data warehousing and dashboards. Lots of overlap in interest/experience/skills. Not enough talking.
  7. Business Case: Joss still waiting on storage modelling costs from ICT to present to SMT in September. Will start process of writing Business Case following that meeting.