On Wednesday, we hosted three people from the Open Knowledge Foundation, to discuss the Orbital project and their software, CKAN. It was a very engaging and productive day spent with Peter Murray-Rust (on the Advisory Board of OKFN), Mark Wainwright (community co-ordinator) and Ross Jones (core developer). We asked them at the start of the day to challenge us about our technical work on Orbital so far and I described the day to them as an opportunity to evaluate our work developing the Orbital software so far. We didn’t touch on the other aspects of the Orbital project such as policy development and training for researchers.
To cut to the chase, the Orbital project will be adopting CKAN as the primary platform for further development of the technical infrastrcuture for RDM at Lincoln. This is subject to approval by the Steering Group, but the reasons are compelling in many ways and I am confident that the Steering Group will accept this recommendation. More importantly, the Implementation Plan that was approved by the Steering group and submitted to JISC remains unchanged.
The raw notes from our meeting are available here. Remember these are raw notes written throughout the day, primarily for our own record. They probably mean more to us than they do to you! Thanks to Paul Stainthorp for his fanatical note taking 🙂
Here’s the list of attendees and our agenda:
Peter Murray-Rust (OKFN)
Mark Wainwright (OKFN)
Ross Jones (OKFN)
Joss Winn (University of Lincoln, CERD)
Nick Jackson (University of Lincoln, CERD)
Harry Newton (University of Lincoln, CERD)
Jamie Mahoney (University of Lincoln, CERD)
Alex Bilbie (University of Lincoln, ICT services)
Paul Stainthorp (University of Lincoln, Library)
10.00 Orbital introduction and context: Student as Producer, LNCD; Orbital bid and pilot project; Discussion of Orbital approach, the data we’re using, user needs etc.
10.30 CKAN introduction and context
11.00 Technical discussion – Orbital
12.30 Technical discussion – CKAN
13.30 Discussion – should Orbital adopt CKAN?
15.00 Next steps; Opportunities for collaboration/funding?
What is probably of most interest to people reading this are the pros & cons of the Orbital project adopting CKAN. I’ll provide more context further into the post, but here’s a summary copied from our notes:
Discussion – should Orbital adopt CKAN?
Should #orbitalMRD cease development of Orbital software and switch to CKAN?
- Should be able to get CKAN up and running v. quickly
- University won’t care (esp. if we can re-skin under Common Web Design)
- More sustainable – less reliance on local developers on fixed term contracts
- Not throwing away what we’ve already learnt (about users, etc.)
- LNCD devs can contribute to the CKAN greater good
- CKAN can benefit other things (data.lincoln.ac.uk) not just Orbital
- More community support. Widely used internationally by public sector organisations.
- Support packages from OKFN
- Opportunities for collaboration & joint bids for funding w/ OKFN
- Not re-inventing the wheel.
- Nice visualisation plugins ready to go.
- JISC would be “excited by a clear, rational change in direction mid project”
- JISC would appreciate a LNCD/OKFN collaboration?
- Siemens probably don’t mind which technology is used.
- We should take the opportunity to become conversant in as many new languages as possible.
- We can contribute code to CKAN instances all over the world inc. data.gov.uk – high profile exposure for LNCD
- Exposure to Linked Data community?
- CKAN interested in MongoDB, SWORD2 extensions – we can contribute
- RCUK response – will be interest in a system which can deal with the ‘long tail’ of scientific data (lots of .csv) – opportunities to build a genuinely open, community-run platform.
- Likely more uptake from other universities – more likely to adopt CKAN-Orbital than Orbital-Orbital.
- Save time to spend on high-value project activities, such as Kumo/OpenStack.
- Danger of mismatch between our approaches on Orbital and data.lincoln.ac.uk
- Courageous step mid-project to switch allegiance
- Less flexibility/control? CKAN might be OK for Orbital but might we want to stick with our own approach for other projects?
- Lose out on some of the higher-level VRE functionality (although this can be reproduced if necessary)
- New learning curve (learning Python, CKAN APIs etc)
- Will have to get used to working in a new language (Python) – not our usual
- Mismatch of cultures between academia/university and CKAN?
- Negative feedback from JISC for being too risky (????????? unlikely)
- Fewer opportunities for commercialisation for unilincoln?
- Are there the same opportunities for RDM system-as-data-processing-platform with CKAN as there are with MongoDB? – Probably
- Does CKAN offer the same level of sophistication around security/authentication? – v2 may
- Waste time on newbie errors and re-tooling.
- CKAN documentation not suited to academic community.
As Project Manager, my primary concern is meeting the intended deliverables by the end of the project and ensuring that those deliverables can be sustained. The nine months of technical development on Orbital has produced good, working software, but most importantly, we have learned a lot about RDM, the requirements of our researchers, and how that translates into technological requirements. However, having kept an eye on CKAN development over that period, too, we recognised that we are very much trailing behind what CKAN offers as a drop-in replacement to Orbital. CKAN is a five year old project and has many more people contributing to it.
Prior to the meeting this week, we had more-or-less agreed among the team that CKAN could provide the ‘archive/publication’ functionality we were developing in Orbital. We were less confident that CKAN could match Orbital’s ‘dynamic datasets’ functionality – essentially a datastore with APIs for data analysis. However, we were impressed by what is already in place for this in CKAN and were told that the short-term roadmap for CKAN will see a switch from using Elastic Search to PostgreSQL in schemaless mode. We were also reassured and encouraged to pursue the use of MongoDB with CKAN, as that is our preferred NoSQL database technology. Such an extension to CKAN is likely to be widely used by other people.
What CKAN doesn’t provide is the first part of the Orbital workflow, which is networked desktop storage – an ‘academic dropbox‘, which is integrated with the dynamic database and archive applications. We remain convinced that ownCloud offers the best technology for achieving this and will work on integrating ownCloud with CKAN under the name of Orbital.
So, while the implementation plan remains the same, the technology is changing as follows:
Workspaces: ownCloud -> Dynamic Datasets: CKAN -> Archive: CKAN. These will all be integrated by and branded as ‘Orbital’.
Integration will happen through bespoke extensions for CKAN as well as appropriate Orbital-specific applications for passing data and users between systems. Our initial ToDo list now looks like this:
- ownCloud and CKAN integration
- MongoDB/CKAN extension
- Lincoln Authentication/CKAN
- SWORD2/CKAN extension
- Awards Management System/CKAN integration
- A CKAN language file for academic use.
We’ll be breaking down this work and moving it into our issue tracker soon.
Initially, Orbital will be developed and piloted on our forthcoming OpenStack cloud, which we’re currently working on. The LNCD group, who are working on the Orbital project, aim to demonstrate to the university a complete RDM system, managed on our own cloud platform, offering researchers a new network storage solution and suite of research tools, including CKAN, which can be used throughout the research process, concluding with the deposit of research data for publication. The LNCD group is increasingly being seen as the innovation group which points the way to potential changes in the role and use of technology at Lincoln and Orbital is our showcase project.
As an evaluation of our work mid-way through the Orbital project, I was pleased with how the day went and the feedback we got from OKFN. They were impressed with Lincoln, especially the Student as Producer initiative, which LNCD, our R&D group, has evolved out of. They were also impressed with the way we are approaching RDM, which is to ensure that the technical infrastructure is embedded in the research process itself, rather than being simply a deposit and publication tool. In the wider context of our work on building a cloud computing platform, tools and services for researchers, and data.lincoln.ac.uk, they commended our work and what we have achieved so far. They pointed out the importance of developing it further and ensuring that our work and our approach is recognised and valued internally as it is increasingly externally.
Discussion of Pros/Cons
For clarity, I know want to discuss some of the pros/cons above.
Some of the points highlight that neither the University of Lincoln nor its researchers are, at this stage, committed to any single technological solution for RDM. We are committed to an approach to RDM, but not to a single technology to achieve that. A switch to CKAN at this stage would be largely transparent and impact very little on the two research groups we are working with in the School of Engineering. What the institution will care about is the long-term implications for support and sustainability of our RDM infrastructure. CKAN is widely used (the most widely used?) technology for ‘managing’ datasets that are intended to be published to the web. It has been adopted by high-level public sector projects across the world, most significantly, http://data.gov.uk. OKFN also offer consultancy and on-going support contracts which we could use, as we have with EPrints Services at Southampton. As technical staff come and go at Lincoln, the RDM platform should remain largely unaffected.
For us, as an R&D team, it is very exciting to be able to contribute to a large and growing open source project aimed at managing and publishing data on the web. We are learning a lot about academic research data management throughout the Orbital project and can bring this to the CKAN/OKFN community as well as learn much from it, too. Being five years old, the CKAN project provides a lot of the functionality we are beginning to develop in Orbital and with the same effort we think we can achieve much more. We also think we will achieve more impact overall by bringing together Orbital and CKAN than by simply going it alone.
We also discussed how it would look for the Orbital project to switch technologies at this stage and we all agreed (rightly or wrongly!) that JISC, our funders, would probably see it as a positive and courageous move, further opening up the potential joint work between the open data community and the research data management community. As I said above, I am confident that our project Steering Group will back this decision, too.
On the down side, the cons/risks of moving to CKAN means that Harry and Nick will have to start working in Python, which they have limited experience programming in. This was the matter of some discussion but both felt they could do it and would be happy to do it. Ross, a more experienced CKAN developer, encouraged them to not get tied down to any one particular programming language. All agreed, that this was not a show-stopper. LNCD are currently implementing OpenStack, which Orbital will initially run on, and it is also written in Python, so becoming comfortable with this language is a necessity anyway.
We initially had concerns about the flexibility of CKAN and whether it could meet our requirements, not just as a data deposit/publication tool, but also as a tool to support the research process.We were assured that most of the additional functionality we require could be developed as CKAN extensions, and other code could run as intermediary services that talk to CKAN and other systems through its API. We were assured that CKAN exposes most (if not all) of its core functionality through APIs. We like APIs.
CKAN does not have a large uptake within the academic community – yet. Its documentation and user interface does not use the language that we might use either. For example, CKAN doesn’t undersand what a ‘project’ is, whereas projects are at the heart of Orbital. However, CKAN has something similar, called ‘groups’ that could be reconceived as projects with project members, and the roadmap for v2 includes the introduction of ‘organisations’ in CKAN, which will allow more project-like functionality. CKAN has been widely translated, too, so we will look at creating a new language file for CKAN for academia.
Security is a big issue for Orbital, being a pilot project with the School of Engineering and the management of commercially sensitive data has been a conern from the start. Nick spent the first few weeks of the project, designing and writing security into Orbital Core. CKAN has not been built with security as a priority but the next major version of CKAN introduces new levels of security alongside the new ‘organisations’ functionality, so we were reassured that granular control over security is being improved and that work will be in place by the end of our project. Perhaps we can contribute to that, too.
We have had a local install of CKAN for a few weeks now and will begin to work on it alongside the Kumo cloud project. We’ve forked the CKAN code and joined the IRC and mailing lists. We’re talking to OKFN about some initial consultancy to help get us up to speed with CKAN development. There is nothing too daunting about this process. We’ll be attending the Open Knowledge conference in Helsinki next month and hope to talk about CKAN as a tool for academic use. Nick and Harry will probably attend the PyCon conference later in September, too. Immersing ourselves in this new community is as essential as digging around in the CKAN code.
On a more personal note, I want to thank everyone at the meeting on Wednesday, especially Harry and Nick, the Developers working full-time on Orbital. Together we recognised that our effort over the remaining 8-9 months of the Orbital project would be best directed towards extending CKAN rather than continuing to develop our own software from scratch and they showed the courage to switch direction in the middle of the project, rather than cling on to what they had developed so far. That is not to say that all the code developed for Orbital so far will be abandoned. Some of it can be re-used, some of it ported to CKAN and most importantly, the code is just an expression of our ideas, and CKAN will now be the vessel for realising those ideas and ambitions.