Orbital Team Meeting 22nd November

Present

Nick Jackson
Melanie Bullock
Joss Winn
Bev Jones
Paul Stainthorp
Harry Newton

Apologies

Annalisa Jones

Agenda

Policy & Business Case
Training
Technical
Dissemination
Budget

Policy & Business Case

Ieuan Owen has taken over senior management of research at the University.

JW met with Lisa Mooney and IO. Agreed policy and business case should be presented as “research services”, and include case studies where these have benefitted researchers.

JW to present business case to SMT on December 17.

Policy should be more presented more as collegial support than a university mandate. Policy to be re-worded to this effect.

Presentation to SMT – focus on long tail, benefits to institution, risk and benefits, “publishing of research data helped massively” in gaining new research grant. Roles required for RDM and institutional positions.

RDM proposed to be through existing roles, except for Data Scientist. LM suggested research funding for study into how role will fit into Lincoln.

Action: JW to circulate draft documents for business case and SMT presentation to Orbital team.

Development of policy and business case are on schedule.

Training

PS has developed an outline of a 1 hour training workshop covering RDM. Action: PS to blog this outline.

No success in getting PhD students from engineering to partake in initial sessions, will talk instead to Graduate School.

Training is not technical or software/process specific, instead focussing on high level concepts and best practice for RDM. Target is for training to be prepared by Christmas, even if sessions are not running by that time.

Existing RDM blog is to be used as authoring environment for policy and training documents and syndicated to permanent RDM site. Action: PS to talk to NJ and HN about ingestion of this content to Bridge.

When new Library Repository Officer is appointed, it was agreed they should become part of Orbital team, and involved in RDM.

Technical

Technical development continuing broadly to plan, with exception of OpenStack which has suffered various setbacks.

JW to include conceptual overview of Orbital Bridge in presentation to SMT.

Meeting planned with ICT to discuss Awards Management System (AMS) integration last week was cancelled due to illness, and has been rescheduled. AMS integration is highly desirable, but not essential.

NJ/PS/BJ/HN spent time on mapping concept of a ‘dataset’ within Orbital Bridge to ePrints using the SWORD deposit method.

Most of SWORD mapping is also valid for DataCite specification, which also informs sanity checking of data within Orbital Bridge.

Cost for membership of a DOI service to go into business case to SMT. DOIs should be minted at point of deposit to ePrints (‘publication’), and not at point of original dataset creation.

AMS project IDs are key for collecting items in a project together. AMS could also handle unfunded projects, but this will require extension of the system and is outside its current scope. Orbital avoids reliance on AMS by using Nucleus data store as primary keys and project IDs.

Dedicated time should be set aside for OpenStack work.

Dissemination

JW and PS went to JISC project meeting in Nottingham. JW presented on adoption of CKAN, as a result Bristol have adopted CKAN. JISC programme manager is encouraged by seeing us using CKAN.

JW went to DCC forum in Cambridge.

Management of active data is now a high priority in the RDM field.

JW submitted abstract for conference in Cologne on providing critical evaluation of CKAN for academic use. The resultant paper, if accepted, could inform JISC on the use of CKAN in academia.

A member of the Orbital team should attend DCC conference in Amsterdam, including specific themes on “what is a data scientist”. This will help inform a new role at Lincoln.

Budget

Funds remain for hardware and dissemination. It was suggested that some of this might be spent on developing a more permanent hosting solution for the Nucleus data warehousing platform.

Money is also required for a dedicated CKAN server and Orbital Bridge server, as well as possibly a dedicated database server for CKAN’s DataStore.

It is necessary to integrate Orbital with research impact analysis and recording systems. Action: MB to send NJ/HN information on impact recording systems.

A job description is being written for the post of “Research Information Management Developer” in the Library.

Presentations from the JISC MRD Programme Progress Meeting

Below are two short presentations I gave at the JISC programme meeting today. Both concern different aspects and advantages of using CKAN to manage research data. They simply link through to blog posts that have been written here which offer more detailed information. During the presentations, I gave demonstrations of using CKAN in practice.

Orbital at the Open Knowledge Festival #okfest

Harry and I attended the Open Knowledge Festival in Helsinki last week. Harry attended the CKAN sessions, while I was invited to be on a panel discussing ‘Immediate Access to Raw Data from Experiments’, which was part of the Open Research and Education stream of events. None of the panel members gave presentations as such, but you can read my notes and the session was recorded, too. Here’s all 46 minutes of it for your viewing pleasure.

The festival/conference was probably the best conference I’ve ever been to. It was completely sold out with 800 delegates and about 1000 participants in total. It was very international with many participants from outside the EU. It seemed like a genuine effort had been made to ensure that people from Africa, Asia and South America could attend, with some bursaries available. The conference programme, over five days, was largely crowdsourced in the run up to the event, and this made the programme very diverse, reflecting the diversity of interests people have in ‘openness’. It was also reassuring to find that despite the huge enthusiasm for openness in many aspects of public and civil society, people are also keenly aware of the challenges and issues that this raises, too, and ultimately the political ramifications of this endeavour.

The conference also seemed very well funded/sponsored, with support from the  Finish government, among many partners. The event was held at the fantastic Arabia Campus of the Aalto University, School of Art, Design and Architecture. When I visited Helsinki in 2008 for a conference about the design of learning spaces, delegates were bused up to the Arabia campus simply to see what a great place it is!

As well as participating in the above panel, I also got involved in the drafting of the ‘Open Research Data Handbook‘, which is a collaborative exercise in writing a handbook aimed at researchers who work with data. It’s my intention that the Orbital project commits some time to this and ultimately produces a Handbook useful for all researchers and possibly a variant for Lincoln researchers, too. I ensured that the authors of the Handbook are all aware of the DCC’s work as well as the various JISC-funded projects to produce training and guidance for researchers and I suspect that the Handbook will largely be a synthesis of sources which are already available.

Finally, I learned about the Panton Fellowships that the Open Knowledge Foundation have awarded this year, and both Fellows presented on their work. I think this is an excellent initiative from the OKFN to create a strong and direct tie with academia and support further research and action in our community. You can see both presentations from the Panton Fellows here and here.

Choosing CKAN for research data management

The switch to CKAN was an important decision for the Orbital project and I’d like to think that it will help raise the profile of CKAN within the academic community. We’d been keeping an eye on CKAN development from earlier on in the year, but it was the opportunity to talk to Mark Wainwright, OKFN Community Co-ordinator, at the Open Repositories 2012 conference that prompted us to really look at the potential of using CKAN as part of Lincoln’s Research Data Management infrastructure. Mark’s OR2012 poster (PDF) provides an nice overview of what CKAN currently offers.

Before I go into more detail about why we think CKAN is suitable for academia, here are some of the feature highlights that we like:

  • Data entry via web UI, APIs or spreadsheet import
  • versioned metadata
  • configurable user roles and permissions
  • data previewing/visualisation
  • user extensible metadata fields
  • a license picker
  • quality assurance indicator
  • organisations, tags, collections, groups
  • unique IDs and cool URIs
  • comprehensive search features
  • geospacial features
  • social: comments, feeds, notifications, sharing, following, activity streams
  • data visualisation (tables, graphs, maps, images)
  • datastore (‘dynamic data’) + file store + catalogue
  • extensible through over 60 extensions and a rich API for all core features
  • can harvest metadata and is harvestable, too

You can take a tour or demo CKAN to get a better idea of its current features. The demo site is  running the new/next UI design, too, which looks great.

CKAN’s impact

In its five years of development, CKAN has achieved significant impact across the world. Despite web scale open data publishing being a relatively recent initiaitve, CKAN, through the efforts of OKFN, is the defacto standard for the publishing of open data with over 40+ instances running around the world. How do the UK, Dutch, Norweigan and Brazilian governments make their data publicly accessible? The European Commission? They use CKAN.

On the flip side, CKAN has attracted significant interest from developers with 53 code contributors over 5 years and 60+ extensions.

Major CKAN changes since Orbital project began

When we first bid for the JISC MRD programme funding, CKAN was a less attractive offering to us. Our bid focused on an approach we’ve taken on a number of projects, using MongoDB as a datastore over which we built an application that adds/edits/reads data via a set of APIs we would write. Our bid also focused on security and the confidentiality of commercial engineering data. Since starting the Orbital project these concerns have been addressed or are being addressed by CKAN and the requested features we’ve identified through our engagement with researchers have also been integrated into CKAN, such as activity streams and data visualisation. Reading through the CKAN changelog shows just how much work is going into CKAN and with each release it’s developing into a better tool for RDM. Here are some of the headline features, in order of priority, that have turned our attention to CKAN over the course of the Orbital project.

CKAN in an academic environment

We’ve discussed the idea of a Minimum Viable Product for RDM, and consider it to be authentication, data storage, hosting/publishing, licensing, a persistent URI and analytics. These features alone allow an academic to reliably and permanently publish data to support their research findings and help measure its impact. CKAN meets these requirements ‘out of the box’. Other requirements of a tool for managing research data include the following (you can add more in the comment box – these are based on our own discussions with researchers and a quick scan of other JISC MRD projects)

  • Integration with the institutional research environment (e.g. hooks into CRIS system, Institutional Repository, DMPOnline, networked storage)
  • Capturing the research process/context/activity; notation, not just data
  • Controlled access to non-Lincoln staff e.g. research partners
  • Good, comprehensive search tools
  • Version control for data and metadata
  • Customisable, extensible meatadata
  • Adherence to data standards e.g. RDF
  • Multi-level access policies
  • Secure, backed up, scalable file storage for anywhere access to files and file sharing (e.g. Dropbox)
  • Command-line tools and good web UI for deposit/update of data
  • Permanent URIs for citation e.g. DOIs
  • Import/export of common data formats
  • Linking datasets (by project, type, research output, person, etc.)
  • Rights/license management
  • Commercial support/widely used, popular platform (‘community’)

RDM features that are currently lacking in CKAN

During our meeting with OKFN staff last month we identified several areas that need addressing for CKAN to meet our wider requirements for RDM. These are:

  • Security: CKAN is not lacking in security measures, but we need to look at CKAN’s security model more closely (roles, permissions, access, authentication) and also tie it into the university’s Single Sign On environment
  • ‘Projects’ concept: We think that the new ‘organisations‘ feature might work conceptually in the same way as this.
  • Academic terminology + documentation for academic use: We need to review CKAN and write documentation for an academic use case as well as provide a modified language file that ‘translates’ certain terminology into that more appropriate for the academic context.
  • Batch edit/upload controls. Certain batch functions are available on the command line, but out of the box, there’s no way to upload and batch edit multiple files, for example.
  • ownCloud integration: CKAN doesn’t provide the network drive storage that researchers (actually pretty much everyone) relies on to organise their files. Increasingly people are using Dropbox because of the synchonisation and sharing features. These are important to researchers, too, and moving data from such a drive to CKAN will be key to researchers adopting it.
  • EPrints integration (SWORD2): A way to create a record of CKAN data in EPrints, thereby joining research outputs with research data.

It’s these features that we’ll be concentrating on in our development on the Orbital project.

Harry and I are attending the Open Knowledge Festival in Helsinki later this month and will talk more about our choice of CKAN for research data. I’d be interested to hear from anyone working in a university who has looked at CKAN in detail and decided against using it for RDM. It seems odd to me that it has such a low profile in academia (or maybe I’m just clueless??) and I do think that the time has come to embrace CKAN and acknowledge the efforts of OKFN more widely. I know there are people like Peter Murray Rust and Mark MacGillivray, who are actively trying to do this and OKFN’s presence at Dev8D and OR2012 this year demonstrates its eagerness to work more closely with the university sector. Perhaps we’re near a tipping point?