Metadata – Orbital

Agenda & notes

Application update: Orbital v0.3 will be release today. New dynamic dataset features. Import/export/query data. Already auto importing Siemens’ sensor data for analysis. Integration with MatLab will be presented to the Siemens/Lincoln research group on August 1st. We have set up ownCloud as an alternative to Dropbox and integrated with it with Lincoln SSO. Working on full integration with Orbital for Orbital v0.4 release next month. Nick, Harry and other members of LNCD will take third/fourth weeks in August to set up and build OpenStack private cloud for R&D/Academic Computing. Following that, Orbital servers on Rackspace will be moved in-house to OpenStack. Initially starting with three servers (26 cores, 140GB RAM) and 30TB storage. We intend to provide a Continuous Integration (CI) environment (Gitorious/Jenkins) for staff and student research, as well as support LNCD R&D.
Edinburgh/OR12: Presented a paper at OR12, which was well received. Came away thinking that RDM has ‘arrived’ and that platforms for RDM need to be more integrated into research process than research output repositories have been. Orbital is well placed for this e.g. ‘dynamic datasets’. Discussed systems with OKFN/CKAN representative at OR12 and have invited OKFN to Lincoln to discuss CKAN and Orbital. Initial Skype meeting Monday 30th July. We have asked OKFN to challenge/persuade us as to why we should adopt CKAN rather than continue to develop Orbital. Need to think about the pros/cons of eight more months of Orbital development vs. eight months contributing to CKAN development. May lead to Lincoln extending CKAN for academic RDM, re-using bits of Orbital. Still not clear. Nick has also contacted Patrick at Soton about his winning proposal for DevCSI developer challenge. Patrick’s proposal is very similar to Orbital and both Nick and Patrick are keen to work on dynamic data and visualisations together with others.
RDM Policy: Joss and Annalisa met with Andrew S, Head of Research & Enterprise Office. Reviewed draft RDM Policy. Will make minor amendments but on the whole agreement on way forward.
Training: Joss, Annalisa, Melanie, Bev, Paul to meet early September and bring together RDM training materials produced by other projects. Evaluate, synthesise, extend and re-package for Lincoln. Agreed to arrange RDM training workshops via HR for staff every three weeks from late September to test and inform the development of this project deliverable. Annalisa/Melanie to arrange meeting and workshops.
Metadata: Discussion about adopting BL’s minimum metadata requirements for DataCite. Agreement that the mandatory and optional attributes should be part of Orbital. Need to talk to/confirm with Bev about this and add relevant tasks to Pivotal Tracker. CKAN (see above) meeting relevant to this. If CKAN is used to publish datasets, need to ensure it meets this requirement.
Research Information/Systems integration: Research Information Management (RIM) at the university is dependent on three systems: new Awards Management System (AMS) (Research and Enterprise Office), Orbital, and EPrints (Library). Need to contact Worktribe about API access to AMS for Orbital. EPrints work planned to add the ‘REF plugin’ will give us better data. SWORD2 deposit from Orbital planned. Again, need to consider in light of CKAN conversations. Other work going on in university to build business intelligence dashboards. Lee to arrange meeting with Registry DBAs and Orbital team to discuss data warehousing and dashboards. Lots of overlap in interest/experience/skills. Not enough talking.
Business Case: Joss still waiting on storage modelling costs from ICT to present to SMT in September. Will start process of writing Business Case following that meeting.

I’ve been at the University of Warwick today, for a workshop organised by the Digital Curation Centre (DCC), entitled RDMF7: Incentivising Data Management & Sharing. There appeared to be a wide range of attendees, from data curators & data scientists, ICT/database folk. actual researchers and academics, as well as at least one fellow library/repository rat.

Unfortunately I was only able to attend part of the event (which ran over two days). The following notes have been reconstructed from the Twitter stream (hashtag #RDMF7)!

The first speaker I heard was Ben Ryan of the funding council, the EPSRC. He talked about the “long-established” principles of responsible data management [links below]… this may be my own interpretation of Ben’s presentation, but I don’t think I was imagining undertones of “…so there’s really no excuse!“. He also covered individual and institutional motivations for taking care of data [much more about which later], policy and the enforcement of policy, dataset discoverability/metadata, funding (including the EPSRC’s expectation that institutions will make room in existing budgets to meet the costs of RDM), and embargo periods (inc. researchers’ entitlement to a period of “privileged use of the data they have collected, to enable them to publish” first – important to stress this in order to allay fears/get researchers on board?).

Some links:

Next up was Miggie Pickton, ‘queen bee’ of the University of Northampton‘s repository (and self-described RDM “novice”, indeed!), talking about their participation in the multi-institution, JISC-funded KeepIt project, which aimed to design “not one repository but many that, viewed as a whole, represent all the content types that an institutional repository might present (research papers, science data, arts, teaching materials and theses).” This work lead almost by chance to Northampton’s undertaking of a university-wide audit of its research data management processes using the DCC’s Data Asset Framework (DAF) methodology. This helped them to make the case for an institutional research data management working group and [eventually, and not without resistance] to establish a mandatory, central policy for RDM. (Show of hands at this point: how many other institutions have completed a DAF? I counted perhaps only three, Lincoln certainly not being amongst them. Q. Should the University of Lincoln complete a Data Asset Framework exercise as part of the Orbital project?)

After coffee, we heard a third presentation from Neil Beagrie of (management consultancy partnership) Charles Beagrie Ltd. Neil delivered a very comprehensive explanation of the KRDS (“Keeping Research Data Safe”) project, which has developed both an activity model and a benefits analysis toolkit for the management and preservation-of-access to ‘long-lived data’. I have to come clean here and admit that I was a little bewildered by the detail: much of it went through both ears without sticking to the brain on the way through. I need to go back over the tweets more carefully and have a look at the KRDS toolkit and reports at: beagrie.com/krds.php

The morning’s presentations over, we split into three groups for breakout discussion.

I attached myself to the second of the three groups, led by (JISC programme manager for Orbital) Simon Hodson; our job to consider the question: “What really are the sticks and carrots that will make a long-term difference to the pursuit of structured data management processes?“. After spending some time picking apart the terminology, and what each of the various ‘processes’ might include, we had a wide-ranging (and allocated-time-overrunning) discussion about the things that genuinely motivate scientists, universities, and funding councils(!) to care about RDM; about some of the problems caused by the complexity and inconsistency of metadata for datasets; also about the issue of citations/digital object identifiers for data—how those citations might be treated by publishers and citation data services—and how that relates to any notions of ‘peer review’ in experimental data.

As requested, our group came up with three actions which we believe will help address the question of motivation:

Data citation – publishers should consistently include e.g. DOIs for datasets in final published articles, so that citations of the data can be measured.
Measurement of RDM “maturity” – departments and whole institutions should adopt a standardised quality mark for research data management, to give [potential] researchers, funding bodies, and the public confidence in their ability to handle data appropriately.
Discovery – the research councils (probably) should push for common metadata standards for describing datasets and underlying data-generating research/experimental processes.

Lunch followed, and I had time to hear two more presentations in the afternoon before I had to run for a bus:

Catherine Moyes of the Malaria Atlas Project: in effect, demonstrating what really clear and consistent management of large-scale (geo)data looks like. This seems to consist of an extremely rigorous approach to requesting, tracking, and licensing data from the contributors of the project’s data… and an equally strict (but in a good way) expectation of clarity when dealing with requests from third parties to use the data. If that all comes across as restrictive, I’d point to Catherine’s slide on ‘legalities’ of the data that the Malaria Atlas Project has released openly – it’s about as open as it gets, with no registration needed, no terms & conditions placed on re-use of the published data, and all software/artefacts released under very permissive and free licences (Creative Commons or GNU). N.B. the Orbital project should look at the Malaria Atlas Project’s “data explorer”, available via map.ox.ac.uk, as an example of a really nifty set of applications built on top of openly accessible and re-usable data.

Finally (and I’m sorry I only got to hear part of his presentation), University of So’ton chemistry professor Jeremy Frey on their IDMB (Institutional Data Management Blueprint) Project—southamptondata.org—and some rather funny anecdotes about the underlying knowledge, expectations, and problems faced by researchers managing their own data, which emerged when they were surveyed as part of the above project.

Lots to take in (lots). But some useful suggestions for Orbital, which I’ll be bringing to the next project meeting: and plenty more reading material which I’ll add to the project reading list asap.

—Paul Stainthorp, lead researcher on the Orbital project.

Tag: Metadata

Orbital Team meeting notes 26-07-12

Agenda & notes