A JISC-funded Managing Research Data project

Posts tagged research data management

I am very pleased to announce that our Research Data Management policy, which was one of the main objectives of the Orbital project, has been approved by the university’s Research Committee. The process of drafting the policy began in April 2012 as a collaborative effort between Orbital team members from the Centre for Educational Research and Development, The Library and the Research and Enterprise Office. Comments were then solicited from the Director of ICT, the Director of Research and Enterprise and the University Librarian. The draft was then presented to the Research Committee, which requested that the policy be discussed with the Senior Management Team due to its resourcing implications. This meeting took place in October 2012 and as a result, SMT requested that the College Research Directors were consulted on the Policy and agreed that a Business Case for Research Data Management (to effectively ‘underwrite’ the policy) should be put together. The Business Case was presented to SMT and accepted in January 2013. Following discussion with the Research Directors and further re-drafting, the policy was approved today.

This completes the formal objectives of the Orbital project and places us in a position where we have a Business Case, Policy and a new ‘Research Information Services’ team that is being formed to meet the expectations and aspirations of our researcher community and our funders.

On Wednesday, we hosted three people from the Open Knowledge Foundation, to discuss the Orbital project and their software, CKAN. It was a very engaging and productive day spent with Peter Murray-Rust (on the Advisory Board of OKFN), Mark Wainwright (community co-ordinator) and Ross Jones (core developer). We asked them at the start of the day to challenge us about our technical work on Orbital so far and I described the day to them as an opportunity to evaluate our work developing the Orbital software so far. We didn’t touch on the other aspects of the Orbital project such as policy development and training for researchers.

To cut to the chase, the Orbital project will be adopting CKAN as the primary platform for further development of the technical infrastrcuture for RDM at Lincoln. This is subject to approval by the Steering Group, but the reasons are compelling in many ways and I am confident that the Steering Group will accept this recommendation. More importantly, the Implementation Plan that was approved by the Steering group and submitted to JISC remains unchanged.

The raw notes from our meeting are available here. Remember these are raw notes written throughout the day, primarily for our own record. They probably mean more to us than they do to you! Thanks to Paul Stainthorp for his fanatical note taking :-)

Here’s the list of attendees and our agenda:

Present

Peter Murray-Rust (OKFN)
Mark Wainwright (OKFN)
Ross Jones (OKFN)
Joss Winn (University of Lincoln, CERD)
Nick Jackson (University of Lincoln, CERD)
Harry Newton (University of Lincoln, CERD)
Jamie Mahoney (University of Lincoln, CERD)
Alex Bilbie (University of Lincoln, ICT services)
Paul Stainthorp (University of Lincoln, Library)

Agenda

09.30 Introductions
10.00 Orbital introduction and context: Student as Producer, LNCD; Orbital bid and pilot project; Discussion of Orbital approach, the data we’re using, user needs etc.
10.30 CKAN introduction and context
11.00 Technical discussion – Orbital
12.00 LUNCH
12.30 Technical discussion – CKAN
13.30 Discussion – should Orbital adopt CKAN?
14.00 data[.lincoln].ac.uk
15.00 Next steps; Opportunities for collaboration/funding?

What is probably of most interest to people reading this are the pros & cons of the Orbital project adopting CKAN. I’ll provide more context further into the post, but here’s a summary copied from our notes:

(more…)

I’ve been asked to highlight the benefits of running the Orbital project so far. We’re seven months or so into Orbital, which is an 18 month project. In our initial Project Plan, we identified the anticipated impact of the project and so I’ll use that list in this blog post to reflect on the impact and benefits so far. The headings and text in italics are copied from the Project Plan.

Research practices

Researcher’s data management practices will change, supported by technologies that encourage new processes in the administration and dissemination of data.

We’ve had very little impact in this area so far. It’s too early to impact on researchers’ practices when we’re still developing our own knowledge and the infrastructure to support RDM. Changing researchers’ practices takes time. However, there are indications that it will happen. Engagement with our user groups and ad hoc requests for help from researchers who know we’re working on the project has shown us that researchers do want to change their practices. Our recent DAF survey also told us that researchers know that their practices could be improved and where they need support to do so.

Internal auditing

Greater oversight and analysis of research data created by researchers will be possible.

We’ve had no impact here and won’t until the Orbital software is built. We will be working on this in the next version of Orbital and our recent effort at the MRD Hack Day around activity data was a precursor to this work. We are increasingly seeing activity data as a key component of Orbital and this was underlined early on in the project when Mansur Darlington from the ERIM and REDm-MED projects stressed the importance of capturing contextual metadata.

Research governance

Improved methods of auditing research undertaken by the university will be possible, enabling greater cross-disciplinary work.

This relates to the benefit above around capturing activity data and improving ‘business intelligence’. As yet no real impact, but we still anticipate Orbital being a useful tool for reporting and enabling greater cross-disciplinary collaboration though greater transparency. Related to this is the creation of RDM Policy, which we have begun and has resulted in a statement being made for the EPSRC RDM ‘Road Map’.

Integrated services

Research data management will be integrated into existing systems, such as staff profiles, the institutional repository, blogs and calendars. Towards a Virtual Research Environment.

We have had a direct impact on the creation of new staff profiles at the university. Nick Jackson, Lead Developer on Orbital has been working with Alex Bilbie in the ICT Online Services Team to create an aggregated profile for staff. We blogged about it earlier and you can see my example. Profiles of staff are now aggregated from different systems and stored as RDF Linked Data and we intend to pull in activity data from Orbital to further enrich staff profiles. In this way, our work is being recognised and valued by other teams in the university. Furthermore, the university has recently procured a new ‘Awards Management System’, which will provide data to Orbital about funded research projects and we intend to couple Orbital with EPrints using SWORD2.

What is clear from our discussions with users is that they expect Orbital to do more than simply store and publish research data. Without using the term, they are effectively asking for a Virtual Research Environment (VRE). This is something which we did anticipate and have always planned for Orbital to be a tool for both analysing and publish research data. When discussing ‘Research Data Management’, there is a fine line between DMP planning, research project management, team workspaces and public web publishing and while we need to be careful that the scope of Orbital does not creep, we are sensitive to on-going user requirements.

FOI compliance

Will make FOI requests easier to respond to or unnecessary.

We’ve had no impact here so far, nor are we yet in a position to.

Open Data

Will promote and enable the production of public data sets.

As I will discuss in a forthcoming blog post about our first release of Orbital, we have had some impact here and have witnessed the benefits. In summary, a researcher contacted us for help with publishing some data, the result of which was an invitation to write a journal article about the data, offers of collaboration and the strengthening of an EU grant application.

Our workshop on open licensing has also led to a further meeting between myself (Joss Winn, Orbital PM), and the university’s IP Manager. A further follow up meeting is planned to draft guidance for staff on the use of open licenses for source code and data. Furthermore, research staff are being directed to the Orbital team for informal advice on open licensing. In this sense, we are beginning to improve the awareness and understanding of open licenses among researchers.

The innovation cycle

Will embed new technologies and culture change among professional staff at the university and lead to further innovation in our services.

We are having some impact here and have had meetings with central ICT staff about integrating our server farm with cloud services. We are currently developing the Orbital software using Rackspace, but have recently ordered hardware, partly paid for by the project, to establish a private cloud, running on OpenStack, for research and development at the university. In addition to this, our development toolchain has changed and we now have tools and processes in place that we did not have six months ago. These are being adopted outside of the Orbital project by staff within the ICT Online Services Team and other projects we are running. In addition to this, we intend that the changes we make to our own R&D tools and processes, are made available to other researchers and students. Over the summer, we will set up and maintain a university-wide Gitorious source code repository service (similar to Github), where staff and students who write code can form teams, manage their source code, and publish it if they wish. We also intend to run a Jenkins server for similar purposes so that all staff and students can benefit from source code control and the quality assurance processes that we have implemented through Orbital. Orbital is now a driver for a general R&D infrastructure for Academic Computing that project members and wider members of LNCD  are building.

I will write more about this at a later date because, for someone who manages R&D infrastructure projects at the university and wishes to engage staff and students in our work, I am excited to be able to integrate this into academic programmes and the work of other researchers.

I want to also stress that like all of our projects, the benefits, however slight, spill into other aspects of our work. Being a large project, Orbital has allowed us to concentrate on developing our toolchain and development environment across other projects, it’s given us time to learn new skills and share our learning with colleagues. In this way, it has been pivotal in the way we work and the future direction of our work.

Recruitment

Will build capacity for local development of innovative services

Orbital has allowed us to recruit two full-time Developers (Nick Jackson and Harry Newton). We are therefore two staff up and it is my intention to try and keep it that way.

Staff skills

Will improve staff skills and experience

Yes, we are benefitting in this way. The Orbital project team are now the RDM ‘experts’ in the university and despite being novices in this regard, over the course of the project staff working in the Library, ICT, Research and Enterprise Office and Centre for Educational Research and Development, are each developing their understanding of the processes and implications of RDM.

Clearly an 18 month project (at least the way I run them!), allows for staff to learn new tools and skills, experiment with new methods of working and disseminate this learning to other staff. This is one constant that I value highly about our project work. Despite the stop and start nature of project work and that not all of our work eventually makes it into a fully fledged university services, the tools and learning, especially as we engage more with academic programmes, goes beyond the confines of the project and is most satisfying.

I have written more about my interest in how hackers learn and the university as a hackerspace.

Culture change

Will change the research culture of the university by improving the tools available for managing and sharing data.

From the point-of-view of RDM, this is closely related to the first anticipated impact/benefit. We cannot claim to have any real evidence of benefits or impact at this stage on how we manage research data. However, as I’ve noted several times above, our use of the cloud, our advocacy of open licensing, our implementation of new R&D tools and processes, are also part of ‘culture change’ at the university. Furthermore, due to the DAF survey, the Orbital project is now widely known by researchers beyond our initial user group in the School of Engineering, and through our reporting to the university Research, Innovation and Enterprise Committee, staff at all levels are made aware of our work. Gradually, the idea of ‘research data management’ is being understood.

Technology choices

Will influence future choices in technologies (both locally developed and outsourced).

Yes! See above.

HE sector R&D

Contributes to innovative R&D in the HE sector

Yes, I think we are beginning to do this and the benefits so far are around shared learning among developers across different projects. We were instrumental in early discussions about the DevCSI MRD Hack Day and three of us contributed to the two day event. We blog regularly on this site (around 50 posts so far) and share our work with anyone who is interested (see the links in the sidebar).

Public Sector data management

As yet, the Orbital project can only claim to have resulted in one research dataset being published (again, more on this soon as I want to explain it in more detail). However, Orbital has grown out of our work over the last couple of years around managing and re-using institutional data, resulting in data.lincoln.ac.uk. We are also active members of the data.ac.uk initiative and I chaired the data.ac.uk panel at Dev8D this year.

Efficient re/use of resources

Demonstrably re-uses and builds on previous work, both funded and non-funded projects.

Yes, this was an early benefit of the project. We are building on our previous work and what we have learned from it in past projects. Our use of MongoDB, our work on staff profiles, our use of OAuth, and our API-driven approach to development, all build on past projects, funded and un-funded.

Introduction

The Orbital Implementation Plan (WP6) is intended to be a synthesis of our initial user requirements gathering (WP5), an assessment of Engineering research data (WP9), an evaluation of standards and technologies (WP10), informed by a literature review of previous work relevant to the Research Data Management (RDM) domain as it relates the discipline of Engineering (WP4).

Therefore, appended to this Implementation Plan is: i) a Technical Specification based on user requirements; ii) a Literature Review; iii) a summary of an institution-wide survey based on the Data Asset Framework; iv) and a draft Research Data Management Policy for the University of Lincoln (WP7), which is currently under-going internal review.

The Implementation Plan has been written at exactly a third of the way into the Orbital project (six months), allowing for a further year of development based on the work brought together in this document. It is worth repeating the objectives of the project, as stated in the Project Plan:

We intend to build on our previous work around the deposit, management and access to university research as well as further existing work in which we are building a platform for data-driven services at the university.

Throughout this undertaking, we aim to improve our understanding of the issues around research data management; develop the requisite skills among the university community to better manage research data; re-use and develop some of the underlying tools we have built to provide an institution-wide service for the ingest, description, preservation and dissemination of research data; improve the way we work on such projects, refining our use of agile methods; build capacity for the local development of academic technologies at the university; develop and implement appropriate institutional policy for the deposit, management and sharing of research data; and develop a Business Plan for the university for the long-term sustainability of our research data.

Our work to-date has pursued many of these objectives closely, reflecting continued effort over the last six months, both inside and outside the project, to build on previous work by using institutional data to drive application development; to improve our methods of access and identity management; and develop an environment that fosters and supports in-house innovation.

This planning document is primarily intended to support the technical implementation of the Orbital application to manage research data at the University of Lincoln. What it does not address is the training to support the use of the application (WP11), nor the Business Case for sustaining the pilot service (WP13), which we are implementing. However, some preliminary work is underway to consider appropriate business models for sustaining Orbital as open source software and we believe that the technical decisions laid out in this Implementation Plan will support the development of a sustainable Business Case for Orbital. This area of work continues and the outcomes are due to be delivered towards the end of the project.

What follows is a brief summary of the appended Technical Specification and Literature Review. I would like to thank Nick Jackson and Paul Stainthorp for their work on these documents, which have brought clarity to the Orbital project and contributed to a much better understanding of RDM at the University of Lincoln.

Joss Winn, Orbital Project Manager, 2nd April 2012.

Literature Review

The management of research data is recognised as one of the most pressing challenges facing the higher education and research sectors. Research data generated by publicly-funded research is seen as a public good and should be available for verification and re-use. In recognition of this principle, all UK Research Councils require their grant holders to manage and retain their research data for re-use, unless there are specific and valid reasons not to do so. (JISC Managing Research Data Programme 2011-13).

To gain a clearer understanding of the more complex and unfamiliar concepts in the emerging discipline of Research Data Management, the Orbital project conducted a review of published literature on the subject (mainly web sites, project reports and guidance documents), with particular reference to RDM in the discipline of engineering.

An online Research Data Management bibliography is being maintained at: http://lncn.eu/bcf6

The project team identified the following nine themes in the literature – for each theme, a recommendation is made which will support the development of RDM infrastructure at the University of Lincoln.

1. Fundamentals of research data and RDM

Researchers are not a homogeneous group, and their data needs are changing as the research landscape becomes more complex. Recommendation: the Orbital project continue its work to assess the storage and other requirements of Lincoln researchers using surveys and interviews.

2. Particular requirements of the discipline of engineering

The ERIM (Engineering Research Information Management) project at the University of Bath has specified the first ever set of RDM principles and terminology designed specifically for engineers. Recommendation: the Orbital continue to work with the Bath team on implementing ERIM’s findings.

3. The behaviour of researchers

What motivates researchers to invest in RDM is not the same as what motivates their institutions. Recommendation: Orbital to use surveys and interviews to understand researchers’ requirements and develop appropriate advocacy materials.

4. RDM policies and legal aspects

All UK Research Councils are introducing mandates for data curation, and in some cases data publication. Recommendation: the Orbital team to support the University’s response to the imminently required EPSRC data policy roadmap and to help develop institutional policies.

5. Data sharing

Research data are at their most useful when they are interoperable with other data. Sharing data leads to a range of real and measurable benefits, and researchers’ interests are protected by a principle of ‘proprietary period’ of privileged access. Recommendation: Orbital work with Research & Enterprise to formulate clear policies on data sharing and licensing.

6. Costs and benefits

The most significant RDM costs for the institution occur at the data acquisition/ingest stage. Institutions that invest in RDM can expect significant benefits including new, unforeseen research activities made possible through the re-use and aggregation of data. Recommendation: Orbital provide guidance to researchers on ensuring RDM is costed into future research funding bids.

7. Curation standards, metadata and citation

Without a system for assigning citations to research data, further curation and sharing is impossible. Recommendation: Orbital incorporate the functionality of DataCite to allow Lincoln researchers to secure a DOI (Digital Object Identifier) for their data objects.

8. Technical considerations

The range of file formats involved in engineering research is a significant area of complexity. Recommendation: Orbital continue to work with Siemens, the School of Engineering, the University of Bath and the DCC to develop expertise in handling engineering data formats.

9. Tools, support and training

A range of immediately re-purposable RDM training kits and planning tools already exists. Recommendation: Orbital review the available material, and use them to design a RDM training programme for the University of Lincoln – also incorporate Data Management Planning (DMP) tools within the Orbital application.

In light of this, the initial objectives of the Orbital project were on the mark, but indicate a broad area of institutional responsibility that goes beyond scholarly communication to affect strategic areas such as recruitment and training, business intelligence and continuity, IP and income generation, as well as future curriculum design and our corresponding investment in infrastructure and estate. No small task.

Technical Approach

Our Project Plan outlined the technical approach that we originally anticipated and six months later this has not fundamentally changed. As detailed in the Technical Specification, we remain convinced of the benefits of pursuing a data-driven, API-centric model of development, using storage and access control methods that support the creation of a modular and scaleable web application that is attractive to both Users and Developers.

As we have learned from our requirements gathering and literature review, Research practices both within and across subject disciplines are varied, suggesting that over the next 12 months, the Orbital project should concentrate on developing an application that remains open and attractive to further development, rather than seeking to design a single workflow for all users’ needs – an impossible task.

We believe this approach best supports the sustained development of Orbital beyond the life of the pilot project, allowing both Researchers and software Developers to create applications for Orbital to suit the requirements of specific research disciplines at a given point in time. Likewise, an API-centric approach will also ensure that our existing and related applications, such as institutional repository software and research information systems can equally be treated as ‘users’ (producers and consumers) of Orbital.

As we outlined in our Project Plan, this approach allows us to benefit from work which continues outside the Orbital project such as that around Access and Identity Management and academic profiles, and the development of data.lincoln.ac.uk. It is also a suitable approach for the development of Orbital as open source software, which should remain simple to develop for specific user’s needs if it is to receive interest and contributions from developers outside the university.

The Technical Specification contains five core functional requirements: Projects, Workspace, Archives, Working Dataset, and Publication. A Project may result in a specific Publication(s), while the Workspace, Archives and Working Dataset allow for three non-sequential methods of data storage, manipulation and analysis. These requirements are loosely coupled to one another, but do not represent a publication workflow. Orbital is not simply intended to be a data repository, but the basis of a flexible collaborative environment for working Researchers.

Each Project acts as a conceptual container for all data and represents the ‘space’ in which administrative, descriptive and contextual metadata is captured and stored, as well as the datasets themselves. It is at the level of a Project that Orbital will interface with other systems, such as an institutional repository or research information system by storing, exchanging and publishing information according to recognised standards, such as CERIF, SWORD2, DOI, etc.

Finally, a core requirement from Orbital is that data should be stored, accessed and transported securely. Being a native web application, we have opted to implement the OAuth 2 protocol to provide secured access to all API functions over HTTPS. As such, all user applications will be treated equally and will be required to access the core Orbital APIs via this popular and mature standard for application authentication on the web. OAuth is increasingly being deployed at the University of Lincoln and work continues outside of the Orbital project to implement it as part of an institution-wide Single Sign On (SSO) architecture.

Related project blog posts

Chosen Methodology

Jenkins, build my software

Pivoting Around

Project Planning: Quality Assurance

Understanding and participating in open source culture

The Toolchain: First Pass

Tracking progress

Literature Review

An Orbital project reading list

Initial User Requirements

Meeting our users, the Engineers

Assessment of Data Sources

Research Data vs Research Data

Let’s Look At Data

Data, Data Everywhere

Gluing people together

Evaluation of standards and technologies

How the National Archives use MongoDB

Forecast: Cloudy

Piloting the cloud

Why Orbital is all about the API

Servers, Servers Everywhere

Eating your own dog food: Building a repository with API-driven development

Hello? Is it me you’re looking for?

Orbital and the OAIS reference model

This morning, four of us (Bev Jones and Paul Stainthorp, Library; Annalisa Jones, Research Office; and Joss Winn, Centre for Educational Research and Development) met for three hours to draft a Research Data Management Policy for the University.

We began by Paul and Bev summarising their experience attending the RDM Policy workshop in Leeds last month, and then went on to look at the requirements of UK funding bodies, as summarised by the DCC. We then reviewed the four university RDM policies linked to from the DCC’s institutional policy page and set about creating a draft policy for Lincoln, which will first be reviewed by the Orbital Steering Group later this week and then referred to the Academic Board and Research, Innovation and Enterprise Committee for approval.

Our draft policy is modelled on the Cross Council Policy Overview by the DCC, which broke down funders’ policies as follows:

  • Published outputs: a policy on published outputs e.g. journal articles and conference papers
  • Data: a datasets policy or statement on access to and maintenance of electronic resources
  • Time limits: set timeframes for making content accessible or preserving research outputs
  • Data plan: requirement to consider data creation, management or sharing in the grant application
  • Access/sharing: promotion of OA journals, deposit in repositories, data sharing or reuse
  • Long-term curation: stipulations on long-term maintenance and preservation of research outputs
  • Monitoring: whether compliance is monitored or action taken such as withholding funds
  • Guidance: provision of FAQs, best practice guides, toolkits, and support staff
  • Repository: provision of a repository to make published research outputs accessible
  • Data centre: provision of a data centre to curate unpublished electronic resources or data
  • Costs: a willingness to meet publication fees and data management / sharing costs

We then drew from Edinburgh’s policy to look at how it meets each of these points. Then, we began merging points and writing a policy response, again borrowing from Edinburgh at times.

You can read our draft policy online. If you’re interested in seeing in detail how it was written, go to the File menu, click on See revision history and then at the bottom of the page, click Show more detailed revisions. Amendments to the policy will continue to be made at that location, so we should see the full history of the policy development over time.  Sorry, it appears that in read-only mode, Google docs doesn’t allow access to the document revision history. UPDATE: See the link to a version maintained on Github in the comments below.

I should note that this is intended to be a pithy policy statement, similar to what other institutions have written and will be supported by more detailed written guidance, which we’ll develop over the course of the Orbital project.