A JISC-funded Managing Research Data project

Posts tagged open source software

Following up on our post from the MRDHack day, what follows is an evaluation of ownCloud as an institutional alternative to Dropbox. Our DAF survey showed that researchers at Lincoln require better managed storage space than the current 1GB FTP “H: Drive” provided to each staff member. Many of them are using portable drives, USB sticks and cloud-based services such as Dropbox to store and share their research data. Services like Dropbox provide compelling advantages to more traditional storage. Dropbox provides, for free, double the storage on offer to Lincoln researchers at present; it is always backed up, existing on both the local machine and on Amazon’s servers, it offers versioning for files up to 30 days old with the free account and ‘forever’ for paid accounts, it is accessible from almost all devices with Linux, OS X, Windows, iOS and Android clients available. Files can be published to the web or shared privately with other Dropbox users.

We know, however, that researchers using Dropbox are doing so without a clear understanding of the terms and conditions of the service and would ideally like a similar service to be provided internally by the University, where we can retain control of the data and its associated security.

We were first made aware of ownCloud when D’Arcy Norman blogged about his initial trial of it. ownCloud is an open source tool, which provides the same features as Dropbox (and more). With the release of version 4 in May, it appears to be a credible alternative to Dropbox for institutions wishing to provide a modern storage solution for their staff and students. D’Arcy’s initial experience with ownCloud was promising but he found issues with the syncing of files. Our recent tests of ownCloud have found that these problems have now been resolved with a recent update and what follows is an evaluation of ownCloud version 4.0.6, looking at it from three perspectives:

  1. ownCloud as a general purpose storage technology for an academic community
  2. ownCloud as a storage technology for research data
  3. ownCloud as a technology for integration with Orbital

Storage for an academic community?

ownCloud is an AGPLv3 licensed open source project, which started in January 2010. The project is run by a company, also called ownCloud, which provides commercial services and support for its software. The company resides in both Germany and the USA. The development of ownCloud is also open and supported by standard tools for open source projects: a source code repository, bug tracker, IRC channel, mailing list, wiki and forum. There are currently 13 core members of the project and 34 contributing developers. Development of the code is currently very active with changes made several times a day. The ownCloud project was started by the KDE community (it has no dependencies on KDE) and therefore benefits from the involvement of experienced open source developers. The software is written in PHP and can use MySQL, PostgreSQL or SQLite for its database. As of version 4, ownCloud has the following features:

  • Web user interface for file uploads and management of account and other features. Files can also be uploaded from an existing URL.
  • Windows/Mac/Linux/Android/iOS synchronisation clients.
  • WebDav integration for direct access to file storage. ownCloud has its own WebDAV server.
  • Folder and/or file sharing: publish to the web or share with groups or individuals.
  • File versioning.
  • An API for application integration.
  • Previewing for a number of filetypes.
  • Server side file encryption.
  • LDAP integration.
  • Notifications.
  • ownCloud can be installed in a PHP/MySQL environment on both Linux and Windows servers.

ownCloud also provides a number of other applications that can be activated or not, including a calendar, task list, contacts management, a built in text editor, image management, and experimental support for FTP, Google Drive and Dropbox integration. These are all available from the built in ‘app store’ and configurable by administrators. Maximum quotas for each account can be specified on an account-by-account basis, and a maximum file upload size can also be specified if needed.

The roadmap for version 5, due in August 2012, list the following:

  • Inter-ownCloud Sharing
  • Ajax interface
  • Mozilla Sync Integration
  • Improved permissions
  • Mounting of Dropbox and Google Drive
  • Improved version control

(more…)

Introduction

The Orbital Implementation Plan (WP6) is intended to be a synthesis of our initial user requirements gathering (WP5), an assessment of Engineering research data (WP9), an evaluation of standards and technologies (WP10), informed by a literature review of previous work relevant to the Research Data Management (RDM) domain as it relates the discipline of Engineering (WP4).

Therefore, appended to this Implementation Plan is: i) a Technical Specification based on user requirements; ii) a Literature Review; iii) a summary of an institution-wide survey based on the Data Asset Framework; iv) and a draft Research Data Management Policy for the University of Lincoln (WP7), which is currently under-going internal review.

The Implementation Plan has been written at exactly a third of the way into the Orbital project (six months), allowing for a further year of development based on the work brought together in this document. It is worth repeating the objectives of the project, as stated in the Project Plan:

We intend to build on our previous work around the deposit, management and access to university research as well as further existing work in which we are building a platform for data-driven services at the university.

Throughout this undertaking, we aim to improve our understanding of the issues around research data management; develop the requisite skills among the university community to better manage research data; re-use and develop some of the underlying tools we have built to provide an institution-wide service for the ingest, description, preservation and dissemination of research data; improve the way we work on such projects, refining our use of agile methods; build capacity for the local development of academic technologies at the university; develop and implement appropriate institutional policy for the deposit, management and sharing of research data; and develop a Business Plan for the university for the long-term sustainability of our research data.

Our work to-date has pursued many of these objectives closely, reflecting continued effort over the last six months, both inside and outside the project, to build on previous work by using institutional data to drive application development; to improve our methods of access and identity management; and develop an environment that fosters and supports in-house innovation.

This planning document is primarily intended to support the technical implementation of the Orbital application to manage research data at the University of Lincoln. What it does not address is the training to support the use of the application (WP11), nor the Business Case for sustaining the pilot service (WP13), which we are implementing. However, some preliminary work is underway to consider appropriate business models for sustaining Orbital as open source software and we believe that the technical decisions laid out in this Implementation Plan will support the development of a sustainable Business Case for Orbital. This area of work continues and the outcomes are due to be delivered towards the end of the project.

What follows is a brief summary of the appended Technical Specification and Literature Review. I would like to thank Nick Jackson and Paul Stainthorp for their work on these documents, which have brought clarity to the Orbital project and contributed to a much better understanding of RDM at the University of Lincoln.

Joss Winn, Orbital Project Manager, 2nd April 2012.

Literature Review

The management of research data is recognised as one of the most pressing challenges facing the higher education and research sectors. Research data generated by publicly-funded research is seen as a public good and should be available for verification and re-use. In recognition of this principle, all UK Research Councils require their grant holders to manage and retain their research data for re-use, unless there are specific and valid reasons not to do so. (JISC Managing Research Data Programme 2011-13).

To gain a clearer understanding of the more complex and unfamiliar concepts in the emerging discipline of Research Data Management, the Orbital project conducted a review of published literature on the subject (mainly web sites, project reports and guidance documents), with particular reference to RDM in the discipline of engineering.

An online Research Data Management bibliography is being maintained at: http://lncn.eu/bcf6

The project team identified the following nine themes in the literature – for each theme, a recommendation is made which will support the development of RDM infrastructure at the University of Lincoln.

1. Fundamentals of research data and RDM

Researchers are not a homogeneous group, and their data needs are changing as the research landscape becomes more complex. Recommendation: the Orbital project continue its work to assess the storage and other requirements of Lincoln researchers using surveys and interviews.

2. Particular requirements of the discipline of engineering

The ERIM (Engineering Research Information Management) project at the University of Bath has specified the first ever set of RDM principles and terminology designed specifically for engineers. Recommendation: the Orbital continue to work with the Bath team on implementing ERIM’s findings.

3. The behaviour of researchers

What motivates researchers to invest in RDM is not the same as what motivates their institutions. Recommendation: Orbital to use surveys and interviews to understand researchers’ requirements and develop appropriate advocacy materials.

4. RDM policies and legal aspects

All UK Research Councils are introducing mandates for data curation, and in some cases data publication. Recommendation: the Orbital team to support the University’s response to the imminently required EPSRC data policy roadmap and to help develop institutional policies.

5. Data sharing

Research data are at their most useful when they are interoperable with other data. Sharing data leads to a range of real and measurable benefits, and researchers’ interests are protected by a principle of ‘proprietary period’ of privileged access. Recommendation: Orbital work with Research & Enterprise to formulate clear policies on data sharing and licensing.

6. Costs and benefits

The most significant RDM costs for the institution occur at the data acquisition/ingest stage. Institutions that invest in RDM can expect significant benefits including new, unforeseen research activities made possible through the re-use and aggregation of data. Recommendation: Orbital provide guidance to researchers on ensuring RDM is costed into future research funding bids.

7. Curation standards, metadata and citation

Without a system for assigning citations to research data, further curation and sharing is impossible. Recommendation: Orbital incorporate the functionality of DataCite to allow Lincoln researchers to secure a DOI (Digital Object Identifier) for their data objects.

8. Technical considerations

The range of file formats involved in engineering research is a significant area of complexity. Recommendation: Orbital continue to work with Siemens, the School of Engineering, the University of Bath and the DCC to develop expertise in handling engineering data formats.

9. Tools, support and training

A range of immediately re-purposable RDM training kits and planning tools already exists. Recommendation: Orbital review the available material, and use them to design a RDM training programme for the University of Lincoln – also incorporate Data Management Planning (DMP) tools within the Orbital application.

In light of this, the initial objectives of the Orbital project were on the mark, but indicate a broad area of institutional responsibility that goes beyond scholarly communication to affect strategic areas such as recruitment and training, business intelligence and continuity, IP and income generation, as well as future curriculum design and our corresponding investment in infrastructure and estate. No small task.

Technical Approach

Our Project Plan outlined the technical approach that we originally anticipated and six months later this has not fundamentally changed. As detailed in the Technical Specification, we remain convinced of the benefits of pursuing a data-driven, API-centric model of development, using storage and access control methods that support the creation of a modular and scaleable web application that is attractive to both Users and Developers.

As we have learned from our requirements gathering and literature review, Research practices both within and across subject disciplines are varied, suggesting that over the next 12 months, the Orbital project should concentrate on developing an application that remains open and attractive to further development, rather than seeking to design a single workflow for all users’ needs – an impossible task.

We believe this approach best supports the sustained development of Orbital beyond the life of the pilot project, allowing both Researchers and software Developers to create applications for Orbital to suit the requirements of specific research disciplines at a given point in time. Likewise, an API-centric approach will also ensure that our existing and related applications, such as institutional repository software and research information systems can equally be treated as ‘users’ (producers and consumers) of Orbital.

As we outlined in our Project Plan, this approach allows us to benefit from work which continues outside the Orbital project such as that around Access and Identity Management and academic profiles, and the development of data.lincoln.ac.uk. It is also a suitable approach for the development of Orbital as open source software, which should remain simple to develop for specific user’s needs if it is to receive interest and contributions from developers outside the university.

The Technical Specification contains five core functional requirements: Projects, Workspace, Archives, Working Dataset, and Publication. A Project may result in a specific Publication(s), while the Workspace, Archives and Working Dataset allow for three non-sequential methods of data storage, manipulation and analysis. These requirements are loosely coupled to one another, but do not represent a publication workflow. Orbital is not simply intended to be a data repository, but the basis of a flexible collaborative environment for working Researchers.

Each Project acts as a conceptual container for all data and represents the ‘space’ in which administrative, descriptive and contextual metadata is captured and stored, as well as the datasets themselves. It is at the level of a Project that Orbital will interface with other systems, such as an institutional repository or research information system by storing, exchanging and publishing information according to recognised standards, such as CERIF, SWORD2, DOI, etc.

Finally, a core requirement from Orbital is that data should be stored, accessed and transported securely. Being a native web application, we have opted to implement the OAuth 2 protocol to provide secured access to all API functions over HTTPS. As such, all user applications will be treated equally and will be required to access the core Orbital APIs via this popular and mature standard for application authentication on the web. OAuth is increasingly being deployed at the University of Lincoln and work continues outside of the Orbital project to implement it as part of an institution-wide Single Sign On (SSO) architecture.

Related project blog posts

Chosen Methodology

Jenkins, build my software

Pivoting Around

Project Planning: Quality Assurance

Understanding and participating in open source culture

The Toolchain: First Pass

Tracking progress

Literature Review

An Orbital project reading list

Initial User Requirements

Meeting our users, the Engineers

Assessment of Data Sources

Research Data vs Research Data

Let’s Look At Data

Data, Data Everywhere

Gluing people together

Evaluation of standards and technologies

How the National Archives use MongoDB

Forecast: Cloudy

Piloting the cloud

Why Orbital is all about the API

Servers, Servers Everywhere

Eating your own dog food: Building a repository with API-driven development

Hello? Is it me you’re looking for?

Orbital and the OAIS reference model

There are direct and anticipated outcomes of running relatively big projects like Orbital – outcomes which are integral to the success of the project, such as those listed in our project plan: a technical infrastructure for research data, support and training, an institutional data management policy and a business plan for sustaining the work of the project. There are also outcomes which, to be honest, I didn’t entirely anticipate, such as Orbital becoming the pilot project for how the university tackles integration with the cloud; or the implementation of a new development tool-chain and associated working practices.

Yesterday wasn’t originally anticipated either, as the Orbital project hosted a meeting to raise awareness of ‘open source’ among staff at the university. It’s a term that we hear quite often these days and increasingly it’s being applied to non-software domains, such as hardwaredata and education. In effect, it’s being used to refer to a method of participation and collaboration, as much as a legal statement about the ownership of property. In my day-to-day experience, more often than not, it’s a term that’s poorly understood and mis-used, so an open source software development project like Orbital seemed like a good opportunity to ask the question, “what is open source?” and see if anyone else was interested in learning more about what it means and how it relates to the work of a university. With that in mind, I arranged for Sander van der Waal, from OSSWatch, to lead a meeting where we discussed open source in general, but also began to address some specific issues that I think we need to work on as we continue to both re-use and produce more open source software.

The meeting ran all morning, from 9.30-12, and could have gone on for longer. I kicked things off with the slides below, which were intended to provide a brief overview of the work we’ve been doing over the last four years where the use of open licenses was central, and in particular, give a brief summary of why we undertake the work we do and some of the benefits of ‘openness’. I finished up with a list of things I think we need to address and take forward for further discussion. I was pleased that Dr. James Murray, the IP Manager for the university was attending and keen to engage in this discussion, too.

Having set the scene, I handed over to Sander, who led the rest of the meeting. As you can see from his slides below, he covered a lot of ground, which we were grateful for, and we intend to draw from them in our next follow-on meeting. I hope that the Orbital project will now act as a catalyst to the development of guidelines on the use and creation of open source software, as well as a clearer understanding of the business case and business models for open source.

On a more personal note, having joined the university in 2007 as Project Officer on the JISC-funded LIROLEM institutional repository project, yesterday felt like a bit of a milestone, when I was able to draw together a lot of our work under the banner of ‘open’, and impress upon colleagues what we’ve learned and achieved and the direction I think we need to go in.

Before too long, I’d like there to be a greater appreciation across the institution of how the open source movement is changing the way some of us think about (intellectual) property and the nature of work and how this is reflected by the environment we work in. Open source (and its open * derivatives), is not a panacea to society’s problems, by any means, but its impact on our lives in just twenty years or so has been quite profound and it’s impact on the nature of research, teaching and learning is increasingly apparent. Since the development of time-sharing systems fifty years ago, programmers have been building tools with each other that allow them to share their knowledge and their productive capacity across divides in space and time that once presented significant barriers to collaboration. Variations on these tools (hardware, software, legal), are now available to researchers, teachers and students outside Computer Science programmes and present challenges as well as new ways to conceive the organising principles of property and work.

In the future, I’d like institutional projects (not just discreet research projects), such as Orbital, to somehow be tied into curricula for courses where we turn classrooms into hackerspaces, project work into apprenticeships, award degrees on the basis of participation in and learning from open source projects, and help students form start-ups by creating an intensive but supportive learning environment along the same lines that Y Combinator has done. None of this is beyond the capability of our institutions, nor in conflict with the idea of the university. From where I stand, it is the only direction available to us if we wish to remain relevant to young people’s lives and aspirations: on an every day level, technology is a determining force in society and is determining how we undertake research, teaching and learning, but in response, it’s the hackers who are changing technology and therefore have a role in the future of the university.

If you’re interested in further reading about open source, I recommend the following books:

Benkler, Y. (2006) The wealth of networks: how social production transforms markets and freedom.

Fogel, K. (2006) Producing open source software: how to run a successful free software project.

Lindberg, V. (2008) Intellectual property and open source.

Weber, S. (2005) The success of open source.

Sander van der Waal from OSSWatch, JISC’s open source software advisory service will lead a meeting on March 7th.

The term ‘open source’ is increasingly being used to refer not only to the development of software, but also in other disciplines, such as design, education and even government.

This meeting is an opportunity for attendees to learn exactly what ‘open source’ means and its effect on our understanding of property and the production of knowledge, goods and services.

In the context of the Orbital project, we will also discuss the use and application of open source licenses by universities and consider how open source can contribute to innovation and the development of new business models.

The meeting will be held on March 7th, 9.30-12pm, MB1005. Refreshments will be provided. Staff and students wishing to attend should RSVP Joss Winn.

We met this morning for our first Steering Group meeting of the Orbital Project. Following a discussion about the objectives of the MRD programme in general, the main agenda point was to discuss the Project Plan prior to me sending it to JISC. I will publish the Plan on this website once it has been signed off.

Questions were raised by the Steering Group specific to the research data of Engineers and the confidential and commercial nature of their work. Our School of Engineering was established through a partnership with Siemens and therefore the research undertaken by some of our researchers uses data provided under strict confidentiality agreements. The Orbital project has always been aware of this and it is one of the interesting challenges which we highlighted in our bid to JISC. It raises very important questions over ownership, authenticity, privacy and liability. Further discussions on this topic will be forthcoming.

Another point was raised by Dr. James Murray, our IP Manager, around the use of open licenses for documentation and code and whether the infrastructure we develop might have any commercial value. On a project of this size, it’s an important question and one I had given some thought to. Personally, I admire the way that the University of Southampton has created a commercial service around their open source EPrints software, which we use and subscribe to at Lincoln. I was asked if we might invite someone from EPrints Services to come to discuss their experience with the Steering Group at our next meeting in February. I was pleased that this was brought up at this early stage as developing a Business Case for Orbital is not only vital to the long-term sustainability of our work, but a required output of the project, too. Given the project team’s preference for employing and publishing open source software, I’m keen that a Business Model based on open source software be given thorough consideration. It’s very early days to be thinking about this, but such considerations do take time to work out, too.

Finally, Prof. Andrew Hunter, Head of the College of Science and our Senior User, identified other areas of our STEM research that would benefit from the work of Orbital. This is not something we need to concentrate on right now in this MRD pilot project, but it, too, is an important consideration in planning for the long-term deployment and use of Orbital.