Research data documentation and training materials

The final within-project version of the Orbital Research Data Management training materials are now live on the Orbital Researcher Dashboard website. They have been written collaboratively by the Orbital project team, and draw on a lot of existing RDM training and guidance material from across the web (in particular, from the DCC).

We intend that these materials will continue to be maintained and developed as part of the new University-wide research information service mentioned in a previous blog post.

Screenshot of the Researcher Dashboard

The training materials can be accessed at https://orbital.lincoln.ac.uk/ and cover the following areas:

  1. What is research data?
  2. The research data lifecycle
  3. Policies affecting your research data
  4. Data Management Planning (DMP)
  5. Data search and discovery tools
  6. Data storage and security
  7. Legal and ethical issues
  8. Tools for working with your data
  9. Data publishing and citation
  10. Licences for sharing your data
  11. Data curation and preservation
  12. Workshops and training events
  13. Help and support

The source text for each page is stored in an open Github repository (at http://github.com/unilincoln/rdm) in Markdown format. The page admin tools in the Researcher Dashboard can then be used to link to the source document, which is then formatted in the University’s Common Web Design.

These web pages will be used to support the ongoing RDM training for postgraduate students, which will shortly be rolled out to University staff.

A Minimum Viable Product: Orbital v0.1

This is a post about our first release of Orbital.

About a month ago, Dr. Tom Duckett, Reader in The Department of Computing and Informatics approached the Orbital project because he urgently wanted to publish around 20GB of data for Long-term mobile robot operations. That afternoon, we gave Tom and Feras Dayoub, his Research Assistant, space on one of our servers and they uploaded a bunch of HTML pages and the zipped up data. We minted a proxy URL for them and advised them on an appropriate data license to choose.   We also set up Google Analytics, so they could see what interest in his data there was.

Job done. For the time being.

What Tom really wanted was to be able to email a link to his data to a robotics mailing list and tell an international community of likeminded researchers and manufacturers that the data was available to use. He says that long-term datasets for mobile robots are quite rare in his community, so there was a good chance people would be interested in them. He also wanted to be able to demonstrate his work when writing an EU bid. There will be a follow up blog post about what impact this has had on Tom’s research.

That afternoon got us thinking: What is the minimal set of functions that a researcher like Tom requires of a Research Data Management tool?

Tom wanted access (sign in) to a server (hosting) where he could upload his data (storage) and describe it so that other people could understand and download it (publish) under an appropriate license. The URL pointing to the data should be persistent, even if the data itself is migrated from one system to another. The impact (analytics) of the data should also be measurable.

Tom’s chance intervention in our project made us focus on Orbital v0.1 as the ‘minimum viable product‘ for researchers who need to publish open data. We thought his requirements were a great opportunity to release something early and start getting direct user feedback on our product. We decided to set a release date for Orbital v0.1 a month ahead and aim to deliver everything that Tom asked of us in this first release.

A Minimum Viable Product has just those features that allow the product to be deployed, and no more.

Today, we released Orbital v0.1 and it does everything described above. It’s an alpha release, but we’ve been testing it like crazy, we also had Feras test it and we’ve been pushing code through Jenkins since the beginning of the project so we know it passes our QA checks and we think it’s stable enough for use. From this point forward, Orbital and the URIs it mints will persist, too.

From today, a researcher at the University of Lincoln can sign in to Orbital, create and describe a project, upload their data to the project, choose a license for the data and add a Google Analytics code to measure project analytics (we’re also tracking each button click to better understand how people use Orbital). The data is published at a id.lincoln.ac.uk URI, which will persist indefinitely. At this stage, until we’ve got an approved business case for scaling it up and out to all academics, we’ll be limiting uploads on a case-by-case basis. You can view and request what other features we develop for Orbital on UserVoice, or in more detail on our project tracker. We’ve also written a basic development roadmap.

For developers, here are the basic technical details. You might also want to trawl through our implementation plan and the collected blog posts at the bottom of the plan.

Orbital is written in PHP using the CodeIgniter development framework.  It’s split into two main pieces of functionality. Orbital Core (database and APIs) is currently hosted on a Linux box on Rackspace’s cloud. Orbital Manager (the User Interface) is likewise hosted on Rackspace. A user signs in to Orbital Manager via OAuth 2.0 using their university credentials. Orbital Manager is using Twitter’s Bootstrap framework. The project metadata is stored in a MySQL database. Files are uploaded to Rackspace’s cloud files storage using Andrew Valums’s AJAX Uploader. APIs are exposed using Phil Sturgeon’s CodeIgniter REST server.

Orbital is licensed under the GNU Affero GPL 3 license and you can download, fork it and create pull requests on Github:

Orbital Core

Orbital Manager

New contributors to Orbital will be ritually applauded each weekday morning 🙂 Thanks.

Orbital: Impact and benefits

I’ve been asked to highlight the benefits of running the Orbital project so far. We’re seven months or so into Orbital, which is an 18 month project. In our initial Project Plan, we identified the anticipated impact of the project and so I’ll use that list in this blog post to reflect on the impact and benefits so far. The headings and text in italics are copied from the Project Plan.

Research practices

Researcher’s data management practices will change, supported by technologies that encourage new processes in the administration and dissemination of data.

We’ve had very little impact in this area so far. It’s too early to impact on researchers’ practices when we’re still developing our own knowledge and the infrastructure to support RDM. Changing researchers’ practices takes time. However, there are indications that it will happen. Engagement with our user groups and ad hoc requests for help from researchers who know we’re working on the project has shown us that researchers do want to change their practices. Our recent DAF survey also told us that researchers know that their practices could be improved and where they need support to do so.

Internal auditing

Greater oversight and analysis of research data created by researchers will be possible.

We’ve had no impact here and won’t until the Orbital software is built. We will be working on this in the next version of Orbital and our recent effort at the MRD Hack Day around activity data was a precursor to this work. We are increasingly seeing activity data as a key component of Orbital and this was underlined early on in the project when Mansur Darlington from the ERIM and REDm-MED projects stressed the importance of capturing contextual metadata.

Research governance

Improved methods of auditing research undertaken by the university will be possible, enabling greater cross-disciplinary work.

This relates to the benefit above around capturing activity data and improving ‘business intelligence’. As yet no real impact, but we still anticipate Orbital being a useful tool for reporting and enabling greater cross-disciplinary collaboration though greater transparency. Related to this is the creation of RDM Policy, which we have begun and has resulted in a statement being made for the EPSRC RDM ‘Road Map’.

Integrated services

Research data management will be integrated into existing systems, such as staff profiles, the institutional repository, blogs and calendars. Towards a Virtual Research Environment.

We have had a direct impact on the creation of new staff profiles at the university. Nick Jackson, Lead Developer on Orbital has been working with Alex Bilbie in the ICT Online Services Team to create an aggregated profile for staff. We blogged about it earlier and you can see my example. Profiles of staff are now aggregated from different systems and stored as RDF Linked Data and we intend to pull in activity data from Orbital to further enrich staff profiles. In this way, our work is being recognised and valued by other teams in the university. Furthermore, the university has recently procured a new ‘Awards Management System’, which will provide data to Orbital about funded research projects and we intend to couple Orbital with EPrints using SWORD2.

What is clear from our discussions with users is that they expect Orbital to do more than simply store and publish research data. Without using the term, they are effectively asking for a Virtual Research Environment (VRE). This is something which we did anticipate and have always planned for Orbital to be a tool for both analysing and publish research data. When discussing ‘Research Data Management’, there is a fine line between DMP planning, research project management, team workspaces and public web publishing and while we need to be careful that the scope of Orbital does not creep, we are sensitive to on-going user requirements.

FOI compliance

Will make FOI requests easier to respond to or unnecessary.

We’ve had no impact here so far, nor are we yet in a position to.

Open Data

Will promote and enable the production of public data sets.

As I will discuss in a forthcoming blog post about our first release of Orbital, we have had some impact here and have witnessed the benefits. In summary, a researcher contacted us for help with publishing some data, the result of which was an invitation to write a journal article about the data, offers of collaboration and the strengthening of an EU grant application.

Our workshop on open licensing has also led to a further meeting between myself (Joss Winn, Orbital PM), and the university’s IP Manager. A further follow up meeting is planned to draft guidance for staff on the use of open licenses for source code and data. Furthermore, research staff are being directed to the Orbital team for informal advice on open licensing. In this sense, we are beginning to improve the awareness and understanding of open licenses among researchers.

The innovation cycle

Will embed new technologies and culture change among professional staff at the university and lead to further innovation in our services.

We are having some impact here and have had meetings with central ICT staff about integrating our server farm with cloud services. We are currently developing the Orbital software using Rackspace, but have recently ordered hardware, partly paid for by the project, to establish a private cloud, running on OpenStack, for research and development at the university. In addition to this, our development toolchain has changed and we now have tools and processes in place that we did not have six months ago. These are being adopted outside of the Orbital project by staff within the ICT Online Services Team and other projects we are running. In addition to this, we intend that the changes we make to our own R&D tools and processes, are made available to other researchers and students. Over the summer, we will set up and maintain a university-wide Gitorious source code repository service (similar to Github), where staff and students who write code can form teams, manage their source code, and publish it if they wish. We also intend to run a Jenkins server for similar purposes so that all staff and students can benefit from source code control and the quality assurance processes that we have implemented through Orbital. Orbital is now a driver for a general R&D infrastructure for Academic Computing that project members and wider members of LNCD  are building.

I will write more about this at a later date because, for someone who manages R&D infrastructure projects at the university and wishes to engage staff and students in our work, I am excited to be able to integrate this into academic programmes and the work of other researchers.

I want to also stress that like all of our projects, the benefits, however slight, spill into other aspects of our work. Being a large project, Orbital has allowed us to concentrate on developing our toolchain and development environment across other projects, it’s given us time to learn new skills and share our learning with colleagues. In this way, it has been pivotal in the way we work and the future direction of our work.

Recruitment

Will build capacity for local development of innovative services

Orbital has allowed us to recruit two full-time Developers (Nick Jackson and Harry Newton). We are therefore two staff up and it is my intention to try and keep it that way.

Staff skills

Will improve staff skills and experience

Yes, we are benefitting in this way. The Orbital project team are now the RDM ‘experts’ in the university and despite being novices in this regard, over the course of the project staff working in the Library, ICT, Research and Enterprise Office and Centre for Educational Research and Development, are each developing their understanding of the processes and implications of RDM.

Clearly an 18 month project (at least the way I run them!), allows for staff to learn new tools and skills, experiment with new methods of working and disseminate this learning to other staff. This is one constant that I value highly about our project work. Despite the stop and start nature of project work and that not all of our work eventually makes it into a fully fledged university services, the tools and learning, especially as we engage more with academic programmes, goes beyond the confines of the project and is most satisfying.

I have written more about my interest in how hackers learn and the university as a hackerspace.

Culture change

Will change the research culture of the university by improving the tools available for managing and sharing data.

From the point-of-view of RDM, this is closely related to the first anticipated impact/benefit. We cannot claim to have any real evidence of benefits or impact at this stage on how we manage research data. However, as I’ve noted several times above, our use of the cloud, our advocacy of open licensing, our implementation of new R&D tools and processes, are also part of ‘culture change’ at the university. Furthermore, due to the DAF survey, the Orbital project is now widely known by researchers beyond our initial user group in the School of Engineering, and through our reporting to the university Research, Innovation and Enterprise Committee, staff at all levels are made aware of our work. Gradually, the idea of ‘research data management’ is being understood.

Technology choices

Will influence future choices in technologies (both locally developed and outsourced).

Yes! See above.

HE sector R&D

Contributes to innovative R&D in the HE sector

Yes, I think we are beginning to do this and the benefits so far are around shared learning among developers across different projects. We were instrumental in early discussions about the DevCSI MRD Hack Day and three of us contributed to the two day event. We blog regularly on this site (around 50 posts so far) and share our work with anyone who is interested (see the links in the sidebar).

Public Sector data management

As yet, the Orbital project can only claim to have resulted in one research dataset being published (again, more on this soon as I want to explain it in more detail). However, Orbital has grown out of our work over the last couple of years around managing and re-using institutional data, resulting in data.lincoln.ac.uk. We are also active members of the data.ac.uk initiative and I chaired the data.ac.uk panel at Dev8D this year.

Efficient re/use of resources

Demonstrably re-uses and builds on previous work, both funded and non-funded projects.

Yes, this was an early benefit of the project. We are building on our previous work and what we have learned from it in past projects. Our use of MongoDB, our work on staff profiles, our use of OAuth, and our API-driven approach to development, all build on past projects, funded and un-funded.

Gluing people together

In December, colleagues in the Web Team (who manage the corporate web site in the Department of Marketing and Communications) approached a few of us about building a tool to allow staff to edit their profile for the new version of the lincoln.ac.uk website. We suggested that much of the work was already done and it just needed gluing together. Yesterday we met with the Web Team again to tell them that our part of the work is pretty much complete. Here’s how it works.

Quick sketch of profile building at Lincoln
Quick sketch of profile building at Lincoln

This requires a bit of explanation, but let me tell you, it’s the holy grail as far as I’m concerned and having this in place brings benefits to Orbital and any other new application we might develop. Here’s a clearer rendering.

 

Building staff profiles
Building staff profiles

The chart above strips out the stuff around authentication that you see in the bottom right of the whiteboard photo. That’s for another post – something Alex is better placed to write.

Information about staff at the university starts with the HR database. This feeds the Active Directory, which authenticates people against different web services. Last year, Nick and Alex pulled this data into Nucleus, our MongoDB datastore, and with it built a new, slick staff directory. Then they started bolting things on to it, like research outputs from the repository and blog posts from our WordPress/BuddyPress platform. To illustrate what was possible, they started pulling information from my BuddyPress profile, which I could edit anytime I wanted to. It got to the point where I started using my staff directory link in my email signature because it offered the most comprehensive profile of me anywhere on a Lincoln website.

By the time we first met with the Web Team about the possibility of helping them with staff profiles, Alex and Nick had 80% of the work already done. What remained was to create a richer number of required fields in BuddyPress for staff to edit about themselves and a scheduled XML dump for the Web Team to wrangle into their new templates on www.lincoln.ac.uk.

So the work is nearly done. The XML file is RDF Linked Data, which means that we have a rich aggregation of staff information and some simple relationships, feeding the Staff Directory, being refreshed every three hours and then being output either as HTML, JSON or RDF/XML.

For the Orbital project, all this glue is invaluable. When staff login to Orbital (Nick’s working on this part right now), we’ll already know who they are, which department they work in, what research outputs they’ve deposited in the institutional repository, what their research interests are, what projects they’re working on, the research groups they’re members of, their recent awards and grants, and the keywords they’ve chosen to tag their profile with. It’s our intention that with some simple AI, we’ll be able to make Orbital a space where Researchers find themselves in an environment which already knows quite a bit about their work and the context of the research they’re undertaking. Once Orbital starts collecting specific staff data of its own, it can feed that back into Nucleus, too.

This reminds me of our discussion last month with Mansur Darlington of the ERIM/REDm-MED project. Mansur stressed the importance of gathering data about the context of the research itself, emphasising that without context, research data becomes increasingly meaningless over time. Having rich user profiles in Orbital and ensuring that we record data about the Researcher’s activity while using Orbital, should help provide that context to the research data itself.

Orbital, therefore, becomes an infrastructure not only for storing and managing research data, but also a system for storing and managing data about the research itself.

Why Orbital is all about the API

One of the interesting things about Orbital is its use of an API-driven development approach. In traditional, API-less applications your end-to-end system would look something like this:

The only way to interact with this application is to either be a user, or pretend to be one.

This is all well and good if the only thing you want to be able to interact with your application is a real user, but it’s increasingly a bad idea. Users can interact with your application as intended, but should a machine want to get at your data (which may happen for any one of a hundred reasons) they’ve got to muck about pretending to be a user and scraping dataEverybody is building with APIs nowadays, and if you aren’t then you’re going to be left behind, cold and frightened, in a world which no longer subscribes to the notion that monolithic software can stand on its own and provide useful functionality.

So the next step is to bolt on an API.

APIs like this are notorious for only exposing part of the functionality of an application.

This is the most common form of API around, and consists of a ‘second view’ on the data and functionality of an application. This is a massive step forwards and makes lives much, much easier in most cases. The only downside is that it’s very easy for this kind of API to provide a ‘bare bones’ functionality, such as only providing a list of items when the ‘real’ user interface lets you not only view the list but also edit its contents. It’s better than nothing but not ideal, which is why Orbital is taking the next step:

In an API-driven model the API is the only way to interface with the application

Under this design the API is the only way to interface with the data and functionality of the system. If a user wants to access it they must go through an intermediary to translate their wishes into API calls, and the results back into a nicely human readable form. The plus side is that any other consumer of the service is free to interact with the application on exactly the same terms as the ‘official’ frontend, providing that it has been granted those permissions. As far as Orbital Core (our actual application) is concerned there is no functional difference between Orbital Manager (our frontend) and an application that a researcher has hacked together to give themselves an easier time inputting data — they are subject to the exact same access controls, restrictions, sanity checking and limitations.

This means that every time we want to build user-facing functionality we have to stop, look at our APIs and work out where the functionality belongs. This also has the added benefit of making it essential to fully document our APIs for our own sanity, as well as ensuring that we have lightweight data transfer and rock-solid error handling baked right in.

The downside is that we have to double up on some bits of development, writing both the Core and Manager sides. It can also lead to the usual frustrations you get when trying to communicate with APIs, but on the plus side we have the ability to change both ends for the better.

Know of any other API-driven development in the fields of higher education or research data management? We’d love to hear about them, so that we can try to make our APIs as compatible as possible and improve interoperability. Drop us a note in the comments.