A JISC-funded Managing Research Data project

Posts tagged cloud services

I’ve been asked to highlight the benefits of running the Orbital project so far. We’re seven months or so into Orbital, which is an 18 month project. In our initial Project Plan, we identified the anticipated impact of the project and so I’ll use that list in this blog post to reflect on the impact and benefits so far. The headings and text in italics are copied from the Project Plan.

Research practices

Researcher’s data management practices will change, supported by technologies that encourage new processes in the administration and dissemination of data.

We’ve had very little impact in this area so far. It’s too early to impact on researchers’ practices when we’re still developing our own knowledge and the infrastructure to support RDM. Changing researchers’ practices takes time. However, there are indications that it will happen. Engagement with our user groups and ad hoc requests for help from researchers who know we’re working on the project has shown us that researchers do want to change their practices. Our recent DAF survey also told us that researchers know that their practices could be improved and where they need support to do so.

Internal auditing

Greater oversight and analysis of research data created by researchers will be possible.

We’ve had no impact here and won’t until the Orbital software is built. We will be working on this in the next version of Orbital and our recent effort at the MRD Hack Day around activity data was a precursor to this work. We are increasingly seeing activity data as a key component of Orbital and this was underlined early on in the project when Mansur Darlington from the ERIM and REDm-MED projects stressed the importance of capturing contextual metadata.

Research governance

Improved methods of auditing research undertaken by the university will be possible, enabling greater cross-disciplinary work.

This relates to the benefit above around capturing activity data and improving ‘business intelligence’. As yet no real impact, but we still anticipate Orbital being a useful tool for reporting and enabling greater cross-disciplinary collaboration though greater transparency. Related to this is the creation of RDM Policy, which we have begun and has resulted in a statement being made for the EPSRC RDM ‘Road Map’.

Integrated services

Research data management will be integrated into existing systems, such as staff profiles, the institutional repository, blogs and calendars. Towards a Virtual Research Environment.

We have had a direct impact on the creation of new staff profiles at the university. Nick Jackson, Lead Developer on Orbital has been working with Alex Bilbie in the ICT Online Services Team to create an aggregated profile for staff. We blogged about it earlier and you can see my example. Profiles of staff are now aggregated from different systems and stored as RDF Linked Data and we intend to pull in activity data from Orbital to further enrich staff profiles. In this way, our work is being recognised and valued by other teams in the university. Furthermore, the university has recently procured a new ‘Awards Management System’, which will provide data to Orbital about funded research projects and we intend to couple Orbital with EPrints using SWORD2.

What is clear from our discussions with users is that they expect Orbital to do more than simply store and publish research data. Without using the term, they are effectively asking for a Virtual Research Environment (VRE). This is something which we did anticipate and have always planned for Orbital to be a tool for both analysing and publish research data. When discussing ‘Research Data Management’, there is a fine line between DMP planning, research project management, team workspaces and public web publishing and while we need to be careful that the scope of Orbital does not creep, we are sensitive to on-going user requirements.

FOI compliance

Will make FOI requests easier to respond to or unnecessary.

We’ve had no impact here so far, nor are we yet in a position to.

Open Data

Will promote and enable the production of public data sets.

As I will discuss in a forthcoming blog post about our first release of Orbital, we have had some impact here and have witnessed the benefits. In summary, a researcher contacted us for help with publishing some data, the result of which was an invitation to write a journal article about the data, offers of collaboration and the strengthening of an EU grant application.

Our workshop on open licensing has also led to a further meeting between myself (Joss Winn, Orbital PM), and the university’s IP Manager. A further follow up meeting is planned to draft guidance for staff on the use of open licenses for source code and data. Furthermore, research staff are being directed to the Orbital team for informal advice on open licensing. In this sense, we are beginning to improve the awareness and understanding of open licenses among researchers.

The innovation cycle

Will embed new technologies and culture change among professional staff at the university and lead to further innovation in our services.

We are having some impact here and have had meetings with central ICT staff about integrating our server farm with cloud services. We are currently developing the Orbital software using Rackspace, but have recently ordered hardware, partly paid for by the project, to establish a private cloud, running on OpenStack, for research and development at the university. In addition to this, our development toolchain has changed and we now have tools and processes in place that we did not have six months ago. These are being adopted outside of the Orbital project by staff within the ICT Online Services Team and other projects we are running. In addition to this, we intend that the changes we make to our own R&D tools and processes, are made available to other researchers and students. Over the summer, we will set up and maintain a university-wide Gitorious source code repository service (similar to Github), where staff and students who write code can form teams, manage their source code, and publish it if they wish. We also intend to run a Jenkins server for similar purposes so that all staff and students can benefit from source code control and the quality assurance processes that we have implemented through Orbital. Orbital is now a driver for a general R&D infrastructure for Academic Computing that project members and wider members of LNCD  are building.

I will write more about this at a later date because, for someone who manages R&D infrastructure projects at the university and wishes to engage staff and students in our work, I am excited to be able to integrate this into academic programmes and the work of other researchers.

I want to also stress that like all of our projects, the benefits, however slight, spill into other aspects of our work. Being a large project, Orbital has allowed us to concentrate on developing our toolchain and development environment across other projects, it’s given us time to learn new skills and share our learning with colleagues. In this way, it has been pivotal in the way we work and the future direction of our work.

Recruitment

Will build capacity for local development of innovative services

Orbital has allowed us to recruit two full-time Developers (Nick Jackson and Harry Newton). We are therefore two staff up and it is my intention to try and keep it that way.

Staff skills

Will improve staff skills and experience

Yes, we are benefitting in this way. The Orbital project team are now the RDM ‘experts’ in the university and despite being novices in this regard, over the course of the project staff working in the Library, ICT, Research and Enterprise Office and Centre for Educational Research and Development, are each developing their understanding of the processes and implications of RDM.

Clearly an 18 month project (at least the way I run them!), allows for staff to learn new tools and skills, experiment with new methods of working and disseminate this learning to other staff. This is one constant that I value highly about our project work. Despite the stop and start nature of project work and that not all of our work eventually makes it into a fully fledged university services, the tools and learning, especially as we engage more with academic programmes, goes beyond the confines of the project and is most satisfying.

I have written more about my interest in how hackers learn and the university as a hackerspace.

Culture change

Will change the research culture of the university by improving the tools available for managing and sharing data.

From the point-of-view of RDM, this is closely related to the first anticipated impact/benefit. We cannot claim to have any real evidence of benefits or impact at this stage on how we manage research data. However, as I’ve noted several times above, our use of the cloud, our advocacy of open licensing, our implementation of new R&D tools and processes, are also part of ‘culture change’ at the university. Furthermore, due to the DAF survey, the Orbital project is now widely known by researchers beyond our initial user group in the School of Engineering, and through our reporting to the university Research, Innovation and Enterprise Committee, staff at all levels are made aware of our work. Gradually, the idea of ‘research data management’ is being understood.

Technology choices

Will influence future choices in technologies (both locally developed and outsourced).

Yes! See above.

HE sector R&D

Contributes to innovative R&D in the HE sector

Yes, I think we are beginning to do this and the benefits so far are around shared learning among developers across different projects. We were instrumental in early discussions about the DevCSI MRD Hack Day and three of us contributed to the two day event. We blog regularly on this site (around 50 posts so far) and share our work with anyone who is interested (see the links in the sidebar).

Public Sector data management

As yet, the Orbital project can only claim to have resulted in one research dataset being published (again, more on this soon as I want to explain it in more detail). However, Orbital has grown out of our work over the last couple of years around managing and re-using institutional data, resulting in data.lincoln.ac.uk. We are also active members of the data.ac.uk initiative and I chaired the data.ac.uk panel at Dev8D this year.

Efficient re/use of resources

Demonstrably re-uses and builds on previous work, both funded and non-funded projects.

Yes, this was an early benefit of the project. We are building on our previous work and what we have learned from it in past projects. Our use of MongoDB, our work on staff profiles, our use of OAuth, and our API-driven approach to development, all build on past projects, funded and un-funded.

Last month, Nick wrote about how Orbital is being designed as an application to run in the cloud. This week, we met with Andy Powell from Eduserv to discuss the use of their ‘Education Cloud‘ for the Orbital project.

In the run up to this meeting, we’d been talking to colleagues in ICT Services about our need for more flexibility and autonomy when we required servers in order to do our work. Outside of work we’re quite used to spinning up servers on Rackspace or AWS to try things out and increasingly we’ve been looking for ways to take control of our servers in this way at work. We’re not the only Researchers who need this flexibility; colleagues in LiSC have also been telling us that for some of their work, the scalability and reliability of cloud services is looking increasingly attractive.

This is not to say that ICT services is inflexible and unreliable by any means. I’ve always found my colleagues very willing to help where and when they can, but I think we’d all agree that a central ICT department in a university, with the multivarious responsibilities it has, is not the same as a dedicated cloud provider and, in our case, does not offer the resilience nor the scalability that Rackspace or Amazon are offering for example. The availability of resources, the business model and available support are quite different. When I joined the University in 2007, ICT Services were implementing a new VMWare server farm, which has given us more flexibility than having to work with physical boxes in every instance. Typically, if I want a Linux server with 4GB RAM and 100GB of HDD, I put in a request, transfer approx. £1200, and some time later, a virtual server is provided to me at no further cost. If I need more RAM or HDD, I put in another request, transfer some money, and some time later, I get what I need. This process can take weeks or months.

However, our VMWare farm is now almost five years old and nearing ‘end of life’ and I know that ICT are thinking about the next five year cycle and how cloud computing fits into their future plans. Colleagues in the Online Services team have been using Rackspace recently as a CDN for the Common Web Design framework as well as hosting our popular Gateway website, and have been very impressed with the service. The main hurdle was not technical but organisational: billing for the use of the CDN is by credit card and Pay As You Go (PAYG), meaning we don’t know exactly how much it will cost each month. This is in contrast to how departments normally make payments which are known in advance and invoiced in arrears. Nevertheless, that hurdle has been overcome and hopefully set a precedence.

So the meeting we had with ICT Services was in light of all this and we recognised and agreed that Orbital was a timely and appropriate project by which the university could pilot a more extensive use of cloud services and look at how we might integrate servers in the cloud with our existing server farm. It would also allow us to think about new business models where the real costs of running a server are more transparent to everyone, rather than being absorbed by ICT as the server ages.

Nick has been setting up the Orbital development environment and basic architecture (more on that in another post) using Rackspace and the Orbital project pays for this each month via our departmental credit card. This works fine if a) the department is happy to use the credit card in this way; and b) we have dedicated project funds for this, but it’s no way to run a long-term service that is to be sustained by the institution. Our interest is not really in whether we use Rackspace or Eduserv for hosting during the period of our project – both offer Linux boxes afterall – rather we’re interest in working with ICT to ensure that by the end of the project, there are formal processes in place for a) running sustained services in the cloud; and b) providing researchers with the ability to spin up and manage adhoc servers as and when they are required.

The plan is to evaluate both Rackspace and Eduserv over the coming months, looking at which service fits best with the future plans of ICT Services. Rackspace has a much more mature offering, but we’re really keen to work with Eduserv too, recognising that they’re a new not-for-proft provider of cloud services, running on JANET and with a long history of providing hosting and other technical services to HE and government.

At our meeting with Andy, he went through much the same presentation that Nick and I had seen at the MRD start-up meeting, answering our specific questions along the way. He also demonstrated (for the first time??) the vCloud Director interface for setting up and managing the servers, and this should, in principle, integrate with our existing VSphere system. One of the nice things about the Eduserv offering is that unlike most other cloud providers, they provide the entire vCloud Director application to their customers, including a full API, rather than a cut-down interface. We’ve yet to see how vCloud Director will allow us to create access controls for different types of users, but that’s what the Orbital project will be helping to investigate and I’m pleased that we’re able to work with our ICT department in this way. There are other important questions, too, around data protection and liabilities, and Andy was keen that we review Eduserv’s Terms and Conditions and SLA and feed back our thoughts on it.

This experience will allow me to better understand the business model of the cloud and how to make the business case for developing and running cloud-based services. As Nick previously said, it also allows us to make our costs more transparent, too, so that the actual costs (per Gigabyte and per Gigahertz) of managing research data are clearer to both Researchers and the institution. Having a clearer idea of the costs will help us create a more sustainable service in the long run.