Orbital and the OAIS reference model

In the Orbital project bid, I wrote,

We intend to re-use and develop some of the underlying tools we have built to provide an institution-wide service for the ingest, description, preservation and dissemination of research data, which is informed by the OAIS reference model.

My first encounter with OAIS was about seven years ago, when I was designing a digital archive for Amnesty International’s image, film and video archives. If you begin to do any design work in the digital archiving domain, you come across the OAIS model very quickly. It is the standard and though somewhat daunting, when you get your teeth into it, you realise that it does an excellent job of describing what any decent digital archive should be doing anyway.

The mistake to make with OAIS is looking at the model and thinking that you have to create a system that is designed in such a way, rather than functions in such a way. The OAIS standard is a tool that allows Archivists, Designers and Developers to share a common language when discussing and planning the implementation of a digital archive and what that archive should do, not how it should be designed.

Here is the high-level OAIS model. (Here is the full, composite model).

OAIS Functional Entities
OAIS Functional Entities

Here is a high-level model of Orbital.

Orbital design
Orbital design

They’re very different because they should be. Remember, the first is a functional model, the second is a logical model of Orbital’s server requirements.

Here’s another model I published recently relating to building staff profiles.

Building staff profiles
Building staff profiles

 

The task ahead of the Orbital Team is to consider how our on-going designs like these two above, relate to the OAIS functional model. For example, the staff profile diagram (which is not simply an abstract model, but a retrospective design document) tells us where some of the OAIS Submission Information Package (SIP) information will be derived from. When a ‘Producer’ (i.e. a Researcher) signs in to Orbital and uploads a dataset, that content and the information we know from their university profile, as well as further information that they provide, constitutes the SIP.

As I was reading around the OAIS standard recently, I came across a nice piece of work done by a collaborative project between Cornell and Göttingen State universities. Their MathArc project was run using the XP agile project management methodology and as part of their development process, they broke down the OAIS reference model into a deck of 33 cards and tackled each one as a specific iteration. You can download the cards, here [.doc].

Although we’re not planning on 100% OAIS ‘compliance’ during this pilot stage of the Orbital project, it is our intention that Orbital is informed by the OAIS standard and these cards provide a useful and provocative set of functional requirements that we have added to our project tracker. Nick’s currently working on authentication and security for Orbital, or rather, card #3 ‘Provide Security Services’:

Protect sensitive information in the system, including authentication, access control, data integrity, data confidentiality, and non-repudiation services.

Cards #1 and #2 (‘Provide O/S Services’ & ‘Provide Network Services’) are requirements that go beyond the Orbital project itself and are largely met by the wider IT infrastructure that we’re working in at Lincoln. However, our work on ‘piloting the cloud‘ also addresses issues relating to the operating environment and networking of our future RDM infrastructure.

Nick and I met a couple of weeks ago to look at the OAIS model in detail and consider it in light of what we’re beginning to implement. As we’d hope, the high-level stuff you see in the diagram at the top of this post is easy to ‘tick off’. You couldn’t really say you were building an infrastructure to manage research data if you weren’t clear on the basic functional entities of OAIS. The lower-level components of the OAIS standard are where the work gets interesting and more demanding, and where the deck of cards is useful. Along the way, we’ll be creating diagrams like those above to show how we’re iteratively addressing the OAIS standard so that by the end of the project, we should have a model of our own that maps reasonably well onto the OAIS composite model.

I’d be very interested to hear from other MRD projects that are looking at the OAIS standard in detail, as well as people from earlier projects that have been through this process before. I know there have been a number of such efforts in the JISC community. A couple of the documents I found useful were Alex Ball’s Briefing Paper and Julie Allinsons’s OAIS as a reference model for repositories. The paper I remembered from my time at Amnesty is Brian Lavoie’s Introductory Guide. If you’re new to OAIS, I’d recommend that you at least read Lavoie’s report before tackling the full OAIS standard document (PDF).

Piloting the cloud

Last month, Nick wrote about how Orbital is being designed as an application to run in the cloud. This week, we met with Andy Powell from Eduserv to discuss the use of their ‘Education Cloud‘ for the Orbital project.

In the run up to this meeting, we’d been talking to colleagues in ICT Services about our need for more flexibility and autonomy when we required servers in order to do our work. Outside of work we’re quite used to spinning up servers on Rackspace or AWS to try things out and increasingly we’ve been looking for ways to take control of our servers in this way at work. We’re not the only Researchers who need this flexibility; colleagues in LiSC have also been telling us that for some of their work, the scalability and reliability of cloud services is looking increasingly attractive.

This is not to say that ICT services is inflexible and unreliable by any means. I’ve always found my colleagues very willing to help where and when they can, but I think we’d all agree that a central ICT department in a university, with the multivarious responsibilities it has, is not the same as a dedicated cloud provider and, in our case, does not offer the resilience nor the scalability that Rackspace or Amazon are offering for example. The availability of resources, the business model and available support are quite different. When I joined the University in 2007, ICT Services were implementing a new VMWare server farm, which has given us more flexibility than having to work with physical boxes in every instance. Typically, if I want a Linux server with 4GB RAM and 100GB of HDD, I put in a request, transfer approx. £1200, and some time later, a virtual server is provided to me at no further cost. If I need more RAM or HDD, I put in another request, transfer some money, and some time later, I get what I need. This process can take weeks or months.

However, our VMWare farm is now almost five years old and nearing ‘end of life’ and I know that ICT are thinking about the next five year cycle and how cloud computing fits into their future plans. Colleagues in the Online Services team have been using Rackspace recently as a CDN for the Common Web Design framework as well as hosting our popular Gateway website, and have been very impressed with the service. The main hurdle was not technical but organisational: billing for the use of the CDN is by credit card and Pay As You Go (PAYG), meaning we don’t know exactly how much it will cost each month. This is in contrast to how departments normally make payments which are known in advance and invoiced in arrears. Nevertheless, that hurdle has been overcome and hopefully set a precedence.

So the meeting we had with ICT Services was in light of all this and we recognised and agreed that Orbital was a timely and appropriate project by which the university could pilot a more extensive use of cloud services and look at how we might integrate servers in the cloud with our existing server farm. It would also allow us to think about new business models where the real costs of running a server are more transparent to everyone, rather than being absorbed by ICT as the server ages.

Nick has been setting up the Orbital development environment and basic architecture (more on that in another post) using Rackspace and the Orbital project pays for this each month via our departmental credit card. This works fine if a) the department is happy to use the credit card in this way; and b) we have dedicated project funds for this, but it’s no way to run a long-term service that is to be sustained by the institution. Our interest is not really in whether we use Rackspace or Eduserv for hosting during the period of our project – both offer Linux boxes afterall – rather we’re interest in working with ICT to ensure that by the end of the project, there are formal processes in place for a) running sustained services in the cloud; and b) providing researchers with the ability to spin up and manage adhoc servers as and when they are required.

The plan is to evaluate both Rackspace and Eduserv over the coming months, looking at which service fits best with the future plans of ICT Services. Rackspace has a much more mature offering, but we’re really keen to work with Eduserv too, recognising that they’re a new not-for-proft provider of cloud services, running on JANET and with a long history of providing hosting and other technical services to HE and government.

At our meeting with Andy, he went through much the same presentation that Nick and I had seen at the MRD start-up meeting, answering our specific questions along the way. He also demonstrated (for the first time??) the vCloud Director interface for setting up and managing the servers, and this should, in principle, integrate with our existing VSphere system. One of the nice things about the Eduserv offering is that unlike most other cloud providers, they provide the entire vCloud Director application to their customers, including a full API, rather than a cut-down interface. We’ve yet to see how vCloud Director will allow us to create access controls for different types of users, but that’s what the Orbital project will be helping to investigate and I’m pleased that we’re able to work with our ICT department in this way. There are other important questions, too, around data protection and liabilities, and Andy was keen that we review Eduserv’s Terms and Conditions and SLA and feed back our thoughts on it.

This experience will allow me to better understand the business model of the cloud and how to make the business case for developing and running cloud-based services. As Nick previously said, it also allows us to make our costs more transparent, too, so that the actual costs (per Gigabyte and per Gigahertz) of managing research data are clearer to both Researchers and the institution. Having a clearer idea of the costs will help us create a more sustainable service in the long run.

How the National Archives use MongoDB

We spoke to staff at 10gen yesterday about our choice of MongoDB for Orbital and were reminded that the National Archives (UK) are using it as the basis of their discovery platform. We plan to stay in touch with 10gen throughout the project and provide a Case Study on our use of Mongo for Managing Research Data. Here are some slides from a recent conference where the National Archives spoke about their work with MongoDB.

Source link.

USTLG meeting on research data management

Clare CollegeYesterday I was at Clare College, University of Cambridge for a meeting organised by USTLG, the University Science & Technology Librarians Group. The group—open to any librarians involved with engineering, science or technology in UK universities—has meetings once or twice a year. The theme of yesterday’s meeting (free to attend, thanks to sponsorship from the IEEE) was data management, with an implied focus on research data.

The meeting consisted of a series of presentations (plus a fantastic lunchtime diversion, below) with plenty of time for networking – there were about 40 people there, all with an interest in research data management – though interestingly, a show of hands suggested very few people were actively engaged in looking after their own institution’s researchers’ data.

As usual, this blog post has been partially reconstructed from the Twitter stream (hashtag #ustlg).

First up, Laura Molloy, substituting for Joy Davidson of the Digital Curation Centre (DCC), on a project called the Data Management Skills Support Initiative (DaMSSI), looking at the [shades of information literacy] skills needed by different people involved in the research data curation process. “DaMSSI aims to facilitate the use of tools like Vitae’s Researcher Development Framework (RDF) and the Seven Pillars of Information Literacy model” developed by SCONUL. Key question: how do you assess the effectiveness of research data management training?

Continue reading “USTLG meeting on research data management”