Servers, Servers Everywhere…

One of the cool things about Orbital from my point of view is that I’m not just responsible for putting together a bit of software that runs on a web server, but also for designing the reference platform which you run those bits of software on.

At this point I could digress into discussing exactly what boxes we’re running Orbital on top of, but that doesn’t really matter. What is more interesting is how the various servers click together into building the complete Orbital platform, and how those servers can help us scale and provide a resilient service.

You’re probably used to thinking of most web applications like this:

A 'traditional' server model

It’s simple. You install what you need to run your application on a server, hook it up to the internet, and off you go. Everything is contained on a single box which gives you epic simplicity benefits and is often a lot more cost efficient, but you lose scalability. If one day your application has a traffic spike your Serv-O-Matic 100 may not be able to cope. The solution is to make your server bigger!

Throw more power at it!

This is all well and good, until you start to factor in resiliency as well. Your Serv-O-Matic 500 may be sporting 16 processor cores and 96GB of RAM, but it’s only doing it’s job until the OS decides it’s going to fall over, or your network card gives up, or somebody knocks the Big Red Switch.

Continue reading “Servers, Servers Everywhere…”

Piloting the cloud

Last month, Nick wrote about how Orbital is being designed as an application to run in the cloud. This week, we met with Andy Powell from Eduserv to discuss the use of their ‘Education Cloud‘ for the Orbital project.

In the run up to this meeting, we’d been talking to colleagues in ICT Services about our need for more flexibility and autonomy when we required servers in order to do our work. Outside of work we’re quite used to spinning up servers on Rackspace or AWS to try things out and increasingly we’ve been looking for ways to take control of our servers in this way at work. We’re not the only Researchers who need this flexibility; colleagues in LiSC have also been telling us that for some of their work, the scalability and reliability of cloud services is looking increasingly attractive.

This is not to say that ICT services is inflexible and unreliable by any means. I’ve always found my colleagues very willing to help where and when they can, but I think we’d all agree that a central ICT department in a university, with the multivarious responsibilities it has, is not the same as a dedicated cloud provider and, in our case, does not offer the resilience nor the scalability that Rackspace or Amazon are offering for example. The availability of resources, the business model and available support are quite different. When I joined the University in 2007, ICT Services were implementing a new VMWare server farm, which has given us more flexibility than having to work with physical boxes in every instance. Typically, if I want a Linux server with 4GB RAM and 100GB of HDD, I put in a request, transfer approx. £1200, and some time later, a virtual server is provided to me at no further cost. If I need more RAM or HDD, I put in another request, transfer some money, and some time later, I get what I need. This process can take weeks or months.

However, our VMWare farm is now almost five years old and nearing ‘end of life’ and I know that ICT are thinking about the next five year cycle and how cloud computing fits into their future plans. Colleagues in the Online Services team have been using Rackspace recently as a CDN for the Common Web Design framework as well as hosting our popular Gateway website, and have been very impressed with the service. The main hurdle was not technical but organisational: billing for the use of the CDN is by credit card and Pay As You Go (PAYG), meaning we don’t know exactly how much it will cost each month. This is in contrast to how departments normally make payments which are known in advance and invoiced in arrears. Nevertheless, that hurdle has been overcome and hopefully set a precedence.

So the meeting we had with ICT Services was in light of all this and we recognised and agreed that Orbital was a timely and appropriate project by which the university could pilot a more extensive use of cloud services and look at how we might integrate servers in the cloud with our existing server farm. It would also allow us to think about new business models where the real costs of running a server are more transparent to everyone, rather than being absorbed by ICT as the server ages.

Nick has been setting up the Orbital development environment and basic architecture (more on that in another post) using Rackspace and the Orbital project pays for this each month via our departmental credit card. This works fine if a) the department is happy to use the credit card in this way; and b) we have dedicated project funds for this, but it’s no way to run a long-term service that is to be sustained by the institution. Our interest is not really in whether we use Rackspace or Eduserv for hosting during the period of our project – both offer Linux boxes afterall – rather we’re interest in working with ICT to ensure that by the end of the project, there are formal processes in place for a) running sustained services in the cloud; and b) providing researchers with the ability to spin up and manage adhoc servers as and when they are required.

The plan is to evaluate both Rackspace and Eduserv over the coming months, looking at which service fits best with the future plans of ICT Services. Rackspace has a much more mature offering, but we’re really keen to work with Eduserv too, recognising that they’re a new not-for-proft provider of cloud services, running on JANET and with a long history of providing hosting and other technical services to HE and government.

At our meeting with Andy, he went through much the same presentation that Nick and I had seen at the MRD start-up meeting, answering our specific questions along the way. He also demonstrated (for the first time??) the vCloud Director interface for setting up and managing the servers, and this should, in principle, integrate with our existing VSphere system. One of the nice things about the Eduserv offering is that unlike most other cloud providers, they provide the entire vCloud Director application to their customers, including a full API, rather than a cut-down interface. We’ve yet to see how vCloud Director will allow us to create access controls for different types of users, but that’s what the Orbital project will be helping to investigate and I’m pleased that we’re able to work with our ICT department in this way. There are other important questions, too, around data protection and liabilities, and Andy was keen that we review Eduserv’s Terms and Conditions and SLA and feed back our thoughts on it.

This experience will allow me to better understand the business model of the cloud and how to make the business case for developing and running cloud-based services. As Nick previously said, it also allows us to make our costs more transparent, too, so that the actual costs (per Gigabyte and per Gigahertz) of managing research data are clearer to both Researchers and the institution. Having a clearer idea of the costs will help us create a more sustainable service in the long run.

Jenkins, build my software!

Orbital is going to be a big bit of software, with lots of things doing lots of other things. A big part of putting together such a large bit of software – alongside our Pivotal Tracker instance – is the regular process of ‘building’ the software from source code into something that can actually be used, testing it and getting it onto our development servers so that we can actually see what it’s doing. As part of Orbital we’re taking a step into what is a relatively unexplored frontier for the development team here at Lincoln – Continuous Integration.

Continuous Integration means that as we develop our software it’s constantly being built, tested and deployed to make sure that it’s behaving as expected. We’re using the popular Jenkins server to manage everything that’s going on as part of this process, as well as provide reports on what’s happened. We’re slowly adding more things to the list of what’s actually happening when the magic starts, but here’s what we’re going to be doing by the end of the project every single time that somebody makes a change to our codebase:

  • Ensure that the source code is available from GitHub.
  • Invoke Phing to do all kinds of additional goodness as part of an automated build, including:
    • Run unit tests on our code using PHPUnit.
    • Verify that the code adheres to certain style standards (We use the CodeIgniter Style Guide) using PHP Code Sniffer. Specifically we’re using Thomas Ernest’s implementation of the guide.
    • Run a whole battery of analysis that looks for messy code structure and duplicate code.
    • Automatically build the technical documentation using DocBlox. This isn’t the end-user documentation, but it does tell us exactly what all our code is supposed to be doing so that we have a reference.
    • Perform token replacement on the resultant codebase. This means that we can keep the code repository clear of all environment and institution specific configuration, since these are replaced as we perform a build.
  • Deploy the built codebase to our development and testing platform so that we can actually use it.
  • Tell us the results of all of the above in a variety of pretty graphs and reports.

Continue reading “Jenkins, build my software!”

Forecast: Cloudy

With the Orbital project we’re looking at taking a leap into the brave new world (well, at least as far as university projects go) of cloud hosting. Now, I hate the word “cloud” when used to describe most services because people bander it about like it’s some magical world-fixing technology when in actual fact all they mean is “it’s on the internet”. We have “cloud” services which are fundamentally no different from doing the same thing on a server kept on a desk somewhere; but Orbital isn’t going to be like that.

I hope.

Instead, Orbital will be a true ‘cloud’ service in that it’s a resource which end users can tap into with no care at all for the underlying technologies. It’ll scale up and down with demand, extending both processing power and storage space as needed. Should one of our servers fail for any reason it won’t be met with a week of downtime whilst we rebuild things, but instead a seamless transition of work to one of the redundant, load balanced alternatives. If a process stops working then instead of the entire system crashing down it’ll adapt, queueing tasks until things are restored. Alongside this, the use of common standards is something that is essential to development. RESTful APIs follow well understood principles for interacting with data, and authentication using OAuth (the same authentication method used by Twitter, Facebook, Google and Microsoft) is core to how things behave.

Whilst the Orbital application itself is built to run in a cloudy manner using these loosely coupled methods and Rambo architecture, we’re also going to be hosting the thing in the cloud. This helps us with a few things including the aforementioned scalability, improved resiliency and the ability to properly analyse how the cloud works for higher education.

There’s also an unexpected benefit of this cloudy approach to Orbital: we gain the ability to pin a real-world cost on the storage of research data since we are quite literally being charged by the GB. At the moment researchers tend to treat storage as a one-off cost – for example buying a pile of hard disks – with less understanding of what it actually costs to keep them spinning. Since Orbital will know more about the intricacies of the stored data than the researchers we will be able (for the first time) to offer a number both in terms of how much it is costing to store data and also the estimated carbon impact.

Both these numbers are something that we want to be able to give researchers to help them understand that hanging on to research data has a cost, but also that it’s probably more efficient to hang on to it in a central, cloud-based platform. Of course, we also want to give people a clean exit strategy so we’re also going to be looking at ways of easily creating ‘hard’ copies for offline, non-cloud storage whilst still maintaining a virtual presence for the purposes of referencing and metadata.

Research Data vs Research Data

As I’ve been looking closer at various requirements for Orbital, as well as other research data management projects, it’s becoming increasingly apparent that Orbital has taken a different tack when it comes to defining what research data actually is. Whilst not a problem, it does lead to a certain disconnect when talking to people with a different idea about what data means. When it comes to storing data the disconnect is even bigger, caused by people experiencing problems breaking the transit format of the data away from the data itself. In true engineering/computing style, it’s time for an analogy. I’m using sweets because hey, sweets are awesome.

Sugar! Sugar!

Imagine a tube of Smarties (or sugar-coated chocolate beans of choice). When I talk about research data I’m talking about the individual smarties, the individual nuggets of information. You could tip 100 tubes of smarties into a bowl and you’d just end up with a big pile of smarties. You could then go through and sort the smarties by colour, or perform some other type of organisation. Since you’ve got the individual smarties out of their containers it’s a lot easier to see a whole overview and work with them all at once.

 

Taking this approach makes sense to me, because if I want to throw in a couple of bags of Peanut M&Ms I can do without suddenly having a tube saying “Smarties” which contains nuts. I can still sort my pile of sweets into colours, or into types. I can orient them by the little letters on top. I could throw in a handful of jelly beans and a bar of chocolate broken into squares, and then order by sugar content, colour, and number of artificial flavours. The possibilities are quite literally limited only by my tolerance for sugar highs.

Continue reading “Research Data vs Research Data”