Today I’ve been kicking around the ICT office with Alex, figuring out how to make Jenkins (our wonderful CI server) build and publish the latest version of the CWD with all the bells and whistles like compilation of CSS using LESS, minification, validation of code and so-on. As part of this we managed to fix a couple of bits and pieces which had been bugging me for a while, namely the fact that GitHub commit notifications weren’t working properly (fixed by changing the repository URI in the configuration) and the fact that Campfire integration wasn’t working (fixed by hitting it repeatedly with a hammer).
This brought me to thinking about how our various things tie in together, so I set about charting a few of them up. After a while I realised the chart had basically expanded into a complete flowchart of the various tools and processes that hang together to keep the code flowing in a steady stream from my brain – via my fingers – into an actual deployment on the development server. Since it may be of interest to some of you, here’s a pretty picture:
This is (approximately) the toolchain I currently use for Orbital, including rough details of what is being passed around
The beauty of this is that the vast majority of the lines happen completely by themselves — I get to spend my days living in the small bubble of my local development server and dipping in and out of Pivotal Tracker to update stories. The rest is magically happening as I work, and the constant feedback through all our monitoring and planning systems (take a look at SplendidBacon for an epic high-level overview) means that the rest of the project team and any project clients can see what’s going on at any time.
One of the interesting things about Orbital is its use of an API-driven development approach. In traditional, API-less applications your end-to-end system would look something like this:
This is all well and good if the only thing you want to be able to interact with your application is a real user, but it’s increasingly a bad idea. Users can interact with your application as intended, but should a machine want to get at your data (which may happen for any one of a hundred reasons) they’ve got to muck about pretending to be a user and scraping data. Everybody is building with APIs nowadays, and if you aren’t then you’re going to be left behind, cold and frightened, in a world which no longer subscribes to the notion that monolithic software can stand on its own and provide useful functionality.
So the next step is to bolt on an API.
This is the most common form of API around, and consists of a ‘second view’ on the data and functionality of an application. This is a massive step forwards and makes lives much, much easier in most cases. The only downside is that it’s very easy for this kind of API to provide a ‘bare bones’ functionality, such as only providing a list of items when the ‘real’ user interface lets you not only view the list but also edit its contents. It’s better than nothing but not ideal, which is why Orbital is taking the next step:
Under this design the API is the only way to interface with the data and functionality of the system. If a user wants to access it they must go through an intermediary to translate their wishes into API calls, and the results back into a nicely human readable form. The plus side is that any other consumer of the service is free to interact with the application on exactly the same terms as the ‘official’ frontend, providing that it has been granted those permissions. As far as Orbital Core (our actual application) is concerned there is no functional difference between Orbital Manager (our frontend) and an application that a researcher has hacked together to give themselves an easier time inputting data — they are subject to the exact same access controls, restrictions, sanity checking and limitations.
This means that every time we want to build user-facing functionality we have to stop, look at our APIs and work out where the functionality belongs. This also has the added benefit of making it essential to fully document our APIs for our own sanity, as well as ensuring that we have lightweight data transfer and rock-solid error handling baked right in.
The downside is that we have to double up on some bits of development, writing both the Core and Manager sides. It can also lead to the usual frustrations you get when trying to communicate with APIs, but on the plus side we have the ability to change both ends for the better.
Know of any other API-driven development in the fields of higher education or research data management? We’d love to hear about them, so that we can try to make our APIs as compatible as possible and improve interoperability. Drop us a note in the comments.
One of the cool things about Orbital from my point of view is that I’m not just responsible for putting together a bit of software that runs on a web server, but also for designing the reference platform which you run those bits of software on.
At this point I could digress into discussing exactly what boxes we’re running Orbital on top of, but that doesn’t really matter. What is more interesting is how the various servers click together into building the complete Orbital platform, and how those servers can help us scale and provide a resilient service.
You’re probably used to thinking of most web applications like this:
It’s simple. You install what you need to run your application on a server, hook it up to the internet, and off you go. Everything is contained on a single box which gives you epic simplicity benefits and is often a lot more cost efficient, but you lose scalability. If one day your application has a traffic spike your Serv-O-Matic 100 may not be able to cope. The solution is to make your server bigger!
This is all well and good, until you start to factor in resiliency as well. Your Serv-O-Matic 500 may be sporting 16 processor cores and 96GB of RAM, but it’s only doing it’s job until the OS decides it’s going to fall over, or your network card gives up, or somebody knocks the Big Red Switch.
Orbital is going to be a big bit of software, with lots of things doing lots of other things. A big part of putting together such a large bit of software – alongside our Pivotal Tracker instance – is the regular process of ‘building’ the software from source code into something that can actually be used, testing it and getting it onto our development servers so that we can actually see what it’s doing. As part of Orbital we’re taking a step into what is a relatively unexplored frontier for the development team here at Lincoln – Continuous Integration.
Continuous Integration means that as we develop our software it’s constantly being built, tested and deployed to make sure that it’s behaving as expected. We’re using the popular Jenkins server to manage everything that’s going on as part of this process, as well as provide reports on what’s happened. We’re slowly adding more things to the list of what’s actually happening when the magic starts, but here’s what we’re going to be doing by the end of the project every single time that somebody makes a change to our codebase:
Run a whole battery of analysis that looks for messy code structure and duplicate code.
Automatically build the technical documentation using DocBlox. This isn’t the end-user documentation, but it does tell us exactly what all our code is supposed to be doing so that we have a reference.
Perform token replacement on the resultant codebase. This means that we can keep the code repository clear of all environment and institution specific configuration, since these are replaced as we perform a build.
Deploy the built codebase to our development and testing platform so that we can actually use it.
Tell us the results of all of the above in a variety of pretty graphs and reports.
With the Orbital project we’re looking at taking a leap into the brave new world (well, at least as far as university projects go) of cloud hosting. Now, I hate the word “cloud” when used to describe most services because people bander it about like it’s some magical world-fixing technology when in actual fact all they mean is “it’s on the internet”. We have “cloud” services which are fundamentally no different from doing the same thing on a server kept on a desk somewhere; but Orbital isn’t going to be like that.
I hope.
Instead, Orbital will be a true ‘cloud’ service in that it’s a resource which end users can tap into with no care at all for the underlying technologies. It’ll scale up and down with demand, extending both processing power and storage space as needed. Should one of our servers fail for any reason it won’t be met with a week of downtime whilst we rebuild things, but instead a seamless transition of work to one of the redundant, load balanced alternatives. If a process stops working then instead of the entire system crashing down it’ll adapt, queueing tasks until things are restored. Alongside this, the use of common standards is something that is essential to development. RESTful APIs follow well understood principles for interacting with data, and authentication using OAuth (the same authentication method used by Twitter, Facebook, Google and Microsoft) is core to how things behave.
Whilst the Orbital application itself is built to run in a cloudy manner using these loosely coupled methods and Rambo architecture, we’re also going to be hosting the thing in the cloud. This helps us with a few things including the aforementioned scalability, improved resiliency and the ability to properly analyse how the cloud works for higher education.
There’s also an unexpected benefit of this cloudy approach to Orbital: we gain the ability to pin a real-world cost on the storage of research data since we are quite literally being charged by the GB. At the moment researchers tend to treat storage as a one-off cost – for example buying a pile of hard disks – with less understanding of what it actually costs to keep them spinning. Since Orbital will know more about the intricacies of the stored data than the researchers we will be able (for the first time) to offer a number both in terms of how much it is costing to store data and also the estimated carbon impact.
Both these numbers are something that we want to be able to give researchers to help them understand that hanging on to research data has a cost, but also that it’s probably more efficient to hang on to it in a central, cloud-based platform. Of course, we also want to give people a clean exit strategy so we’re also going to be looking at ways of easily creating ‘hard’ copies for offline, non-cloud storage whilst still maintaining a virtual presence for the purposes of referencing and metadata.