Version 0.3 released

Our aim is to release a new version of Orbital every month until the end of the year. Yesterday, we released version 0.3, which, as well as many small improvements and bug fixes,  improves the handling of dynamic datasets and begins work on implementing and integrating ownCloud with Orbital. Here’s the changelog.

  • Improvements to project activity timelines:
    • Public/private modes
    • Calendar events
  • Improvements to filetype handling and file uploads
  • Improvements to file management, collections and private/public modes
  • Dynamic datasets:
    • A working query builder
    • Queries can be saved and re-run against data
    • CSV output of data for use by external tools e.g. Matlab
  • Working Datasets:
    • Preparation for ownCloud integration (integration with Lincoln SSO, evaluation of product, contact with developers)

The plan for version 0.4 is full ownCloud integration with Orbital via the respective APIs, which will provide the first part of the overall Orbital workflow: ‘Working Data’ -> ‘Dynamic Data’ -> Archive Files. During two weeks in August we’ll also be setting up our own private in-house cloud using OpenStack and moving Orbital in-house from Rackspace.

Release The Releases!

One of the things that Orbital set out to do is to prove that agile development of software and solutions can arrive at the same outcome as a more traditional ‘waterfall’ method of project planning. A big part of this is the “release often” approach to development, shipping new versions far more quickly than is usually the case in academia. We’re actually on the slightly slow side of agile development (some companies push updates several times a day), but still aiming to ship a major point release every month for our users to have a play with and comment on.

On top of the point releases we’re also churning out additional maintenance releases with bug fixes and minor features roughly every week, our latest being 0.2.1 which shipped yesterday. These follow a fairly common pattern of odd-numbered releases (0.1.1, 0.2.1, 0.2.3 etc) being patch releases which fix bugs and even-numbered (0.2.2, 0.2.4 etc) adding evolutionary updates to existing functionality.

There are a few major benefits to doing things this way:

  • Orbital never gets the chance to stray far from user requirements – we can’t go off and spend 6 months developing something that doesn’t do what people want, because they can tell us if we’re going wrong at least once a month (and often more frequently than that).
  • Users who report bugs don’t have to wait several months for the next ‘service patch’ which rolls up thousands of changes, the next maintenance release will fix things and is usually only a few days out.
  • The gap between a feature request and implementation is dramatically reduced, so the majority of feature requests are delivered whilst the requirement is fresh in the minds of the user. This results in more immediate usage and feedback.
  • Our code is refactored and refined more often. Instead of building a massive codebase all at once and never going back to improve things we spend a small amount of time each release making sure that code is clean and sane, interfaces are well defined and so-on.
  • Our continuous integration server won’t let us ship a product which doesn’t meet minimum requirements of code quality, documentation and testing. Being forced through this process on a regular basis means that we never get a chance to build up a significant backlog of problems.
  • The use of our code repository and feature branches (there’s a post on this coming up later) means that every ‘merge’ of development code with our staging code is checked over by a developer other than the one who wrote the feature. When we ‘merge’ the staged and tested code with our production code the changes are checked yet again.
  • More granular releases make it easier to roll back when things go wrong. Moving from v0.2.1 to v0.2.3 doesn’t need any database changes, so if something isn’t working as expected its a simple matter to move back to an earlier release. In contrast, if we only ever moved from major releases v1 to v2 (which will almost inevitably involve changing a database schema) then performing a rollback becomes much more challenging.

Thus far Orbital has made four distinct releases (v0.1, v0.1.1, v0.2 and v0.2.1), with v0.2.2 due out next week. If you’re interested in seeing (roughly) what’s in the pipeline don’t forget our Pivotal Tracker can tell you more.

Notes on Orbital v0.2.1

A few notes on some of the new features in the latest version of Orbital: these were presented to Dr Bingo Wing-Kuen Ling on 15 June 2012.

  1. ‘Your Projects’ now includes an Activity Timeline of comments and file changes aggregated across all projects in Orbital; each project page also displays a timeline for that project.
    Screenshot of the Orbital timeline
  2. Files from the File Archives can be organised using Collections (which are ‘tag-like’ rather than ‘folder-like’: i.e. a file can belong to more than one Collection).
    Screenshot of Orbital project
  3. You can now edit project information and add new members to a Project. To do this, go to the Project within Orbital, click on the ‘edit’ button, and scroll down to Project Members.
    Screenshot of the Orbital project page
    Screenshot of the Orbital add members section
  4. Finally, a bug which was preventing the upload of files using Internet Explorer has now been fixed.

Orbital v0.2 release

Today, we released Orbital v0.2, about a month after our v0.1 release. As per the roadmap, Nick and Harry have made good progress on project activity data, user role management, dynamic datasets and, based on user feedback, we’ve added the ability to organise data into collections. You can read the high-level change log or trawl through the project tracker, if you feel inclined. Paul has also made some notes with screenshots on some of the new features.

You’ll notice that there are now APIs for loading data into Orbital’s MongoDB store and querying it, too. This is now in use on a daily basis, retrieving turbine data from Siemens, loading it into Orbital and then running queries on it. It’s very fast. I might add that updates to the data are being versioned, too, so a researcher can query data as it was stored in the past. There’s much more to be done to make Orbital a versatile platform for data analysis during the research process, but the groundwork is in place.

As we identified in our Implementation Plan, we see a workflow whereby data can be selected from a project workspace (e.g. a network drive), loaded into the dynamic datastore, analysed, and then eventually selected for archiving alongside published research papers.

And Now… Dynamic Data!

You may remember a while back that I blogged about how Orbital thinks of research data, using our “Smarties not tubes” approach. We then went away for a bit, and in our first functional release included a way of storing your tubes, but nothing about the Smarties. This understandably caused some confusion amongst those who had paid close attention to our poster and were expecting a bit more than a file uploader.

The reason behind this seeming disconnect in our plans was simple: a researcher had a bunch of files they wanted to publish through Orbital, and it made a lot more sense to build something that would let them do what they wanted straight out of the gate rather than devote our efforts to breaking up their nice datasets again and storing individual bits and pieces. Fortunately, our next functional release is planned to include our magical Smarty storage system. Here’s a quick overview.

Dynamic Data (as we’re calling it) uses a document storage database to keep tabs on individual data points within a research project. It’s designed specifically so that you can fill it up with fine-grained information as individual records rather than storing a single monolithic file. We think this is the best way to go about storing and managing research data during the lifetime of a research project for a few reasons:

  • It’s easier to find relevant stuff. Instead of trying to remember if the data you were looking for was in 2011-Nov-15_04_v2.xls or 2011-Nov-15_04_v3.xls you can instead just search the Dynamic Dataset.
  • It’s an awful lot easier for us to ensure a Dynamic Dataset is stored reliably than a bunch of files, due to databases’ tendencies to have good replication and resiliency options.
  • We can scale for storing individual files up to a few tens of gigabytes per file at most before things start to get silly, although we can store a lot of files at that size. We can scale a single Dynamic Dataset until we run out of resources.
  • Data can be reproduced in a number of standard ways from a single source. The same source can easily be turned into a CSV, or XML document, or JSON, or any other format we can write a structured description for.
  • With a little work, data can be reproduced in a number of non-standard ways from a single source. Templating engines can allow researchers to describe their own output formats.
  • Data can be interfaced with at a much more ‘raw’ level with only basic programming skills. Equipment such as sensors can automatically load data to a Dynamic Dataset, survey responses can be captured automatically and more. Data can be retrieved and analysed in the same way, for example scheduling analysis of a week’s worth of data.

The data in release v0.2 is manipulated purely at an API level through Orbital Core, although upcoming versions will have cool ways of manually entering and querying the data through the web interface. Data is then quickly processed to add things like tracking metadata (upload time etc) and shovelled off to our storage cluster of MongoDB servers.