Further to yesterday’s blog post about linking our CKAN datastore with our EPrints Repository (to allow researchers to deposit permanent, public, citable records of their datasets), here’s a fleshed-out diagram of the proposed dataset deposit workflow process.
At the moment, this assumes a one-time “fire and forget” deposit. At some point, we’re going to have to tackle versioning.
The original diagram is available on Lucidchart. See the table in my previous blog post for details of which data fields are involved in the process (i.e. passed between CKAN, Orbital Bridge, the DataCite API, and EPrints).
This is a proposal and still has to be road-tested. Comments welcome.
Stages in the proposed deposit process:
- User enters project metadata in AMS
- AMS creates project container in CKAN
- User creates dataset record in CKAN
- Nucleus adds user metadata to CKAN
- User deposits data in CKAN
- User presses “DEPOSIT DATASET” button in CKAN
- Orbital Bridge requests DOI
- DataCite API returns DOI
- Orbital Bridge adds DOI to dataset record in CKAN
- User reviews and approves dataset metadata (making changes if necessary)
- Orbital Bridge writes changes back to dataset record in CKAN
- Orbital Bridge creates a new EPrints record via SWORD
- EPrints confirms existence of new record
- Orbital Bridge writes EPrints record URL back to CKAN dataset record
Paul, Interesting ideas. Some questions follow. In this new context I need to backtrack and reconsider Q1 that I may have discussed previously with Joss.
1 Given the full workflow represented here, what role is CKAN playing that EPrints can’t? Is this process appropriate and sufficient, or over-complex?
2 At stage 6 in the workflow user instructs “DEPOSIT DATASET”, which implies some finality to the process, but not so, it seems. How will you ensure the user is still watching by stage 10?
3 What checks will be in place to ensure correct completion of stages 11-14?
4 Stage 1, project metadata: have you thought about the possibility of pulling some of this information from a DMP?
5 It’s not clear what the Nucleus stage means or does.
Thanks Steve. Good questions!
1. CKAN is a data playground, and we expect will include a lot of stuff that people are working on but have no intention of ever publishing/depositing/recording officially (in EPrints).
2. Stages 7-9 are automatic and should take place in (near) real time, so as far as user intervention is concerned they’ll go straight from stage 6 to stage 10 (stage 10 being a “review and confirm your deposit” screen). Also we need to decide on wording. “DEPOSIT DATASET” may be a bit final-sounding.
3. Orbital Bridge is the application that looks after integrity of the data, so I suppose it will be the thing that [a] checks that the SWORD process has completed successfully, reporting an error to EPrints admin (i.e. me) if it hasn’t, and [b] acts as the control application for changes written back to CKAN. But I’ll need to defer to my developer colleagues to tell you exactly how we’ll ensure all automated stages are completed with checks.
4. Excellent idea. We’re looking at DMPOnline API integration, but it’s at an early stage. Our AMS (Awards Management System) is the system of record for institutional project / funding data, so we need to make sure we’re not creating data that conflicts with that system.
5. I’ve not gone into it in any here, but for the purposes of this project Nucleus provides all “institutional” data – HR data including people’s names and IDs, and controlled lists of University departments and subjects.