digitisation – what’s it all about, eh?

I’ve just been successful in an application to become the Project Officer for the LIFE-SHARE Project, which I’m really thrilled about for a number of reasons. (Including but not limited to, it’s an area I’m really interested in; it’s a natural progression from my current job; I withdrew from another interview and didn’t apply for another job I think I could have got because I knew this one was on the horizon so having very much put all my eggs in one basket it’s a relief that the basket had protected the eggs successfully; and with the upcoming cuts in Higher Education spending, it’ll probably be the last job ever advertised in an academic library anyway! I've updated my own library route accordingly.) The rationale of the project is to look at the life-cycle of digital materials and look at what skills and techniques will be needed further along the line, what training there is or needs to be, and eventually provide a bunch of information and guidance on the whole issue of digitisation for the HE community as a whole. Or to quote the official version, the primary aim is of “…identifying, and firmly establishing, institutional and consortial strategies and infrastructure for the curation, creation and preservation of a variety of digital content.” Digitisation in this context refers to more than scanning from print, which is the area I’ve mainly been involved in thus far, but digitising audio, video, image and anything else too. This got me thinking about digitisation, and something my Dad (he has all my best ideas…) said about the way in which we use it. His point was that for thousands of years, music existed as an oral tradition. Nothing was recorded or written down; it was disseminated purely by people passing music on (verbally and vocally) to one another and to future generations. Around a thousand years ago (give or take 150 years) monks started to write their plainchant down with musical notation, and a proper written tradition began. What is pertinent here is that the oral tradition continued alongside the written one (despite the latter in theory rendering the former superfluous) for several hundred years before finally dying out; people could write music down, but were not necessarily sure what to do with this new found ability. Similarly, it was a good while after Edison invented the Phonograph that people really knew what they wanted from recorded sound. [I’m going to quote thewikidad directly at this point: “…having sort of invented recording but not really knowing what to do with it he went off and spent ten years inventing the light bulb. During which time hundreds of people that might have been recorded (had we known that that's what recording would turn out to be for…) died. I imagine there are parallels with digitisation here too.” He’s right, there is a parallel: while we’ve been using digitisation primarily to increase access – an excellent use of it, certainly – many fragile digital objects may have become so degraded as to not be able to bear digitising now, or have been digitised insufficiently well and cannot be refreshed, so the digital object itself will eventually degrade past usability.]

It is around 50 years since the first scan as we understand the term today. Digitisation is, in technological terms, a relatively new development; we’ve only recently started to digitise stuff in earnest. Like many new technologies, there is initially a somewhat scattergun approach before people focus on what they really want out of digitisation – only now are we taking a step back, and looking at what digitisation is for and what it means in the long term. As mentioned above, it’s only fairly recently that preservation has been seen as an important and valid use of the technology, for example.

Digital Preservation is, I bet you a fiver, a much more interesting field than you think it is… For a start, it will effect everyone - even if you don't work with digital materials now, if you work in an information environment that has any at all then eventually, indirectly or directly, the lifecycle of a digital asset will become relevant. This is because, as the previous LIFE Project discovered, taking everything into account to do with acquiring and storage, an e-journal will for example cost a library £206. This compares with only £19 for a hand-held serial (ie your basic journal). However, ten years down the line, LIFE estimated the total lifecycle cost for an e-journal will have been £3,000! (As opposed to only £14 per issue for a hand-held serial). Multiply that by the number of e-journals most academic libraries subscribe to and the figures become staggering. These are just projections, and will be investigated further, but what is clear is that long-term storage of digital materials is actually going to be a lot more expensive than anybody realised, and that'll eventually have a knock-on effect to the budget of all the other departments in a library.    

There are all kinds of issues with preservation – if you digitise an old piece of papyrus to preserve its contents, you may only be able to do once. There’s no refreshing it or doing it again if the file corrupts, as the papyrus won’t be able to withstand repeated scanning. If you’re digitising a sound-recording, how do you know who to get permission to do this from? Who owns the rights of some obscure recording of a speech in 1940? Then you’ve got lossy file-formats gradually eroding the integrity of your digitised objects, the challenge of future-proofing something so that it is of a high enough standard for future generations (image resolution is a good example of this – what is considered ‘exceptional’ quality changes basically by the year) while still being small enough in file size to store in a repository today.

In short, you have in many cases just one shot at taking something precious, and somehow ensuring that not just in 5 or 10 years but in 200 years time and beyond, people are still going to be able to use it and find it of sufficient standard and integrity. There are various ways of achieving this, including the main threads of preservation such as emulation, migration, and technology preservation.

Technology preservation is literally preserving the means to play / view / access the object you wish to preserve. So for example, you can preserve reel-to-reel tapes by ensuring you have a number of reel-to-reel players in good working order. Migration (or refreshing) is the process of transferring something from one format to another (print to PDF, or reel-to-reel to .wav, or whatever). This ensures that even after the original format becomes obsolete, you are still able to make use of your digital object. Emulation is perhaps the most intriguing method, as this involves recreating or appropriating lost or obsolete technology in order to utilise the object in its current state – so the reel-to-reel example would perhaps involve making a new piece of kit which allowed you to play reel-to-reel tapes on a computer.

Anyway, on a related note, what this means is I can now focus on the Digitisation in HE Best Practice Wiki again – yay! You may have noticed this has faded into the background somewhat (you may not even know that this was what the blog was originally intended to document..) but that was because I’ve had a major deadline to get through, and because I knew this job was coming up and wanted to incorporate elements of each into the other if I got the role. I want to expand the Digitisation wiki to include all sorts and kinds of digitisation, rather than just scanning from print, and perhaps to disseminate the knowledge we gain as part of the LIFE-SHARE Project via this medium as well. Thank you to those of you have expressed an interest in being part of the informal (and strictly email-based) Working Group to sort out how to populate the wiki – I’ll sort this out properly now (and any more volunteers please contact me…). Expect more wiki-news soon.

-  thewikiman