Challenges in using blockchains to build trust in digital archiving

800px-A_view_of_the_server_room_at_The_National_Archives

Thu Oct 18, 2018

We've been researching whether blockchains could be part of the answer to ensuring that our public archives are trusted, unaltered and auditable. Our Technical Researcher Jared Robert Keller and Technical Associate Jez Higgins talk us through their discoveries so far.

Setting up a distributed ledger (a class of data infrastructure technology that includes blockchains) is no simple or straightforward task.

In early 2018, our team at the ODI joined ARCHANGEL, a collaborative research project which aims to understand how a distributed ledger technology-based system might be used as a mechanism to verify that objects have not been altered or adapted while stored in digital archives.

In the months since we announced our partnership with the National Archives and the University of Surrey, the ARCHANGEL project has progressed a good deal, and along the way we have been confronted by a number of practical challenges about how to deploy, structure and govern the ARCHANGEL distributed ledger.

This was to be expected given that our recent report on smart contracts advised that DLTs can be deployed in a wide range of ways depending on the use case, industry context and, perhaps most importantly, the needs of the various parties involved.

We outlined our approach to this project in a blog earlier this year, but the basic method is worth rehashing. By applying an algorithm to an object – say, a born-digital spending report or a digital scan of a once-paper memorandum – we can create a hash of that object (in essence, a one-way, reproducible fingerprint of that object) which can then be stored in a transparent and distributed database.

Throughout the life of that digital object (which might be very long indeed, ranging anywhere from a few decades in the case of a government-mandated embargo all the way to the end of the universe) the system can ensure two things: first, that no unauthorised alterations have been made during its time in the digital archive, and second, that for any authorised alterations, there is a transparent, auditable trail.

If, for instance, 20 years on, someone wants to confirm that the contents of the object have not been altered, they can run the hashing process again and compare the resulting hash to the hash that was made when the object was first deposited in the archive. If the object is untouched, the two hashes will be identical; if the object has been altered, the hash will be different, and the object should be treated with suspicion.

However, in this latter case, if the alteration is an approved alteration – as opposed to a malicious or accidental alteration – it will be possible for an archivist to add a record in the system that documents the changes made to that object. This is a crucial feature considering archives are not static institutions and sometimes have to make updates or alterations to the objects under their care – for instance, in the case of redactions to sensitive material. If, after years in an archive, an object needs to be altered, it will be possible to store a new hash of the altered version of that object alongside metadata detailing when the altered version was uploaded and by whom. It would even be possible to store metadata detailing the exact changes made, if so desired.

Transparency and control

One of the first requirements we had for the system was that it should be capable of balancing a need for control with a need for transparency. A major goal of the project, after all, is to use the transparent, immutable nature of distributed ledgers to engender trust between Archives and Memory Institutions (AMIs) and the people who rely on them – eg researchers, historians, journalists, the public.

If the system is to prove useful, it will need to be:

publicly readable to allow citizens to openly verify that objects released from digital archives have not been tampered with
AND possible for AMIs to control who is allowed to write new information into the ledger to prevent unauthorised parties from writing fraudulent information to the ledger.

A private ledger – where only certain authorised users have access to the database – would fail on the former condition (public readability); while a public ledger – where anyone may have a copy of the database and anyone may write to it – would fail on the latter (control over who is allowed to write new information).

Our implementation will instead use a ‘permissioned ledger’, thereby allowing anyone to access a copy of the database, but ensuring that only certain authorised parties may update it; the ideal balance, we think, of transparency and control.

By enabling members of the public to openly verify the unadulterated status of objects released from an archive, we believe we can help ensure that AMIs remain trusted custodians of what is becoming an increasingly-digital public record.

In previous decades, when archives primarily dealt with physical objects, this trust was underpinned largely by the reputation and prominence of the archival institution. Going forward, as archives are increasingly asked to safeguard born-digital objects, we believe technology, in particular distributed ledgers, will have a role to play as well. Technology will never be capable of completely replacing all other forms of trust, but we believe technologies like distributed ledgers may be able to augment those traditional pillars of trust, and in so doing, help bring transparency to archival processes that have in the past been conducted largely out of sight.

Indexable, searchable and verifiable

Another requirement for the system was that metadata about individual objects – eg notes from the depositing archivist or the date of deposition – should be searchable and thereby indexable and cross-referenceable. This meant that the hash for each digital object would have to include metadata about that object, and, furthermore, the metadata would need to be searchable while that hash was stored in the distributed ledger.

Our implementation will use smart contracts – essentially pieces of executable computer code stored on a distributed ledger that can automatically modify data stored on that ledger if and when certain conditions are met – to search, index, and verify objects within the system.

In light of the crucial role of smart contracts within our system we chose to create our network using Ethereum – one of the most convenient and easily accessible platforms for executing smart contracts. This does not mean the final implementation will necessarily use Ethereum, however. As the project progresses we may explore using other platforms, such as Hyperledger Fabric.

A cross-AMI model

Finally, blockchain technologies require collaboration, so while DLTs undoubtedly offer trust-related benefits for individual archives or heritage institutions, we believe a distributed ledger system such as ARCHANGEL can benefit the archival community as a whole if AMIs are willing and able to collaborate with each other on a national and international level.

A quick example helps to demonstrate how: although it would be technically feasible to construct a permissioned network wherein all the authorised nodes exist within a single nation or jurisdiction, such a system would be vulnerable to manipulation since a single government or organisation could compel member nodes to make fraudulent changes to the ledger.

On the other hand, a network with member nodes drawn from a wide range of different disciplines and countries would be able to resist similar attempts at manipulation. AMIs would be able to mutually reinforce the integrity of the objects in each other’s archives, further engendering trust between AMIs and the citizens that rely on them. Rather than being asked to trust a single archival or memory institution, citizens would be able to trust the hundred or even thousands of AMIs within the network.

Though collaboration on this level would require institutional and organisational changes in archival processes, AMIs have a long history of working collaboratively across borders, and collaborating in this way might even make possible new business models for AMIs – something our project partners are currently exploring.

Future plans

Over the next few months we plan to progress the project in a number of practical ways, including:

moving on from standard binary hashing – eg SHA-256 – to specialist hashing for particular object types such as PDF or even images and video
developing a publically accessible demo
and iterating on a user interface that will meet the needs of archivists and citizens alike.

We will even explore something we’ve taken to calling, the ‘edge of integrity’. Now there’s something to look forward to.

If you wish to get involved or find out more, email [email protected].

Image credit: CC BY 3.0 by The National Archives (UK)

About us

Our five year plan

What we do

Ready for consultancy?

Challenges in using blockchains to build trust in digital archiving

Transparency and control

Indexable, searchable and verifiable

A cross-AMI model

Future plans