AI excitement is everywhere, but we need to talk about data

Nigel Shadbolt seated at the ODI Summit 2022

Fri Mar 17, 2023

The chair of the Open Data Institute, Professor Sir Nigel Shadbolt, recently gave evidence to parliament on the governance of artificial intelligence. But what does AI have to do with data? At ODI Summit 2022, Sir Nigel was in conversation with Distinguished Professor Genevieve Bell, Director of the School of Cybernetics, Australian National University about AI. Much of what was covered is relevant to the evidence he provided to the Committee.In the first oral evidence session of the AI Governance Inquiry, the House of Commons Science and Technology Committee heard from one witness who said that in the past, ‘we didn’t really call it AI—it was just called maths, right?’

At the ODI we would swap out the word 'maths' for ‘data’. As our chair, Sir Nigel Shadbolt told the Committee in February (see video here) ‘although we are talking a lot about AI, for the algorithms, their feedstock—their absolute requirement—is data.' This means that, as the Open Data Institute has argued since its creation, ‘the need to build a trustworthy data ecosystem is paramount’.

AI systems, including the generative AI tools that have received so much attention recently (like ChatGPT, Google Bard, Stable Diffusion and DALL-E), not only require data – they are driven by it. As Sir Nigel told the select committee, and as the ODI’s new strategy also recognises, this data can be biased, and therefore should prompt concern about its trustworthiness. As the select committee heard, such bias can come from the data itself, as well as in the design of algorithmic systems and how they are implemented (and by whom). Data assurance will also be vital in helping people assess how much they should trust data and the technologies and techniques that make use of it. There may also be a role for new kinds of data institutions in how models and companies using data can be overseen and governed.

The importance of openness

As Sir Nigel told the select committee, a data ecosystem that is as far as possible open provides the best foundation (bearing in mind some data on the spectrum needs to be protected). This helps with transparency, accountability and understanding where data might be biased – for example, if we know that an AI language model has been trained on social media then ‘you can immediately imagine what we have got to worry about with that data landscape.’

But openness can also present opportunities and should not be seen as an obstacle. Sir Nigel cited both Google’s original search technology – which harvested webpages containing links put there in the open by humans; and HuggingFace, whose ‘entire business model is to collect these models together, make them available and have innovation happen around them’. It is also a ‘fascinating thing’ that so much development has been driven by open-source code.

Data literacy

Another area that came up at the select committee was the importance of data literacy. This goes beyond the technical skills needed to work with data; instead, it is ‘the ability to think critically about data in different contexts and examine the impact of different approaches when collecting, using and sharing data and information.’ As Nigel told the committee, ‘we will not all be data scientists on this stuff, but we should have some appreciation of how that landscape can work and operate effectively.’

Many regulators ‘are trying to gear up for this world of AI… trying to understand what technical skills they need and what they would need to put into the process to enable their job to be easier’. Policymakers too, need to understand this world to make better decisions within it. But literacy is for everyone: ‘at every level of society, and into our schools—about the new reality, what data is, what its inherent properties are, how it can work, the various prescriptions you need to put around it, the sensitivities and the insights you can get’. This broad data literacy will also, as our strategy notes, be vital for government, industry and others communicating the advantages brought by new data-driven technologies, as well as any risks.

The power of politics

Although technological change, and its advantages and risks, can feel inevitable and predetermined, there is a role for politics in shaping it to benefit society. For example, Sir Nigel noted that there was some fatalism, a ‘counsel of despair’ about ‘black box’ algorithms whose inner workings will not be revealed to the world. He said:

We absolutely have to demand the process behind these systems and their outputs be made ‘more transparent and more explicable. There is an active area of research in machine learning and deep neural networks that is seeking to build explainable AI systems, and what we should not do is imagine, “we’ve got what we’ve got, we will have to put up with it.”

Sir Nigel Shadbolt

Making such workings transparent also does not mean ‘that you have to throw your secret IP components over the fence’. There might be ways of giving a functional characterisation of how the system is working or performing against a set of benchmarks instead.

AI is sometimes characterised as being a good servant, but a poor master, and we should remember that we humans including politicians and policymakers, leaders in industry and civil society, as well as individuals and communities, are in a position to make technology serve us. As well as guarding against risks, this provides real opportunities for better lives. As Sir Nigel told the select committee:

I think the proposition you want to drive with the things you put in place - your public services and the way these machines evolve their algorithms - to be at our service and support human flourishing in an age of AI. That is a phrase I like to use, “human flourishing” [it] is about values. It is about trying to be quite careful to keep revisiting to what extent those values of transparency, equity and access are being maintained.

Sir Nigel Shadbolt

About us

Our five year plan

What we do

Solid

Membership

AI excitement is everywhere, but we need to talk about data

The importance of openness

Data literacy

The power of politics