Data Decade - Crimson

Data Decade: Saving lives with data

Thu Nov 3, 2022
$download_content = get_field('download_content');

As part of the Data Decade, we are exploring how data surrounds and shapes our world through 10 stories from different data perspectives. The seventh, Saving lives with data, explores how better data sharing and transparency in the health sector can drive innovation and save lives

Find out more about the Data Decade

By Professor Ben Goldacre, Director of the Bennett Institute for Applied Data Science, and Lisa Allen, Director of Data and Services at the ODI

As part of the Data Decade, at the Open Data Institute (ODI), we are exploring how data surrounds and shapes our world through 10 stories from different data perspectives. The seventh, Saving lives with data, explores how better data sharing and transparency in the health sector can drive innovation and save lives.

Listen to the podcast now

Listen to “Saving Lives With Data” on Spreaker.

Listen on Apple Podcasts or Spotify

You can also listen and subscribe to the ODI’s podcast on your preferred platform: Apple Podcasts | Spotify

Emma Thwaites: Hello and welcome to Data Decade, the podcast from the ODI. I’m Emma Thwaites, and in this series we look at the last 10 years of data, the next decade ahead, and the transformational possibilities for data in the future. So far, we’ve looked at a range of topics from data in arts and culture, and how data is shaping our cities to trust and misinformation, and the relationship between data and technology.


But today we’re looking at how data can help save lives. The recent pandemic has shown how the effective use of data can help to fight diseases on a global scale, enabling clinicians, governments, epidemiologists, and pharmaceutical companies to track emerging variants and design strategies, vaccines and medicines more effectively.


The collection and use of data in health research and clinical care has developed significantly in the last 10 years and holds enormous promise for the decade ahead. But what are the risks and the challenges of doing so? Can better data sharing, innovation and transparency in the health sector save lives?


Let’s find out more. Welcome to Data Decade. 


Emma Thwaites: So this is going to be a great episode. Over the next 30 minutes, we’re going to look at how data can help to save lives. Joining me to talk about this are Ben Goldacre, Director of the Bennett Institute for Applied Data Science. Hello, Ben. 


Ben Goldacre: Hey. Hello.


Emma Thwaites: And Lisa Allen, the Director of Data and Services at the ODI. Hello, Lisa. 


​​Lisa Allen: Hello. 


Emma Thwaites: It’s great to have you with us. So Ben, I’m gonna come to you first. You are a doctor, an academic and broadcaster, and a lot of the work that you do at the Bennett Institute depends on open data. Can you tell us a bit more about that? 


Ben Goldacre: Yeah. So we are quite unusual in the academic world in that we are a truly mixed team, first of all, of software developers working very closely alongside traditional academic researchers and clinicians.


And that means that we’ve got software developers who know about how electronic health records work, what epidemiological research looks like. And they can be  a core part of the creative intellectual team developing services and tools as well as research papers. And we’ve also got researchers who know how to do a pool request on GitHub. They know how to be non-infuriating colleagues and collaborators to software developers. 


And that’s really critical because really, if you’re doing anything with data, you’re doing it by writing code. And if you’re doing that, you want to have people who are professionals at writing code. That means software developers. And then the second thing that’s a bit different about us is that we produce academic research papers and we get them in the glamorous journals like Nature and BMJ and Lancet and all of the rest. But we’re also interested in taking large datasets and turning them into tools and services to directly improve healthcare or to help other researchers do their research. 


And we started off with open data, in particular because we could see some open opportunities to do good with datasets that were just sort of sitting around waiting to be exploited. But also, secondly, open data allowed us to start innovating for this reasonably new way of working without having to get slowed down with all of the governance that you have to have around very disclosive patient data in healthcare.


Emma Thwaites: Now that’s the really interesting point because I think a lot of the stuff that we hear about health data is related to this issue of personal data and the very sensitive matter of, you know, disclosure.


I’m gonna come to you Lisa now because I want to unpick some of that stuff around trust and sharing data and how challenging that can be when it comes to the health sector in particular. What’s your take on all of that?


Lisa Allen: Yeah, sharing data is one of those areas where actually you can get really big benefits. So like if you look at it in the UK, it’s estimated that you could generate more social and economic benefits. It’s between 1 and 2.5% of GDP actually from the public sector and private sector combined. And if you scale that to the 20 largest economies, and we did some research that looked at that in 2019, they estimated that data sharing could unlock between 700 billion and 1.75 trillion in value globally.


But that unlocking really relies on trustworthy data that’s flowing in well-governed ways around the ecosystem. So it’s drilling into what some of those things you were just talking about there, about that trust. So we’re doing some work on data assurance and we want to explore what does trust look like.


So far we’ve been looking at the organisation itself and that looks at the data practices, making sure it’s well-governed, they’re doing things they say they should be doing, you know, all those things that build up the trust in an organisation. And then it’s actually looking at the data itself.


So we are gonna be going out this Autumn actually, and doing some more user research to build and identify those areas that really people care about to make sure that we can build up this trust to overcome those barriers to data sharing, which often get in the way of the brilliant outcomes that you can deliver.


Emma Thwaites: Because you do have to have the confidence of the public, of individuals, around this, don’t you?


I mean, people still remember the abandoned plan for the NHS patient records system, and you know, of course, as we’ve said, patient data, particularly knotty issue. Ben, how can we persuade patients that their data is safe?


Ben Goldacre: Well, I think the right way to get public trust is to earn it by putting in appropriate safety measures to prove to the public that you’re protecting their data.


And I think what’s gone wrong in the past, and we talk about this a lot in the Goldacre Review that I did for Secretary of State on how to get better, broader, safer use of health data – available online in all good internet browsers – in that we talked about how there has been an excessive reliance in the past on, first of all, pseudonymization of data.


So the idea has always been you take people’s names and addresses off data, and that is somehow a meaningful protection of patient’s privacy. Now, anybody who works in security research knows that that’s simply not the case. It’s, it’s often very straightforward to reidentify somebody in a detailed health data set, even when their name and address is taken off, with information that you already know about them.


For example, if you know when they had their children, when they moved from the London area to the Oxford area, a commonly used example is with Tony Blair, if you know that he had his abnormal heart rhythm reset in, I think, March of his second year as Prime Minister, and then again in October of his second year as Prime Minister. That was in all of the national newspapers. 


If you search for a man of his age who had that particular procedure in those two particular one week windows in the London area in that year, then you’ll probably get a unique match for Tony Blair’s health record, and you’ll be able to see everything else in his health record.


So in the past, first of all, people have overstated the benefits from pseudonymization in terms of preserving patient’s privacy. But at the same time, people working in the system doing information governance, have known that taking names and addresses off and then disseminating data out to multiple different locations is not a sufficient privacy mitigation.


And so because of that, we’ve had to have very slow and laborious administrative and governance processes. To add on to that, a layer of essentially trustworthiness in individuals. So you take many, many years often to evaluate whether somebody is trustworthy enough to have the data, and then you get them to sign contracts – which some people would have faith in, some people have concerns about – where they promise that they’ll look after the data very carefully. 


Now that approach to managing privacy has two problems. First of all, it’s not an absolute protection of privacy. And secondly, that second step of governance, to manage the privacy problems with pseudonymization and data dissemination, that information governance palaver is absolutely soul destroying for researchers and for analysts in the NHS community and for innovators in commercial environments. We were told during the review of multiple occasions where people would wait years to get access to data, where whole PhDs were derailed because the money ran out before they got their single cut of a dataset that they needed to do that project.


Now the answer to this is not to say, “Come on, hurry up, just become more liberal about where you send the data.” The answer is instead to build in appropriate technical mitigations to protect patient privacy. And the most commonly used of these is to hold the data in a trusted research environment. 


So instead of sending the data out to hundreds or thousands of different people’s own computers where they can work on it in their own environment, but without clear oversight. Instead the detailed granular health data stays put in one secure environment and the analysts come into that environment. They work inside that secure setting where there are additional privacy safeguards, but also logs and audits to make sure that everybody can see everything that’s happening with the data.


And we know from citizens juries, and we also know from the extent to which privacy campaigners and researchers are reassured by these moves. We know that these moves, that using trusted research environments, earns public trust. And so in my review for Secretary of State, we recommend that these TREs are used much more widely, that they become the default for any very large disclosive datasets like for example, putting everybody’s GP records in one place. 


And that’s been, I’m very pleased to say, adopted in the NHS 2022 data strategy, which is called Data Saves Lives. So I think we’re now at the beginning of a great new era where we’ll be able to get many more people working with this kind of disclosive data in much more safe ways.


And when you’re working in safer ways, that means that you can hugely expand the number of people who are working with data. And that means you get much more innovative work. So lastly, the benefits of trusted research environments go beyond just managing privacy risks around pseudonymized data, because trusted research environments are also much better for efficiency and modern open collaborative approaches to working with large datasets.


So the current way of working where you send slightly different copies of the same data out to hundreds or thousands of different locations is very, very duplicative. You duplicate the risks. You duplicate all of the governance work you have to do. You duplicate all of the data management costs. You also duplicate all of the work around data curation because people don’t work on the raw health data in their final graphs or their final statistical regressions.


They boil down the very detailed data about every prescription, every diagnostic code, every referral letter. They boil that down into higher level variables like “this is a child with asthma.” And that’s about 80% of the work in any project using NHS electronic health records. And when everybody works with their own copy of the data on their own machine, everybody does that data curation in a real hotchpotch of different ways.


When people are working in shared environments, then it’s much easier for them to take a shared and collaborative approach to those common tasks. So, trusted research environments are therefore also much, much better for efficient and reproducible ways of working with data based on, ideally, open code.


Emma Thwaites: That’s really fascinating. And actually I was gonna come to Lisa at this point because I guess we could say that a trusted research environment is like one species of the thing that we at the Open Data Institute might call a data institution. Lisa, what are your thoughts around what Ben’s just told us about?


Lisa Allen: Yeah, well I definitely agree. I think, you know, the days have gone where about the inefficient moving data, you know, lifting and shifting it and copying it to different environments. It’s just not workable anymore, especially with the volume of data. So I think that secure access is key that Ben was pointing out in there.


So, you know, a data institution is one model where people can gather around a common cause. But I think that also means we need data architects to design this secure storage and that trusted sharing across organisations. It also means we need a level of maturity for those organisations taking part with the data to make sure they’ve got all their controls and governance in place, to make sure they understand that,.


And the other point I think it’s really good to pick up is about the open code and open data. Cause I think sometimes – and in your report, didn’t you, you highlighted Ben, that open code is different from open data.


Ben Goldacre: Yep.


Lisa Allen: And yeah, it’s reasonable for NHS and governments to do some analysis discreetly without showing all the results in real time. And I think people don’t really see or understand that data exists on a spectrum. 


So at the ODI we’ve obviously got the ODI Data Spectrum showing, you know, from closed to shared to open. And data can sit at different points on there depending on what’s in the data. And obviously with health, for that personal data, it means it is more likely to be on that side of shared or closed depending what’s in there.


So it does put limitations which need to be understood when we are designing these environments.


Emma Thwaites: I just wanna stick with the Goldacre Review for just one more second, because something that you mentioned in your last answer, Ben, really struck a chord with me about how these trusted research environments kind of engender patient or citizen trust in the way that their information about them is being handled. 


You also, one of your recommendations in the review was for a frank public conversation about the commercial use of NHS data for innovation. And I just want to talk about for a second, about that kind of, you know, the role that the public plays and how we have more of an open dialogue with people about the benefits, I guess, of using patient data to improve health outcomes.


Ben Goldacre: So I think trusted research environments are absolutely key here, principally because they allow us to separate out two very different issues. One is privacy, and the other is the more broad ethical or political or cultural question of are you okay with commercial entities turning a profit from access to NHS patient data? 


So, at the moment where the offer tends to be, “Would you like this large company to have access to your data?”, the response tends to be, “Well, they’re selling your health data and it’s dreadful ‘cause commercial companies are benefiting and they can see all of your medical secrets.” 


Now, those are two very separate issues and they can be disaggregated by using trusted research environments. If you can give commercial companies access to data in a trusted environment where you can give absolutely rock solid proof to patients that their privacy, that their medical records are being kept absolutely confidential, then you are left with a completely separate conversation: “What do you think about commercial innovation using NHS data?”


Now, personally, I’m not entirely sure what I think, but I think I think that it’s quite a good thing. I’m actually quite cautious, and I have written very critically about bad behaviour by some parts of the commercial sector with data, but overall, there is no medicine without medicines.


You need commercial companies who develop new treatments to come in and innovate using data. I’m also very sceptical about making ethical adjudications about whether a particular research project or analysis is right or wrong on the basis of the characteristics of the person asking, rather than the characteristics of the research question or the analysis itself.


But most importantly, I don’t think it should be my choice. It’s not my decision. This is a broader public decision, and I think that the state – and it is the state in its sort of great amorphous whole, rather than the NHS who should be making these choices. I think the state has shied away from having these conversations because the ethical and political questions around commercial access to data have always previously been lashed onto the privacy problems around handing out access to data.


And I think with trusted research environments, we can move forward from that. Although, you know, there’s a big caveat, it shouldn’t be my choice. I am not the most interesting person in the room on this topic. But personally I think probably it should go that government gives access to health records on the basis that the state gets a cut of any benefits that come from secondary uses of that health data in a commercial setting.


And I think that’s important because otherwise it’s kind of invisible, the background work that went into collecting this data. If you look at GP data, for example, in the UK we have an unparalleled resource.


I mean, GP data contains a trace about every single clinical activity and outcome, pretty much, of any significance in every citizen’s lives, going back many decades, every prescription, every diagnosis, every referral letter, every blood test and its result, every imaging request that comes from primary care. It is a phenomenal resource.


It is also globally unparalleled because although other countries sometimes have similarly detailed data, it’s really only the UK that has got data on such a large and ethnically-diverse population. So it’s another interesting example really of where our diversity in the UK is an enormous strength. Now, because of that, our health data should be regarded as not so much an asset as a resource of just profound importance to the whole of humanity, right? 


This contains signals and secrets yet to be unlocked that could save lives on a biblical scale around the world. And so I think it is our duty to find ways to make that more accessible to people. But that data didn’t arrive for free. That data was collected by millions of healthcare professionals all typing away into dreadful terminals and outdated Windows XP machines on terrible office furniture in GP practices and hospitals around the country. But it also comes from patients sharing their secrets, sharing their most confidential medical information. 


And the cost of producing that data is not the marginal cost of putting it on a USB stick to give it to somebody, or even the marginal cost of provisioning a trusted research environment for somebody to access it.


The cost of sharing that data, of making it accessible is, is to a greater or lesser extent, the cost of collecting it in the first place, and also the shared cost of managing it, curating it, and understanding it. And so we should be taking a cut back of that, but we should also, I think, be making it available for innovation.


Emma Thwaites: I absolutely love that idea that, you know, that we have the potential within these datasets to discover new medicines, to discover new public health interventions that could have, you know, an impact on a biblical scale as you described it. And I wonder what role proving impact has in the public dialogue, in the discourse around data sharing? 


So of course we saw during the pandemic how sharing data on vaccines, for example, affects decisions made by healthcare providers, policy makers, and the public. And I know you have a tool called Trials Tracker, and I’d quite like to move on to that and why is it so important, for example, that the outcomes of trials are reported? Will you say something about that?


Ben Goldacre: Yeah, so I think this is another interesting example, first of all, of how if you want public trust, you need to earn it through concrete action rather than asserting that you are trustworthy. And actually, just to close off the last part of the discussion, I think one of the things that has gone wrong in public discussions around patient data is that people have relied too much on thinking, “Oh, we just need to tell patients how brilliant the work that we do with their data is.” And I think overall, people appreciate that we do great work with data.


The thing that’s missing is a good description of the privacy mitigations. Anyway, Trials Tracker is a project that we’ve built over the past, I guess, half decade. It started actually with the AllTrials campaign, which began in 2012, 2013.


And this was to address a problem that wasn’t really sufficiently being talked about. So, in medicine we do enormous clinical trials at tremendous financial expense. And we do these big randomised trials comparing one treatment against another in order to find out which treatments work best in which patients and which treatments have adverse effects in which patients. 


They are the gold standard. They are the most fair test of which treatment works best. But there’s a problem, which is that at the end of the process of having run that clinical trial, it’s actually very common to find that the results of clinical trials are left unreported. Now, the problem with that is compounded by the fact that overall, when you look at the research, first of all, what you find – and this is as things stood in, you know, 2000, 2005, 2010 – around half of all the trials that are conducted and completed, when you go and look for results, you can’t find them.


And furthermore, it was very clear that results that are more flattering to the sponsor’s preferences – and that could be commercial sponsor, but it may also be, you know, a particular surgical intervention that a surgeon or a community of surgeons is keen on, or a talking treatment in depression that a community of therapists is particularly keen on. Results that are regarded as positive, are much more likely to be disseminated than results that are regarded as negative. 


So that means that after you invest all of that cash up front to eradicate bias from the literature, to get a good, fair test of what works best, you allow all those biases to flood back in at the final furlong, the final part of the process where the results are actually disseminated.


Now this was very well recognised going back to, you know, first talked about in the 1980s and then in particular the 1990s, and people began to set up things called clinical trial registries. And the idea here was one that will be very familiar to followers of Open Data Institute-type work. The idea was this was sunlight to sort of drive transparency.


So everybody who does a trial has to register it on a public registry. Everybody can see who’s doing it, what they’re doing, who the patients are, what treatment A looks like, what treatment B looks like, what are the outcomes they’re gonna measure at what time points, and so on. And everybody hoped, I think perhaps in retrospect naively, that that would be enough. Because people would know which clinical trials had happened. Then they would know which trials had or hadn’t reported their results.


And that would be sufficient public or professional pressure to address the problem of clinical trial non-reporting. But that didn’t come to pass. And so first of all, with the AllTrials campaign in 2013, we started basically sort of civic activism, but in a rather technical space, calling people to sign up to three premises.


One is all trials must be registered, because we know that registrations still isn’t always done. Secondly, that the summary results should be reported within 12 months of completion. And thirdly, that if a clinical study report, which is a very detailed technical document, was created for commercial purposes on the trial, then that should be made publicly available.


Now, that was a big collaborative project across me, the British Medical Journal, Sense About Science, who are a fantastic science campaigning outfit. And then also the Cochrane Collaboration, who are a global collaboration of around 13,000 academics producing systematic reviews, so gold standard summaries of all the clinical trial results on a given topic. So a community doing good work who were also blocked by non-reporting of clinical trial results. 


And very, very quickly it snowballed. So it got to the point where we have 90,000 individuals signed up to support those three premises.


And I think last time I looked, maybe seven or 800 individual organisations including MRC, the German government’s cost effectiveness adjudicator, drug companies like GSK signed up very early on, which was really fantastic to see. Lots of commercial entities, public health bodies, royal colleges, and all of the rest.


So that’s nice and that brings legitimacy and it helps push things forward. But we wanted to do something driven by data. And in particular we wanted to do essentially audit and feedback, but we wanted to do audit and feedback that wasn’t just – I’m not personally a big fan of naming and shaming. I think audit and feedback should be about giving people informative and actionable information that they can use to improve their performance.


So in the past, people would often do these big audits where it was a one-off process. It was all done very manually. So you’d take a list of trials you know had finished, and then some poor research assistant would go off and manually look in all the indexes to try and find the results. These things would get published with a bit of a flourish.


Normally, they wouldn’t identify which institutions or companies were particularly good or bad. But where they did, because it was a one-off project, it was a static ranking and they’d get a bit of news coverage for that static ranking that says that one particular company or one particular university is doing very badly.


But the problem with those one-off static rankings is that the mindset of the institution being criticised when presented with that kind of press release is, “Okay, we’ve got to rubbish this. We’ve gotta just survive it, get through the day. We’ve got to just hope that everybody will forget it and then they’ll move on.”


What we wanted to do was create a live audit and feedback system, driven by automated processes to look at all the trials that have finished and then look at their results and match the results onto the completed trials and produce metrics that could update on a daily basis or at least a monthly basis.


And the idea there was that if a company or an organisation like a university gets metrics saying, “You are not doing very well on this,” if they want to fight that off, if they want to mend their reputation, then the quickest and easiest way for them to do that is by reporting their clinical trial results.


So that’s exactly what we built with the Trials Tracker service. And there are two big projects that you can still see running today. One is at, and that tracks all clinical trials since 2000-and-something that were conducted in the European Union. And then there’s the And that looks at every clinical trial that completed over the last five years since the American government brought in some very robust legislation. And those results update once a month for the EU trials tracker, once a day or once every working day for the FDA.


And it’s been a tremendous success in the sense that, first of all, I think people often naively think you do that sort of thing and drug companies will sort of, you know, come and try and slip a bomb under your car or something.


Actually what happens is their compliance teams get in touch and they say “Thanks a lot, this is really helpful. We’re gonna go and make sure that we are doing brilliantly.” They come at you and they say, “Hey, there’s a bunch of trials where we think we’ve reported them, and actually the clinical trials registries themselves aren’t updating their data properly. Can you help?” And that’s really great because you want the source data to be accurate too, so you join forces with people from global drug companies’ compliance teams to go and have a crack at trying to get the EU clinical trial registry people to change their processes around updating data. 


But you also start getting emails from universities saying, “Hey look, this is really helpful in particular because I can take this to the dean of my medical school and I can say, look, ‘you’ve tasked me with getting these clinical trials reported here on this website. They’re showing that we’ve got five trials outstanding from four years ago.’” And on our trial tracker we are careful to give people feedback about which individual clinical trials are overdue, which individual clinical trials need to be reported. So that’s actionable information cause they can go and find the principal investigators on that trial and say, “Look, you are making our university look bad. Please can you help us report?” 


And then the last thing that they like is we can give them a heads up. We can show them the list of all of the trials. They can see it for themselves on the open internet. They can see all of the trials that are just about to breach so they can do anticipatory work. So that’s been a great success.


So then the next thing that happened, actually incredibly heartwarming. Science and Tech Select Committee said they were gonna do a big project looking at clinical trial reporting. They took our data and they sent an email out to every university in the country, the ones that were doing well, and the ones that were doing badly.


And they said, “Work from the Data Lab” – as we were called then before we were The Bennet Institute – “shows that you are doing well/badly at reporting all your clinical trials. Please be advised that in six months time we’ll be looking at this again. We’ll be inviting all of the people who have done the best at improving their clinical trial reporting and those who have not improved their clinical trial reporting. We’ll be inviting you to give live oral evidence about your performance on these metrics.” 


Now, that’s a sort of mixed thing for someone like me. At the time, I didn’t have a permanent university post. A friend of mine who was on Medical Schools Council sent me a text the next day after these emails went out saying, “I was at Medical Schools Council yesterday. Your name was mud.” 


But, what happened over the next year is that British universities’ clinical trial reporting went from being pretty average in the European context – just over half of all clinical trials reported in compliance with the EU requirements – went up to to top eighties, and now I think just over 90%.


Emma Thwaites: Wow. 


Ben Goldacre: So British universities are head and shoulders ahead of every single other country in Europe at reporting their clinical trials appropriately. Now, I think that’s incredibly important because we’ve already talked about our data and how our data can be used. And in particular, we could use our data to do clinical trials much more efficiently.


But clinical trials, again, are a trust business. And if you are a destination where, number one, it’s possible to do trials cheaply. And number two, everybody knows that they can trust the results of clinical trials that come out of the UK because you know that they have the best score by a country mile on the core transparency and trust metric of reporting your clinical trials.


Then I think that’s valuable, not just for people who want to be sanctimonious about good open science. It’s also a core transparency metric, a core trust metric, and therefore a core part of our offer as UK PLC.


Emma Thwaites: I don’t think I’ve ever heard a better case study to demonstrate the transformational power of data than that one.


So thank you for that, Ben. Fascinating. So we’ve looked back over the last 10 years, and beyond actually. But I always ask our guests a final question at the end of the podcast, and I’m really looking forward to the answers to this one. So I’m going to come to you, Lisa Allen, first. Can you gaze into my crystal ball and ask what the future holds for data and health?


Lisa Allen: So for me, I think it’s got to be much more about that community engagement and of data about you. So I think it’s about people really understanding what the data is, who it’s being used by, so they can give that informed choice in how and what the data is used, and that includes some of the stuff that Ben spoke about, about those business models and that transparency. So that’s the first one. 


And the second one is more of a personal one. So my son is a Type 1 diabetic and he has an amazing insulin pump and a continuous glucose monitor that has got some brilliant data. But actually when you look at the privacy policy on there, the data goes back to the company that provides it, and not to the NHS who’s paying for it.


So I’d really like to see that actually those contracts get better. So the value of that data is not just that service provider, but to the NHS and beyond.


Emma Thwaites: Fantastic. Thank you, Lisa. I hope so too. And Ben, you get the last word on this one. What are your hopes for the future of data and health? 


Ben Goldacre: Look, I think we’re at a pivotal moment for this country uniquely. People have been saying forever that British health data is unique in the world.


I think we’ve got some of the best single project researchers in the world, we’ve got some of the best raw data in the world. What we’ve always been missing is the glue between the two to move forwards from having a kind of ramshackle, happenstance, rather manual approach to each individual data project.


We need to move forward to taking a systematic structured approach to building platforms and processes that make a real production line out of working with data. I think if we crack that over the next five years, then that will be historic for this country. I think it’ll be a massive contribution to humanity, but I think it’s gonna be a hard journey because first of all, you need to have extremely clueful people in very senior roles. You need people with deep technical skills in senior and strategic roles to make good choices around that kind of data architecture and those kinds of processes and platforms. 


And secondly, it does require a degree of, essentially, spend to save. I mean, you’re right back there with railway tracks. You know, this is core basic infrastructure. And at the moment we are dragging carts of data through the mud and everything’s breaking and everything’s spelling out everywhere. We’ve gotta move forward to build proper infrastructure for data.


But if we do that, the prize is of incalculable value.


Emma Thwaites: Amazing. Ben Goldacre, Lisa Allen. Thank you so much for joining me on this episode of the podcast. 


Lisa Allen: You’re welcome. 


Ben Goldacre: Thanks all.


Emma Thwaites: That’s all from this episode of Data Decade, and some really fascinating insights as ever. Thanks again to our guests, Ben Goldacre and Lisa Allen.


And if you want to find out more about anything you’ve heard in this episode, head over to where we continue the conversation around the last 10 years of data and what the next decade has in store. Ben and Lisa will also be joining us at the ODI Summit, alongside many other brilliant speakers. So to hear from other leaders in data, head over to our website for tickets.


And if you’ve enjoyed the podcast, please do subscribe for updates.


I’m Emma Thwaites, and this has been Data Decade from the ODI.

The recent pandemic has shown how the effective use of data can help to fight diseases on a global scale, enabling clinicians, governments, epidemiologists and pharmaceutical companies to track emerging variants, and to design strategies, vaccines and medicines more effectively.

The collection and use of data in health research and clinical care has developed significantly in the last 10 years and holds enormous promise for the decade ahead, but what are the risks and challenges in doing so? And can data save lives?

Limitless opportunities

Open data offers multiple opportunities to do good in the healthcare space. Huge datasets are currently unused or underused – datasets which could be employed to create tools and services that directly improve the health of individuals and communities, or help other researchers work most effectively. Open data offers a chance to be progressive and proactive, innovating for new research- and data-scientific ways of working, without getting slowed down by the governance that necessarily surrounds disclosive patient data in healthcare.

“We could see some open opportunities to do good with datasets that were just sort of sitting around waiting to be exploited.”

– Ben Goldacre, Director of the Bennett Institute for Applied Data Science

If you combine the public and private sectors, Allen explains, it’s estimated that sharing healthcare data could generate social and economic benefits worth between 1% and 2.5% of GDP. When that’s scaled to the 20 largest economies in 2019, it’s estimated it could unlock between £700 billion and £1.75 trillion in value globally.

But unlocking those benefits relies on trustworthy data that’s flowing in well-governed ways around the ecosystem. There are barriers to data sharing, because of perceptions and misconceptions about how personal data might be used. Public trust is paramount. Earning that trust, says Goldacre, comes from both putting in appropriate safety measures to prove you are protecting data, and then telling people about it.

Past problems

On a systemic level, there are differing approaches to, and understanding of, how to adequately and securely treat large datasets.

“Lifting and shifting data, and copying it to different environments is just not workable anymore, especially with the volume of data.”

– Lisa Allen, Director of Data and Services, ODI

There has been a historical over-reliance on pseudonymization – the removal of names, addresses and other identifiers, and disseminating data to multiple different locations. This is unhelpful. In reality it is often straightforward to re-identify somebody, even in a detailed health dataset, using other information.

There is also a history of slow and laborious administrative and governance processes assessing whether an interested party is trustworthy enough to use the data. This is problematic, as it doesn’t offer absolute protection, and managing privacy can take so long that it derails projects and ruins opportunities for researchers before anyone can reap the benefits.

The solution is not to become more liberal about where you send data, but to build appropriate technical mitigations to protect patient privacy. A good example is trusted research environments.

Trusted data environments

A trusted research environment (TRE) keeps detailed granular health data in one secure place and the analysts work inside that setting, where additional privacy safeguards, logs and audits ensure transparency. They encourage efficiency and modern collaborative approaches to large datasets – avoiding duplication of risk, data management costs, data curation and governance.

These kinds of data institutions reassure privacy campaigners and researchers, and can earn public trust. The Goldacre Review recommended they become the default for any very large, disclosive datasets, and the NHS 2022 data strategy calls for making everybody’s GP records accessible through a TRE.

Working in safer ways increases access to data and prompts innovation and new approaches, but it needs data architects to design secure storage and trusted sharing across organisations, and new levels of maturity for those organisations accessing, using, sharing and storing data, to ensure controls and governance is in place.

Public trust

Healthcare and patient data, like all data, exists on a spectrum – from closed to shared to open, though this isn’t always fully understood, which puts limitations on designing these environments.

So how do we have more of an open dialogue with people about the benefits of using patient data to improve health outcomes? Especially in the UK, where the NHS means the data is unparalleled – not just because of the breadth of GP and hospital data, but also because of the diversity of its population.

Public trust has to be earned through concrete action and understandable examples. During the pandemic, we saw how sharing data on vaccines affected decisions made by healthcare providers, policymakers, and the public. But that’s not enough on its own. In the past, there has been an over-reliance on showing patients the brilliant work done with data, rather than a good description of the privacy mitigations.

Why is the UK’s health data so special?

“[The UK’s] health data… should be regarded as not so much an asset but a resource that contains signals and secrets yet to be unlocked that could save lives on a biblical scale around the world”

– Ben Goldacre, Director of the Bennett Institute for Applied Data Science

At the moment, Goldacre explains, we tend to ask patients, ‘Would you like this large company to have access to your data?’ which implies that companies glean commercial benefits and have access to all of someone’s medical secrets. These two separate issues can be disaggregated by using trusted research environments. If commercial companies access data in an environment where patients’ medical records are absolutely confidential, the conversation changes. It’s complex, but there are clear positives to companies using expansive, rather than restricted, datasets including developing new treatments.

He argues that as the UK’s data is so rich and unparalleled, it has profound importance to humanity, because of the potential incalculable benefits it could contain for the future of healthcare. We therefore have a duty to find ways of making that data more open and accessible to the world.

There is an appetite for openness, for example in clinical trials. Though historically it was not always the case, there is momentum for making the data on all trial results available and accessible – not just the data that supports hypotheses or is actioned into workable medical products, services and solutions. For example, the AllTrials campaign called for all clinical trials to be registered and reported, and clinical reports made publicly available. To date, 90,000 individuals and over 700 organisations, including governments, commercial entities and public health bodies have signed up.

“The prize is of incalculable value.”

– Ben Goldacre, Director of the Bennett Institute for Applied Data Science