To be used and shared effectively, Covid-19 models should be well documented and published under an open licence.
This guide summarises the process in four simple steps.
- For publishing Covid-19 data, please see our guide: Publishing open data in times of crisis.
Models and forecasts have been central to governments’ responses to the Covid-19 pandemic. These models are used to study characteristics of the disease, forecast possible future transmission among populations, and understand how different interventions may affect the spread of the disease. Broader models might include impacts on our health system, the wider economy and other elements of our lives. The models do not provide precise predictions of what will happen but are projections of how our actions may affect a range of possible futures.
Given their importance to public health, and to help ensure as wide a use as possible, Covid-19 models should be well documented and published under an open licence. This allows anyone – from scientists to health bodies to the media – to better review, discuss, test, adapt and even improve upon published work. Additionally, when governments are transparent about the science that underpins their health policies it helps build public understanding and trust.
There’s more than one way to publish a pandemic model
There are many ways to publish Covid-19 models. You could release the theory, mathematics and results of a model as a peer-reviewed science paper or report. Or you could write a high-level explainer as a blogpost. You could publish the source code of the model for others to inspect and run. Or publish the projections from a model as machine-readable data. You could publish an interactive visualisation that allows people to change parameters and see results visually. You could even use some combination of all of these.
By opening the research, code and data behind a model, it allows others to learn, test and adapt your model. It also enables others to use their expertise and resources to present your work in new and innovative ways which may not have been possible otherwise.
We present some high-level guidance on what to consider when openly publishing a model and some tools to help you publish it easily. We are not prescribing how to develop models themselves, but are offering guidance on how to share and publish them effectively using open data principles.
Who will use the model and for what purpose?
Like any data or information you share, it’s important to carefully consider the audience and their needs. Different people have different backgrounds and motivations for examining or using a model. It is important to consider how you can address these needs.
- Scientists may want to study and validate your model or reuse and adapt it for their own research
- Governments may base their policies and actions on the results of your model and might be interested in using it to test different scenarios
- Journalists may want to investigate your model, particularly if it informs government policy. They will aim to explain your work to the broader public.
- The public may want to find out more about your model
For each of these audiences, you should consider how they might explore or use your model and its results. Also, given it would be impossible to address every audience, you may need to decide which audience to prioritise. Think about their specific goals and what model information they need to achieve those goals.
Scientists wishing to validate models may need the specificity of a peer-reviewed research paper, while interested members of the public may be happy with a high-level overview. Publishing in multiple formats – papers, blogs, interactive visualisations – can help reach multiple audiences. It can also improve the relationships and communication between these audiences.
Four steps to sharing models under an open licence
1. Document it
Good supporting model documentation can answer questions such as:
- What is a high-level summary of this model?
- Who created this model? What organisations or institutions are they affiliated with? What were their roles in the model creation? How can the model creator/s be contacted?
- What are the assumptions and theory behind this model? Is there a referenced explanation or research paper?
- What are the intended uses of this model? Who should use this model? What does good use of this model look like?
- What version of the model is this? Are previous versions available? How has the model been updated?
- Is a software implementation of the model openly available? If so, where?
- What steps are needed to run and test the software? How much computing power is needed to run it?
- Does the software use source data or generate output data? Where can this dataset be found? Does it follow FAIR data principles?
2. Publish it
Given that models can be a mixture of text, software, mathematics and visualisations there are many formats for publishing.
- Publishing in a peer-reviewed journal is the best way to share quality science. Please aim to publish your paper in an open-access journal. If waiting for publication, the preprint could be released on arxiv.org, medRxiv, or bioRxiv. The Public Library Of Science has created an open-access collection of its COVID-19 content and is also maintaining a registry of scientific outputs from other providers.
- Data and other supporting materials from your research should also be published. Figshare is an open-access repository that allows researchers to share their figures, datasets, images, videos and other materials. DataCite is an organisation that provides Digital Object Identifiers (DOIs) – a type of permanent online identifier – for research data and other research outputs.
- Dataset metadata should be marked up following a standard like schema.org’s Dataset markup or W3C’s Data Catalog Vocabulary (DCAT) format. Google provides guidelines for describing datasets to make them more findable.
- If you have code, you can publish it and work with others to improve it using a code repository such as GitHub. For more interactivity, you can also publish it as a Jupyter Notebook or in R Markdown.
- There are many free-to-use visualisation tools available. For R, there is the Shiny Package (example model), Dash for Python, Svelte for Node (example model) and Casual which integrates with Google Sheets (see example model using Casual). Or Google DataStudio. Or just keep it simple and build your model in an online spreadsheet (see example model using Google Sheets).
3. Add an open licence
Without a licence, others cannot lawfully reuse your model in their own work. When you develop a model you have certain rights over it. For others to use yours, they must seek permission from you. This permission is best given explicitly in the form of a licence.
There are several off-the-shelf licences available, such as the Open Government Licence, Open Database License (ODbL) and the Creative Commons licences.
For the source code of your model, we recommend the MIT License. This allows others to reuse your code in their projects as long as they include the MIT License.
If releasing figures, equations, tutorials or other written supporting material, we recommend the Creative Commons Attribution licence. This licence allows the user to share the material in any medium or format, and adapt and build on the material for any purpose, even commercially. Research published in open access journals is published under open licence.
To publish model data, platforms such as Octopub.io can help you publish data using an open licence.
4. Tell people about it
The more your model is talked about and shared, the more people will see it. This may involve blogging about it, listing it in external data collections or sharing on social media.
Community resources we recommend:
- The Coronavirus Tech Handbook: Newspeaks House’s crowdsourced library of tools, services and resources relating to COVID-19 response. It includes a page dedicated to modelling & forecasting, information on funding for modelling and a WhatsApp chat for Covid-19 modelling.
- #Data4Covid19: GovLab’s living repository to build a responsible infrastructure for data-driven pandemic response. It’s mostly data-focused but does include some references to modelling efforts worldwide.
- #opendatasaveslives: A community established by ODI Leeds to help gather useful resources, create things openly, and enable others to engage with data about the crisis.
- ResearchGate provides a forum for the Covid-19 research community, which keeps an index of published research, allows discussion between researchers and provides links to other resources.
Sharing Covid-19 models: examples we like
Many Covid-19 models and forecasts have been published. Here we have been able to highlight just a few. We are not making a judgement on the scientific validity of these models, rather their quality in terms of open data.
- The MRC Centre for Global Infectious Disease Analysis at Imperial College has published the code for its model on Github. Initially only available as a research paper, this code now allows anyone (with the skills and computing power) to run, analyse and adapt this model. Accompanying the code is much documentation including an overview, a glossary and explanation of the inputs and outputs.
- Harvard mathematical biologist Alison Hill created a tool for Covid-19 modeling that includes full access to the available code and a very well-designed interactive tool. It is released under a Creative Commons Attribution-ShareAlike 4.0 licence.
- Researchers at the University of Basel created a Covid-19 Scenarios app. This tool ‘uses a mathematical model to simulate a variety of COVID-19 outcomes based on user-defined parameters’. What’s especially notable here is that the researchers embraced the open-source community and accepted contributions from over 60 people in creating the application.
Get in touch
Like many others during this pandemic, we’re still learning as we go and always looking to improve our knowledge. If you know of advice, best practices or standards in documenting and sharing models for Covid-19 which we may have missed, please do get in touch.
Further reading
- How you can help with COVID-19 modelling – Julia R. Gog – Nature Reviews Physics
- What are COVID-19 Models Modeling? – Jimi Adams
- Sharing models between ‘digital twins’ – ODI
- Publishers Guide to Open Data Licensing – ODI
- Marking up your data with DCAT – ODI
- Making source code open and reusable – gov.uk
- Licensing a repository – GitHub
- Software and Open Source Licensing – MIT
About
This guide has been produced by the Open Data Institute, and published in April 2020. Its lead author is Fionntán O’Donnell with contributions from with contributions from Olivier Thereaux, Jeni Tennison and David Tarrant
This guide is published under the Creative Commons Attribution-ShareAlike 4.0 International licence.