Assuming you can already verbalise the data you need for your application/service/project, then there are some questions you might want to pose about the data and how you intend to work with it.

Tom and Ulrich bundled these questions together into a data consumer's checklist, which was the focus of several presentations they gave at a series of TSB workshops. The checklist is reproduced below:

Accessibility

  • is the data already available? If so, where?
  • how can you access it? dumps? API?
  • in what format is the data published? CSV? JSON? PDF?!

Ownership and licensing

  • who publishes the data?
  • are they the originator of the data?
  • under what licence is the data published?
  • is it personal data?

Form

  • how has the data been processed?
  • is it in raw or summary form?
  • how will its form (e.g. granularity) affect your analysis/product/application?
  • what syntactic and semantic transformations will you need to make?
  • is this compatible with other data sets you have?

Quality

  • how current is the data?
  • how regularly is it updated?
  • do you understand all the fields and their context?
  • for how long will it be published? what is the commitment by the publisher?
  • what do you know about the accuracy of the data?
  • how are missing data handled?

Support

  • how is the data set documented?
  • is there a place you can report errors in the data?
  • does the meta-data make sense?
  • does the publisher offer support in any way?