Assuming you can already verbalise the data you need for your application/service/project, then there are some questions you might want to pose about the data and how you intend to work with it.
Tom and Ulrich bundled these questions together into a data consumer’s checklist, which was the focus of several presentations they gave at a series of TSB workshops. The checklist is reproduced below:
- is the data already available? If so, where?
- how can you access it? dumps? API?
- in what format is the data published? CSV? JSON? PDF?!
Ownership and licensing
- who publishes the data?
- are they the originator of the data?
- under what licence is the data published?
- is it personal data?
- how has the data been processed?
- is it in raw or summary form?
- how will its form (e.g. granularity) affect your analysis/product/application?
- what syntactic and semantic transformations will you need to make?
- is this compatible with other data sets you have?
- how current is the data?
- how regularly is it updated?
- do you understand all the fields and their context?
- for how long will it be published? what is the commitment by the publisher?
- what do you know about the accuracy of the data?
- how are missing data handled?
- how is the data set documented?
- is there a place you can report errors in the data?
- does the meta-data make sense?
- does the publisher offer support in any way?