When there are multiple datasets which cover similar domains ( for example airports), its not obvious which dataset is preferable to use. It depends on your requirement, but one problem is that much of the information on which the decision might be based is not available or easily available such as:

geographic coverage
temporal coverage
data model
linkage to other datasets

I quess an overarching criteria is veracity over the expected lifespan of the project, for which some of the above serve as surrogates.

What do folks really use and how do they assess these dataset properties?

asked 23 Jan '11, 11:45

kitwallace's gravatar image

accept rate: 13%

edited 23 Jan '11, 11:46

How about "convenience", at least in the sense that:

1) the dataset covers the literals or ranges you need; 2) the dataset is available in a format that you can easily work with; (for a novice such as myself, unicode encodings give me all sorts of problems!); 3) the dataset appears to be relatively friction free in terms of licensing and/or payment for use; 4) the provenance seems okay, and where appropriate the data appears to be maintained?

I guess directories such as CKAN could help in this area, for example by supporting trust/reputation metrics?


answered 23 Jan '11, 14:32

psychemedia's gravatar image

psychemedia ♦♦
accept rate: 11%

Just come across this blog item by Stefan Urbanek on Data Quality which presents a useful discussion of criteria, although not without some terminological difficulties



answered 26 Jan '11, 15:50

kitwallace's gravatar image

accept rate: 13%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported



Asked: 23 Jan '11, 11:45

Seen: 990 times

Last updated: 12 Jul, 08:09

powered by OSQA