Wanted: volunteers to join a team of “Data Curators” maintaining “core” datasets (like GDP or ISO-codes) in high-quality, easy-to-use and open form.
- What is the project about: Collecting and maintaining important and commonly-used (“core”) datasets in high-quality, standardized and easy-to-use form - in particular, as up-to-date, well-structured Data Packages.
The “Core Datasets” effort is part of the broader Frictionless Data initiative. - What would you be doing: identifying and locating core (public) datasets, cleaning and standardizing the data and making sure the results are kept up to date and easy to use
- Who can participate: anyone can contribute. Details on the skills needed are below.
- Get involved: read more below or jump straight to the sign-up section.
What is the Core Datasets effort?
Summary: Collect and maintain important and commonly-used (“core”) datasets in high-quality, reliable and easy-to-use form (as Data Packages).
Core = important and commonly-used datasets e.g. reference data (country codes) and indicators (inflation, GDP)
Curate = take existing data and provide it in high-quality, reliable, and easy-to-use form (standardized, structured, open)
- Full details: including slide-deck at http://data.okfn.org/roadmap/core-datasets.
- Live examples: You can find already packaged core datasets at http://data.okfn.org/data/ and in “raw” form on Github at https://github.com/datasets/
What Roles and Skills are Needed
We need a variety of roles from identifying new “core” datasets to packaging the data to performing quality control (checking metadata etc).
Core Skills - at least one of these skills will be needed:
- Data Wrangling Experience. Many of our source datasets are not complex (just an Excel file or similar) and can be “wrangled” in a Spreadsheet program. What we therefore recommend is at least one of:
- Experience with a Spreadsheet application such as Excel or (preferably) Google Docs including use of formulas and (desirably) macros (you should at least know how you could quickly convert a cell containing ‘2014’ to ‘2014-01-01’ across 1000 rows)
- Coding for data processing (especially scraping) in one or more of python, javascript, bash
- Data sleuthing - the ability to dig up data on the web (specific desirable skills: you know how to search by filetype in google, you know where the developer tools are in chrome or firefox, you know how to find the URL a form posts to)
Desirable Skills (the more the better!):
- Data vs Metadata: know difference between data and metadata
- Familiarity with Git (and Github)
- Familiarity with a command line (preferably bash)
- Know what JSON is
- Mac or Unix is your default operating system (will make access to relevant tools that much easier)
- Knowledge of Web APIs and/or HTML
- Use of curl or similar command line tool for accessing Web APIs or web pages
- Scraping using a command line tool or (even better) by coding yourself
- Know what a Data Package and a Tabular Data Package are
- Know what a text editor is (e.g. notepad, textmate, vim, emacs, …) and know how to use it (useful for both working with data and for editing Data Package metadata)
Get Involved - Sign Up Now!
We are looking for volunteer contributors to form a “curation team”.
- Time commitment: Members of the team commit to at least 8-16h per month (though this will be an average - if you are especially busy with other things one month and do less that is fine)
- Schedule: There is no schedule so you can contribute at any time that is good for you - evenings, weekeneds, lunch-times etc
- Location: all activity will be carried out online so you can be based anywhere in the world
- Skills: see above
To register your interest fill in the following form. Any questions, please get in touch directly.
Want to Dive Straight In?
Can’t wait to get started as a Data Curator? You can dive straight in and start packaging the already-selected (but not packaged) core datasets. Full instructions here: