Labs newsletter: 12 December, 2013

DECEMBER 12, 2013

We’re back after taking a break last week with a bumper crop of updates. A few things have changed: Labs activities are now coordinated entirely through GitHub. Meanwhile, there’s been some updates around the Nomenklatura, Annotator, and Data Protocols projects and some new posts on the Labs blog.

Migration from Trello to GitHub

For some time now, Labs activities requiring coordination have been organized on Trello—but those days are now over. Labs has moved its organizational setup over to GitHub, coordinating actions and making plans by means of GitHub issues. This change comes as a big relief to the many Labs members who already use GitHub as their main platform for collaboration.

General Labs-related activities are now tracked on the Labs site’s issues, and activities around individual projects are managed (as before!) through those projects’ own issues.

New Bad Data

New examples of bad data continue to roll in—and we invite even more new submissions.

Bad datasets added since last newsletter include the UK’s Greater London Authority spend data (65+ files with 25+ different structures!), Nature Magazine’s supplementary data (an awful PDF jumble), and more.

Nomenklatura: new alpha

As we’ve previously noted, Labs member Friedrich Lindenberg has been thinking about producing “a fairly radical re-framing” of the Nomenklatura data reconciliation service.

Friedrich has now released an alpha version of a new release of Nomenklatura at nk-dev.pudo.org. The major changes with this alpha include:

  • A fully JavaScript-driven frontend
  • String matching now happens inside the PostgreSQL database
  • Better introductory text explaining what Nomenklatura does
  • “entity” and “alias” domain objects have been merged into “entity”

Friedrich is keen to hear what people think about this prototype—so jump in, give it a try, and leave your comments at the Nomenklatura repo.

Annotator v1.2.9

A new maintenance release of Annotator came out ten days ago. This new version is intended to be one of the last in the v1.2.x series—indeed, v1.2.8 itself was intended to be the last, but that version had some significant issues that this new release corrects.

Fixes in this version include:

  • Fixed a major packaging error in v1.2.8. Annotator no longer exports an excessive number of tokens to the page namespace.
  • Notification display bugfixes. Notification levels are now correctly removed after notifications are hidden.

The new Annotator is available, as always, from GitHub.

Data Protocols updates

Data Protocols is a project to develop simple protocols and formats for working with open data. Rufus Pollock wrote a cross-post to the list about several new developments with Data Protocols of interest to Labs. These included:

  • Close to final agreement on a spec for adding “primary keys” to the JSON Table Schema (discussion)
  • Close to consensus on spec for “foreign keys” (discussion)
  • Proposal for a JSON spec for views of data, e.g. graphs or maps (discussion)

For more, check out Rufus’s message and the Data Protocols issues.

On the blog

Labs members have added a couple new posts to the blog since the last newsletter. Yours truly (with extensive help from Rufus) posted on using Data Pipes to view a CSV. Michael Bauer, meanwhile, wrote about the new Reconcile-CSV service he developed while working on education data in Tanzania. Look to the Labs blog for the full scoop.

Get involved

If you have some spare time this holiday season, why not spend it helping out with Labs? We’re always always looking for new people to join the community—visit the Labs issues and the Ideas Page to get some ideas for how you can join in.