Shuttleworth Fellowship Application: Using the Open Data Ecosystem

JULY 10, 2012

My application this year is under the title “Using the Open Data Ecosystem” (aka “Generating Value from the Open Data Ecosystem: Using and Applying Open Data”).

Describe the world as it is.

The last several decades the world has seen an explosion of digital technologies which have the potential to transform the way knowledge is disseminated.

This ongoing technological revolution has enabled – and implies – the creation of an open data ecosystem in which information is freely used, extended and built on.

When I started work in this area approximately a decade ago, the idea of open data and open knowledge were still very much new, with little awareness and even low uptake.

Today, 10 years on the landscape has dramatically changed – in some, even substantial part, to the efforts of the Open Knowledge Foundation. Increasing amounts of open data are being made available with significant commitment from a variety of organizations and communities including governments to release data and knowledge openly.

We should not, of course, underestimate how much more there is still to do – much data released is still, too often, of the “toy” variety – locations of park benches, user satisfaction with government websites – rather than the core kind (e.g. postcodes, full geodata, detailed financial and transport information – see for more). Nevertheless, it is reasonable to feel that the hardest part of the road is now behind us.

[we are now on the downward slope?]

What change do you want to make?

I want to see substantially greater use and application of open data to solve immediate real-world problems.

Specifically, now that we have increasing amounts of open data available, it is time to focus our efforts on uses and applications of this data to deliver tangible and visible benefits.

It is worth recalling here that openness is not an end in itself – open data and open knowledge are means to an end and must be used if they are to create value – it can be in helping us understand and prevent climate change; speeding the development of life-saving drugs; helping us find a better way to work or any number of other areas whether data and information play a significant role.1

What do you want to explore?

I want to explore what are the most effective applications for open content and data that we, and others, can make. Additionally, and somewhat secondarily), what additional lightweight tools are needed to assist with this.

Being more specific I want to identify and engage with non-technical and non-open-data communities and organizations to identify problems and issues they have where data and data applications would be of high value. I then want to work with these communities to develop applications that address their real needs.

I also want to explore development of ultra lightweight tools and services for processing and managing data (services that can be prototyped in a few days to a week or two) and to explore generalizing this into simple standards and protocols needed for the open data community to interoperate and scale.

Finally, I want to explore how to build a really significant distributed community of data wranglers and hacktivists (focusing probably on a specific area like civic information or finances).

What are you going to do to get there?

I plan to divide my work into 3 distinct but complementary strands.

The first will focus on the on the development and show-casing of open data applications. This will involve several (iterable) steps. First, will be an process to identify areas where there is a good match between a problem (and community) on the one hand and data (and skills) on the other hand. We already have a good number of communities and organizations to work with so this process need not start from scratch. Second, I plan to run a very short sprint process to produce a prototype (on the order of 2-4 weeks). Third, we will announce and evaluate and decide what to do next. In general I hope that taking the implementation to the next stage will be something done by the organization or community we are working with. The aim will be to do between 4 and 6 or these type of activities in the next year. Lastly, I would also emphasize that I (and the Open Knowledge Foundation) to do not always have to be involved – encouraging and guiding others in this process would be equally productive.

Second, I want to work on developing a small suite of data micro-services. Each micro-service would be focused on one specific area, and a current list would be: data conversion (e.g. csv to json), data modelling (e.g. this column is a string, and represents a country code), data validation (this dataset fails to have required columns), data issues (reporting and logging of issues e.g. column X at row Y has wrong type). See this diagram for an overview. While each micro-service can run standalone they would also form a natural whole. In addition, these services form a natural cluster around the DataHub and will likely integrate with it for authentication and data storage. Lastly, I plan to use these micro-services as prototypes for the creation of a set of simple, lightweight protocols and formats as part of the Data Protocols project 1.

Third, I plan to develop and expand the work of the School of Data and OKFN Labs as a way of engaging with and developing a broader community of data wranglers and data hackers both within the Open Knowledge Foundation community and more widely. Here, we have already begun but can and will do much, much more.

  1. We can, to some extent, see a value in open “knowledge” and open data in and of themselves. Nevertheless, especially for the latter I think this is reasonably limited. ↩︎