Last year I had the great fortune to be supported, in the role of a Fellow, by the Shuttleworth Foundation in my work to help Bootstrap the Open Data Ecosystem.
Not only was it great to have the resource to dedicate myself full-time to this area, but as a Fellow you get the (rare) chance to be part of a small, tight-knit group of really smart, interested and, well, just, incredibly nice set of people who share your interests and are able to offer constructive, critical feedback and support on what you are doing – the other place I know like this is the Open Knowledge Foundation ;-). You can read more about what I got up to in the section on “Year in Review” below.
(By the way, deadline for Spring 2012 Fellowship applications closes on the 1st of Nov and it’s really easy to apply so if you’re interested it’s not too late, just head over to Shuttleworth Fellowship application page right now).
I’ve really enjoyed my time as a Fellow over the last year and I’m delighted to report that Shuttleworth have been kind enough to accept my reapplication and renew my fellowship for a further year. Reflecting the evolution and maturing of the open data ecosystem this year’s proposal was entitled “Scaling the Open Data Ecosystem” to contrast with last year’s “Bootstrapping”.
In a follow up post next week I’ll be outlining what I’m planning to get up to in the coming year with Shuttleworth’s support. See below for what I’m planning to do (also x-posted on Open Knowledge Foundation blog at https://blog.okfn.org/2011/10/31/scaling-the-open-data-ecosystem/).
Scaling the Open Data Ecosystem
Describe the world as it is.
The last several decades the world has seen an explosion of digital technologies which have the potential to transform the way knowledge is disseminated. This world is rapidly evolving and one of its more striking possibilities is the creation of an open data ecosystem in which information is freely used, extended and built on. The resulting open data ‘commons’ is valuable in and of itself, but also, and perhaps even more importantly, because the social and commercial benefits it generates — whether in helping us to understand climate change; speeding the development of life-saving drugs; or improving govenance and public services.
In developing this open data ecosystem there are three key things are needed: material, tools and people. This is a key point: open information without tools and communities to utilise it is not enough, after all, openness isn’t an end itself – open material has no value if it isn’t used.We need therefore to have widely available the capabilities for utilising open material, for processing, analysing and sharing it, especially on a large scale. Relevant tools need to be freely and openly available and the related infrastructure — after all tools need somewhere to run, and data needs somewhere to be stored — should be capable of effective deployment by distributed communities.
Over the last few years we’ve started to see increasing amounts of open material made available, with release of open data really starting to take off in the last couple of years. But the (open) tools and the communities to use them are still very limited — we’re just starting to see the first self-identified “data wranglers / data hackers / data scientists” (note how the terms have not settled yet!). Key architectural elements of the ecosystem, such as how we create and share data in an open componentized way, are only just beginning to be worked through. We are therefore at a key moment where we transition from just ‘getting the data’ (and building the app) to a real data ecosystem in which data is transformed, shared and reintegrated and we replace a ‘data pipeline’ with ‘data cycles’.
What change do you want to make?
I want to see a world in which open data – data that can be freely shared and used without restriction – is ubiquitous and in which that data is used to improve the world around us, whether by finding you a better route to work, helping us to prevent climate change, or improving reportage. I want open data to allow us to build the tools and systems to help us navigate and managing the increasingly complex information-based world in which we now live.
Specifically, I want to help grow the emerging open data ecosystem. While part of this involves supporting and expanding the ongoing release of material — building on the major progress of the last few years — the biggest change I want to make is develop the tools and communities so that we can make effective use of the increasing amounts of open data is now becoming available.
Particular changes I want to make are:
- Development of real ‘data cycles’ (especially for government data). By data cycles I mean a process whereby material is released, it’s used and improved by the community and then that work finds its way back to the data source.
- Greater connection of open data to journalists and other types of reporters/analysts who can use this data and bring it to a wider audience.
- Development of an active and globally-connected community of open data wranglers.
- Development of better open tools and infrastructure for working with data, especially in a distributed community using a componentization approach that allow us to scale rapidly and efficiently.
What do you want to explore?
I’m interested in learning more about the actual and potential user communities for open data. I want to explore what they want — in relation to both tools and data — and, also their awareness of what is already out there. I’m especially interested in areas like journalism, government, and the general civic hacker community.
I want to explore the processes around ‘data refining’ — obtaining, cleaning and transforming source data into something more useful and data ‘analysis’ (usually closely related tasks). I’m especially interested in existing business activity in this area — often labelled with headings like business intelligence and data warehousing. I want to see what we could learn from business regarding tools and process that could be used in the wider open data community as well as how the business community can take advantage of open data.
I want to explore how we can connect together the distributed community of data wranglers and hacktivists, focusing on a specific area like civic information or finances. How do we allow for loose networks across different location and different organisations while sharing information and collaborating on the development of tools.
Lastly, I want to explore the tools and processes needed to support decentralised, collaborative, and componentised development of data. How can we build robust and scalable infrastructures? How can we build the technology to allow people to combine multiple sources of official data in a wiki-like manner – so that changes can be tracked, and provenance can be traced? How can we break down data into smaller manageable components, and then successfully recombine them again? How can we ‘package’ data and create knowledge APIs to enable automated distribution and reuse of datasets? How can we achieve real read/write status for official information – not just access alone?
What are you going to do to get there? I want to focus my efforts in this next year on 3 key areas, breaking new ground but also building on existing work I’ve been doing with the Open Knowledge Foundation.
First, I want to build out CKAN software and community from a registry to a data hub – a platform for working with data not just listing it. The last year has seen very significant uptake of the CKAN with dozens of CKAN instances around the world including several official government and institutional deployments. Improving and expanding CKAN we will allow us to capitalize on this success to make CKAN into an essential tool and platform for open data “development”.
The most important aspect of the software side of this will be the development of a datastore component supporting the processing and visualization of data within CKAN. With features like these CKAN can become a valuable tool not just for tech-savvy data ‘geeks’ but for the more general users of data such as journalists and civil servants. Engaging this wider, “non-techy” audience is a key part of scaling up the ecosystem. It is important to emphasize that this won’t just be about developing software but is about understanding and engaging with the a variety of data-user communities, exploring how they work, what they want and how they can be helped.
Second I want to build out the OpenSpending platform and community. OpenSpending is Where Does My Money Go Goes globalized — a worldwide project to ‘map the money’. Following the successful launch of Where Does My Money Go last autumn in the UK, in the last 6 months we have dramatically expanded of coverage with data now from more than 15 countries (in May our work on Italy received coverage in La Stampa, the Guardian and other major newspapers).
Working with OpenSpending complements work on CKAN because it is a chance to act as a data user and refiner — we already have some basic integration with CKAN but it’s still very basic. Furthermore, OpenSpending presents the chance to develop a specific data wrangler / data user community and one which can and should have close links with users and analysts of data including journalist and civic ‘hacker’ groups. In this way OpenSpending can act as a microcosm and prototype for developments in the wider open data community.
Third, I want to develop the OKF Open Data Labs. Much like the “Google Labs” for Google’s web services, Mozilla Labs for the Web, and the “Sunlight Labs” for US transparency websites, I would like the “Open Data Labs” to be a place for coders and data wranglers to collaborate, experiment, share ideas and prototypes, and ultimately build a new generation of open source tools and services for working with open data. The labs would form a natural complement to the my other activities with CKAN and OpenSpending – the Labs could build on material and tools from those projects while simultaneously acting as an incubator for new extensions and ideas useful both there and elsewhere.
Year in Review (2010 - Boostrapping the Open Data Ecosystem)
In my fellowship proposal last year I set out 4 key milestones:
- 10 active working groups promoting open knowledge in a different key area
- 10 actively used instances of CKAN for open data in different countries
- A major international workshop on open government data
- Reaching version 1.0 with two major open data projects: Where Does My Money Go? and Open Biblio
For each of these milestones I have achieved the stated goal – and in several cases surpassed it. And my work has gone well beyond these specific targets. I have given more than 30 presentations and workshops over the last 9 months everywhere from Brasilia to Sofia. As one of the four members of the UK Government’s Transparency Board I have been advising the UK government on open data and transparency and helping the UK continue its pioneering role in this area. I have overseen, since last September a rapid expansion in the Open Knowledge Foundation’s activities with a 3 times increase in core staff.
Working groups
The Foundation now has more than dozen working groups working in areas from science to literature, and archaeology to government data. In the last year we have seen significant growth both in the number and activity of working groups.
To give a couple of specific examples:
- Our Open Government Data working group now has over 500 members on its mailing list and and 100 official (invited) members. It has representatives from dozens of countries and its membership includes most of the key people working in this area both inside and outside of government. The working group’s website http://opengovernmentdata.org/ is the top hit on google for “open government data”. As an example of the strength of the working group we were able to subtitle our #opendata film into more than twenty languages in just a few days after a call for help on the mailing list.
- Our economics working group only entered incubation last Autumn but in January members of the working group conducated a hectic 2-day sprint to create an app for submission to the World Bank apps for development competition. http://yourtopia.net/ was the result and it went on win third place and a $5000 prize that is going to expanding the working group’s activities.
CKAN
There are now more than 30 CKAN instances world-wide – see the (still incomplete) list at http://wiki.ckan.net/Instances. In the last 9 months specifically we’ve helped set up several new ‘official’ government or institutional data catalogs using CKAN, for example the IATI (International Aid Transparency Initiative) Registry http://iatiregistry.org/, for the Helsinki Region http://data.hri.fi/ and for Greater Manchester in the UK http://datagm.org.uk/.
In addition there a large number of community instances that have been set up, often with our help with many of these now containing hundreds of datasets. There has also been growth in existing instances with http://ckan.net/, our main community instance, now having nearly 2000 data packages. The community has also grown in significant ways with CKAN now adopted as official registry for Linked Open Data by the maintainers of the LOD cloud.
Insert image:
http://iatiregistry.org/ http://ckan.net/stats
Open Government Data Camp and Open Government Data
The UK Open Government Data Camp is one of the most exciting events I have seen in years. The energy and interaction is great.
Joey Hutcherson, Deputy Director of Open Government, U.S. Department of Commerce
Where Does My Money Go and OpenSpending
Where Does My Money Go had its 1.0 release in November 2010 to coincide with the UK Government’s announcement that spending over 25k. A measure of the success and recognition of the project has been its frequent citation by others as an exemplar open project app. To give two examples: Google’s dataviz challenge used Where Does My Money Go? as one of its two examples to inspire contributors and Rohan Silva, a member of the Prime Minister’s team on transparency in the UK, cited Where Does My Money Go? as an exemplar in his talk at the 2011 Personal Democracy Forum conference in New York in June.
The most significant update though is our work to transform Where Does My Money Go? into a global project under the name OpenSpending. Already we have project participants from more than 15 countries and we have already had substantial success getting open finance data from other countries. For example in May, we were able to obtain a set of Italian budget data which, in a hectic 2-day sprint, we were able to load in OpenSpending and visualize with the result receiving coverage in several major newspapers including La Stampa and the Guardian.
We have also been able to scale up the platform to deal with the increasing amount of open data becoming available (thanks often to our efforts). For example in the UK, every government department is now releasing monthly data detailing every financial transaction over £25,000. In the eight months since the announcement of this policy at the Open Government Data last Autumn, more than 1.8 million transactions have been made available.
Open Bibliographic Data
Very good progress has been made on Open Bibliographic Data over the last year. Highlights:
- Open Bibliography working group is now very active with several hundred members of the mailing list. The working group drafted a set of open bibliography principles launched earlier this year which has received significant sign-on.
- Last Autumn, working in collaboration with the JISC OpenBib project I obtained the release of 3m open bibliographic records by the British Library the largest release of its kind to date and a major milestone for open bibliographic data. All of this data is available on our http://bibliographica.org/ site, and the data has already been used to build apps for Wikipedia and to power our own Public Domain Works project http://publicdomainworks.net/
Talks and Community Engagement
I have given more than 30 presentations and workshops over the last 9 months everywhere from Brasilia to Sofia in Bulgaria.
http://rufuspollock.org/2011/03/18/shuttleworth-fellowship-activity-in-the-last-3-months/ http://rufuspollock.org/2010/12/01/progress-in-the-last-3-months/ http://rufuspollock.org/2011/05/05/shuttleworth-highlights-for-april-2011/