This post continues the work begun in this earlier post on “Estimating Information Production and the Size of the Public Domain”. **Update: 2009-07-17** there is now a follow-up post.

Having already obtained estimates of the number of items (publications) produced each year based on library catalogue data our next step is to convert this into an estimate of the “size” of the public domain. (NB: as already discussed, “size” could mean several different things. Here, at least to start with, we’re going to take the simplest and crudest approach and equate size with number of publications/items.)

The natural, and most obvious, approach here is to go through our 1 million+ items and compute their public domain status (as discussed in this earlier post). Unfortunately, as detailed there, this is problematic because we often have insufficient information in library catalogues with which to compute PD status with certainty – in particular, author death dates are frequently absent. Thus, it will be necessary to fall back on some approximate method.

For example, we can use base PD status on simple publication dates: if a book was published, say, 140 years ago it is very likely it is in the public domain – for it to be in copyright its author must have lived more than 70 years after the book came out (remember copyright lasts for life plus 70 years in the EU)! Conversely, any publication less than 70 years old is almost certainly *not* in the public domain. For periods in between we can assume some proportion of publications are PD starting close to zero for more recent items and rising towards one for older ones. A calculation along those lines is provided in the following table:

Start | End | Items | % PD | Number PD |
---|---|---|---|---|

1400 | 1870 | 389291 | 100 | 389291 |

1870 | 1880 | 50564 | 95 | 48035 |

1880 | 1890 | 66857 | 90 | 60171 |

1890 | 1900 | 66883 | 80 | 53506 |

1900 | 1910 | 70360 | 50 | 35180 |

1910 | 1920 | 60489 | 30 | 18146 |

1920 | 1930 | 78670 | 10 | 7867 |

1930 | 1940 | 90576 | 5 | 4528 |

Total | 873690 | 0.71 | 616724 |

So, based on the assumptions regarding PD proportions given in the table, there are somewhat over 600 thousand PD books according to the holdings of Cambridge University Library (of which just over half, approx 390k are from before 1870). The British Library dataset is approx 4x as big as Cambridge University Library and the numbers scale up roughly proportionately giving a total of over **2.4 million** items.

Of course this is a fairly crude approach based purely on publication date and it be improved in a variety of ways, most notably by using the authorial birth date information which is usually present in catalogue data (we can also use death date information where present). This will be the subject of the next post. (**2009-07-17** the post is up here).