FLOSS 2008 Workshop on Free/Open Source Software

JUNE 30, 2008

Last week I attended FLOSS 2008, the second international workshop/network meeting on FLOSS (Free/Libre/Open Source software) in Rennes, France. I was presenting my paper Innovation and Imitation with and without Intellectual Property Rights (and would have offered discussant comments but the author of the paper I was scheduled to discuss had to pull out at the last minute). In addition to this I got to hear a variety of interesting talks. On some of these I was able to take notes which I have included below for the ‘delectation’ of anyone else who is interested.

Mikko Valimaki: IPR and Open Source Software

  • Goodman and Myers (2005) – the 3G standard.
  • Leveque and Meniere 2007: what does RAND mean
    • reasonable royalty is R = c (v1-v2)p where c is incremental costs of licensing, v1-v2 is gain from using this pattern over second-best.
  • Other questions for royalty-setting
    • quality of volume of patents
    • early or late innovators
    • cumulative royalties or one-time fees
  • But all models he knows of have non-zero royalty fees
    • [ed]: not surprising given that you will always get interior solutions
  • Windows/Samba discussion
    • specific sets of terms
    • provide RF for the open source community
  • Commission Decision para 783
    • “On balance, the possible negative impact of an order to supply on Microsoft’s incentives to innovate is outweighed by its positive impact on the level of innovation of the whole industry.”
  • Nokia to acquire Symbian:
    • “a full platform will be available … under a royalty-free license … from the Foundation’s first day of operations … the Foundation will make selected components available as open source at launch.”
    • [ed]: Motivation here is clear: Nokia care about the hardware and for them software is a complementary good – which they therefore wish to be as cheap as possible. But this raises question as to what is being made open: is hardware patents or pure software patents (and if so how big a deal is this)

Stefan Koch: Efficiency of FLOSS Production

  • Question of efficiency of open source development
  • How much software did we get for our effort
    • Is OS a waste of resources?
  • Discussion without much empirical basis
    • Claim: fast and cheap, high quality, finding bugs late is inefficient (actually large effort) – see IEEE Software 1999
  • Completely unknown as no-one keeps time-sheets. So
    • Effort based on participation data
    • Effort based on product – look at software and ask how much effort would be needed in commercial environment
  • Empirical research in open source
    • Mainly case studies
    • Helpful but need proper large-scale analysis
  • Mined software repositories [ed: cf. today FLOSSMatrix, FLOSSMore]
    • 8,261 projects
    • 7,734,082 commits
    • 663M LOCs
    • resources and output is skewed: top decile of programmers: 79% of code base, second decile: 11%
  • Effort estimation based on actual participation
    • active programmer months (define active as committing in a given month)
    • high correlation with LOC added in month
  • Cumulate this number for each project
    • But not equal to a commercial person-month
    • How do we scale: use 18.4 h/w taken from stats for committers on Linux kernel
    • [ed:] this is the key assumption. The whole point is that FLOSS effort is not observed and they are using a measure of output (committing) and trying to infer actually activity
  • Manpower function modelling:
    • Norden-Rayleigh model (1960)
    • Some set of problems N (unknown but finite)
    • Probs are solved independently and randomly (following Poisson)
    • This fits ok but has eventual decline in participation which does not occur
    • Modify this: in particular to allow introduction of new problems
      • Introduce in prop to original no. problems, in prop to current set of problems etc
      • Also have different learning rates
      • [ed: but isn’t the setup a little different. Really it is a question of success vs. non-success in terms of acquiring users + some kind of bound on amount of participation due either to fission or complexity]
  • Product-based estimation
    • COCOMO 81 and COCOMO 2
  • Results:
    • Comparison COCOMO - Norden-Rayleigh
    • For COCOMO 81 cannot find parameters favourable enough to explain Norden-Rayleigh curve
    • For COCOMO 2 can find parameters but very favourable
    • Suggest (roughly) that FLOSS very efficient (but not very rigorous)
  • More formal estimation using all models etc
    • Norden-Rayleigh significantly below prodcut-based estimates (factor of 8 in mean)
  • Interpretation
    • FLOSS v. efficient (self-selection for tasks etc)
    • Extremely high amount of non-programmer participation (1:7 relation …)
  • [ed]: not sure about this generous view. Other explanations
    • No quality measurement (also mentioned by Koch)
      • OK: lot of code but low quality
    • (Related) Many sourceforge projects are incomplete, easy bit at the start
      • Later comes a lot of refactoring/writing documentation. This may display significant diminishing returns
    • Many FLOSS projects come from what were originally commercial projects. In that case:
      • code may have already been written
      • conceptual components have been done already
    • Trade-off of time vs. productivity
      • May be more productive to only work 10h a week but then product might not be ready for 10 years
  • Form discussion
    • interesting point: Nokia thinking of moving to more FLOSS in-house because they can’t manage their 5-10k programmers centrally any more

Mickael Vicente: Shift to Competences Model: A Social Network Analysis of Open Source Professional Developers

  • Robles 20007
    • Statistics on Debian showing increasing corporate involvement
  • Social network extraction
    • Get repo logs
    • Create link between 2 developers if they have committed on the same file (non-directed graph)
      • Simplification: the best collaboration of each developer (directed graph) – pick other developer with whom they have committed most files in common
    • Longitudinal analysis
      • extract clusters
  • Correlation with professional career
    • CV collected on Internet, personal web page etc (96% collected)
  • Interesting data

Nicholas Radtke: What Makes FLOSS Projects Successful: An Agent-Based Model of FLOSS Projects

  • Positive Characteristics of FLOSS
    • High quality (Low defect count: Chelf 2006)
    • Rapid development
    • Violates Brooks law (Rossi 2004)
  • Risky Business
  • for every successful FLOSS project there are dozens of unsuccessful projects
  • Corporate IT manager survey (2002)
    • 41% mention inability to hold someone responsible for software
  • Attempts at Simulating FLOSS
    • SimCode (Dalle and David 2004)
    • OSsim (Waggstrom et al 2005)
    • K-Means stuff
  • Simulate across landscape
    • Not social network
    • Focus on developer decision to join/contribute to projects (Agent-Based Modelling)
  • Defining Success and Failure
    • Traditional metrics do not work well (on budget?)
    • Completion (Crowston et al. 2003)
    • Progression through maturity stages (Crowston and Scozzi 2002)
    • Number of developers
    • Mailing list activity
    • Project outdegree, Active developer count (Wang 2007)
  • The Model Universe
    • Agents and projects
    • Agents:
      • Consumption: 0-1
      • Producer: 0-1
      • Resource: 0-1.5 (1=40h)
      • Memory: agents only aware of some subset of projects
      • Needs vector (preferences)
      • utility: linear sum of: similarity match + current popularity (current resources) + cumulative resources + download + f(maturity)
    • Projects:
      • resources needed
      • current resources
      • cumulative resources
      • download count
      • preferences: same as agent but converges towards those had by agents working on it
  • Agents choose between projects each time period
    • have some randomness in that use multinomial logit: prob choose project i ~ exp(mu * Utility of project i)
  • Results
    • Simulate over 250 time steps ~ 4 years
    • calibrate [ed: in a way I was not quite clear about]
    • compare simulation with empirical data from sourceforge
      • developers per project
      • projects per developer
    • Find that (from simulation data) downloads and cumulative resources are not important

Fabio Manenti: Dual Licensing in Open Source Software Markets

  • Benefits of Going Open Source
    • feedback from community
    • network effects (usage)
    • competitive pressures (e.g. Netscape) [ed: not sure this is a benefit]
  • Dual-licensing
    • Kosky (2007): 6% of representative sampl of European OSS business firms employ DL strategies

Alexia Gaudeul: Blogs and the Economics of Reciprocal (In-)Attention

  • What blogs are
  • Reasons for blogging
  • Question: do you befriend (link) because of content produced or do you produce content because of friends
  • General points
    • Market interactions only part of wider class of reciprocal relations
    • Time vs. money economics
    • Unique dataset, very detailed and complete, to test networked relations
  • Model – but left out due to time
  • Dataset: livejournal 2006
    • Sociology: teenagers to young adults (15 to 23), female (67%), Americans (70%)
    • Fast growth: created in 1999, 8M accounts, 1.3M active
    • FLOSS but for-profit (SaaS)
    • Great part from self-referential
    • Lively: 4 comments per post on average
    • Federated by communities: no. of communities per person 15
    • Journals updated for more than 2 years on avg
    • 70% have posted in last 2 months
    • No. of entries: 1 every 2 days
    • No. of friends: 50 avg
    • Balance between friends and friends of
    • Balance between comments received / made
  • Friendship patterns
    • May be balance but does not explain no. of friends of diff. individuals
    • Need to distinguish
      • Norm of reciprocity: more promiscuous bloggers accumulate friends
      • Content attractiveness
        1. Quality/freq. of posts
        2. Interactivity (comments per post)
  • Regressions
    • Reciprocity: No. blogs read (friend) = b * number of readers (friend of) + error
    • Activity: No. readers = cX + error – X = matrix of ind. variables
    • Endogeneity issues [ed: all over the place)
    • Regress: ln(Friends) = ln(Friend of) + … (with instrumenting Friends Of on Activity so solve endogeneity issues)
      • Saturation around 400 friends seemingly (few with more)
    • Max no. of friendship when your no. friends = no. friends of (maybe)
      • A norm of reciprocity
    • Issues with endogeneity of activity (which was used to instrument friends of)

Sylvain Dejean

  • Does ICT lead to the Internet lead to a global village or a cyber-balkan
  • What leads to emergence of virtual commmunities
  • Is the heterogeneity of contributions an impediment to self-organize
  • How to manage virtual communities
  • Agent-based model:
    • Individuals defined by some characteristics
    • Herfindahl index measures degree of self-organization [ed: why self-organization]
    • Communities change via selection and variation