Q. How is social network analysis used in studies of open source?
A. Social network analysis (SNA) can be used to study online communities, including free/libre open source software (F/LOSS) developer teams. SNA techniques provide insight into these communities and enable researchers to make predictions based on these insights. They can be used to model the nature and patterns of interactions that can be used as a predictor of group behaviour, trust, knowledge generation, and information diffusion (Crowston et al., 2010). SNA can also be used make predictions about other kinds of networks other than pure social networks, such as networks based on relationships between code artifacts.
In this article, we answer the question of how SNA has been used to study open source. We begin by describing social networks and how they can be deconstructed to examine the relationships between entities within them. Next, we discuss social networks within F/LOSS communities and describe how SNA gives insights into the various actors and groups acting within networks. Finally, we provide an overview of common SNA measures used to study open source, including examples of how they have been used to provide insights about F/LOSS communities.
A social network is made up of individuals or organizations who are linked. It can be viewed as a network of nodes and links, where the nodes are actors (such as individuals or organizations) and links represent some kind of connection between actors. This connection could represent a variety of ties, such as affiliation or membership in an organization, dependency, social relationships, information flow, or interactions (Crowston et al., 2010). All these types of ties can be represented in a single network, which can illustrated graphically. For example, Figure 1 shows developers and their relationship with multiple projects, where actors such as projects and developers are represented as nodes and developer interactions and affiliations with projects are represented as ties.
Figure 1. A Social Network*
*Adapted from Michael Weiss (2010, "SYSC5801: Open Source Business," Carleton University).
In this example, developers are linked if they belong to the same project, and projects are linked if a developer works on both projects. Even with just two types of ties as shown in Figure 1 (i.e, developer-project relationships and project-project relationships), the network can quickly become difficult to analyze. The analysis is improved by modeling developer-developer ties and project-project ties as two different social networks, as shown in Figure 2. Projects P1 and P2 are related because developer D2 (from Figure 1) is involved with both projects. Similarly, developer D1 is related to developer D2 because they both work on project P1. Also, developer D2 and D3 are related because they work together on project P2.
Figure 2. Deconstructing a Social Network*
*Adapted from Michael Weiss (2010, "SYSC5801: Open Source Business," Carleton University).
SNA examines the relationships between these actors, the characteristics of these relationships, and their impact on the actors. It provides a means to formalize social properties and processes by providing testable models of social concepts. SNA has been used for studying relationships between people, groups, organizations, and other social actors, including relationships within F/LOSS communities.
F/LOSS Communities and Social Networks
F/LOSS communities exhibit properties of social networks in that they consist of actors who are linked by some interdependency. SNA techniques have been used by researchers to understand the dynamics of such communities. For example, Madey and colleagues (2004) studied almost 60,000 F/LOSS projects hosted by SourceForge and applied SNA measures to detect the presence of certain properties of social networks in the SourceForge developer community. They found that the SourceForge community showed properties of being a social network in that: i) it has hub actors, who are key to information flow within the network and also tie separate parts of the network together; and ii) it is a self-organizing system that forms "patterns of connectivity, that emerge from bottom up process based on local interactions."
The use of SNA in open source is not limited to using people or projects as the actors in a network. Nguyen and colleagues (2010) modeled the Eclipse project as a dependency network of software packages and used various network analysis measures to predict post-release failures in Eclipse projects.
Contexts for SNA
Social network analysis gives us insight into the various roles and groupings in a network. Most research asks the following types of questions:
Who are the information hubs within the network and who bridges different groups of clusters together?
Who is important in the network and who has influence over the network?
What is the level of activity in the network?
Where in the network is there a need for improved communication?
To answer these questions, identifying the types of actors is particularly important. Certain actors hold privileged positions within the network, which enables them to have greater influence over the network or earlier awareness of new information relative to other members of the network. For example, in a study of the spread of H1N1 virus, Christakis and Fowler (2010), found that, by monitoring the health of central actors (rather than the usual approach of monitoring a random sample from the population), health professionals could detect the spread of the virus up to 16 days earlier in central actors than in the general population. Identifying central actors will enable organizations involved in F/LOSS projects to react to changes within the community faster and more aptly.
Another area where insights from SNA are important is organizational mergers. When organizations merge, challenges arise when combining the formal structures of operations. There is also an issue of merging distinct corporate cultures. Cultures are created, maintained, and shared through interactions between people in networks. Just after the merger, the new organization consists of two virtually separate social networks. If the social networks of the organization remain separate, so will their culture and the flow of communication between the people. Thus, efforts early on should be directed toward identifying central actors and combining the networks. To track the progress of the merger, snapshots of the organization-wide network should be taken at different points in time to measure the connectedness of the network and where gaps remain.
The following SNA measures have been used to study F/LOSS communities:
1. Betweenness centrality: this measure identifies information hubs within a network, which act to bridge or "glue together" different parts of a network that would otherwise be apart (Martinez-Romo et al., 2008).
Madey and colleagues (2004) used betweenness centrality to study F/LOSS projects hosted on SourceForge. The study modeled the developer community as a collaborative network. The study demonstrated that “linchpin” or hub developers play a central role in linking fragmented developer communities in a F/LOSS community.
Martinez-Romo and colleagues (2008) used betweenness centrality to measure positions of developer leadership in a study of company involvement in an open source project. They showed that actors with high values of betweenness centrality are on paths that provide opportunities to others, even if they are not directly connected to those benefiting from the opportunities. By identifying the leaders and information controllers in the network, the study was able to show that company employees held leadership positions with low degree of turnover.
2. Eigenvector centrality: this measure identifies positions of importance and influence within a network. In the study of company involvement in an open source project, Martinez-Romo and colleagues (2008) used this measure to identify developers of high influence. Nguyen and colleagues (2010) used eigenvector centrality as a component measure to identify post-release failures in the Eclipse project.
The betweenness centrality and eigvenvector centrality identify different forms of leadership within a network. Betweenness centrality identifies information hubs; eigenvector centrality identifies nodes that have influence over the network. Martinez-Romo and colleagues (2008) showed that it is harder to gain positions of influence than become an information hub.
3. Coordination degree: this is a measure of the ability of a vertex to interchange information. It shows the ability of a node to receive information from the network and capture information about activity in a project (Martinez-Romo et al., 2008).
Martinez-Romo and colleagues (2008) used coordination degree to measure the role of a company in an open source project. They found that periodic, time-based releases of code increased developer activity more than feature-based code releases. Using a slightly different measure, the average coordination degree, the study found phases in which the network structure was efficient and when it was not. Comparing that with levels of corporate involvement, the study showed that corporate involvement in F/LOSS projects lead to more efficient development, but only if both the company and the F/LOSS community cooperate in the development efforts. There was less activity when there was no corporate involvement or when the company choose not to engage the F/LOSS community.
SNA provides a set of measures well suited to analyzing networks, including F/LOSS communities and other types of online networks. It allows researchers to visualize relationships within complex networks and provide insights into these communities.
For a detailed analysis of the use of SNA measures in studying online communities, including the limitations of this approach and recommendations for researchers, see:
"Validity Issues in the Use of Social Network Analysis for the Study of Online Communities" by Kevin Crowston, James Howison, and Andrea Wiggins (2010)