Sunday, June 20, 2010

Detecting cohesive subgroups through SNA

Social Network Analysis (SNA) is a formal method, or rather a family of methods, that can be used to examine differentiated patterns of interaction between actors. Most social research methods work with attributional data. They measure the individual actor’s personal attributes (such as age, gender, socio-economic status, etc.), then try to group together individuals possessing similar profiles of attributes, and conclude that social behavior is the result of these common attributes. By contrast, SNA works with relational data. It adopts as its unit of analysis the ties or relations between actors, and tries to explain social behavior as a result of the patterns of strong and weak ties between actors, and the resulting constraints to social behavior. Once ties are defined and measured no assumption needs to be made about the spatial position or other characteristics of individual actors.

The fact that SNA only requires relational data made it well-suited to analyze newsgroups, where personal attributes of participants are invisible and not readily disclosed. But one thing that was easy to observe in newsgroups is who posted to who. Even after just following a newsgroup for a couple of days, patterns started to emerge: some people posted a lot, some people's posts were highly commented (for better or for worse), some people seemed to be ignored, etc. Since my research interest was to detect stable communities within newsgroups, I decided I would try to detect subsets of participants that displayed high levels of interaction between them. In other words, using SNA terminology, I was looking for cohesive subgroups, within the total population of the newsgroup.

To do this, I needed a complete listing (or at least a large sample) of messages posted to the newsgroup, and I needed to record all the combinations of one actor replying to another's post. Fortunately, Usenet messages have a standard format that made this task relatively easy. I used the From and References headers of each message. The first one told me who had posted the message; the second one told me to which message it was a follow-up. Using a simple BASIC routine, I imported these headers into an Access database, and then used a query to obtain a listing of all From-To combinations. In effect, this listing was the social network participants had created over a specific period of time through their online interactions. I then imported this into Ucinet, an SNA program, and was able to identify those people who formed the most cohesive subgroups in the 19 newsgroups I was focusing on: in effect who were the most active members of the virtual community in the newsgroup, and just how active they were.

I recently wrote a book chapter describing this technique and included as an appendix the BASIC routine I used. The book will come out later this year, here is the reference:

Murillo, E. (2010) Using social network analysis to guide theoretical sampling in an ethnographic study of a virtual community. In Ben Kei Daniel (Ed.), Handbook of Research on Methods and Techniques for Studying Virtual Communities: Paradigms and Phenomena. Hershey, PA: IGI Global.

Sunday, June 6, 2010

Systematically searching Usenet

When I first started looking at newsgroups for some evidence of VCoP traits, I just subscribed to different groups, guided by their name, downloaded threads and messages and examined how long threads were, how many participants were involved and how knowledgeable they seemed. Of course, this was far from systematic; and although I did pick up on a few groups that would later turn out to be true VCoPs, I needed a formal procedure to search Usenet and zero in on newsgroups which were good candidates for my research. 

Luckily, I came upon Marc Smith's (1999) excellent study of Usenet and read about Netscan, a newsgroup analyzer he built to map Usenet activity and to gather newsgroup interaction measures. Using Netscan, I was able to obtain interaction statistics from complete hierarchies, such as sci.*, comp.* and misc.*. This gave me an efficient means of comparing hundreds of newsgroups at a time and detecting those that were more likely to display CoP-like traits.

Specifically, I was searching for groups with a high volume of posting, low poster-to-post ratio, low thread-to-post ratio and low percentage of cross-posting. All of these were indicators that a newsgroup was active, with a small core of participants sustaining most of the discussion, and with relatively strong topical focus. Furthermore, I concentrated on newsgroups focused on professional topics, as opposed to hobbies or fan groups.

These selection criteria allowed me to discard a large majority of newsgroups, and focus my search on 41 very good candidates: active groups focused on a professional topic. To further narrow the field, I looked for what I called institutional documents; such as a newsgroup FAQ, a home page, formal posting guidelines, or a moderation policy (for moderated newsgroups). By focusing on newsgroups that had developed high-quality institutional documents, I was able to narrow the field of candidate newsgroups to 19. These I examined further applying a more intensive method, known as Social Network Analysis, which I'll explain in the next post. 

Smith, M. A. (1999). Invisible crowds in cyberspace: mapping the social structure of the Usenet. In M. A. Smith, & P. Kollock (Eds.), Communities in Cyberspace (pp. 195-219). New York: Routledge.