An Example of Social Network Analysis with R using Package igraph

This post presents an example of social network analysis with R using package igraph.

The data to analyze is Twitter text data of @RDataMining used in the example of Text Mining, and it can be downloaded as file “termDocMatrix.rdata” at the Data webpage. Putting it in a general scenario of social networks, the terms can be taken as people and the tweets as groups on LinkedIn, and the term-document matrix can then be taken as the group membership of people. We will build a network of terms based on their co-occurrence in the same tweets, which is similar with a network of people based on their group memberships.

At first, a term-document matrix, termDocMatrix, is loaded into R. After that, it is transformed into a term-term adjacency matrix, based on which a graph is built. Then we plot the graph to show the relationship between frequent terms, and also make the graph more readable by setting colors, font sizes and transparency of vertices and edges.

Load Data

> # load termDocMatrix
> load(“data/termDocMatrix.rdata”)
> # inspect part of the matrix
> termDocMatrix[5:10,1:20]

             Docs
Terms        1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
data         1 1 0 0 2 0 0 0 0  0  1  2  1  1  1  0  1  0  0  0
examples     0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
introduction 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  1
mining       0 0 0 0 0 0 0 0 0  0  0  1  1  0  1  0  0  0  0  0
network      0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  1  0  1  1  1
package      0 0 0 1 1 0 0 0 0  0  0  1  0  0  0  0  0  0  0  0

Transform Data into an Adjacency Matrix

> # change it to a Boolean matrix
> termDocMatrix[termDocMatrix>=1] <- 1
> # transform into a term-term adjacency matrix
> termMatrix <- termDocMatrix %*% t(termDocMatrix)
> # inspect terms numbered 5 to 10
> termMatrix[5:10,5:10]

             Terms
Terms        data examples introduction mining network package
data         53          5            2     34       0       7
examples      5         17            2      5       2       2
introduction  2          2           10      2       2       0
mining       34          5            2     47       1       5
network       0          2            2      1      17       1
package       7          2            0      5       1      21

Build a Graph

Now we have built a term-term adjacency matrix, where the rows and columns represents terms, and every entry is the number of co-occurrences of two terms. Next we can build a graph with graph.adjacency() from package igraph.

> library(igraph)
> # build a graph from the above matrix
> g <- graph.adjacency(termMatrix, weighted=T, mode = “undirected”)
> # remove loops
> g <- simplify(g)
> # set labels and degrees of vertices
> V(g)$label <- V(g)$name
> V(g)$degree <- degree(g)

Plot a Graph

> # set seed to make the layout reproducible
> set.seed(3952)
> layout1 <- layout.fruchterman.reingold(g)
> plot(g, layout=layout1)

A different layout can be generated with the first line of code below. The second line produces an interactive plot, which allows us to manually rearrange the layout. Details about other layout options can be obtained by running ?igraph::layout in R.

> plot(g, layout=layout.kamada.kawai)
> tkplot(g, layout=layout.kamada.kawai)

Make it Look Better

Next, we will set the label size of vertices based on their degrees, to make important terms stand out. Similarly, we also set the width and transparency of edges based on their weights. This is useful in applications where graphs are crowded with many vertices and edges. In the code below, the vertices and edges are accessed with V() and E(). Function rgb(red, green, blue, alpha) defines a color, with an alpha transparency. We plot the graph in the same layout as the above figure.

> V(g)$label.cex <- 2.2 * V(g)$degree / max(V(g)$degree)+ .2
> V(g)$label.color <- rgb(0, 0, .2, .8)
> V(g)$frame.color <- NA
> egam <- (log(E(g)$weight)+.4) / max(log(E(g)$weight)+.4)
> E(g)$color <- rgb(.5, .5, 0, egam)
> E(g)$width <- egam
> # plot the graph in layout1
> plot(g, layout=layout1)

More Examples

More examples on social network analysis with R and other data mining techniques can be found in my book “R and Data Mining: Examples and Case Studies“, which is downloadable as a .PDF file at the link.

About Yanchang Zhao

I am a data scientist, using R for data mining applications. My work on R and data mining: RDataMining.com; Twitter; Group on Linkedin; and Group on Google.

View all posts by Yanchang Zhao →

20 Responses to An Example of Social Network Analysis with R using Package igraph

Joel Cadwell says:

May 18, 2012 at 9:18 am

Well done. Very straightforward example that is easy to follow. Not just for text mining, but also showing how to use igraph.

I use the gRbase package to convert Rgraphviz objects from the pcalg package into igraphs:
ig<-as(rgraphviz@graph, "igraph").

Your reference to tkplot helped me clean up the graphs and save for insertion into Excel or Powerpoint. Thanks.

Pingback: An Example of Social Network Analysis with R using Package igraph | Prueba de topic | Scoop.it
Pingback: An Example of Social Network Analysis with R using Package igraph « Another Word For It
Pingback: igraph and structured text exploration | TRinker's R Blog
Pingback: igraph and structured text exploration « Another Word For It
Asif says:

May 22, 2013 at 1:28 pm

Hi ,
I want to create a network data file for further analysis using igraph.
I want to create a network of ingredients.
The current data format that I have looks like this :
Column1 Column2
Food1 ingredient1, ingredient2 , ingredient3
Food2 ingredient2, ingredeint3
Food3 ingredeint3, ingredeint4

From this file I want to create an adjacency matrix of ingredients . My final goal is to find out
which ingredients go together.
Could you please give me suggestion how to create adjacency matrix from the above datafile.
Thanks

- Yanchang Zhao says:
  
  May 22, 2013 at 7:04 pm
  
  First, create a data frame with 2 columns: food and ingredient, You might use stack() or unstack() for that. In you case, the data frame will look like:
  Column1 Column2
  Food1 ingredient1
  Food1 ingredient2
  Food1 ingredient3
  Food2 ingredient2
  Food2 ingredient3
  …
  
  Second, convert it to graph with graph.data.frame(YourDataFrame)
  
  For future questions, please post them to my RDataMining group on LinkedIn at http://group.rdatamining.com/
  
sarah says:

July 22, 2013 at 6:41 am

Hi,

I am trying to do network analysis for my data-set but facing few difficulties. Would be very grateful if could get some help.

My dataset has 5 columns and 110 rows and I am trying to build link which rows are more likely to be connected to the columns. I want to use Jaccard similarity coefficient as the weight for my network. I am doing this in igraph library so far but would be happy to use any other library package.

I am not sure how to import my dataset into R for network analysis purposes and I also can’t seem to add weights to my network and would be very grateful if someone could guide me as to how can I do it in R.

Thanks
Sarah

- Yanchang Zhao says:
  
  July 23, 2013 at 9:50 pm
  
  You can set values to E(g)$weight. Alternatively, you can use graph.adjacency() by setting parameter “weighted”. See igraph documentation at http://cran.r-project.org/web/packages/igraph/igraph.pdf for details.
  
  For future questions, please post them to my RDataMining group on LinkedIn at http://group.rdatamining.com/.
  
Pingback: Venture Capital – Startup Network « nTreees
Dan says:

September 17, 2013 at 11:20 am

Hi,

This is really fantastic. By may way of thinking (and please correct me if I’m wrong), the diagonal on the adjacency matrix should dictate the value for the vertex, while values on the upper or lower side of the diagonal (depending on whether mode is set to ‘upper’ or ‘lower’) will dictate the values for the edges?

If this is the case, then the values for the vertex from the following code:

> V(g)$label.cex V(g)$label.color V(g)$frame.color egam E(g)$color E(g)$width # plot the graph in layout1
> plot(g, layout=layout1)

Does not correspond with the values presented in the adjacency matrix. Eg. The diagonal value for the value ‘r’ is 70 (and the largest in the adjacency table) however ‘mining’ ends up with the larger label, despite the diagonal value being 47.

I’m inclined to think I am misinterpreting something, but could you please confirm this for me?

Dan says:

September 17, 2013 at 11:50 am

Sorry, I see where I have gone wrong – ‘degree’ refers to the number of edges from each vertex, as opposed to the size of the vertex. Apologies.

Pingback: An Example of Social Network Analysis with R us...
S Ram says:

December 11, 2014 at 7:23 pm

Thanks for this example. I have one question on the specific purpose of this type of analysis. Based on the above, I have created a network for my case and now what does this convey and what is the next step. Could you please guide?

Yanchang Zhao says:

December 15, 2014 at 9:40 pm

You might do community detection by find groups of closely connected vertices. You can also find brokers, which are important vertices that connect two parts of networks. You may also simply filter vertices by degrees and/or links by weights to find interesting clusters.

Please feel free to post your questions to my RDataMining group on LinkedIn at http://group.rdatamining.com/, where you will get answers from many group members.

Pingback: Homepage
libin says:

December 29, 2016 at 12:25 pm

Hi!
Assume that I have a Chinese text and I have done word segmentation, I just want to know how can I get the ‘termDocMatrix’?

Pingback: #Text mining dos 4 evangelhos | #syntaxxi
Rodrigo Badilla says:

April 10, 2018 at 8:11 am

Dear Sir,
In this example you load a tdm,
How do you save previosuly a termDocMatrix(tdm), what format and command?

- Yanchang Zhao says:
  
  April 10, 2018 at 8:57 am
  
  The data to analyze is Twitter text data of @RDataMining used in the example of Text Mining, and it can be downloaded as file “termDocMatrix.rdata” at the Data webpage http://www.rdatamining.com/data

	Vince Schulz on Coronavirus data analysis with…
	Stofskiftesygdom (@S… on Coronavirus data analysis with…
	Yanchang Zhao on Coronavirus data analysis with…
	Yanchang Zhao on Coronavirus data analysis with…
	glensbo on Coronavirus data analysis with…

20 Responses to An Example of Social Network Analysis with R using Package igraph

Leave a comment Cancel reply

Recent Posts

Recent Comments

Archives

Categories

Meta