Practical 3

James Hollway

Packages for plotting

There are a host of packages for plotting in R, and for plotting networks in R. Plotting in R is typically based around two main approaches: the ‘base’ approach in R by default, and the ‘grid’ approach made popular by the famous and very flexible {ggplot2} package.1 Approaches to plotting graphs or networks in R can be similarly divided. The two classic packages are {igraph} and {sna}, both building upon the base R graphics engine, but newer packages {ggnetwork} and {ggraph} build upon a grid approach.2 {migraph} builds upon the ggplot2/ggraph engine for plotting.

library(migraph)
brandes
#> # A tbl_graph: 11 nodes and 24 edges
#> #
#> # A directed simple graph with 1 component
#> #
#> # Node Data: 11 × 0 (active)
#> # … with 5 more rows
#> #
#> # Edge Data: 24 × 2
#>    from    to
#>   <int> <int>
#> 1     1     3
#> 2     2     3
#> 3     3     1
#> # … with 21 more rows
autographr(brandes)

For this exercise, we’ll use the brandes dataset. This dataset is in a ‘tidygraph’ format, but migraph makes it easy to coerce this into other forms to be compatible with other packages.

as_igraph(brandes)
#> IGRAPH 0d8eae5 D--- 11 24 -- 
#> + edges from 0d8eae5:
#>  [1]  1-> 3  2-> 3  3-> 1  3-> 2  3-> 4  4-> 3  4-> 5  4-> 6  5-> 4  5-> 7
#> [11]  6-> 4  6-> 7  6-> 8  7-> 5  7-> 6  7-> 9  8-> 6  8-> 9  9-> 7  9-> 8
#> [21]  9->10  9->11 10-> 9 11-> 9
as_network(brandes)
#>  Network attributes:
#>   vertices = 11 
#>   directed = FALSE 
#>   hyper = FALSE 
#>   loops = FALSE 
#>   multiple = FALSE 
#>   bipartite = FALSE 
#>   total edges= 12 
#>     missing edges= 0 
#>     non-missing edges= 12 
#> 
#>  Vertex attribute names: 
#>     vertex.names 
#> 
#> No edge attributes
mat <- as_matrix(brandes)

Calculating different centrality measures

Let’s start with calculating degree, as it is easy to calculate yourself. Just sum the rows or columns of the matrix!

(degrees <- rowSums(mat))
#>  [1] 1 1 3 3 2 3 3 2 4 1 1
rowSums(mat) == colSums(mat)
#>  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
# Are they all equal? Why?
# You can also just use a built in command in migraph though:
node_degree(brandes)
#>  [1] 1 1 3 3 2 3 3 2 4 1 1

Often we are interested in the distribution of (degree) centrality in a network. {migraph} offers a way to get a pretty good first look at this distribution, though there are more elaborate ways to do this in base and grid graphics.

ggdistrib(brandes, node_degree)

Other measures of centrality can be a little trickier to calculate by hand. Fortunately, we can use functions from migraph to help:

node_betweenness(brandes)
#>  [1]  0.00000  0.00000 34.00000 43.66667 12.00000 30.33333 27.33333 11.00000
#>  [9] 35.66667  0.00000  0.00000
node_closeness(brandes)
#>  [1] 0.02777778 0.02777778 0.03703704 0.04545455 0.04347826 0.04761905
#>  [7] 0.04545455 0.04166667 0.04000000 0.02941176 0.02941176
node_eigenvector(brandes)
#>  [1] 0.07499458 0.07499458 0.19563116 0.36033505 0.30930935 0.43503091
#>  [7] 0.44653052 0.32795777 0.42048101 0.16119006 0.16119006
# TASK: Can you create degree distributions for each of these?

Note that there are an enormous number of centrality measures available in R across a range of packages. In addition to the main four used here, {igraph} includes:

graph.strength()
alpha_centrality()
power_centrality()
page_rank()
eccentricity()
hub_score()
authority_score()
subgraph_centrality()

{sna} implements most of these too, plus a few extra:

flowbet()
loadcent()
gilschmidt()
infocent()
stresscent()

There are also some dedicated centrality packages, such as centiserve, CINNA, influenceR, and keyplayer. There are also pretty exhaustive discussions of this online.

Plotting different centrality measures

There is also a function in {migraph} that plots our network and highlights the node with the maximum (e.g. degree) score. This is basically doing the same as what was in the original script, just gives you the function to do this:

ggidentify(brandes, node_degree)
#> Using `stress` as default layout

ggidentify(brandes, node_betweenness)
#> Using `stress` as default layout

ggidentify(brandes, node_closeness)
#> Using `stress` as default layout

ggidentify(brandes, node_eigenvector)
#> Using `stress` as default layout

How neat!

Calculating centralization

{migraph} also implements centralization functions. Here we are no longer interested in the level of the node, but in the level of the whole graph, so the syntax is:

graph_degree(brandes)
#> [1] 0.2
graph_betweenness(brandes)
#> [1] 0.32
graph_closeness(brandes)
#> [1] 0.23
graph_eigenvector(brandes) # note that graph_eigenvector() is not yet implemented for two-mode networks
#> [1] 0.48
graph_eigenvector(brandes, digits = 4)
#> [1] 0.4838
graph_eigenvector(brandes, digits = FALSE)
#> [1] 0.483777

By default, these scores are rounded to 2 decimal places, but you can alter or turn this off.

Exporting plots to PDF

We can print the plots we have made to PDF by point-and-click by selecting ‘Save as PDF…’ from under the ‘Export’ dropdown menu in the plots panel tab of RStudio.

If you want to do this programmatically, say because you want to record how you have saved it so that you can e.g. make some changes to the parameters at some point, this is also not too difficult.

After running the (gg-based) plot you want to save, use the command ggsave("my_filename.pdf") to save your plot as a PDF to your working directory. If you want to save it somewhere else, you will need to specify the file path (or change the working directory, but that might be more cumbersome). If you want to save it as a different filetype, replace .pdf with e.g. .png or .jpeg. See ?ggsave for more.

But what if we want to have a single image/figure with multiple plots? This can be a little tricky with gg-based plots, but fortunately the ‘gridExtra’ package is here to help.

gd <- ggidentify(brandes, node_degree) + 
  ggtitle("Degree", subtitle = graph_degree(brandes))
#> Using `stress` as default layout
gc <- ggidentify(brandes, node_closeness) + 
  ggtitle("Closeness", subtitle = round(graph_closeness(brandes), 2))
#> Using `stress` as default layout
gb <- ggidentify(brandes, node_betweenness) + 
  ggtitle("Betweenness", subtitle = round(graph_betweenness(brandes), 2))
#> Using `stress` as default layout
ge <- ggidentify(brandes, node_eigenvector) + 
  ggtitle("Eigenvector")
#> Using `stress` as default layout
grid.arrange(gd, gb, gc, ge, ncol = 2)

# ggsave("brandes-centralities.pdf")

Tasks

  1. Import the drugnet data from Moodle (Since it is an RData file, you can just load it). load("drugnet.RData")

  2. Name a plausible research question you could ask of this data for each of the four main centrality measures (degree, betweenness, closeness, eigenvector) You may want to add these as titles or subtitles to each plot.

  3. How centralized is the network?


  1. ‘gg’ stands for the Grammar of Graphics.↩︎

  2. Others include: ‘Networkly’ for creating 2-D and 3-D interactive networks that can be rendered with plotly and can be easily integrated into shiny apps or markdown documents; ‘visNetwork’ interacts with javascript (vis.js) to make interactive networks (http://datastorm-open.github.io/visNetwork/); and ‘networkD3’ interacts with javascript (D3) to make interactive networks (https://www.r-bloggers.com/2016/10/network-visualization-part-6-d3-and-r-networkd3/).↩︎