14 Performances

The graphTweets has already seen great performance improvements and can now build large graphs.

The sigmajs library needs some explanation, namely on layout, and renderers.

14.1 Read

You might have noticed that we reduced the number of tweets we visualised in the last four chapters, fetching only 400 odd tweets. This is because dynamic graphs are quite draining for the website (more on that later in the chapter). There is however a remedy to it, or part of it anyway. Instead of using sg_add_nodes and sg_add_edges we can use the sg_read_* family of functions.

We’ll collect some tweets setting type as mixed to ensure we get tweets that spread over a couple of days.

# TK <- readRDS(file = "token.rds")
tweets <- search_tweets("#rstats", n = 3500, token = TK, include_rts = FALSE, type = "mixed")
## Searching for tweets...
## Finished collecting tweets!

With the tweets we collected we’ll build a dynamic network where nodes and edges appear on the day they are created.

graph <- tweets %>% 
    gt_edges(screen_name, mentions_screen_name, created_at) %>% 
    gt_nodes() %>% 
    gt_dyn() %>% 
    gt_collect()

c(edges, nodes) %<-% graph

we’ll start by changing the created_at from a datetime to a date then a numeric. Also, for a change, we’ll rescale with the scales package (???) to rescale the nodes and edges date of appearance, the scales package makes it much more convenient.

You can install scales from CRAN with install.packages(“scales”)

Note that we also scales package to color the nodes according to their size. One additional thing, remember how, if not specified, sg_nodes randomly assigns coordinates? Perhaps we ought to initialise x and y in a circle to make the visualisation visually more appealing; it’ll make the nodes appear from the outer edges of the graph.

nodes <- nodes %>% 
    sg_get_layout(edges, layout = igraph::layout_in_circle) %>% # get coordinates
    dplyr::arrange(start) %>% 
    dplyr::mutate(
        start = as.Date(start),
        start = as.integer(start),
        start = scales::rescale(start, to = c(1, 10000)),
        size = n,
        color = scales::col_numeric(c("#0075a0", "#c0deed"), domain = NULL,)(start)
    ) 

edges <- edges %>% 
    dplyr::arrange(created_at) %>% 
    dplyr::mutate(
        created_at = as.Date(created_at),
        created_at = as.integer(created_at),
        created_at = scales::rescale(created_at, to = c(1, 10000)),
        id = 1:dplyr::n()
    )

Now that nodes and edges are ready we can use our new functions, rest assured they work in very much the same way. Finally we’ll add a date tracker with sg_progress as we have done in previous chapters.

progress <- tweets %>% 
    dplyr::mutate(created_at = as.Date(created_at)) %>% 
    dplyr::distinct(created_at) %>% 
    dplyr::arrange(created_at) %>% 
    dplyr::pull(created_at) %>% 
    dplyr::tibble(
        date = .,
        delay = unique(nodes$start)
    ) %>% 
    dplyr::mutate(text = format(date, "%d %b"))

sigmajs() %>% 
    sg_force_start() %>% 
    sg_read_nodes(nodes, id = nodes, size, color, x, y, delay = start) %>% 
    sg_read_edges(edges, id, source, target, delay = created_at) %>% 
    sg_read_exec() %>% 
    sg_force_stop(11000) %>% 
    sg_progress(progress, delay, date, position = "bottom") %>% 
    sg_button(c("read_exec", "force_stop", "progress"), "Add nodes and edges") %>% 
  sg_settings(
    minNodeSize = 1,
    maxNodeSize = 4,
    edgeColor = "default",
    defaultEdgeColor = "#d3d3d3"
  )

14.2 Renderers

The sigmajs package actually comes with three renderers out-of-the-box; canvas, the default, svg and webgl. svg makes for slightly nicer looking graphs but at performance costs so only use it for the smaller graphs. In contrast, webgl is extremely performant.

Let’s plot huge graph using webgl to demonstrate, below we graph 25,000 nodes.

data <- sg_make_nodes_edges(25000) # make 25,000 nodes

sigmajs("webgl") %>% # set to webgl
  sg_nodes(data$nodes, id, size) %>% 
  sg_edges(data$edges, id, source ,target) %>% 
  sg_layout() %>% 
  sg_settings(
    nodeColor = "default",
    defaultNodeColor = "#328983",
    edgeColor = "default",
    defaultEdgeColor = "#b9b9b9"
  )

14.3 Layout

As you might have observed in the previous chapter on ephemeral graphs our last graphs is not layed out quite as nicely as we would like to.

How forceAtlas2 works in sigmajs

When you pass sg_force or sg_force_start to your graph, on load (or on the button click, if linked), forceAtlas2 kicks in and starts to layout the graph. We saw that since this is draining the browser we can then pass sg_force_stop specifying a delay in milliseconds when we want the layout to stop.

What happens when we actually run the above but adding or dropping nodes or edges? In actual fact the layout does not run constantly, as it would not take into account added or dropped nodes or edges and would continue running as if new edges or nodes were not added. Therefore a worker is created and the forceAtlas2 layout is killed then started after every node and or edge is added or dropped. This can be quite difficult for the browser to handle.

Let’s make a dummy example below to show how it works smoothly for the smaller graphs.

nodes <- sg_make_nodes(colors = c("#0084b4", "#00aced", "#1dcaff", "#c0deed"))
edges <- sg_make_edges(nodes)

# add random delay between .5 and 2 seconds
edges$delay <- runif(nrow(edges), 500, 1000) 

sigmajs(height = 300) %>%
  sg_force_start() %>%
  sg_nodes(nodes, id, size, color) %>%
  sg_add_edges(edges, delay, id, source, target, refresh = TRUE) %>% # read delay documentation
  sg_button(c("add_edges"), "Add edges") %>% 
  sg_settings(
    mouseEnabled = FALSE,
      touchEnabled = FALSE,
    minNodeSize = 1,
    maxNodeSize = 4
  )

The above is much less draining on the browser and runs smoothly.