14 Performances

The graphTweets has already seen great performance improvements and can now build large graphs.

The sigmajs library needs some explanation, namely on layout, and renderers.

14.1 Read

You might have noticed that we reduced the number of tweets we visualised in the last four chapters, fetching only 400 odd tweets. This is because dynamic graphs are quite draining for the website (more on that later in the chapter). There is however a remedy to it, or part of it anyway. Instead of using sg_add_nodes and sg_add_edges we can use the sg_read_* family of functions.

We’ll collect some tweets setting type as mixed to ensure we get tweets that spread over a couple of days.

# TK <- readRDS(file = "token.rds")
tweets <- search_tweets("#rstats", n = 3500, token = TK, include_rts = FALSE, type = "mixed")
## Searching for tweets...
## Finished collecting tweets!

With the tweets we collected we’ll build a dynamic network where nodes and edges appear on the day they are created.

graph <- tweets %>% 
    gt_edges(screen_name, mentions_screen_name, created_at) %>% 
    gt_nodes() %>% 
    gt_dyn() %>% 

c(edges, nodes) %<-% graph

we’ll start by changing the created_at from a datetime to a date then a numeric. Also, for a change, we’ll rescale with the scales package (???) to rescale the nodes and edges date of appearance, the scales package makes it much more convenient.

You can install scales from CRAN with install.packages(“scales”)

Note that we also scales package to color the nodes according to their size. One additional thing, remember how, if not specified, sg_nodes randomly assigns coordinates? Perhaps we ought to initialise x and y in a circle to make the visualisation visually more appealing; it’ll make the nodes appear from the outer edges of the graph.

nodes <- nodes %>% 
    sg_get_layout(edges, layout = igraph::layout_in_circle) %>% # get coordinates
    dplyr::arrange(start) %>% 
        start = as.Date(start),
        start = as.integer(start),
        start = scales::rescale(start, to = c(1, 10000)),
        size = n,
        color = scales::col_numeric(c("#0075a0", "#c0deed"), domain = NULL,)(start)

edges <- edges %>% 
    dplyr::arrange(created_at) %>% 
        created_at = as.Date(created_at),
        created_at = as.integer(created_at),
        created_at = scales::rescale(created_at, to = c(1, 10000)),
        id = 1:dplyr::n()

Now that nodes and edges are ready we can use our new functions, rest assured they work in very much the same way. Finally we’ll add a date tracker with sg_progress as we have done in previous chapters.

progress <- tweets %>% 
    dplyr::mutate(created_at = as.Date(created_at)) %>% 
    dplyr::distinct(created_at) %>% 
    dplyr::arrange(created_at) %>% 
    dplyr::pull(created_at) %>% 
        date = .,
        delay = unique(nodes$start)
    ) %>% 
    dplyr::mutate(text = format(date, "%d %b"))

sigmajs() %>% 
    sg_force_start() %>% 
    sg_read_nodes(nodes, id = nodes, size, color, x, y, delay = start) %>% 
    sg_read_edges(edges, id, source, target, delay = created_at) %>% 
    sg_read_exec() %>% 
    sg_force_stop(11000) %>% 
    sg_progress(progress, delay, date, position = "bottom") %>% 
    sg_button(c("read_exec", "force_stop", "progress"), "Add nodes and edges") %>% 
    minNodeSize = 1,
    maxNodeSize = 4,
    edgeColor = "default",
    defaultEdgeColor = "#d3d3d3"

14.2 Renderers

The sigmajs package actually comes with three renderers out-of-the-box; canvas, the default, svg and webgl. svg makes for slightly nicer looking graphs but at performance costs so only use it for the smaller graphs. In contrast, webgl is extremely performant.

Let’s plot huge graph using webgl to demonstrate, below we graph 25,000 nodes.

data <- sg_make_nodes_edges(25000) # make 25,000 nodes

sigmajs("webgl") %>% # set to webgl
  sg_nodes(data$nodes, id, size) %>% 
  sg_edges(data$edges, id, source ,target) %>% 
  sg_layout() %>% 
    nodeColor = "default",
    defaultNodeColor = "#328983",
    edgeColor = "default",
    defaultEdgeColor = "#b9b9b9"