Chapter 14 Performances
The graphTweets has already seen great performance improvements and can now build large graphs.
The sigmajs library needs some explanation, namely on layout, and renderers.
You might have noticed that we reduced the number of tweets we visualised in the last four chapters, fetching only 400 odd tweets. This is because dynamic graphs are quite draining for the website (more on that later in the chapter). There is however a remedy to it, or part of it anyway. Instead of using
sg_add_edges we can use the
sg_read_* family of functions.
We’ll collect some tweets setting
mixed to ensure we get tweets that spread over a couple of days.
# TK <- readRDS(file = "token.rds") tweets <- search_tweets("#rstats", n = 3500, token = TK, include_rts = FALSE, type = "mixed")
## Searching for tweets...
## Finished collecting tweets!
With the tweets we collected we’ll build a dynamic network where nodes and edges appear on the day they are created.
graph <- tweets %>% gt_edges(screen_name, mentions_screen_name, created_at) %>% gt_nodes() %>% gt_dyn() %>% gt_collect() c(edges, nodes) %<-% graph
we’ll start by changing the
created_at from a datetime to a date then a numeric. Also, for a change, we’ll rescale with the scales package (???) to rescale the nodes and edges date of appearance, the scales package makes it much more convenient.
You can install
scales from CRAN with
Note that we also scales package to color the nodes according to their size. One additional thing, remember how, if not specified,
sg_nodes randomly assigns coordinates? Perhaps we ought to initialise
y in a circle to make the visualisation visually more appealing; it’ll make the nodes appear from the outer edges of the graph.
nodes <- nodes %>% sg_get_layout(edges, layout = igraph::layout_in_circle) %>% # get coordinates dplyr::arrange(start) %>% dplyr::mutate( start = as.Date(start), start = as.integer(start), start = scales::rescale(start, to = c(1, 10000)), size = n, color = scales::col_numeric(c("#0075a0", "#c0deed"), domain = NULL,)(start) ) edges <- edges %>% dplyr::arrange(created_at) %>% dplyr::mutate( created_at = as.Date(created_at), created_at = as.integer(created_at), created_at = scales::rescale(created_at, to = c(1, 10000)), id = 1:dplyr::n() )
Now that nodes and edges are ready we can use our new functions, rest assured they work in very much the same way. Finally we’ll add a date tracker with
sg_progress as we have done in previous chapters.
progress <- tweets %>% dplyr::mutate(created_at = as.Date(created_at)) %>% dplyr::distinct(created_at) %>% dplyr::arrange(created_at) %>% dplyr::pull(created_at) %>% dplyr::tibble( date = ., delay = unique(nodes$start) ) %>% dplyr::mutate(text = format(date, "%d %b")) sigmajs() %>% sg_force_start() %>% sg_read_nodes(nodes, id = nodes, size, color, x, y, delay = start) %>% sg_read_edges(edges, id, source, target, delay = created_at) %>% sg_read_exec() %>% sg_force_stop(11000) %>% sg_progress(progress, delay, date, position = "bottom") %>% sg_button(c("read_exec", "force_stop", "progress"), "Add nodes and edges") %>% sg_settings( minNodeSize = 1, maxNodeSize = 4, edgeColor = "default", defaultEdgeColor = "#d3d3d3" )
The sigmajs package actually comes with three renderers out-of-the-box;
canvas, the default,
svg makes for slightly nicer looking graphs but at performance costs so only use it for the smaller graphs. In contrast,
webgl is extremely performant.
Let’s plot huge graph using
webgl to demonstrate, below we graph 25,000 nodes.
data <- sg_make_nodes_edges(25000) # make 25,000 nodes sigmajs("webgl") %>% # set to webgl sg_nodes(data$nodes, id, size) %>% sg_edges(data$edges, id, source ,target) %>% sg_layout() %>% sg_settings( nodeColor = "default", defaultNodeColor = "#328983", edgeColor = "default", defaultEdgeColor = "#b9b9b9" )