Chapter 13 Ephemeral

At this stage we’ve pretty much fully covered temporal graphs; nodes and edges appear over time on the graph. This already comes closer to reflecting reality if we assume that tweets are everlasting. In practice tweets have a life span; it is unlikely that old tweets from 2015 will be seen today. Therefore, on our graph, nodes and edges should appear then disappear after some time.

13.1 Collect

Let’s collect some tweets, just as we did previously.

## Searching for tweets...
## Finished collecting tweets!

13.2 Build

Just as we did in the temporal chapter, we’ll pass created_at so that we know when tweets are created (when edges and nodes should appear), with one difference: we specify lifetime in our gt_dyn function.

We’ll also ease the work on our browser, constantly having to add and remove nodes and edges can be draining; we’ll round the time to the nearest hour.

The lifetime argument takes milliseconds, above, we set it 60 * 60 * 6 which is equal to 6 hours. As you might expect, we will rescale the timeframe as we did before but here we set the lifetime of a tweet before doing so.

So logically, if we take the difference between the appearance and the disappearance of an edge we should obtain 6 hours.

## # A tibble: 6 x 6
##   source  target  created_at              n end                 difference
##   <chr>   <chr>   <dttm>              <int> <dttm>              <time>    
## 1 _colin~ holoma~ 2018-08-16 07:00:00     1 2018-08-16 13:00:00 6         
## 2 _colin~ tudosg~ 2018-08-15 18:00:00     1 2018-08-16 00:00:00 6         
## 3 _hanna~ r4dsco~ 2018-08-15 20:00:00     1 2018-08-16 02:00:00 6         
## 4 _mirkw~ thomas~ 2018-08-15 21:00:00     1 2018-08-16 03:00:00 6         
## 5 _stuar~ jessic~ 2018-08-14 19:00:00     1 2018-08-15 01:00:00 6         
## 6 _stuar~ thomas~ 2018-08-14 19:00:00     1 2018-08-15 01:00:00 6
source target created_at n end difference
_colinfay holomarked 2018-08-16 07:00:00 1 2018-08-16 13:00:00 6 hours
_colinfay tudosgar 2018-08-15 18:00:00 1 2018-08-16 00:00:00 6 hours
_hannahjbw r4dscommunity 2018-08-15 20:00:00 1 2018-08-16 02:00:00 6 hours
mirkwood thomasp85 2018-08-15 21:00:00 1 2018-08-16 03:00:00 6 hours
_stuartlee jessicahullman 2018-08-14 19:00:00 1 2018-08-15 01:00:00 6 hours
_stuartlee thomasp85 2018-08-14 19:00:00 1 2018-08-15 01:00:00 6 hours

But how would this apply to nodes? Let’s plot the distribution of the lifespans of nodes (in milliseconds): the difference between their appearance and disappearance.

We see that, unlike edges nodes are not all present on the graph for the same amount of time (6 hours for edges). There is a simple reason for it. If a user has tweeted at two (or more) different times in our dataset it will be present the from its first tweet to its second tweet (+ 6 hours).

13.3 Visualise

To tackle the visualisation let’s bring back our rescaling functio though we will tweak it this time around because we need more precision as we add and drop nodes and edges which sit in different data frames. If we were to rescaling using the local miinimum and maximum the two tables no longer be in sync. Therefore we use the same minimum and maximum to rescale both the nodes and edges.

As a reminder this is so that the nodes do not take 1.74 to come and go but rather 60 seconds (60,000 milliseconds) as specified by the t argument.

Then onto preparing the data. We do something somewhat similar to what we did previously expect we also rescale end. There was no need to do that before as there was no need to do that since nodes and edges were only appearing on the graph and not disappearing.

Here is the logic we apply to rescaling both nodes and edges:

  1. Convert the date time (POSIXct) to numeric.
  2. We can then compute the minimum (see point 3 for explanation).
  3. Since the converted numeric return the number of milliseconds since the January 1st, 1970, we can remove the minimum number of milliseconds from the start time, i.e.: start - min.
  4. Then we can compute our minimum and maximum milliseconds for the rescaling function.
  5. Finally we use our rescale function passing the minimum and maximum computed at point 4.

You will notice that we remove 100 milliseconds from the nodes appearance and add 400 to the node disappearance. This is to ensure that the node is present at the time any edge connected to is created, similarly, we want to make sure nodes still exist when we drop edges, so we add 400 ms to the end time.

Finally onto the visualisation, we again use sg_drop_nodes and sg_drop_edges, but this time, as we want them to also disappear we also use sg_drop_nodes and sg_drop_edges. With regard to the latter functions, since we only need to remove them from the graph we just need to specify their respective ids.

Then again, we can specify the x and y coordinates as well as the color of the nodes whcih we’ll base on clusters in order to make the graph look better.

## Found # 96 clusters

13.4 Dynamic layout

An issue you may observe has to do with the layout. The layout is calculated based on the full graph, but we never have the full graph on screen only a subgraph at every time step. A better way to layout the graph is to use a dynamic layout that adjusts to the visualisation as nodes and edges appear and disappear.

The forceAtlas2 layout algorithm does just that. However we cannot just launch the forceAtlas2 network as we would on a static graphs. We have to update it at regular intervals.

We will cover why we need to do so in the next chapter.