воскресенье, 3 марта 2019 г.

Offline visualization of geolocation data from Statcounter logs with R

Statcounter is a nice web traffic analysis tool. It collects ISP and geolocation data of visitors of a tracked site. The data is logged on the Statcounter site and can be downloaded by the tracked site’s owner in XLSX or CSV format. In this article I want to show how I managed to visualize geolocation data from the CSV log using R. First of all, I have to admit that I am a newbie in R. I’ve been using Statcounter for years, but have only one month experience in R. All my older scripts were written in awk, perl and bash: they could download and merge Statcounter logs and do some basic data visualization such as plotting bar charts with Gnuplot. I will refer to some of them below in this article, while you can find them in my project statcounter-utils. I discovered R when I decided to put all visits of my blog on a world map. I checked out a plethora of complete GIS solutions, but they all seemed to me unnecessarily heavy-weight and rigid. Then I read somewhere on Stackoverflow about Leaflet and R. This was excellent finding, because this promised programming, and I love to program! Below is the annotated solution of the world map visits with examples (in the statcounter-utils, the R code is located in a file named cities.r).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
library(leaflet)
library(htmltools)
library(plyr)
library(dplyr)
library(tidyr)

cities <- function(gcities, geocode, len = as.integer(.Machine$integer.max),
                   FUN = function(x) TRUE) {
    if (!is.data.frame(gcities)) {
        gcities <- read.csv(gcities, header = TRUE, sep = ";", as.is = TRUE)
    }
    geocode <- read.csv(geocode, header = TRUE, sep = ";", as.is = TRUE,
                        na.strings = "null")

    d <- merge(gcities, geocode, 1:3)
    d <- d[order(-d$Count), ]
    d <- d[!is.na(d$Longitude) & FUN(d), ]

    m <- leaflet() %>% addTiles()

    dh <- head(d, len)

    nrow <- nrow(dh)
    if (nrow == 0) {
        print("No cities to render", quote = FALSE)
        return(m)
    }

    color <- c("#FF3300", "#FF9900", "#0033FF", "#666666")
    #           Country    Region     City       Unknown location

    dh$nc <- case_when(
                 nzchar(dh$City) ~ paste0(htmlEscape(dh$City), color[3]),
                 nzchar(dh$Region) ~ paste0(htmlEscape(dh$Region), color[2]),
                 nzchar(dh$Country) ~ paste0(htmlEscape(dh$Country), color[1]),
                 TRUE ~ paste0(htmlEscape("<UNKNOWN LOCATION>"), color[4]))

    dh <- separate(dh, "nc", c("Name", "Color"), -7)

    m <- addCircleMarkers(m, lng = dh$Longitude, lat = dh$Lattitude,
                          color = dh$Color, radius = 5 * log(dh$Count, 10),
                          popup = paste(dh$Name, ",", dh$Count),
                          label = dh$Name)

    m <- addLegend(m, "bottomright",
                   colors = c(circle_marker_to_legend_color(color[3]),
                              circle_marker_to_legend_color(color[2]),
                              circle_marker_to_legend_color(color[1])),
                   labels = c("City", "Region", "Country"),
                   opacity = 0.5)

    print(paste(nrow, "cities rendered"), quote = FALSE)

    return(m)
}


cities_df <- function(statcounter_log_csv, cities_spells_filter_awk = NULL,
                      warn_suspicious = TRUE, type = "page view") {
    df <- read.csv(`if`(is.null(cities_spells_filter_awk),
                        statcounter_log_csv,
                        pipe(paste("awk -f", cities_spells_filter_awk,
                                   `if`(warn_suspicious,
                                        "-v warn_suspicious=yes", NULL),
                                   statcounter_log_csv))),
                   header = TRUE, sep = ",", quote = "\"", as.is = TRUE)

    if (!is.null(type)) {
        df <- df[df$Type == type, ]
    }

    return(df)
}

gcities <- function(cs) {
    d <- plyr::count(cs, c("Country", "Region", "City"))
    names(d)[4] <- "Count"

    return(d[order(-d$Count), ])
}

circle_marker_to_legend_color <- function(color,
                                          marker_opacity = 0.3,
                                          stroke_opacity = 0.7,
                                          stroke_width = "medium") {
    c <- col2rgb(color)
    cv <- paste("rgba(", c[1], ", ", c[2], ", ", c[3], ", ", sep = "")

    return(paste(cv, marker_opacity, "); border-radius: 50%; border: ",
                 stroke_width, " solid ", cv, stroke_opacity,
                 "); box-sizing: border-box", sep = ""))
}
In lines 1–5 all required libraries are loaded: leaflet for the map, htmltools for function htmlEscape, plyr for function count, dplyr for function case_when, and tidyr for function separate. Function cities (lines 7–55) renders locations (they include not only cities, but also regions and countries, as it will be hinted on the legend of the map) on an interactive world map that shall open in a browser window. This function accepts a list of cities gcities tagged with a count. The gcities can be a file name or a data frame. Geolocation data with locations found in gcities is expected to be passed in parameter geocode: this can only be a CSV file. Other two parameters — len and FUN — define how many top-cities to put on the map and a custom subsetting function. Files for parameters gcities and geocode can be obtained with bash utility group_cities which can extract geolocation data from a Statcounter log and group cities by count. For geocoding, group_cities makes use of the Python Geocoder. Script group_cities can be found in the statcounter-utils.
group_cities -f cities_spells_fix.awk StatCounter-Log.csv > gcities.csv
group_cities -g -f cities_spells_fix.awk StatCounter-Log.csv > geocode.csv
A sample script cities_spells_fix.awk can also be found in the statcounter-utils. This is a manually crafted database of cities and regions synonyms, various transcriptions, misspellings, and apparent errors met in Statcounter logs: the script collapses all variants of a single location to a single value. Files gcities.csv and geocode.csv have schemes with headers Country;Region;City;Count and Country;Region;City;Longitude;Lattitude respectively. In lines 15–16 the data get merged by the first 3 fields (Country, Region, and City) and ordered by field Count from gcities. Then, in line 17, cities with wrong geocode data (more specifically, when field Longitude from geocode is null) get filtered out, and the custom subsetting function FUN is applied. Later, on line 21, top len cities from the survived after all the previous filters set are picked. Basic leaflet construction takes place in line 19. In lines 29–43 cities (as well as regions if there is no city in the record, and countries if there is neither a city nor a region in the record) are marked by circle markers and annotated with popups containing the name and the count. The size of a circle marker is a logarithmic function of the count, whereas its color depends on whether the location is a city or a region or a country. In lines 45–50 a legend with color circles is added to hint a user why circle markers have different colors. Putting circles on a legend is not a trivial task. Function circle_marker_to_legend_color in lines 82–92 accomplishes this using the fact that Leaflet legend’s parameter colors is hackable by supplying a specially crafted HTML code. Selecting cities from a Statcounter log and grouping them by count is a trivial task for R. In other words, there is no need to pass preliminary crafted file gcities.csv, but instead, it makes sense to create a data frame inside R. This makes also possible to apply yet more sophisticated subsettings to the original data because now we are getting access to all the fields in the log directly from R. But remember that we have to apply the cities spells database cities_spells_fix.awk. This seems to be the only complication for function cities_df defined in lines 58–73. This function reads a Statcounter log and returns the desired data frame with grouped cities. Its obscure parameters warn_suspicious and type correspond to whether the awk script should print on the stderr suspicious replacements, and what type of visits to select from the Statcounter log: the default value “page view” is what a user normally expects. A data frame returned from cities_df can be further subset by a custom function as it contains all original data fields. Function gcities (lines 75–80) collapses the data frame fields to the scheme with headers Country;Region;City;Count compatible with input of function cities, and orders the data by count. Let’s run a few examples in an R shell. For all of them, we have to load script cities.r and collect all page view visits from a Statcounter log StatCounter-Log.csv.
source("cities.r")
pv <- cities_df("StatCounter-Log.csv", "cities_spells_fix.awk")
pvC < gcities(pv)
Now let’s render all collected cities on a world map.
cities(pvC, "geocode.csv")
Here is how it looks in my browser.

Seems to be cluttered by myriads of circle markers (function cities printed [1] 1920 cities rendered). No problem! The map is interactive (however, not in this blog) and can be zoomed (look at the buttons at the top-left corner). The legend at the bottom-right corner shows why the markers have different colors. Let’s put on a map cities from the Moscow region.
pvMosk <- pv[grepl("^(Moskva|Moscow)", pv$Region), ]
pvMoskC <- gcities(pvMosk)
cities(pvMoskC, "geocode.csv")

In the next examples I won’t show maps any longer to not clutter the article. Render cities from Russian Federation only.
cities(pvC, "geocode.csv", FUN = function(x) grepl("Russia", x$Country))
Render top 10 cities all over the world with total visits from 10 to 20.
cities(pvC, "geocode.csv", 10, FUN = function(x) x$Count %in% 10:20)
Render all the cities with visits in year 2018.
pv2018 <- pv[grepl("^2018", pv$Date.and.Time), ]
cities(gcities(pv2018), "geocode.csv")
Function cities looks good to me, except it seems to make sense to create geocode data frame separately and pass it to the function like gcities data frame. Perhaps I will implement this in the future. Other improvements may also include using library leaflet.extras for searching of marked cities. (Update: both improvements were implemented in the statcounter-utils.) Now let’s turn to the bar charts of cities. I said that I used Gnuplot for that. But R is capable of making them as well! The following solution (which is the rest of cities.r) makes use of ggplot2 and plotly. As such, lines
library(ggplot2)
library(plotly)
must be put on the top of the script.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
gcities.compound <- function(cs) {
    d <- plyr::count(cs, c("Country", "Region", "City"))
    d$City <- paste(d$Country, "/", d$Region, "/", d$City)
    names(d)[4] <- "Count"

    return(d[order(-d$Count), c("City", "Count")])
}

gcountries <- function(cs) {
    d <- plyr::count(cs, c("Country"))
    names(d)[2] <- "Count"

    return(d[order(-d$Count), ])
}

cities.plot <- function(cs, title = NULL, tops = NULL, width = NULL) {
    w0 <- 1200
    wf <- if (is.null(width)) 1 else w0 / width
    mf <- wf * (max(cs$Count) / 10000)
    cw <- 21
    to <- (cw * nchar(cs$Count) + 300) * mf
    ym <- cs[1, ][["Count"]] + to[1] * 2
    nrow <- nrow(cs)

    p <- ggplot(cs, aes(reorder(cs[[1]], cs$Count), cs$Count)) +
        scale_x_discrete(limits = rev(cs[[1]])) +
        scale_y_continuous(expand = c(0, 50 * mf, 0, 300 * mf),
                           limits = c(0, NA)) +
        coord_flip() +
        geom_col(fill = "darkseagreen", alpha = 1.0) +
        geom_text(aes(label = cs$Count, y = cs$Count + to, alpha = 0.75),
                  size = 3.4) +
        theme(axis.ticks.y = element_blank(),
              axis.ticks.x = element_blank(),
              axis.text.x = element_blank(),
              panel.grid.major = element_blank(),
              panel.grid.minor = element_blank(),
              panel.background = element_blank()
              ) +
        labs(title = title, x = NULL, y = NULL)

    if (is.null(tops)) {
        p <- p + annotate("rect", xmin = 0.1, xmax = 0.9, ymin = 0, ymax = ym,
                          fill = alpha("green", 0.0))
    } else {
        cur <- 0
        ac <- 0.1
        for (i in 1:length(tops)) {
            if (is.na(tops[i]) || tops[i] > nrow) {
                tops[i] <- nrow
            }
            p <- p + annotate("rect",
                              xmin = nrow - tops[i] + 0.5,
                              xmax = nrow - cur + 0.5,
                              ymin = 0, ymax = ym,
                              fill = alpha("green", ac),
                              color = alpha("firebrick1", 0.4),
                              size = 0.4, linetype = "solid") +
                     annotate("text",
                              x = nrow - tops[i] + 1,
                              y = ym - 300 * mf, color = "blue",
                              label = tops[i], size = 3.0, alpha = 0.5)
            cur <- tops[i]
            ac <- ac / 2
            if (tops[i] == nrow) {
                break
            }
        }
    }

    # Cairo limits linear canvas sizes to 32767 pixels!
    height <- min(25 * nrow, 32600)
    p <- ggplotly(p, height = height, width = width)
    p <- config(p, toImageButtonOptions =
                list(filename = `if`(is.null(title), "cities",
                                     gsub("[^[:alnum:]_\\-]", "_", title)),
                     height = height,
                     width = `if`(is.null(width), w0, width), scale = 1))

    print(paste(nrow, "cities plotted"), quote = FALSE)

    return(p)
}
Functions gcities.compound and gcountries (lines 1–14) accept an original data frame read from a Statcounter log and return a data frame with two columns: a location (a compound location comprised from pasted Country, Region and City columns, or from column Country) and the count. They are needed for input to function cities.plot which renders a plot in a browser window. Function cities.plot accepts additionally a title, a tops, and a width. The title and the width correspond to the title and the width of the plot, they can be unset. The tops is a set of counts to form a number of emphasis boxes which help to emphasize top cities on the plot. In the preamble of function cities.plot (lines 17–23) a number of (a bit empirical) factors are configured. The value of w0 corresponds to the standard width for saving a plot as a PNG image and calculating factors for text positions of bar labels, wf is the width factor, the mf is the ultimate width factor which takes into account not only the width of the plot, but also the extents of the Count axis (which depends on the length of the top bar), cw stands for character width, totext offset — the final y (i.e. Count) position of a text label on the plot, ym is the y extent of the emphasis boxes. All these complicated calculations are needed because in ggplot2 there seems to exist no tool for binding a text label to a bar: there is an appropriate package ggrepel but it is not supported in plotly. The first iteration of the plot creation takes place in lines 25–40. We bind data directly to the plot (cs is an argument of ggplot), flip coordinates (line 29) as soon as bars must lie horizontally, say that the x axis must be discrete (line 26) to enable setting exact positions of the emphasis boxes borders, reverse the x scale (on the same line) to put cities with the same count alphabetically from the top to the bottom, and reorder cs by count (aes settings in line 25). Then put bars (line 30), bar labels (lines 31–32), and the title (line 40). In lines 33–39 not needed elements of x and y axes, as well as the grid, get removed. In lines 42–69 the emphasis boxes get applied on the plot as annotations. Notice that being annotations, they lie on top of the bars and the bar labels like colored glass sheets. This means that the boxes must be very transparent to not obscure the bars and the labels under them, and the colors of the bars must be close to the colors of the boxes. When tops is not specified (lines 42–44), a single fully transparent box is still applied: this is done to ensure that the lengths of the bars will keep the same independently of whether the tops specified or not. Alpha channels of the emphasis boxes decrease steadily from value 0.1 (line 47) by factor 2 (line 64). Each box is additionally annotated by a number at the bottom-right corner which corresponds to the position of the lowest bar it covers (lines 59–62). When the value NA (or any number greater than the number of all the bars) are met in the tops, the current emphasis box gets extended to the bottom of the plot, and the loop over the tops stops (lines 49–51 and 65–67). The height of the plot is limited by the maximum linear size allowed in Cairo (lines 71–73). Finally, in lines 74–78, the plotly toolbar gets configured. Particularly, button toImage (Download plot as a png when hovered) no longer proposes to save the image as newplot.png (lines 75–76), instead, it proposes a name based on the title of the plot. Besides this, the width of the PNG image gets 1200 pixels when saved in case if it has not been specified in the width parameter of function cities.plot (line 78). Now let’s render a bar chart of all visits from the Moscow region with top 10, top 40 and top-to-the-end emphasis boxes.
pvMoskCc <- gcities.compound(pvMosk)
cities.plot(pvMoskCc, paste("Moscow region", format(Sys.time(), "(%F %R)")), c(10, 40, NA), 1200)

I cut out a piece of the image (the white lacuna across its lower part) to conform to the limitations on image sizes in this blog.

суббота, 9 февраля 2019 г.

Masking async tasks in the wild

In nginx-haskel-module async threads from module Control.Concurrent.Async are used for running tasks with unpredictable lifetime. A new task gets spawned synchronously from a C code and immediately returns a handle: an object of type Async () wrapped inside a StablePtr. The spawned thread is not waited from anywhere, thus being run in the wild. Instead, the returned stable pointer is used for further communication with the task from the C code. Indeed, having a handle to an async task ensures that the task stays referable, while a stable pointer to this handle ensures that the handle won’t be garbage-collected itself after returning from the Haskell code to C. Every async task returns a result: data buffers collected in a lazy bytestring and some meta-information with details on whether the task was finished correctly and alike. This information gets poked into raw pointers that have been passed from C, then a few bytes get written into an event channel (eventfd or a pipe) to signal Nginx that the task was finished. The event channel is attached to an event handler in the C code, where the written data gets consumed and the async task gets finally freed by deleting the stable pointer to its handle. Then, in case of a persistent task (or a service in terms of the module), a new task of the same kind gets spawned again. The whole lifetime of a single async task can be depicted as in the following pseudocode.
asyncTask customAction rawPointersFromC =
    async
    (do
        result <- customAction `catchAll` returnResultOnException
        pokeAll rawPointersFromC result
        signalEventChannel
    ) >>= newStablePtr
Here, customAction is the payload of the async task that returns result when there was no exception, rawPointersFromC is a collection of pointers where the result will be put by pokeAll, catchAll is the usual catch to catch all, including asynchronous exceptions, signalEventChannel writes into the event channel to signal Nginx that the task was finished. Let’s suppose that pokeAll and signalEventChannel cannot raise exceptions (which is normal for such functions, at least for pokeAll). Is this pseudocode safe at this point? To answer this question, we should first sum up what can make asyncTask unsafe. I see two obvious dangers: writing an inconsistent result in pokeAll in which case the C code may simply segfault if using wrong addresses, and failed signalEventChannel in which case Nginx won’t know that the task was finished and therefore the user request (or the service) will stall. But we noted that pokeAll and signalEventChannel must be safe per se, whereas customAction is well protected by catchAll. So are we safe? Yes… Ooops… No! I forgot about asynchronous exceptions! By the way, two kinds of them are used in the module: exception WorkerProcessIsExiting is used to nicely and synchronously shut down services when the worker process is exiting, and exception ServiceHookInterrupt to signal a running service that it must restart. They can easily break pokeAll and signalEventChannel and ruin the service task. They can even seep into these critical functions from the exception handler returnResultOnException! Let’s model how asynchronous exceptions can break safety of asyncTask. For this, I will use uninterruptibleMask_ threadDelay in the exception handler and the critical section to reliably hit them when needed. Function threadDelay is interruptible, and letting it be hit by an asynchronous exception without wrapping in uninterruptibleMask_ will immediately spoil the protection.
  • Model 1 (obviously bad)
    asyncTask1 = async $ do
        result <- return 10 `catch`
            (const $ return 20 :: SomeException -> IO Int)
        uninterruptibleMask_ (threadDelay 2000000) >> print result
    
    Here I skipped details like passing parameters. The payload function is return 10, the exception handler is located after the catch on the third line. The critical section is on the fourth line, it lasts for 2 seconds. We will try to break it by asynchronous exception ThreadKilled raised in 1 second after starting of asyncTask1. Successful break shall mean failure to reach print result. Running
    a1 <- asyncTask1
    threadDelay 1000000
    throwTo (asyncThreadId a1) ThreadKilled
    wait a1
    
    will print
    asyncTest: thread killed
    
    as it was expected.
  • Model 2 (surprisingly bad) I will show the same model, but in this case ThreadKilled will seep into the critical section from the exception handler.
    asyncTask2 = async $ do
        result <-
            (threadDelay 2000000 >> return 10) `catch`
                (const $ putStrLn "Caught!" >>
                    uninterruptibleMask_ (threadDelay 2000000) >>
                        return 20 ::
                    SomeException -> IO Int
                )
        print result
    
    Running
    a2 <- asyncTask2
    threadDelay 1000000
    throwTo (asyncThreadId a2) ThreadKilled
    wait a2
    
    prints
    Caught!
    20
    
    It’s ok: the critical print was hit, but I promised a failure. Voila!
    a2 <- asyncTask2
    threadDelay 1000000
    throwTo (asyncThreadId a2) ThreadKilled
    threadDelay 1000000
    throwTo (asyncThreadId a2) ThreadKilled
    wait a2
    
    Caught!
    asyncTest: thread killed
    
    And what has happened here. The payload function was interrupted by ThreadKilled and then caught by the exception handler. So far so good. However, the exception handler was slow and we sent another ThreadKilled when it was working. What happened then? Documentation says that asynchronous exceptions in catch are masked. It means that they won’t break the exception handler but instead get postponed, thus becoming synchronous. As soon as the exception handler finishes, the postponed exception raises right at the beginning of the critical section.
  • Model 3 (arguably predictable to be bad) Let’s wrap the critical section in mask_.
    asyncTask3 = async $ do
        result <-
            (threadDelay 2000000 >> return 10) `catch`
                (const $ putStrLn "Caught!" >>
                    uninterruptibleMask_ (threadDelay 2000000) >>
                        return 20 ::
                    SomeException -> IO Int
                )
        mask_ $ print result
    
    a3 <- asyncTask3
    threadDelay 1000000
    throwTo (asyncThreadId a3) ThreadKilled
    threadDelay 1000000
    throwTo (asyncThreadId a3) ThreadKilled
    wait a3
    
    Caught!
    asyncTest: thread killed
    
    All the same. Masking print adjacently to the catch makes only illusion of adjacency. We know that adjacent lines in do-notation always desugar into one of monadic bind operator: (>>) or (>>=). This useless adjacent masking is sometimes referred as a wormhole meaning that an asynchronous exception which was generated and postponed in the upper masked block will inevitably seep into the hole between the two blocks.
  • Model 4 (good?) We should finally try the classical mask / restore idiom. In this approach, a mask applies to a block of code without letting wormholes. The restore is a function-argument of mask which opens a smaller block inside the masked block for asynchronous exceptions.
    asyncTask4 = async $ mask $ \restore -> do
        result <-
            restore (threadDelay 2000000 >> return 10) `catch`
                (const $ putStrLn "Caught!" >>
                    uninterruptibleMask_ (threadDelay 2000000) >>
                        return 20 ::
                    SomeException -> IO Int
                )
        uninterruptibleMask_ (threadDelay 2000000) >> print result
    
    a4 <- asyncTask4
    threadDelay 1000000
    throwTo (asyncThreadId a4) ThreadKilled
    threadDelay 1000000
    throwTo (asyncThreadId a4) ThreadKilled
    threadDelay 1000000
    throwTo (asyncThreadId a4) ThreadKilled
    wait a4
    
    Caught!
    20
    
    Nice! I tried to break both the exception handler and the critical section but failed to do so.
  • Model 5 (certainly good) I am still not sure about safety of asyncTask4. Let’s look at the definition of async.
    async = inline asyncUsing rawForkIO
    
    asyncUsing doFork = \action -> do
       var <- newEmptyTMVarIO
       t <- mask $ \restore ->
              doFork $ try (restore action) >>= atomically . putTMVar var
       return (Async t (readTMVar var))
    
    It uses the same mask / restore idiom inside action which corresponds to our async task. What if… I do not know if it’s possible in principle, but… What if async would return the Async handle before doFork really starts action being in the restore state? Then the C code could send an asynchronous exception to a task which had not yet started. Probably, this is not a problem in the clean Haskell world, but in our case we are getting a broken async task which fails to respond via the event channel and ruins a user request or a service! Fortunately, restore doesn’t unmask asynchronous exceptions but returns to the previous masking state. So the final solution is masking around async rather than around its action: in this case the async’s restore won’t unmask, and we are certainly safe.
    asyncTask5 = mask $ \restore -> async $ do
        result <-
            restore (threadDelay 2000000 >> return 10) `catch`
                (const $ putStrLn "Caught!" >>
                    uninterruptibleMask_ (threadDelay 2000000) >>
                        return 20 ::
                    SomeException -> IO Int
                )
        uninterruptibleMask_ (threadDelay 2000000) >> print result
    
    a5 <- asyncTask5
    threadDelay 1000000
    throwTo (asyncThreadId a5) ThreadKilled
    threadDelay 1000000
    throwTo (asyncThreadId a5) ThreadKilled
    threadDelay 1000000
    throwTo (asyncThreadId a5) ThreadKilled
    wait a5
    
    Caught!
    20
    
The source code for the tests can be found here.