суббота, 9 февраля 2019 г.

Masking async tasks in the wild

In nginx-haskel-module async threads from module Control.Concurrent.Async are used for running tasks with unpredictable lifetime. A new task gets spawned synchronously from a C code and immediately returns a handle: an object of type Async () wrapped inside a StablePtr. The spawned thread is not waited from anywhere, thus being run in the wild. Instead, the returned stable pointer is used for further communication with the task from the C code. Indeed, having a handle to an async task ensures that the task stays referable, while a stable pointer to this handle ensures that the handle won’t be garbage-collected itself after returning from the Haskell code to C. Every async task returns a result: data buffers collected in a lazy bytestring and some meta-information with details on whether the task was finished correctly and alike. This information gets poked into raw pointers that have been passed from C, then a few bytes get written into an event channel (eventfd or a pipe) to signal Nginx that the task was finished. The event channel is attached to an event handler in the C code, where the written data gets consumed and the async task gets finally freed by deleting the stable pointer to its handle. Then, in case of a persistent task (or a service in terms of the module), a new task of the same kind gets spawned again. The whole lifetime of a single async task can be depicted as in the following pseudocode.
asyncTask customAction rawPointersFromC =
    async
    (do
        result <- customAction `catchAll` returnResultOnException
        pokeAll rawPointersFromC result
        signalEventChannel
    ) >>= newStablePtr
Here, customAction is the payload of the async task that returns result when there was no exception, rawPointersFromC is a collection of pointers where the result will be put by pokeAll, catchAll is the usual catch to catch all, including asynchronous exceptions, signalEventChannel writes into the event channel to signal Nginx that the task was finished. Let’s suppose that pokeAll and signalEventChannel cannot raise exceptions (which is normal for such functions, at least for pokeAll). Is this pseudocode safe at this point? To answer this question, we should first sum up what can make asyncTask unsafe. I see two obvious dangers: writing an inconsistent result in pokeAll in which case the C code may simply segfault if using wrong addresses, and failed signalEventChannel in which case Nginx won’t know that the task was finished and therefore the user request (or the service) will stall. But we noted that pokeAll and signalEventChannel must be safe per se, whereas customAction is well protected by catchAll. So are we safe? Yes… Ooops… No! I forgot about asynchronous exceptions! By the way, two kinds of them are used in the module: exception WorkerProcessIsExiting is used to nicely and synchronously shut down services when the worker process is exiting, and exception ServiceHookInterrupt to signal a running service that it must restart. They can easily break pokeAll and signalEventChannel and ruin the service task. They can even seep into these critical functions from the exception handler returnResultOnException! Let’s model how asynchronous exceptions can break safety of asyncTask. For this, I will use uninterruptibleMask_ threadDelay in the exception handler and the critical section to reliably hit them when needed. Function threadDelay is interruptible, and letting it be hit by an asynchronous exception without wrapping in uninterruptibleMask_ will immediately spoil the protection.
  • Model 1 (obviously bad)
    asyncTask1 = async $ do
        result <- return 10 `catch`
            (const $ return 20 :: SomeException -> IO Int)
        uninterruptibleMask_ (threadDelay 2000000) >> print result
    
    Here I skipped details like passing parameters. The payload function is return 10, the exception handler is located after the catch on the third line. The critical section is on the fourth line, it lasts for 2 seconds. We will try to break it by asynchronous exception ThreadKilled raised in 1 second after starting of asyncTask1. Successful break shall mean failure to reach print result. Running
    a1 <- asyncTask1
    threadDelay 1000000
    throwTo (asyncThreadId a1) ThreadKilled
    wait a1
    
    will print
    asyncTest: thread killed
    
    as it was expected.
  • Model 2 (surprisingly bad) I will show the same model, but in this case ThreadKilled will seep into the critical section from the exception handler.
    asyncTask2 = async $ do
        result <-
            (threadDelay 2000000 >> return 10) `catch`
                (const $ putStrLn "Caught!" >>
                    uninterruptibleMask_ (threadDelay 2000000) >>
                        return 20 ::
                    SomeException -> IO Int
                )
        print result
    
    Running
    a2 <- asyncTask2
    threadDelay 1000000
    throwTo (asyncThreadId a2) ThreadKilled
    wait a2
    
    prints
    Caught!
    20
    
    It’s ok: the critical print was hit, but I promised a failure. Voila!
    a2 <- asyncTask2
    threadDelay 1000000
    throwTo (asyncThreadId a2) ThreadKilled
    threadDelay 1000000
    throwTo (asyncThreadId a2) ThreadKilled
    wait a2
    
    Caught!
    asyncTest: thread killed
    
    And what has happened here. The payload function was interrupted by ThreadKilled and then caught by the exception handler. So far so good. However, the exception handler was slow and we sent another ThreadKilled when it was working. What happened then? Documentation says that asynchronous exceptions in catch are masked. It means that they won’t break the exception handler but instead get postponed, thus becoming synchronous. As soon as the exception handler finishes, the postponed exception raises right at the beginning of the critical section.
  • Model 3 (arguably predictable to be bad) Let’s wrap the critical section in mask_.
    asyncTask3 = async $ do
        result <-
            (threadDelay 2000000 >> return 10) `catch`
                (const $ putStrLn "Caught!" >>
                    uninterruptibleMask_ (threadDelay 2000000) >>
                        return 20 ::
                    SomeException -> IO Int
                )
        mask_ $ print result
    
    a3 <- asyncTask3
    threadDelay 1000000
    throwTo (asyncThreadId a3) ThreadKilled
    threadDelay 1000000
    throwTo (asyncThreadId a3) ThreadKilled
    wait a3
    
    Caught!
    asyncTest: thread killed
    
    All the same. Masking print adjacently to the catch makes only illusion of adjacency. We know that adjacent lines in do-notation always desugar into one of monadic bind operator: (>>) or (>>=). This useless adjacent masking is sometimes referred as a wormhole meaning that an asynchronous exception which was generated and postponed in the upper masked block will inevitably seep into the hole between the two blocks.
  • Model 4 (good?) We should finally try the classical mask / restore idiom. In this approach, a mask applies to a block of code without letting wormholes. The restore is a function-argument of mask which opens a smaller block inside the masked block for asynchronous exceptions.
    asyncTask4 = async $ mask $ \restore -> do
        result <-
            restore (threadDelay 2000000 >> return 10) `catch`
                (const $ putStrLn "Caught!" >>
                    uninterruptibleMask_ (threadDelay 2000000) >>
                        return 20 ::
                    SomeException -> IO Int
                )
        uninterruptibleMask_ (threadDelay 2000000) >> print result
    
    a4 <- asyncTask4
    threadDelay 1000000
    throwTo (asyncThreadId a4) ThreadKilled
    threadDelay 1000000
    throwTo (asyncThreadId a4) ThreadKilled
    threadDelay 1000000
    throwTo (asyncThreadId a4) ThreadKilled
    wait a4
    
    Caught!
    20
    
    Nice! I tried to break both the exception handler and the critical section but failed to do so.
  • Model 5 (certainly good) I am still not sure about safety of asyncTask4. Let’s look at the definition of async.
    async = inline asyncUsing rawForkIO
    
    asyncUsing doFork = \action -> do
       var <- newEmptyTMVarIO
       t <- mask $ \restore ->
              doFork $ try (restore action) >>= atomically . putTMVar var
       return (Async t (readTMVar var))
    
    It uses the same mask / restore idiom inside action which corresponds to our async task. What if… I do not know if it’s possible in principle, but… What if async would return the Async handle before doFork really starts action being in the restore state? Then the C code could send an asynchronous exception to a task which had not yet started. Probably, this is not a problem in the clean Haskell world, but in our case we are getting a broken async task which fails to respond via the event channel and ruins a user request or a service! Fortunately, restore doesn’t unmask asynchronous exceptions but returns to the previous masking state. So the final solution is masking around async rather than around its action: in this case the async’s restore won’t unmask, and we are certainly safe.
    asyncTask5 = mask $ \restore -> async $ do
        result <-
            restore (threadDelay 2000000 >> return 10) `catch`
                (const $ putStrLn "Caught!" >>
                    uninterruptibleMask_ (threadDelay 2000000) >>
                        return 20 ::
                    SomeException -> IO Int
                )
        uninterruptibleMask_ (threadDelay 2000000) >> print result
    
    a5 <- asyncTask5
    threadDelay 1000000
    throwTo (asyncThreadId a5) ThreadKilled
    threadDelay 1000000
    throwTo (asyncThreadId a5) ThreadKilled
    threadDelay 1000000
    throwTo (asyncThreadId a5) ThreadKilled
    wait a5
    
    Caught!
    20
    
The source code for the tests can be found here.