суббота, 29 декабря 2018 г.

Type annotations vs partial type signatures vs visible type applications

In Haskell, function calls must sometimes be annotated. One of well known examples is reading arbitrary types: read "5" :: Int. Without the type annotation :: Int, compiler cannot decide what the user wants to read. In point-free expressions type annotations may grow in length, say read :: String -> Int. Often, when an expression is wrapped inside an appropriate type context, compiler is able to infer the type and the type annotation gets no longer needed. Say, in [1, 2] ++ read "[3, 4]", the type of the read’s argument can only be a list of numbers. Let’s consider an example where the type annotations are essential. For this, we will implement specialized read functions inside a standalone module ReadFromByteString.
{-# LANGUAGE TypeFamilies, EmptyDataDecls #-}

module ReadFromByteString (readFromByteString
                          ,readFromByteStringAsJSON
                          ,readFromByteStringWithRPtr
                          ,readFromByteStringWithRPtrAsJSON
                          ) where

import           Foreign.Ptr
import           Foreign.Storable
import qualified Data.ByteString as B
import           Data.ByteString (ByteString)
import qualified Data.ByteString.Char8 as C8
import qualified Data.ByteString.Lazy as L
import           Data.Binary.Get
import           Data.Aeson
import           Control.Arrow
import           Safe

data Readable a
data ReadableAsJSON a

class FromByteString a where
    type WrappedT a
    fromByteString :: Maybe a -> ByteString -> Maybe (WrappedT a)

instance Read a => FromByteString (Readable a) where
    type WrappedT (Readable a) = a
    fromByteString = const $ readMay . C8.unpack

instance FromJSON a => FromByteString (ReadableAsJSON a) where
    type WrappedT (ReadableAsJSON a) = a
    fromByteString = const decodeStrict

readFromByteString :: Read a => ByteString -> Maybe a
readFromByteString = fromByteString (Nothing :: Maybe (Readable a))

readFromByteStringAsJSON :: FromJSON a => ByteString -> Maybe a
readFromByteStringAsJSON = fromByteString (Nothing :: Maybe (ReadableAsJSON a))

readFromByteStringWithRPtr :: Read a => ByteString -> (Ptr (), Maybe a)
readFromByteStringWithRPtr = readRPtr &&& readFromByteString . skipRPtr

readFromByteStringWithRPtrAsJSON :: FromJSON a =>
    ByteString -> (Ptr (), Maybe a)
readFromByteStringWithRPtrAsJSON =
    readRPtr &&& readFromByteStringAsJSON . skipRPtr

readRPtr :: ByteString -> Ptr ()
readRPtr = wordPtrToPtr . fromIntegral . runGet getWordhost . L.fromStrict

skipRPtr :: ByteString -> ByteString
skipRPtr = B.drop $ sizeOf (undefined :: Word)
I leveraged GHC extensions TypeFamilies and EmptyDataDecls to build robust and type-safe functions whose names start with readFromByteString. They all can read only custom types deriving or implementing Read or FromJSON. This restriction comes from the fact that type class FromByteString being not exported from the module, provides only instances bound with type classes Read and FromJSON. Data Readable and ReadableAsJSON are used for indexing the instances, while type WrappedT serves as a selector for the return type of fromByteString, simply unwrapping it from Readable or ReadableAsJSON. Function fromByteString expects an argument of type Maybe a which is not used inside though: without this, the module will not compile because type variable a must be referred elsewhere in the type signature besides the return type. I wrapped this argument in Maybe to simplify implementations of the exported functions. Notice also, that exported functions with WithRPtr inside their names, expect a raw pointer in front of the serialized data. Let’s compile this and begin using.
ghc --make ReadFromByteString.hs 
[1 of 1] Compiling ReadFromByteString ( ReadFromByteString.hs, ReadFromByteString.o )
I want to create two custom data types Conf and ConfJSON.
{-# LANGUAGE DeriveGeneric #-}

module TestReadFromByteString

import           ReadFromByteString

import           Data.ByteString (ByteString)
import qualified Data.ByteString.Lazy as L
import qualified Data.ByteString.Lazy.Char8 as C8L
import           Data.Aeson
import           GHC.Generics

showAsLazyByteString :: Show a => a -> L.ByteString
showAsLazyByteString = C8L.pack . show

newtype Conf = Conf Int deriving (Read, Show)

data ConfJSON = ConfJSONCon1 Int
              | ConfJSONCon2 deriving (Generic, Show)
instance FromJSON ConfJSON
This is a header of a new module TestReadFromByteString, and all expected language extensions and imports are in. Let’s implement two simple converters using functions from module ReadFromByteString.
testReadConf :: ByteString -> L.ByteString
testReadConf = showAsLazyByteString . readFromByteString

testReadConfJSON :: ByteString -> L.ByteString
testReadConfJSON = showAsLazyByteString . readFromByteStringAsJSON
Apparently, this won’t compile: both functions do not have any notion of what must be converted. Therefore we must add type annotations. As soon as the functions are implemented in the point-free style, the annotations will be rather lengthy.
testReadConf :: ByteString -> L.ByteString
testReadConf = showAsLazyByteString .
    (readFromByteString :: ByteString -> Maybe Conf)

testReadConfJSON :: ByteString -> L.ByteString
testReadConfJSON = showAsLazyByteString .
    (readFromByteStringAsJSON :: ByteString -> Maybe ConfJSON)
This time it must compile.
ghc --make TestReadFromByteString.hs 
[2 of 2] Compiling TestReadFromByteString ( TestReadFromByteString.hs, TestReadFromByteString.o )
We can load this in GHCi, and play a bit.
*TestReadFromByteString> :set -XOverloadedStrings 
*TestReadFromByteString> testReadConf "Conf 8"
"Just (Conf 8)"
*TestReadFromByteString> testReadConfJSON "{\"tag\":\"ConfJSONCon2\"}"
"Just ConfJSONCon2"
Now let’s implement similar converters for functions that expect a raw pointer.
testReadConfWithRPtr :: ByteString -> L.ByteString
testReadConfWithRPtr = showAsLazyByteString .
    (readFromByteStringWithRPtr :: ByteString -> (Ptr (), Maybe Conf))

testReadConfWithRPtrJSON :: ByteString -> L.ByteString
testReadConfWithRPtrJSON = showAsLazyByteString .
    (readFromByteStringWithRPtrAsJSON :: ByteString -> (Ptr (), Maybe ConfJSON))
So long and tedious! Even worse, it won’t compile!
ghc --make TestReadFromByteString.hs 
[2 of 2] Compiling TestReadFromByteString ( TestReadFromByteString.hs, TestReadFromByteString.o )

TestReadFromByteString.hs:41:51: error:
    Not in scope: type constructor or class ‘Ptr’
   |
41 |     (readFromByteStringWithRPtr :: ByteString -> (Ptr (), Maybe Conf))
   |                                                   ^^^

TestReadFromByteString.hs:45:57: error:
    Not in scope: type constructor or class ‘Ptr’
   |
45 |     (readFromByteStringWithRPtrAsJSON :: ByteString -> (Ptr (), Maybe ConfJSON))
   |                                                         ^^^
Yes, we must import module Foreign.Ptr only to satisfy the type annotation!
import           Foreign.Ptr
Now that compilation succeeds, let’s play in GHCi again.
*TestReadFromByteString> testReadConfWithRPtr "blahblahConf 90"
"(0x68616c6268616c62,Just (Conf 90))"
*TestReadFromByteString> testReadConfWithRPtrJSON "blahblah{\"tag\":\"ConfJSONCon1\",\"contents\":7}"
"(0x68616c6268616c62,Just (ConfJSONCon1 7))"
Here the blahblah is a raw 8-byte pointer, why not? :) Ok, let’s think what we can do better. Let’s give a try to the Partial Type Signatures extension. At least, the Foreign.Ptr import can be removed because we do not use pointers somehow in the code, while the corresponding type annotations will be replaced by placeholders.
{-# LANGUAGE DeriveGeneric, PartialTypeSignatures #-}

-- ...

--import           Foreign.Ptr  -- no longer needed

-- ...

testReadConf :: ByteString -> L.ByteString
testReadConf = showAsLazyByteString .
    (readFromByteString :: _ -> Maybe Conf)

testReadConfJSON :: ByteString -> L.ByteString
testReadConfJSON = showAsLazyByteString .
    (readFromByteStringAsJSON :: _ -> Maybe ConfJSON)

testReadConfWithRPtr :: ByteString -> L.ByteString
testReadConfWithRPtr = showAsLazyByteString .
    (readFromByteStringWithRPtr :: _ -> (_, Maybe Conf))

testReadConfWithRPtrJSON :: ByteString -> L.ByteString
testReadConfWithRPtrJSON = showAsLazyByteString .
    (readFromByteStringWithRPtrAsJSON :: _ -> (_, Maybe ConfJSON))
It compiles, but now we get many warnings.
ghc --make TestReadFromByteString.hs 
[2 of 2] Compiling TestReadFromByteString ( TestReadFromByteString.hs, TestReadFromByteString.o )

TestReadFromByteString.hs:35:28: warning: [-Wpartial-type-signatures]
    • Found type wildcard ‘_’ standing for ‘ByteString’
    • In an expression type signature: _ -> Maybe Conf
      In the second argument of ‘(.)’, namely
        ‘(readFromByteString :: _ -> Maybe Conf)’
      In the expression:
        showAsLazyByteString . (readFromByteString :: _ -> Maybe Conf)
    • Relevant bindings include
        testReadConf :: ByteString -> C8L.ByteString
          (bound at TestReadFromByteString.hs:34:1)
   |
35 |     (readFromByteString :: _ -> Maybe Conf)
   |                            ^
Of course, we could add option -Wno-partial-type-signatures in the GHC command-line, but it looks like we’re using a feature that was not designed for this case. Nevertheless, GHCi tests will run as expected (but I won’t show GHCi sessions anymore). Let’s try the third option: Visible Type Applications available in GHC since version 8.0.1. They have very interesting semantics which meets our intention very well: annotated types get substituted in place of type variables in the function signature. It means that we can finally get rid of building long type annotations!
{-# LANGUAGE DeriveGeneric, TypeApplications #-}

-- ...

testReadConf :: ByteString -> L.ByteString
testReadConf = showAsLazyByteString .
    readFromByteString @Conf

testReadConfJSON :: ByteString -> L.ByteString
testReadConfJSON = showAsLazyByteString .
    readFromByteStringAsJSON @ConfJSON

testReadConfWithRPtr :: ByteString -> L.ByteString
testReadConfWithRPtr = showAsLazyByteString .
    readFromByteStringWithRPtr @Conf

testReadConfWithRPtrJSON :: ByteString -> L.ByteString
testReadConfWithRPtrJSON = showAsLazyByteString .
    readFromByteStringWithRPtrAsJSON @ConfJSON
This looks really nice! So let me make a small conclusion regarding this, a little bit contrived example.
  1. Type annotations for an external function may require not only its type structure, but also references to the type names or implementations (i.e. import of modules where these types are declared).
  2. Partial type signatures still require the type structure of the function (making use of the scaffold of the type signature).
  3. Visible type applications is the tersest and cleanest way to annotate an external function’s type. It does not require re-building of the type signature in case of ambiguity.
The original source code can be found in module NgxExport.Tools. The specialized read functions are used there to facilitate typed exchange between Haskell handlers and directives in Nginx configuration files. See also examples in the documentation for the module.

воскресенье, 4 марта 2018 г.

Signaling all worker processes in Nginx via an event channel

A working Nginx instance basically contains one master process and a number of worker processes. This architecture suits very well for fail-safety. Indeed, when a worker dies due to some reason, for example because a faulty 3rd-party module crashes, the master process simply starts a new worker to replace the died one. The workers are independent OS processes, however they can communicate via shared memory which is well supported in Nginx. The cross-worker communication is rarely needed in simple scenarios where workers are just servicing user requests. However, there are cases when this is essential. For example, when you want to implement an API to change some internal state that must be known in all worker processes, say a dynamic upstream configuration shared between all workers via dedicated shared memory zone. Shared memory, nevertheless, is not so good for reliable communication in some cases. Because it is passive: you still need some mechanism to propagate messages to workers. Basically, the propagation gets triggered in a worker by a user request that runs a special handler that does all the work. This approach, let’s call it request-driven, has a number of downsides. First, the changes do not apply immediately: a specific worker may keep the older state during a very long time period, until a user request reaches it. This is not an issue for dynamic upstreams as their contents are only meaningful in conjunction with a user request which effectively should always trigger their change. However, immediate propagation of a new state in all workers is still important in many other cases, and I’ll soon show such a case. Second, a developer must create a handler to apply changes and make sure that it will run in all Nginx locations that must trigger the changes. Third, the handler should always look into the shared memory to check for the changes, but this is not a cheep action because it requires synchronization with other worker processes. In this article I want to show how to use Nginx event mechanism for immediate signaling to all worker processes. This approach does not have downsides of the shared memory communication. Of course, you still have to write a handler, an event handler this time, but it will trigger immediately in all workers, it does not need to be associated with a user request and a location in the Nginx configuration file, and it won’t lookup in shared memory on every user request. The approach will only work when Nginx uses epoll or kqueue event engines. However, I didn’t test it on kqueue because I didn’t have FreeBSD or MacOS. Let me describe my own use case. I needed a signaling mechanism to implement an API (I called this service hooks) for services in nginx-haskell-module. Services are asynchronous tasks written in Haskell and launched by a Haskell RTS in every Nginx worker process. They share the same memory space and data buffers for their results with the C part of the module, being as such a mechanism for making lightweight worker-driven (as a contrast to term request-driven) background tasks. Services are launched in the worker’s initialisation function. When a service has a result, it signals via an event channel (yes, they use Nginx event engine too), and in the event handler on the C part of the module, the buffers of the service’s result (a Haskell’s lazy ByteString) get referenced in a configuration storage that can be used to access this service variable from a dedicated Nginx variable handler. Then the event handler starts the service again. Besides normal per-worker services, there are shared services that have only one active instance on one of the worker processes, others are waiting on an advisory file lock: when the worker with the active service instance dies, one of the inactive service instances takes up the work. Shared services hold their results in shared memory. Here is an example of a service declaration in an Nginx configuration.
user                    nginx;
worker_processes        4;

events {
    worker_connections  1024;
}

http {
    default_type        application/octet-stream;
    sendfile            on;

    haskell load /var/lib/nginx/test.so;

    haskell_run_service getUrlService $hs_service_httpbin "http://httpbin.org";

    haskell_service_var_in_shm httpbin 512k /tmp $hs_service_httpbin;

    server {
        listen          8010;
        server_name     main;

        location /httpbin {
            echo $hs_service_httpbin;
        }
    }
}
The compiled Haskell library gets loaded from /var/lib/nginx/test.so. Service getUrlService is used for periodic update from URL http://httpbin.org in background. The service is shared because it stores its result in the named shared memory zone httpbin with size of 512k, the file locks reside in directory /tmp. The service result is accessible from location /httpbin. Implementation of getUrlService is pretty simple.
{-# LANGUAGE TemplateHaskell #-}

module NgxHaskellUserRuntime where

import           NgxExport
import           Data.ByteString (ByteString)
import qualified Data.ByteString.Char8 as C8
import qualified Data.ByteString.Lazy.Char8 as C8L
import           Control.Concurrent
import           Network.HTTP.Client
import           Control.Exception
import           System.IO.Unsafe

-- ...

httpManager :: Manager
httpManager = unsafePerformIO $ newManager defaultManagerSettings
{-# NOINLINE httpManager #-}

getUrl :: ByteString -> IO C8L.ByteString
getUrl url = catchHttpException $ getResponse url $ flip httpLbs httpManager
    where getResponse u = fmap responseBody . (parseRequest (C8.unpack u) >>=)

catchHttpException :: IO C8L.ByteString -> IO C8L.ByteString
catchHttpException = (`catch` \e ->
        return $ C8L.pack $ "HTTP EXCEPTION: " ++ show (e :: HttpException))

getUrlService :: ByteString -> Bool -> IO L.ByteString
getUrlService url first_run = do
    unless first_run $ threadDelay $ 20 * 1000000
    getUrl url
ngxExportServiceIOYY 'getUrlService
It gets new contents from httpbin.org every 20 seconds, and the service variable $hs_service_httpbin gets updated in the Nginx module. Every single request to location /httpbin must be very fast, because it is not passed to httpbin.org, but rather already existing data that have been downloaded by the service in background is returned. Let’s test this.
curl 'http://localhost:8010/httpbin'
<!DOCTYPE html>
<html>
<head>
  <meta http-equiv='content-type' value='text/html;charset=utf8'>
  <meta name='generator' value='Ronn/v0.7.3 (http://github.com/rtomayko/ronn/tree/0.7.3)'>
  <title>httpbin(1): HTTP Client Testing Service</title>

  ...
Let’s get back to service hooks. What if I want to stop the service, or change its argument (http://httpbin.org) in runtime, from a dedicated Nginx API location, say /httpbin/url.
    haskell_service_hooks_zone hooks 32k;

    # ...

        location /httpbin/url {
            allow 127.0.0.1;
            deny all;
            haskell_service_hook getUrlServiceHook $hs_service_httpbin $arg_v;
        }
Here I want to change URL for getUrlService to the value of variable $arg_v. On the top of the configuration, directive haskell_service_hooks_zone declares a shared memory zone for a temporary storage of the service hooks data such as value of $arg_v, no matter: this is solely details of the further implementation. Currently, getUrlService does not support changing URL in runtime, but this is not a problem: we can easily implement this. The real problem is that a request to location /httpbin/url is accepted only in one of the worker processes. For sure, this worker can change state of its own service immediately, but how others will know about the change? They must immediately restart their services too, or, in the case of shared services, the active service that may reside on any of the workers, must restart immediately. So, I want to accomplish this task using Nginx event engine working atop of epoll or kqueue, and here is my plan.
  1. Create file descriptors (one when eventfd is used, or two when a pipe is used) for an event channel in the master process: the file descriptors will be inherited in forked processes, i.e. in all workers.
  2. Declare Nginx content handlers for every service hook. The handlers will merely write to the write-end of the event channel and return HTTP status 200 Ok.
  3. Create Nginx events using ngx_add_event() and associate inherited file descriptors with them in every worker process on its start. Every event will have a dedicated event handler that will be run by Nginx when the event occurs.
  4. When an associated with a service hook content handler writes into the event channel, the dedicated event handlers are called in all worker processes (this is what we want to rely upon when choosing an event engine).
  5. The event handler calls a service hook synchronously, and then calls a special function that interrupts a service using a Haskell asynchronous exception. In this context, service hook is a small and fast Haskell handler which changes some global state and immediately returns. After restart, the service reads in the new state and changes its behavior accordingly.
Where the file descriptors could be stored? Probably in the module’s main configuration (which I will call mcf later)? Unfortunately, not. An older mcf is not available when Nginx restarts with SIGHUP, but we want to close the older file descriptors before we create new (their contents may change when the Nginx configuration changes). I decided that a global static variable defined on top of the module’s source code is a good choice. Below I will show parts of the nginx-haskell-module source code which may contain irrelevant to this article details, no matter.
typedef struct {
    /* ... */

    void                                     (*service_hook_interrupt)
                                                                (HsStablePtr);
    HsBool                                   (*rts_has_thread_support)(void);
    ngx_array_t                                service_code_vars;

    /* ... */

    ngx_array_t                                service_hooks;
    ngx_shm_zone_t                            *service_hooks_shm_zone;
    ngx_uint_t                                 code_loaded:1;
    ngx_uint_t                                 module_failed:1;

    /* ... */
} ngx_http_haskell_main_conf_t;

/* ... */

typedef struct {
    /* ngx_connection_t stub to allow use c->fd as event ident */
    void                                             *data;
    ngx_event_t                                      *read;
    ngx_event_t                                      *write;
    ngx_fd_t                                          fd;
} ngx_http_haskell_async_event_stub_t;

typedef struct {
    ngx_http_haskell_async_event_stub_t               s;
    ngx_cycle_t                                      *cycle;
    struct ngx_http_haskell_service_hook_s           *hook;
} ngx_http_haskell_service_hook_event_t;


struct ngx_http_haskell_service_hook_s {
    ngx_event_t                                       event;
    ngx_http_haskell_service_hook_event_t             hev;
    ngx_fd_t                                          event_channel[2];
    ngx_int_t                                         handler;
    ngx_int_t                                         service_hook_index;
    ngx_int_t                                         service_code_var_index;
    ngx_uint_t                                        update_hook;
    struct ngx_http_haskell_service_code_var_data_s  *service_code_var;
};

typedef struct ngx_http_haskell_service_hook_s ngx_http_haskell_service_hook_t;

/* ... */

static struct {
    size_t      size;
    ngx_fd_t  (*elts)[2];
} service_hook_fd;

/* ... */

ngx_module_t  ngx_http_haskell_module = {
    NGX_MODULE_V1,
    &ngx_http_haskell_module_ctx,            /* module context */
    ngx_http_haskell_module_commands,        /* module directives */
    NGX_HTTP_MODULE,                         /* module type */
    NULL,                                    /* init master */
    ngx_http_haskell_init_module,            /* init module */
    ngx_http_haskell_init_worker,            /* init process */
    NULL,                                    /* init thread */
    NULL,                                    /* exit thread */
    ngx_http_haskell_exit_worker,            /* exit process */
    ngx_http_haskell_exit_master,            /* exit master */
    NGX_MODULE_V1_PADDING
};
The ngx_http_haskell_main_conf_t is the type of our mcf. Besides other data, it contains an array of our service hooks which have type ngx_http_haskell_service_hook_s: it contains event description for the Nginx event engine. The service_hook_fd is the global static variable that holds file descriptors for the events. The ngx_http_haskell_init_module is a standard declaration of the Nginx module’s handlers. It contains four handlers which regard to initialization and deinitialization of service_hook_fd: ngx_http_haskell_init_module, ngx_http_haskell_init_worker, ngx_http_haskell_exit_worker and ngx_http_haskell_exit_master. In the module initialization we close the older and then create new file descriptors. Here is the source code of this handler.
static ngx_int_t
ngx_http_haskell_init_module(ngx_cycle_t *cycle)
{
    ngx_uint_t                                 i;
    ngx_http_haskell_main_conf_t              *mcf;
    ngx_event_conf_t                          *ecf;
    void                                    ***cf;
    ngx_http_haskell_service_hook_t           *service_hooks;

    ngx_http_haskell_cleanup_service_hook_fd(cycle);

    mcf = ngx_http_cycle_get_module_main_conf(cycle, ngx_http_haskell_module);
    if (mcf == NULL || !mcf->code_loaded) {
        return NGX_OK;
    }

    if (ngx_http_haskell_has_async_tasks(mcf)) {
        cf = ngx_get_conf(cycle->conf_ctx, ngx_events_module);
        ecf = (*cf)[ngx_event_core_module.ctx_index];

        if (ngx_strcmp(ecf->name, "epoll") != 0
            && ngx_strcmp(ecf->name, "kqueue") != 0)
        {
            ngx_log_error(NGX_LOG_EMERG, cycle->log, 0,
                          "event engine \"%s\" is not compatible with "
                          "implementation of async tasks and services, "
                          "only \"epoll\" and \"kqueue\" are currently "
                          "supported", ecf->name);
            goto module_failed;
        }
    }

    if (mcf->service_hooks.nelts == 0) {
        return NGX_OK;
    }

    service_hook_fd.size = mcf->service_hooks.nelts;
    service_hook_fd.elts = ngx_alloc(service_hook_fd.size * sizeof(ngx_fd_t[2]),
                                     cycle->log);

    if (service_hook_fd.elts == NULL) {
        ngx_log_error(NGX_LOG_EMERG, cycle->log, 0,
                      "failed to allocate fd storage for service hooks");
        goto module_failed;
    }

    service_hooks = mcf->service_hooks.elts;
    for (i = 0; i < mcf->service_hooks.nelts; i++) {
        if (ngx_http_haskell_open_async_event_channel(
                                            service_hooks[i].event_channel))
        {
            ngx_log_error(NGX_LOG_EMERG, cycle->log, 0,
                          "failed to open event channel for service hook");
            goto module_failed;
        }
        service_hook_fd.elts[i][0] = service_hooks[i].event_channel[0];
        service_hook_fd.elts[i][1] = service_hooks[i].event_channel[1];
    }

    return NGX_OK;

module_failed:

    mcf->module_failed = 1;

    return NGX_OK;
}
I won’t dig into details of the code. Instead, I’ll show a number of auxiliary functions that open and close event channels.
static void
ngx_http_haskell_cleanup_service_hook_fd(ngx_cycle_t *cycle)
{
    ngx_uint_t                                 i;

    for (i = 0; i < service_hook_fd.size; i++) {
        if (service_hook_fd.elts[i][0] != NGX_ERROR) {
            ngx_http_haskell_close_async_event_channel(cycle->log,
                                                       service_hook_fd.elts[i]);
        }
    }
    ngx_free(service_hook_fd.elts);
    service_hook_fd.size = 0;
    service_hook_fd.elts = NULL;
}

ngx_int_t
ngx_http_haskell_open_async_event_channel(ngx_fd_t fd[2])
{
#if (NGX_HAVE_EVENTFD)
#if (NGX_HAVE_SYS_EVENTFD_H)
    fd[0] = fd[1] = eventfd(0, EFD_NONBLOCK);
#else
    fd[0] = fd[1] = syscall(323, O_NONBLOCK);
#endif
    return fd[0] == NGX_INVALID_FILE ? NGX_ERROR : NGX_OK;
#else
    if (pipe(fd) == -1) {
        return NGX_ERROR;
    }
    if (fcntl(fd[0], F_SETFL, O_NONBLOCK) == -1
        || fcntl(fd[1], F_SETFL, O_NONBLOCK) == -1)
    {
        ngx_http_haskell_close_async_event_channel(NULL, fd);
        return NGX_ERROR;
    }
    return NGX_OK;
#endif
}

void
ngx_http_haskell_close_async_event_channel(ngx_log_t *log, ngx_fd_t fd[2])
{
    ngx_int_t  i;

    for (i = 0; i < (fd[0] == fd[1] ? 1 : 2); i++) {
        if (close(fd[i]) == -1 && log != NULL) {
            ngx_log_error(NGX_LOG_CRIT, log, ngx_errno,
                          "failed to close file descriptor of "
                          "async event channel");
        }
    }
}

ssize_t
ngx_http_haskell_signal_async_event_channel(ngx_fd_t fd)
{
#if (NGX_HAVE_EVENTFD)
        uint64_t  v = 1;

        return write(fd, &v, sizeof(uint64_t));
#else
        uint8_t  v = 1;

        return write(fd, &v, sizeof(uint8_t));
#endif
}

ssize_t
ngx_http_haskell_consume_from_async_event_channel(ngx_fd_t fd)
{
#if (NGX_HAVE_EVENTFD)
        uint64_t  v;

        return read(fd, &v, sizeof(uint64_t));
#else
        uint8_t  v;

        return read(fd, &v, sizeof(uint8_t));
#endif
}
The ngx_http_haskell_exit_master merely calls ngx_http_haskell_cleanup_service_hook_fd(). The ngx_http_haskell_init_worker initializes all hooks via ngx_http_haskell_init_service_hook(). This is an interesting function because it creates an event for a hook.
static ngx_event_t  dummy_write_event;

ngx_int_t
ngx_http_haskell_init_service_hook(ngx_cycle_t *cycle,
                                   ngx_array_t *service_code_vars,
                                   ngx_http_variable_t *cmvars,
                                   ngx_http_haskell_service_hook_t *hook)
{
    ngx_uint_t                                 i;
    ngx_http_haskell_service_code_var_data_t  *service_code_vars_elts;
    ngx_event_t                               *event;
    ngx_http_haskell_service_hook_event_t     *hev;

    service_code_vars_elts = service_code_vars->elts;
    for (i = 0; i < service_code_vars->nelts; i++) {
        if (hook->service_code_var_index
            == service_code_vars_elts[i].data->index)
        {
            hook->service_code_var = &service_code_vars_elts[i];
            break;
        }
    }

    if (hook->service_code_var == NULL) {
        ngx_log_error(NGX_LOG_ERR, cycle->log, 0,
                      "service hook will not be enabled because "
                      "variable \"%V\" is not a service variable",
                      &cmvars[hook->service_code_var_index].name);
        hook->service_code_var_index = NGX_DECLINED;
        return NGX_OK;
    }

    if (hook->update_hook) {
        if (hook->service_code_var->shm_index == NGX_ERROR) {
            ngx_log_error(NGX_LOG_ERR, cycle->log, 0,
                          "service update hook will not be enabled because "
                          "variable \"%V\" is not in shm",
                          &cmvars[hook->service_code_var_index].name);
            hook->service_code_var_index = NGX_DECLINED;
            return NGX_OK;
        }
        hook->service_code_var->has_update_hooks = 1;
    }

    event = &hook->event;
    hev = &hook->hev;

    ngx_memzero(event, sizeof(ngx_event_t));
    event->data = hev;
    event->handler = ngx_http_haskell_service_hook_event;
    event->log = cycle->log;

    ngx_memzero(hev, sizeof(ngx_http_haskell_service_hook_event_t));
    hev->cycle = cycle;
    hev->hook = hook;

    hev->s.read = event;
    hev->s.write = &dummy_write_event;
    hev->s.fd = hook->event_channel[0];

    if (ngx_add_event(event, NGX_READ_EVENT, NGX_CLEAR_EVENT) != NGX_OK) {
        ngx_log_error(NGX_LOG_ERR, cycle->log, 0,
                      "failed to add event for service hook");
        ngx_http_haskell_close_async_event_channel(cycle->log,
                                                   hook->event_channel);
        return NGX_ERROR;
    }

    return NGX_OK;
}
Notice that we are using flag NGX_CLEAR_EVENT in ngx_add_event(): this makes epoll use edge-triggered mode (EPOLLET), when a single instance of a file descriptor gets signaled only once per event. In kqueue NGX_CLEAR_EVENT corresponds to flag EV_CLEAR. Static variable dummy_write_event is not really used in our scenario, but it must have all its fields zeros (and static variables always meet this condition): it’s used in the Nginx event engine’s machinery. And finally, the ngx_http_haskell_exit_worker closes all service hooks using the following function.
void
ngx_http_haskell_close_service_hook(ngx_cycle_t *cycle,
                                    ngx_http_haskell_service_hook_t *hook)
{
    if (hook->service_code_var_index == NGX_DECLINED) {
        return;
    }

    hook->service_code_var_index = NGX_AGAIN;

    if (ngx_del_event(&hook->event, NGX_READ_EVENT, 0) == NGX_ERROR) {
        ngx_log_error(NGX_LOG_ERR, cycle->log, 0,
                      "failed to delete service hook event data");
    }

    ngx_http_haskell_close_async_event_channel(cycle->log,
                                               hook->event_channel);
}
Ugh! So many handlers and C is so wordy! And this is still not the end. There are also the content handler and the event handler. The content handler gets bound to a hook in the handler of Nginx directive haskell_service_hook, the event handler gets bound to the hook’s event in ngx_http_haskell_init_service_hook. The content handler, ngx_http_haskell_service_hook, reads from the read-end of the event channel and then writes to the write-end. As soon as a hook may have optional data like $arg_v in the getUrlService example, before writing into the channel it updates the hook’s data in the hook’s shared memory.
ngx_http_haskell_service_hook(ngx_http_request_t *r)
{
    ngx_http_haskell_main_conf_t             *mcf;
    ngx_http_haskell_loc_conf_t              *lcf;
    ngx_http_complex_value_t                 *args;
    ngx_str_t                                 arg = ngx_null_string;
    ngx_http_haskell_service_hook_t          *service_hooks;

    lcf = ngx_http_get_module_loc_conf(r, ngx_http_haskell_module);
    mcf = ngx_http_get_module_main_conf(r, ngx_http_haskell_module);

    if (lcf->service_hook_index == NGX_ERROR
        || mcf->service_hooks.nelts < (ngx_uint_t) lcf->service_hook_index)
    {
        ngx_log_error(NGX_LOG_CRIT, r->connection->log, 0,
                      "unexpected service hook index %ui",
                      lcf->service_hook_index);
        return NGX_HTTP_INTERNAL_SERVER_ERROR;
    }

    service_hooks = mcf->service_hooks.elts;

    if (service_hooks[lcf->service_hook_index].service_code_var_index
        == NGX_DECLINED)
    {
        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0,
                      "service hook was disabled because of inappropriate "
                      "variable handler");
        return NGX_HTTP_INTERNAL_SERVER_ERROR;
    }

    if (service_hooks[lcf->service_hook_index].service_code_var_index
        == NGX_AGAIN)
    {
        return NGX_HTTP_SERVICE_UNAVAILABLE;
    }

    args = lcf->content_handler->args;

    if (args && ngx_http_complex_value(r, args, &arg) != NGX_OK) {
        return NGX_HTTP_INTERNAL_SERVER_ERROR;
    }

    if (arg.len > 0 && mcf->service_hooks_shm_zone == NULL) {
        ngx_log_error(NGX_LOG_CRIT, r->connection->log, 0,
                      "service hook provides data, but service hooks shm zone "
                      "was not initialized");
        return NGX_HTTP_INTERNAL_SERVER_ERROR;
    }

    if (mcf->service_hooks_shm_zone != NULL
        && ngx_http_haskell_update_service_hook_data(r, lcf->service_hook_index,
                                                     arg)
        != NGX_OK)
    {
        ngx_log_error(NGX_LOG_CRIT, r->connection->log, 0,
                      "service hook data failed to update");
        return NGX_HTTP_INTERNAL_SERVER_ERROR;
    }

    if (ngx_http_haskell_consume_from_async_event_channel(
                    service_hooks[lcf->service_hook_index].event_channel[0])
        == -1)
    {
        if (ngx_errno != NGX_EAGAIN) {
            ngx_log_error(NGX_LOG_CRIT, r->connection->log, ngx_errno,
                          "failed to read from service hook event channel");
        }
    }

    if (ngx_http_haskell_signal_async_event_channel(
                    service_hooks[lcf->service_hook_index].event_channel[1])
        == -1)
    {
        ngx_log_error(NGX_LOG_CRIT, r->connection->log, ngx_errno,
                      "failed to write to service hook event channel");
    }

    r->header_only = 1;
    r->headers_out.content_type_lowcase = NULL;
    r->headers_out.status = NGX_HTTP_OK;
    r->headers_out.content_length_n = 0;

    return ngx_http_send_header(r);
}

ngx_int_t
ngx_http_haskell_update_service_hook_data(ngx_http_request_t *r,
                                          ngx_int_t hook_index, ngx_str_t data)
{
    ngx_http_haskell_main_conf_t                *mcf;
    ngx_slab_pool_t                             *shpool;
    ngx_http_haskell_shm_service_hook_handle_t  *shm_vars;
    ngx_str_t                                   *hook_data;
    u_char                                      *hook_data_data;

    mcf = ngx_http_get_module_main_conf(r, ngx_http_haskell_module);

    shpool = (ngx_slab_pool_t *) mcf->service_hooks_shm_zone->shm.addr;
    shm_vars = shpool->data;

    ngx_shmtx_lock(&shpool->mutex);

    hook_data = &shm_vars[hook_index].data;

    if (hook_data->data != NULL) {
        ngx_slab_free_locked(shpool, hook_data->data);
    }
    ngx_str_null(hook_data);

    hook_data_data = ngx_slab_alloc_locked(shpool, data.len);
    if (hook_data_data == NULL) {
        ngx_shmtx_unlock(&shpool->mutex);
        ngx_log_error(NGX_LOG_CRIT, r->connection->log, 0,
                      "failed to allocate memory to store service hook data");
        return NGX_ERROR;
    }

    ngx_memcpy(hook_data_data, data.data, data.len);

    hook_data->len = data.len;
    hook_data->data = hook_data_data;

    ngx_shmtx_unlock(&shpool->mutex);

    return NGX_OK;
}
And finally the event handler. It must read the hook’s data, call the service hook (i.e. the Haskell handler) synchronously, and then call the Haskell service interruption function. Its code is rather huge and contains irrelevant details, so I won’t show it here. Let’s get back to our service getUrlService. First, I have to enable it for reading the URL from a global state, and also write a hook for changing the state.
import           Control.Monad
import           Data.IORef

-- ...

getUrlServiceLink :: IORef (Maybe ByteString)
getUrlServiceLink = unsafePerformIO $ newIORef Nothing
{-# NOINLINE getUrlServiceLink #-}

getUrlServiceLinkUpdated :: IORef Bool
getUrlServiceLinkUpdated = unsafePerformIO $ newIORef True
{-# NOINLINE getUrlServiceLinkUpdated #-}

getUrlService :: ByteString -> Bool -> IO L.ByteString
getUrlService url = const $ do
    url' <- fromMaybe url <$> readIORef getUrlServiceLink
    updated <- readIORef getUrlServiceLinkUpdated
    atomicWriteIORef getUrlServiceLinkUpdated False
    unless updated $ threadDelay $ 20 * 1000000
    getUrl url'
ngxExportServiceIOYY 'getUrlService

getUrlServiceHook :: ByteString -> IO L.ByteString
getUrlServiceHook url = do
    writeIORef getUrlServiceLink $ if B.null url
                                       then Nothing
                                       else Just url
    atomicWriteIORef getUrlServiceLinkUpdated True
    return $ if B.null url
                 then "getUrlService reset URL"
                 else L.fromChunks ["getUrlService set URL ", url]
ngxExportServiceHook 'getUrlServiceHook
The hook which is called here getUrlServiceHook, updates two global states: getUrlServiceLink and getUrlServiceLinkUpdated. The new URL gets used in getUrlService after the service’s restart due to interruption from the event handler. Let’s test it with curl.
curl 'http://localhost:8010/httpbin'
<!DOCTYPE html>
<html>
<head>
  <meta http-equiv='content-type' value='text/html;charset=utf8'>
  <meta name='generator' value='Ronn/v0.7.3 (http://github.com/rtomayko/ronn/tree/0.7.3)'>
  <title>httpbin(1): HTTP Client Testing Service</title>

  ...
curl 'http://localhost:8010/httpbin/url?v=http://example.com'
curl 'http://localhost:8010/httpbin'
<!doctype html>
<html>
<head>
    <title>Example Domain</title>

    <meta charset="utf-8" />

  ...
After the second request, you’ll find in the Nginx error log lines
2018/03/04 16:51:38 [alert] 696#0: service hook reported "getUrlService set URL http://example.com"
2018/03/04 16:51:38 [alert] 699#0: service hook reported "getUrlService set URL http://example.com"
2018/03/04 16:51:38 [alert] 698#0: service hook reported "getUrlService set URL http://example.com"
2018/03/04 16:51:38 [alert] 697#0: service hook reported "getUrlService set URL http://example.com"
2018/03/04 16:51:38 [alert] 698#0: an exception was caught while getting value of service variable "hs_service_httpbin": "Service was interrupted by a service hook", using old value
All 4 workers (with PIDs 696, 697, 698 and 699) reported the change of the URL, and the service itself was interrupted. The service also reported that the value (i.e. the contents of the httpbin.org) won’t change, but do not be deceived by this: it will do change the contents after restart. Let’s reset to httpbin.org.
curl 'http://localhost:8010/httpbin/url'
curl 'http://localhost:8010/httpbin'
<!DOCTYPE html>
<html>
<head>
  <meta http-equiv='content-type' value='text/html;charset=utf8'>
  <meta name='generator' value='Ronn/v0.7.3 (http://github.com/rtomayko/ronn/tree/0.7.3)'>
  <title>httpbin(1): HTTP Client Testing Service</title>

  ...
Now turn back to example.com,
curl 'http://localhost:8010/httpbin/url?v=http://example.com'
curl 'http://localhost:8010/httpbin'
<!doctype html>
<html>
<head>
    <title>Example Domain</title>

    <meta charset="utf-8" />

  ...
and do something really interesting: kill all the workers. Let it be SIGKILL. The master process must restart died workers, correct?
kill -KILL 696 697 698 699
ps -ef | grep nginx
root       695     1  0 14:14 ?        00:00:00 nginx: master process ...
nginx    21203   695  6 17:08 ?        00:00:00 nginx: worker process
nginx    21204   695  6 17:08 ?        00:00:00 nginx: worker process
nginx    21205   695  6 17:08 ?        00:00:00 nginx: worker process
nginx    21209   695  5 17:08 ?        00:00:00 nginx: worker process
They all were restarted, that’s good. Let’s look what contents the service has.
curl 'http://localhost:8010/httpbin'
<!doctype html>
<html>
<head>
    <title>Example Domain</title>

    <meta charset="utf-8" />

  ...
Wow! The latest service contents were recreated in the new workers! In the Nginx logs we’ll find.
2018/03/04 17:08:48 [alert] 695#0: worker process 697 exited on signal 9
2018/03/04 17:08:48 [alert] 695#0: worker process 696 exited on signal 9
2018/03/04 17:08:48 [alert] 695#0: worker process 699 exited on signal 9
2018/03/04 17:08:48 [alert] 695#0: worker process 698 exited on signal 9
2018/03/04 17:08:49 [alert] 21203#0: service hook reported "getUrlService set URL http://example.com"
2018/03/04 17:08:49 [alert] 21205#0: service hook reported "getUrlService set URL http://example.com"
2018/03/04 17:08:49 [alert] 21209#0: service hook reported "getUrlService set URL http://example.com"
2018/03/04 17:08:49 [alert] 21204#0: service hook reported "getUrlService set URL http://example.com"
2018/03/04 17:08:49 [alert] 21203#0: an exception was caught while getting value of service variable "hs_service_httpbin": "Service was interrupted by a service hook", using old value
The value of the hook’s argument (http://example.com) was held in the hook’s shared memory zone and survived despite the misfortune of the workers. But how Nginx could know that the new services must be updated? Remember that in the content handler we first read from the event channel’s read-end and then write to the write-end? This means that the event exists until the next request to the service hook consumes its data. The OS kernel (i.e. Linux in the case of epoll) will persistently signal every new potential consumer of the event (every new Nginx worker) via the file descriptor inherited from the master process. I am not sure if kqueue will work like this, and it would be so neat if someone would test this on FreeBSD or MacOS! Here are the links to the Haskell source code and the Nginx configuration used in the example: test.hs and test.conf. There is a tutorial in PDF format where details on usage of the nginx-haskell-module as well as listings of the files test.hs and test.conf can be found.