понедельник, 14 августа 2017 г.

Passing ByteString contents reliably into C code

The question was raised a week ago on Reddit. Recently I implemented the elaborated solution in nginx-haskell-module. In this article I will talk about why I wanted to pass ByteStrings contents directly, what obstacles lay in this way, and what was the best way to implement this. The answer to the question why is quite simple and straightforward. Many Haskell handlers from ngx-export, including all asynchronous and content handlers pass data from lazy ByteStrings to the C side to be further used in Nginx variable or content handlers. Before version 1.3 of the nginx-haskell-module the data was merely copied into a single buffer (for variable handlers) or multiple buffers (for content handlers). Below are both the historical functions.
toSingleBuffer :: L.ByteString -> IO (Maybe CStringLen)
toSingleBuffer EmptyLBS =
    return $ Just (nullPtr, 0)
toSingleBuffer s = do
    let I l = L.length s
    t <- safeMallocBytes l
    if t /= nullPtr
        then do
            void $ L.foldlChunks
                (\a s -> do
                    off <- a
                    let l = B.length s
                    B.unsafeUseAsCString s $ flip (copyBytes $ plusPtr t off) l
                    return $ off + l
                ) (return 0) s
            return $ Just (t, l)
        else return Nothing

toBuffers :: L.ByteString -> IO (Maybe (Ptr NgxStrType, Int))
toBuffers EmptyLBS =
    return $ Just (nullPtr, 0)
toBuffers s = do
    t <- safeMallocBytes $
        L.foldlChunks (const . succ) 0 s * sizeOf (undefined :: NgxStrType)
    l <- L.foldlChunks
        (\a s -> do
            off <- a
            maybe (return Nothing)
                (\off -> do
                    let l = B.length s
                    -- l cannot be zero at this point because intermediate
                    -- chunks of a lazy ByteString cannot be empty which is
                    -- the consequence of Monoid laws applied when it grows
                    dst <- safeMallocBytes l
                    if dst /= nullPtr
                        then do
                            B.unsafeUseAsCString s $ flip (copyBytes dst) l
                            pokeElemOff t off $ NgxStrType (fromIntegral l) dst
                            return $ Just $ off + 1
                        else do
                            mapM_
                                (peekElemOff t >=> \(NgxStrType _ x) -> free x)
                                [0 .. off - 1]  -- [0 .. -1] makes [], so wise!
                            free t
                            return Nothing
                ) off
        ) (return $ if t /= nullPtr then Just 0 else Nothing) s
    return $ l >>= Just . (,) t
where
safeMallocBytes :: Int -> IO (Ptr a)
safeMallocBytes =
    flip catchIOError (const $ return nullPtr) . mallocBytes

pattern I i                 <- (fromIntegral -> i)
pattern PtrLen s l          <- (s, I l)
pattern PtrLenFromMaybe s l <- (fromMaybe (nullPtr, -1) -> PtrLen s l)
pattern EmptyLBS            <- (L.null -> True)
What imports are needed and what NgxStrType is, you can find in the original module. These functions firstly allocate a storage for a single buffer containing all the buffers of the original ByteString (toSingleBuffer), or a storage of storages corresponding to every single buffer of the ByteString (toBuffers). In case of allocation error the functions return Nothing. The caller (which is still a Haskell function that represents a specific Haskell handler on the C side) interprets this as an allocation error and returns a tuple (nullPtr, -1) of kind (bufs, n_bufs) via the pattern PtrLenFromMaybe. In both the functions, data from the original ByteString gets copied with copyBytes. User of the functions must free the data at some point. Imagine the following simple version of toBuffers that directly passes the ByteString buffers to the C side.
toBuffers :: L.ByteString -> IO (Ptr NgxStrType, Int)
toBuffers EmptyLBS =
    return (nullPtr, 0)
toBuffers s = do
    t <- safeMallocBytes $
        L.foldlChunks (const . succ) 0 s * sizeOf (undefined :: NgxStrType)
    if t == nullPtr
        then return (nullPtr, -1)
        else (,) t <$>
                L.foldlChunks
                    (\a s -> do
                        off <- a
                        B.unsafeUseAsCStringLen s $
                            \(s, l) -> pokeElemOff t off $
                                NgxStrType (fromIntegral l) s
                        return $ off + 1
                    ) (return 0) s
Now we return the tuple directly and put in its first element (the storage of storages that now transforms to a storage of references) the references to the original ByteString buffers. Later, the data gets passed to the C side. Very simple, no extra allocations, no copying, no obligations of freeing data on the C side. Is this really possible? How does life-time of the internal ByteString data correspond to desired life-time of the data on the C side? The answer is simple: there is no correspondence between them! After returning from a Haskell handler on the C side, the passed (or better to say poked) contents of the ByteString can be easily freed by the Haskell garbage collector, because nothing refers to the ByteString on the Haskell side anymore. This is a bad news! Nginx uses epoll() (or a similar mechanism for feeding tasks when epoll() is not available), and this means that we may need the references to stay valid during unpredictable period of time. How could we ensure validity of references? The StablePtr to the rescue!
A stable pointer is a reference to a Haskell expression that is guaranteed not
to be affected by garbage collection, i.e., it will neither be deallocated nor
will the value of the stable pointer itself change during garbage collection
(ordinary references may be relocated during garbage collection). Consequently,
stable pointers can be passed to foreign code, which can treat it as an opaque
reference to a Haskell value.
So, a stable pointer guarantees that an objects it points to will be alive until the pointer gets freed (read the docs further for details). We could pass a StablePtr to the ByteString along with the references to the ByteString buffers to the C code, and free the pointer via calling a special imported Haskell handler like
foreign export ccall ngxExportReleaseLockedByteString ::
    StablePtr L.ByteString -> IO ()

ngxExportReleaseLockedByteString :: StablePtr L.ByteString -> IO ()
ngxExportReleaseLockedByteString = freeStablePtr
when the references are no longer needed (e.g. at the end of the HTTP request). But in the documentation on the StablePtr nothing is said about possible relocations of the object it points to. Normally, the garbage collector moves alive objects to a new heap when the old heap gets removed (see a brilliant article here). Oh, no! Is the StablePtr not a cure? Relax… We do not need the ByteString itself actually. Remember? We need its buffers only! Digging into ByteString implementation reveals that the buffers are allocated in special pinned byte arrays using function newPinnedByteArray#. The docs say about it (more precisely, about newPinnedByteArray, but it merely calls the former):
Create a pinned byte array of the specified size. The garbage collector is
guaranteed not to move it.
Thus, the Haskell garbage collector will leave the internal ByteString buffers in their original places. On the other hand, the StablePtr guarantees aliveness of the ByteString until it’s freed (hence, aliveness of its not-relocatable buffers too). This is all that we really need, and this seems to be the best solution. Update. Function ngxExportReleaseLockedByteString is not really necessary because freeStablePtr is actually an imported C function hs_free_stable_ptr() declared in HsFFI.h which can be called from C code directly.