вторник, 13 октября 2020 г.

ScopedTypeVariables to fix type safety flaws

Recently I struggled to implement a simple converter from a type representing an integer number of seconds to NominalDiffTime from module Data.Time.Clock. For simplicity, let the original type be a simple Integer value counting a number of seconds. It felt like the solution was straightforward. Indeed, the module provides a dead simple conversion function secondsToNominalDiffTime of type Pico -> NominalDiffTime. Type Pico is a fixed-precision value from module Data.Fixed with twelve digits after the dot. Below is the original definition of conversion function toNominalDiffTime in GHCi REPL.

GHCi, version 8.10.2: https://www.haskell.org/ghc/  :? for help
Loaded GHCi configuration from /home/lyokha/.ghci
Prelude> import Data.Fixed as F
Prelude F> import Data.Time.Clock as TC
Prelude F TC> toNominalDiffTime = secondsToNominalDiffTime . MkFixed . (resolution (undefined :: Pico) *)
Prelude F TC> :t toNominalDiffTime
toNominalDiffTime :: Integer -> NominalDiffTime
Prelude F TC> toNominalDiffTime 23
23s
Prelude F TC> toNominalDiffTime 2383747764736464646
2383747764736464646s

Notice that the original value must be multiplied by the resolution of Pico (i.e. 1000000000000), otherwise the converted value will be one picosecond.

Looks good, does it? Nope… This explicit type annotation of undefined is frustrating. What if I make a mistake and put another type instead?

Prelude F TC> toNominalDiffTime = secondsToNominalDiffTime . MkFixed . (resolution (undefined :: Nano) *)
Prelude F TC> toNominalDiffTime 23
0.023s

In many contexts, this is a catastrophe! How do I get rid of the explicit type declaration? Why did I put it here after all? The root of the problem is that resolution gets fed with an Integer value, while it requires a Fixed value. This is why we created an undefined object of type Pico which is a type synonym of Fixed E12. After getting the resolution from this object and multiplying this by the original value, the calibrated value gets piped to constructor MkFixed which expects an integer value calibrated with resolution of Pico because function secondsToNominalDiffTime requires this concrete type. At this moment, we cannot check correctness of the calibration because the temporary undefined object is already unreachable. In other words, the Fixed type context gets broken between resolution and MkFixed, which I would regard as a type safety flaw.

How can we fix this? Let’s first make a Fixed value and then get its resolution.

Prelude F TC> import Control.Arrow as A
Prelude F TC A> asIntegerPart = uncurry (*) . (MkFixed . (^2) . resolution &&& id) . MkFixed
Prelude F TC A> :t asIntegerPart 
asIntegerPart :: HasResolution a => Integer -> Fixed a
Prelude F TC A> toNominalDiffTime = secondsToNominalDiffTime . asIntegerPart 
Prelude F TC > :t toNominalDiffTime
toNominalDiffTime :: Integer -> NominalDiffTime
Prelude F TC A> toNominalDiffTime 23
23s

This works fine and is type safe. Except the algorithm has changed and become less clear. Here we created a fixed-precision value from the original Integer value. Say, if the original value was 23 then MkFixed must produce 0.000000000023 from this. Then we made a fixed-precision value from the quadratic resolution of Pico and multiplied this by value 0.000000000023. Is it clear why quadratic? The answer is simple: the single resolution would give us 1.000000000000 as the multiplier, while the quadratic gives 1000000000000.000000000000. Also notice that there are no explicit type annotations here because (*) expects two arguments of the same type and returns a value of this very same type.

Prelude F TC A> :t (*)
(*) :: Num a => a -> a -> a

As soon as secondsToNominalDiffTime expects Pico, all fixed-precision values in asIntegerPart must be Pico.

Let’s turn back to our simple algorithm and try to make it type safe. A solution could involve using wrapping type classes, GADTs, etc. However, surprisingly, there is a very simple and straightforward solution!

So, what do we want after all? Something like in the following snippet.

Prelude F TC A> :{
Prelude F TC A| asIntegerPart :: HasResolution a => Integer -> Fixed a
Prelude F TC A| asIntegerPart = MkFixed . (resolution (undefined :: Fixed a) *)
Prelude F TC A| :}

Sadly, this does not compile.

<interactive>:36:28: error:
Could not deduce (HasResolution a0)
        arising from a use ofresolution
      from the context: HasResolution a
        bound by the type signature for:
                   asIntegerPart :: forall a. HasResolution a => Integer -> Fixed a
        at <interactive>:35:1-54
      The type variablea0is ambiguous
      These potential instances exist:
        7 instances involving out-of-scope types
        (use -fprint-potential-instances to see them all)
In the first argument of(*), namely
resolution (undefined :: Fixed a)
      In the second argument of(.), namely
(resolution (undefined :: Fixed a) *)
      In the expression: MkFixed . (resolution (undefined :: Fixed a) *)

Obviously, type variables a in the signature and in the declaration of the function denote different entities, but we wanted them to be of the same type. This is where ScopedTypeVariables extension must help!

Prelude F TC A> :set -XScopedTypeVariables 
Prelude F TC A> :{
Prelude F TC A| asIntegerPart :: HasResolution a => Integer -> Fixed a
Prelude F TC A| asIntegerPart = MkFixed . (resolution (undefined :: Fixed a) *)
Prelude F TC A| :}

<interactive>:40:28: error:
Could not deduce (HasResolution a0)
        arising from a use ofresolution
      from the context: HasResolution a
        bound by the type signature for:
                   asIntegerPart :: forall a. HasResolution a => Integer -> Fixed a
        at <interactive>:41:1-54
      The type variablea0is ambiguous
      These potential instances exist:
        7 instances involving out-of-scope types
        (use -fprint-potential-instances to see them all)
In the first argument of(*), namely
resolution (undefined :: Fixed a)
      In the second argument of(.), namely
(resolution (undefined :: Fixed a) *)
      In the expression: MkFixed . (resolution (undefined :: Fixed a) *)

Hmm, no change. But we know that extension ScopedTypeVariables expects explicit forall quantifiers in the signature!

Prelude F TC A> :{
Prelude F TC A| asIntegerPart :: forall a. HasResolution a => Integer -> Fixed a
Prelude F TC A| asIntegerPart = MkFixed . (resolution (undefined :: Fixed a) *)
Prelude F TC A| :}
Prelude F TC A> toNominalDiffTime = secondsToNominalDiffTime . asIntegerPart
Prelude F TC A> :t toNominalDiffTime
toNominalDiffTime :: Integer -> NominalDiffTime
Prelude F TC A> toNominalDiffTime 23
23s
Prelude F TC A> toNominalDiffTime 73643545463262772
73643545463262772s

Finally, with ScopedTypeVariables, the algorithm gets working and type safe, while still being straightforward as in the first try.

воскресенье, 12 апреля 2020 г.

Recursive calculations in Nginx with recursive subrequests from NgxExport.Tools.Subrequest

Disclaimer. Do not use recursive calculations demonstrated below in production code as they use Nginx and system resources very extensively. Recursion is always attractive but dangerous when implemented naively! Here I want to show something weird and beautiful. Module NgxExport.Tools.Subrequest provides natural HTTP subrequests running inside normal client requests. Being directed towards the Nginx server itself (e.g. with 127.0.0.1), they can make recursive flows. It means… Yes, we can calculate things like factorials and Fibonacci numbers simply by doing nested subrequests to the server! Let’s calculate them. Firstly, we have to install the latest version of package ngx-export-tools-extra with Cabal.
cabal v1-install ngx-export-tools-extra
(You may prefer the new-style v2-install instead.) Then write a bit of Haskell code.
{-# OPTIONS_GHC -Wno-unused-imports #-}

{-# LANGUAGE TemplateHaskell, TypeApplications #-}

module NgxSubrequestRecursion where

import           NgxExport
import           NgxExport.Tools
import           NgxExport.Tools.Subrequest

import           Data.ByteString (ByteString)
import qualified Data.ByteString.Lazy as L
import qualified Data.ByteString.Lazy.Char8 as C8L

showAsLazyByteString :: Show a => a -> L.ByteString
showAsLazyByteString = C8L.pack . show

type IntegerTuple = (Integer, Integer)

multiply :: ByteString -> L.ByteString
multiply = showAsLazyByteString .
    maybe 0 (uncurry (*)) . readFromByteString @IntegerTuple

ngxExportYY 'multiply

add :: ByteString -> L.ByteString
add = showAsLazyByteString .
    maybe 0 (uncurry (+)) . readFromByteString @IntegerTuple

ngxExportYY 'add

prev :: ByteString -> L.ByteString
prev = showAsLazyByteString .
    maybe 0 pred . readFromByteString @Integer

ngxExportYY 'prev
Besides the asynchronous handler makeSubrequest imported from module NgxExport.Tools.Subrequest, module NgxSubrequestRecursion exports a number of simple arithmetic handlers such as multiply, add, and prev: they will be used as auxiliary functions in the recursive algorithms. Let’s compile the module.
ghc -O2 -dynamic -shared -fPIC -lHSrts_thr-ghc$(ghc --numeric-version) ngx_subrequest_recursion.hs -o ngx_subrequest_recursion.so
Gather all dependent libraries (see Utility hslibdeps).
hslibdeps -t /var/lib/nginx/x86_64-linux-ghc-8.8.3 ngx_subrequest_recursion.so
And finally, install all the artifacts into directory /var/lib/nginx being a superuser.
cp .hslibs/libHS* /var/lib/nginx/x86_64-linux-ghc-8.8.3/
cp ngx_subrequest_recursion.so /var/lib/nginx/
Now let’s turn to calculation of factorials and Fibonacci numbers based on recursive subrequests spawned right out of the Nginx configuration. Remember how recursive equations for factorials and Fibonacci numbers are defined? Factorial. F0=1 F_0 = 1 Fn=Fn1n F_{n} = F_{n-1} \cdot n Fibonacci numbers. F0=0 F_0 = 0 F1=1 F_1 = 1 Fn=Fn1+Fn2 F_{n} = F_{n-1} + F_{n-2} We are going to encode them inside the configuration!
user                    nobody;
worker_processes        2;

events {
    worker_connections  1024;
}

http {
    default_type        application/octet-stream;
    sendfile            on;

    haskell load /var/lib/nginx/ngx_subrequest_recursion.so;

    haskell_var_empty_on_error $hs_subrequest, $hs_subrequest2;

    server {
        listen       8010;
        server_name  main;
        error_log    /tmp/nginx-test-haskell-error.log;
        access_log   /tmp/nginx-test-haskell-access.log;

        location = /Factorial {
            rewrite ^ /Calculate/Factorial last;
        }

        location = /Fibonacci {
            rewrite ^ /Calculate/Fibonacci last;
        }

        location ~ ^/Calculate/(.*) {
            internal;

            set $algorithm $1;

            if ($arg_v !~ '^\d+$') {
                return 400;
            }

            if ($arg_v = 1) {
                echo 1;
                break;
            }

            set $check0 $arg_v$algorithm;

            if ($check0 = 0Factorial) {
                echo 1;
                break;
            }

            if ($check0 = 0Fibonacci) {
                echo 0;
                break;
            }

            rewrite ^ /Internal/$algorithm last;
        }

        location = /Internal/Factorial {
            internal;

            haskell_run prev $hs_prev_v $arg_v;

            haskell_run_async makeSubrequest $hs_subrequest
                    '{"uri": "http://127.0.0.1:8010/Factorial?v=$hs_prev_v"}';

            haskell_run multiply $hs_result '($arg_v, $hs_subrequest)';

            if ($hs_result = 0) {
                echo_status 500;
                echo "Error while getting factorial $arg_v";
                break;
            }

            echo $hs_result;
        }

        location = /Internal/Fibonacci {
            internal;

            haskell_run prev $hs_prev_v $arg_v;
            haskell_run prev $hs_prev2_v $hs_prev_v;

            haskell_run_async makeSubrequest $hs_subrequest
                    '{"uri": "http://127.0.0.1:8010/Fibonacci?v=$hs_prev_v"}';
            haskell_run_async makeSubrequest $hs_subrequest2
                    '{"uri": "http://127.0.0.1:8010/Fibonacci?v=$hs_prev2_v"}';

            haskell_run add $hs_result '($hs_subrequest, $hs_subrequest2)';

            if ($hs_result = 0) {
                echo_status 500;
                echo "Error while getting Fibonacci number $arg_v";
                break;
            }

            echo $hs_result;
        }
    }
}
There are two entry points: locations /Factorial and /Fibonacci. They rewrite to an internal location /Calculate which filters bad requests (when the value of variable arg_v is not a natural number) and defines base cases for the recursion rules. If the number passed in arg_v is big enough, then the request gets rewritten to locations /Internal/Factorial or /Internal/Fibonacci where calls to makeSubrequest perform recursion over the same locations until the base cases get reached. Let’s calculate basic factorials and Fibonacci numbers.
curl 'http://localhost:8010/Factorial?v=0'
1
curl 'http://localhost:8010/Factorial?v=1'
1
curl 'http://localhost:8010/Fibonacci?v=0'
0
curl 'http://localhost:8010/Fibonacci?v=1'
1
Factorials of big numbers.
time curl 'http://localhost:8010/Factorial?v=10'
3628800

real    0m0,015s
user    0m0,005s
sys     0m0,006s
If you look into the access log /tmp/nginx-test-haskell-access.log, you will see 10 lines which correspond to 10 subrequests.
127.0.0.1 - - [11/Apr/2020:23:42:29 +0300] "GET /Factorial?v=1 HTTP/1.1" 200 12 "-" "-"
127.0.0.1 - - [11/Apr/2020:23:42:29 +0300] "GET /Factorial?v=2 HTTP/1.1" 200 12 "-" "-"
127.0.0.1 - - [11/Apr/2020:23:42:29 +0300] "GET /Factorial?v=3 HTTP/1.1" 200 12 "-" "-"
127.0.0.1 - - [11/Apr/2020:23:42:29 +0300] "GET /Factorial?v=4 HTTP/1.1" 200 13 "-" "-"
127.0.0.1 - - [11/Apr/2020:23:42:29 +0300] "GET /Factorial?v=5 HTTP/1.1" 200 14 "-" "-"
127.0.0.1 - - [11/Apr/2020:23:42:29 +0300] "GET /Factorial?v=6 HTTP/1.1" 200 14 "-" "-"
127.0.0.1 - - [11/Apr/2020:23:42:29 +0300] "GET /Factorial?v=7 HTTP/1.1" 200 15 "-" "-"
127.0.0.1 - - [11/Apr/2020:23:42:29 +0300] "GET /Factorial?v=8 HTTP/1.1" 200 16 "-" "-"
127.0.0.1 - - [11/Apr/2020:23:42:29 +0300] "GET /Factorial?v=9 HTTP/1.1" 200 17 "-" "-"
127.0.0.1 - - [11/Apr/2020:23:42:29 +0300] "GET /Factorial?v=10 HTTP/1.1" 200 18 "-" "curl/7.66.0"
time curl 'http://localhost:8010/Factorial?v=100'
93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000

real    0m0,124s
user    0m0,011s
sys     0m0,010s
Subrequests are spawned sequentially (time to calculate factorial 100 is 10 times bigger than time to calculate factorial 10), but they are still asynchronous. So, when run in a bunch, they’ll probably run out of limit of open file descriptors.
for i in {1..8} ; do curl 'http://localhost:8010/Factorial?v=100' & done
Error while getting factorial 100
Error while getting factorial 100
Error while getting factorial 100
Error while getting factorial 100
93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000
93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000
93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000
93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000
If you look into the error log /tmp/nginx-test-haskell-error.log, you will find a zillion number of HttpExceptionRequest errors with messages Error while getting factorial …, but more interesting are messages with failed to create async event channel for future async result, skipping IO task (24: Too many open files). There should be four such messages: they are the real cause of the four errors. We can evaluate the number of open file descriptors needed for running 8 simultaneous calculations of factorial 100. Every request creates 99 nested subrequests (1+99=1001 + 99 = 100), each of them creates 2 TCP sockets for the Haskell HTTP client and the Nginx HTTP server, add to this 1 eventfd channel for Haskell/Nginx communication. Altogether 3100=3003 \cdot 100 = 300 open file descriptors for a single request at the peak, or 8300=24008 \cdot 300 = 2400 open file descriptors for 8 simultaneous calculations at the peak which is more than 2 times bigger than the value of worker_connections we set (1024). The same story with Fibonacci numbers: each sub-calculation is sequential and nevertheless asynchronous. This means that the bigger the number, the longer time to complete the whole calculation, on the other hand, independent calculations work in parallel and may exhaust open file descriptors as fast as factorial calculations.
time curl 'http://localhost:8010/Fibonacci?v=10'
55

real    0m0,113s
user    0m0,016s
sys     0m0,005s
time curl 'http://localhost:8010/Fibonacci?v=20'
6765

real    0m7,539s
user    0m0,007s
sys     0m0,004s
The parallel computation shall bring a real TCP storm.
for i in {1..8} ; do curl 'http://localhost:8010/Fibonacci?v=20' & done
6765
6765
6765
6765
6765
6765
6765
6765
This added to the access log 175128 new records! Notice that this time there were no errors as far as depth 20 is not enough to exhaust open file descriptors.