[CakeML-dev] Let's remove max_app

Sun Mar 5 09:15:32 UTC 2017

Just to add a bit to my previous message:

An intermediate solution:
 - leave the semantics as it is
 - have max_app be something huge, e.g. 1000000
 - alter clos_to_bvl as suggested in my previous email

Cheers,
Magnus

On 5 March 2017 at 08:21, Magnus Myreen <magnus.myreen at gmail.com> wrote:
> Hi all,
>
> We should get rid of max_app. Reasons:
>
>  1. The benchmarks show that one hits a significant performance drop
>     if the compiled code includes curried functions with more
>     arguments than max_app. I believe the bootstrapped compiler would
>     run a significant constant factor faster if it was compiled with a
>     high (or non-existent) max_app value, since many functions take
>     more than 5 arguments. (I was e.g. looking at assign_def in
>     data_to_word which is run on every Op in the dataLang
>     programs. The assign function takes 7 arguments).
>
>  2. The current compilation strategy produces too much stub code in
>     clos_to_bvl.  If we really must have max_app (I argue below that
>     we don't), then it should be something like 10 or 15. With the
>     current compilation strategy, the binary produced for max_app=9 is
>     176 kb larger than the binary for max_app=3. I wanted to try
>     compiling with max_app=10, but that was taking more than 5
>     minutes. Compiling with max_app=3 takes 7 seconds for me. I
>     believe the slow down in compilation speed with differing max_app
>     values is caused by the rapid increase in code size (number of
>     stubs in clos_to_bvl) as max_app gets larger. My measurements
>     are below while compiling factorial (and the basis library):
>
>       max_app=1:  10.810s  (* very little room for opts in basis *)
>       max_app=2:  8.724s
>       max_app=3:  7.872s  (* allows optimisations in basis code *)
>       max_app=4:  8.935s
>       max_app=5:  11.608s
>       max_app=6:  21.771s
>       max_app=7:  42.932s
>       max_app=8:  1m24.872s
>       max_app=9:  3m9.745s
>
>     One of the reasons the bootstrapped compilation times are slow is
>     that closLang (and subsequent) optimisations could not fire due to
>     clos_mti not doing enough (due to the low max_app) when
>     compiling the compiler in HOL. As we know, if clos_mti doesn't
>     do enough, a lot of other optimisations don't do much.
>
> So the question is: why do we have max_app currently? I believe it is
> only there because of the way clos_to_bvl wants to jump to specialised
> code for dealing with applying n arguments to an m-argument
> closure. Since n and m need to be <= max_app, we have max_app *
> max_app stubs (each being a non-trivial piece of code).
>
> My suggestion is to:
>
>  1. Remove max_app completely from the semantics, compiler and
>     proofs. This should make various things neater and simpler.
>
>  2. Have only one general-purpose stub (gen_app_stub) in clos_to_bvl
>     for dealing with implementing the semantics of applying an
>     arbitrary closure value to any number of arguments. Currently the
>     compilation of applications of unknown closures (case: loc_opt =
>     NONE) is as follows:
>
>   (compile_exps max_app [App loc_opt x1 xs2] aux =
>      let (c1,aux1) = compile_exps max_app [x1] aux in
>      let (c2,aux2) = compile_exps max_app xs2 aux1 in
>        ([dtcase loc_opt of
>          | NONE =>
>              Let (c2++c1) (mk_cl_call (Var (LENGTH c2)) (GENLIST Var
> (LENGTH c2)))
>          | SOME loc =>
>              (Call (LENGTH c2 - 1) (SOME (loc + (num_stubs max_app)))
> (c2 ++ c1))],
>         aux2)) /\
>
>   mk_cl_call cl args =
>     If (Op Equal [mk_const (LENGTH args - 1); mk_el cl (mk_const 1)])
>        (Call (LENGTH args - 1) NONE (args ++ [cl] ++ [mk_el cl (mk_const 0)]))
>        (Call 0 (SOME (LENGTH args - 1)) (args ++ [cl]))
>
>     I suggest we change the last line above to something that tuples
>     up the given arguments and senda them to the gen_app_stub,
>     i.e. something like:
>
>        (Call 0 (SOME gen_app_stub_location)
>           [cl; Op (Cons 0) args; Op (Const (& (LENGTH args))) []])
>
>     The code above passes in LENGTH args to save gen_app_stub from
>     having to compute it from the tuple. It should probably also be
>     given the previously computed value of `mk_el cl (mk_const 1)` so
>     that it can avoid looking up the expected number of arguments from
>     the closure value. (This is a very minor optimisation.)
>
>     The code for gen_app_stub can of course include a few tests for
>     common cases at the top, e.g. it should probably first check
>     whether this is a two-argument closure applied to one argument (my
>     guess is that this is the most common case of a partial
>     application) and then run special-purpose code only for that
>     common case, but all other cases would be dealt with using generic
>     code.
>
> Thoughts? Am I missing some reason for why we have max_app? I suggest
> we remove it.
>
> Cheers,
> Magnus