[CakeML-dev] Let's remove max_app

Sun Mar 5 08:21:43 UTC 2017

Hi all,

We should get rid of max_app. Reasons:

 1. The benchmarks show that one hits a significant performance drop
    if the compiled code includes curried functions with more
    arguments than max_app. I believe the bootstrapped compiler would
    run a significant constant factor faster if it was compiled with a
    high (or non-existent) max_app value, since many functions take
    more than 5 arguments. (I was e.g. looking at assign_def in
    data_to_word which is run on every Op in the dataLang
    programs. The assign function takes 7 arguments).

 2. The current compilation strategy produces too much stub code in
    clos_to_bvl.  If we really must have max_app (I argue below that
    we don't), then it should be something like 10 or 15. With the
    current compilation strategy, the binary produced for max_app=9 is
    176 kb larger than the binary for max_app=3. I wanted to try
    compiling with max_app=10, but that was taking more than 5
    minutes. Compiling with max_app=3 takes 7 seconds for me. I
    believe the slow down in compilation speed with differing max_app
    values is caused by the rapid increase in code size (number of
    stubs in clos_to_bvl) as max_app gets larger. My measurements
    are below while compiling factorial (and the basis library):

      max_app=1:  10.810s  (* very little room for opts in basis *)
      max_app=2:  8.724s
      max_app=3:  7.872s  (* allows optimisations in basis code *)
      max_app=4:  8.935s
      max_app=5:  11.608s
      max_app=6:  21.771s
      max_app=7:  42.932s
      max_app=8:  1m24.872s
      max_app=9:  3m9.745s

    One of the reasons the bootstrapped compilation times are slow is
    that closLang (and subsequent) optimisations could not fire due to
    clos_mti not doing enough (due to the low max_app) when
    compiling the compiler in HOL. As we know, if clos_mti doesn't
    do enough, a lot of other optimisations don't do much.

So the question is: why do we have max_app currently? I believe it is
only there because of the way clos_to_bvl wants to jump to specialised
code for dealing with applying n arguments to an m-argument
closure. Since n and m need to be <= max_app, we have max_app *
max_app stubs (each being a non-trivial piece of code).

My suggestion is to:

 1. Remove max_app completely from the semantics, compiler and
    proofs. This should make various things neater and simpler.

 2. Have only one general-purpose stub (gen_app_stub) in clos_to_bvl
    for dealing with implementing the semantics of applying an
    arbitrary closure value to any number of arguments. Currently the
    compilation of applications of unknown closures (case: loc_opt =
    NONE) is as follows:

  (compile_exps max_app [App loc_opt x1 xs2] aux =
     let (c1,aux1) = compile_exps max_app [x1] aux in
     let (c2,aux2) = compile_exps max_app xs2 aux1 in
       ([dtcase loc_opt of
         | NONE =>
             Let (c2++c1) (mk_cl_call (Var (LENGTH c2)) (GENLIST Var
(LENGTH c2)))
         | SOME loc =>
             (Call (LENGTH c2 - 1) (SOME (loc + (num_stubs max_app)))
(c2 ++ c1))],
        aux2)) /\

  mk_cl_call cl args =
    If (Op Equal [mk_const (LENGTH args - 1); mk_el cl (mk_const 1)])
       (Call (LENGTH args - 1) NONE (args ++ [cl] ++ [mk_el cl (mk_const 0)]))
       (Call 0 (SOME (LENGTH args - 1)) (args ++ [cl]))

    I suggest we change the last line above to something that tuples
    up the given arguments and senda them to the gen_app_stub,
    i.e. something like:

       (Call 0 (SOME gen_app_stub_location)
          [cl; Op (Cons 0) args; Op (Const (& (LENGTH args))) []])

    The code above passes in LENGTH args to save gen_app_stub from
    having to compute it from the tuple. It should probably also be
    given the previously computed value of `mk_el cl (mk_const 1)` so
    that it can avoid looking up the expected number of arguments from
    the closure value. (This is a very minor optimisation.)

    The code for gen_app_stub can of course include a few tests for
    common cases at the top, e.g. it should probably first check
    whether this is a two-argument closure applied to one argument (my
    guess is that this is the most common case of a partial
    application) and then run special-purpose code only for that
    common case, but all other cases would be dealt with using generic
    code.

Thoughts? Am I missing some reason for why we have max_app? I suggest
we remove it.

Cheers,
Magnus