[CakeML-dev] Iterated bootstrap theorem?

Sun Feb 12 07:55:54 UTC 2017

On 11 February 2017 at 22:06, Yong Kiam <tanyongkiam at gmail.com> wrote:
> Thanks for clarifying, I was thinking of:
>
> 1) There is a compiler in the logic, e.g. compile :ast -> bytes (ignoring
> configurations)
>
> 2) Translate this compiler, to get some cakeml AST (in HOL) [..., val
> compile = ... ]
>
> 3) Add new FFI/IO decls to the AST (as we do now) to get the full compiler
> [..., val compile = ..., val main = ... ]
>
> 4) Define a new constant for the AST in 3),  |- cml_ast =  [..., val compile
> = ..., val main = ... ]
>
> 5) Translate cml_ast to get the deeply embedded version of the AST: ``val
> cml_ast  = [ ... ]``
>
> 6) Stick cml_ast onto the back of 2): [ ..., val compile = ..., val cml_ast
> = [ ...], val main = print (compile cml_ast)]
>
> 7) Run the in-logic compiler on 6 (doesn't need to be the same as 1)
>
> I think Q2 would be 4) and Q1 would be 2).
>
> How does it speed things up (maybe?): 7) would run a "fast" version of the
> compiler in the logic, but it produces a program that runs a "slow" version
> of the bootstrap outside the logic.
>
> RE: translation bottleneck, it won't get past it, because we still need to
> translate the whole compiler anyhow. The main purpose would be to speedup
> bootstrap compilation, but I am not sure if the "fast" in-logic compiler
> exists though.

This is a nice idea. It could cut down on the evaluation time quite a
lot, if we had a verified compiler that evaluates quickly in
logic. The translation time would not be improved.

Regarding quick to evaluate compilers: version 1 was much faster than
the current one to a large extent because it was simple to compile via
the bytecode: each bytecode instruction compiled to a fixed list of
bytes, which avoids all of the heavy instruction encoding and all of
the wordLang/stackLang phases.

So the question is: could we have a compiler that's fast to evaluate?

A different idea: how about writing an interpreter for CakeML source
syntax (we already have one, i.e. the functional semantics) and use
that to produce the bootstrapped compiler outside of the logic
quickly? I'm thinking of the following steps:

 1. Produce a CakeML program that is the following

       fun sexp_to_ast = (* the sexpression parser translated *)
       fun evaluate ast = (* translation of functional big-step semantics *);
       fun read_all () = (* reads everything from STDIN *);
       fun print r = (* prints result to STDOUT *);
       val _ = (print (evaluate (sexp_to_ast (read_all ()))));

 2. Compile the above program in the logic. This should be fast
    because nothing is large.

 3. In the logic evaluate (sexp_to_string o ast_to_sexp) on the full
    bootstrap-translated compiler.

 4. Outside of the logic: run the binary produced by step 2 with the
    string from 3 fed in via STDIN.

For all of this to hang together, we need the following theorem to be
the identity:

  (sexp_to_ast o string_to_sexp o sexp_to_string o ast_to_sexp)

I think this is a cool use of the functional big-step semantics.

Cheers,
Magnus

> On Sat, Feb 11, 2017 at 6:56 AM, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk>
> wrote:
>>
>> This looks like an interesting idea. I haven't understood it fully.
>> Let me try to clarify.
>> Here are 2 questions I have about your 2-fold example, which I'll copy
>> below:
>>
>> val prog2fold = process_topdecs`
>> val (compile,conf) = ^Q1;
>> val cml_ast = ^Q2;
>> val main = print(compile conf cml_ast);`
>>
>> My questions are about what Q1 and Q2 should be.
>>
>> I assume the above program, prog2fold, is going to be used as input
>> for proof-grounded evaluation of the compiler, i.e., we will produce a
>> theorem like this by EVAL: |- compile ^prog2fold = <... some bytes
>> ...>
>>
>> I guess Q2 is produced by translator (i.e., what is currently in
>> compiler/bootstrap/translation). I understand that part, if so.
>>
>> Is Q1 also produced by the translator? i.e., we roughly have Q1 = Q2?
>> Or what is it?
>>
>> I think this would be interesting if it provided a principled way to
>> substantially increase the speed for bootstrapping. It also seems like
>> it would interact interestingly with the eval primitive. It seems to
>> have no effect on the translation-time bottleneck for bootstrap speed,
>> though.
>>
>> On 11 February 2017 at 16:34, Yong Kiam <tanyongkiam at gmail.com> wrote:
>> > Hi dev,
>> >
>> > This is more of a random question than anything. I was wondering if it
>> > is
>> > possible to get a theorem about k-fold bootstrap of the compiler, i.e.
>> > using
>> > the compiler to iterate compilation on itself once in the logic and k-1
>> > times (outside the logic).
>> >
>> > Concretely, 1-fold bootstrap would be what we have right now, so the
>> > code
>> > being compiled in the logic looks like:
>> >
>> > ...
>> > val main str = print (compile conf str);
>> >
>> > A 2-fold bootstrap could look like:
>> >
>> > ...
>> > val cml_ast = ... //from an iterated translation of the compiler AST in
>> > HOL
>> > into a translated AST
>> > val main = print(compile conf (cml_ast));
>> >
>> > i.e. the output is a program, that when run, compiles the compiler AST.
>> >
>> > My sense is that such a k-fold bootstrap theorem could probably be done
>> > with
>> > what we have now (replace cml_ast with the appropriate n-1 th
>> > translation).
>> > The corresponding top-level correctness theorem would have some weird
>> > nested
>> > assumptions about each time the bootstrap is ran.
>> >
>> > One possible use of 2-fold bootstrap: one could evaluate in HOL using a
>> > fast
>> > (but less optimizing) version of the compiler, then turn on all the
>> > optimizations inside cml_ast. It probably doesn't matter for our
>> > bootstrap
>> > speed though.
>> >
>> >
>> > _______________________________________________
>> > Developers mailing list
>> > Developers at cakeml.org
>> > https://lists.cakeml.org/listinfo/developers
>> >
>
>
>
> _______________________________________________
> Developers mailing list
> Developers at cakeml.org
> https://lists.cakeml.org/listinfo/developers
>