[CakeML-dev] Iterated bootstrap theorem?

Magnus Myreen magnus.myreen at gmail.com
Sun Feb 12 08:41:29 UTC 2017


>   (sexp_to_ast o string_to_sexp o sexp_to_string o ast_to_sexp)

By the way, it might be nicest to use a non-standard s-expression form
to make parsing of concrete syntax trivial. I prefer Thomas Sewell's
style where the syntax has no parenthesis and instead every AST
element has a unique prefix, e.g. what would be in standard syntax:

  ((1 . 2) . (4 . 5))

could be:

  dot dot int 1 int 2 dot int 4 int 5

One can of course choose prefixes that make the representation much shorter.

Cheers,
Magnus

On 12 February 2017 at 08:55, Magnus Myreen <magnus.myreen at gmail.com> wrote:
> On 11 February 2017 at 22:06, Yong Kiam <tanyongkiam at gmail.com> wrote:
>> Thanks for clarifying, I was thinking of:
>>
>> 1) There is a compiler in the logic, e.g. compile :ast -> bytes (ignoring
>> configurations)
>>
>> 2) Translate this compiler, to get some cakeml AST (in HOL) [..., val
>> compile = ... ]
>>
>> 3) Add new FFI/IO decls to the AST (as we do now) to get the full compiler
>> [..., val compile = ..., val main = ... ]
>>
>> 4) Define a new constant for the AST in 3),  |- cml_ast =  [..., val compile
>> = ..., val main = ... ]
>>
>> 5) Translate cml_ast to get the deeply embedded version of the AST: ``val
>> cml_ast  = [ ... ]``
>>
>> 6) Stick cml_ast onto the back of 2): [ ..., val compile = ..., val cml_ast
>> = [ ...], val main = print (compile cml_ast)]
>>
>> 7) Run the in-logic compiler on 6 (doesn't need to be the same as 1)
>>
>> I think Q2 would be 4) and Q1 would be 2).
>>
>> How does it speed things up (maybe?): 7) would run a "fast" version of the
>> compiler in the logic, but it produces a program that runs a "slow" version
>> of the bootstrap outside the logic.
>>
>> RE: translation bottleneck, it won't get past it, because we still need to
>> translate the whole compiler anyhow. The main purpose would be to speedup
>> bootstrap compilation, but I am not sure if the "fast" in-logic compiler
>> exists though.
>
> This is a nice idea. It could cut down on the evaluation time quite a
> lot, if we had a verified compiler that evaluates quickly in
> logic. The translation time would not be improved.
>
> Regarding quick to evaluate compilers: version 1 was much faster than
> the current one to a large extent because it was simple to compile via
> the bytecode: each bytecode instruction compiled to a fixed list of
> bytes, which avoids all of the heavy instruction encoding and all of
> the wordLang/stackLang phases.
>
> So the question is: could we have a compiler that's fast to evaluate?
>
> A different idea: how about writing an interpreter for CakeML source
> syntax (we already have one, i.e. the functional semantics) and use
> that to produce the bootstrapped compiler outside of the logic
> quickly? I'm thinking of the following steps:
>
>  1. Produce a CakeML program that is the following
>
>        fun sexp_to_ast = (* the sexpression parser translated *)
>        fun evaluate ast = (* translation of functional big-step semantics *);
>        fun read_all () = (* reads everything from STDIN *);
>        fun print r = (* prints result to STDOUT *);
>        val _ = (print (evaluate (sexp_to_ast (read_all ()))));
>
>  2. Compile the above program in the logic. This should be fast
>     because nothing is large.
>
>  3. In the logic evaluate (sexp_to_string o ast_to_sexp) on the full
>     bootstrap-translated compiler.
>
>  4. Outside of the logic: run the binary produced by step 2 with the
>     string from 3 fed in via STDIN.
>
> For all of this to hang together, we need the following theorem to be
> the identity:
>
>   (sexp_to_ast o string_to_sexp o sexp_to_string o ast_to_sexp)
>
> I think this is a cool use of the functional big-step semantics.
>
> Cheers,
> Magnus
>
>
>> On Sat, Feb 11, 2017 at 6:56 AM, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk>
>> wrote:
>>>
>>> This looks like an interesting idea. I haven't understood it fully.
>>> Let me try to clarify.
>>> Here are 2 questions I have about your 2-fold example, which I'll copy
>>> below:
>>>
>>> val prog2fold = process_topdecs`
>>> val (compile,conf) = ^Q1;
>>> val cml_ast = ^Q2;
>>> val main = print(compile conf cml_ast);`
>>>
>>> My questions are about what Q1 and Q2 should be.
>>>
>>> I assume the above program, prog2fold, is going to be used as input
>>> for proof-grounded evaluation of the compiler, i.e., we will produce a
>>> theorem like this by EVAL: |- compile ^prog2fold = <... some bytes
>>> ...>
>>>
>>> I guess Q2 is produced by translator (i.e., what is currently in
>>> compiler/bootstrap/translation). I understand that part, if so.
>>>
>>> Is Q1 also produced by the translator? i.e., we roughly have Q1 = Q2?
>>> Or what is it?
>>>
>>> I think this would be interesting if it provided a principled way to
>>> substantially increase the speed for bootstrapping. It also seems like
>>> it would interact interestingly with the eval primitive. It seems to
>>> have no effect on the translation-time bottleneck for bootstrap speed,
>>> though.
>>>
>>> On 11 February 2017 at 16:34, Yong Kiam <tanyongkiam at gmail.com> wrote:
>>> > Hi dev,
>>> >
>>> > This is more of a random question than anything. I was wondering if it
>>> > is
>>> > possible to get a theorem about k-fold bootstrap of the compiler, i.e.
>>> > using
>>> > the compiler to iterate compilation on itself once in the logic and k-1
>>> > times (outside the logic).
>>> >
>>> > Concretely, 1-fold bootstrap would be what we have right now, so the
>>> > code
>>> > being compiled in the logic looks like:
>>> >
>>> > ...
>>> > val main str = print (compile conf str);
>>> >
>>> > A 2-fold bootstrap could look like:
>>> >
>>> > ...
>>> > val cml_ast = ... //from an iterated translation of the compiler AST in
>>> > HOL
>>> > into a translated AST
>>> > val main = print(compile conf (cml_ast));
>>> >
>>> > i.e. the output is a program, that when run, compiles the compiler AST.
>>> >
>>> > My sense is that such a k-fold bootstrap theorem could probably be done
>>> > with
>>> > what we have now (replace cml_ast with the appropriate n-1 th
>>> > translation).
>>> > The corresponding top-level correctness theorem would have some weird
>>> > nested
>>> > assumptions about each time the bootstrap is ran.
>>> >
>>> > One possible use of 2-fold bootstrap: one could evaluate in HOL using a
>>> > fast
>>> > (but less optimizing) version of the compiler, then turn on all the
>>> > optimizations inside cml_ast. It probably doesn't matter for our
>>> > bootstrap
>>> > speed though.
>>> >
>>> >
>>> > _______________________________________________
>>> > Developers mailing list
>>> > Developers at cakeml.org
>>> > https://lists.cakeml.org/listinfo/developers
>>> >
>>
>>
>>
>> _______________________________________________
>> Developers mailing list
>> Developers at cakeml.org
>> https://lists.cakeml.org/listinfo/developers
>>



More information about the Developers mailing list