[CakeML-dev] Iterated bootstrap theorem?

Sun Feb 12 09:47:00 UTC 2017

I just want to add a quick clarification.

>>        fun sexp_to_ast = (* the sexpression parser translated *)
>>        fun evaluate ast = (* translation of functional big-step semantics *);
>>        fun read_all () = (* reads everything from STDIN *);
>>        fun print r = (* prints result to STDOUT *);
>>        val _ = (print (evaluate (sexp_to_ast (read_all ()))));

>>  3. In the logic evaluate (sexp_to_string o ast_to_sexp) on the full
>>    bootstrap-translated compiler.

For the entire scheme to make sense, "the full bootstrap-translated
compiler" refers to a program that applies the compiler to a deep
embedding of itself, i.e.

  ... (* decls that define compiler_backend *)
  val prog = ... (* constructors that produce value of AST for
parse_infer_compile *)
  val _ = ref (compiler_backend prog)

Here I'm assuming that the `evaluate` function from above will return
the final value assigned to the store.

In summary, the general idea is to just move everything we currently
do with EVAL in the logic, i.e. roughly EVAL ``compiler_backend
entire_compiler``, into an external program. The external program is
must faster to produce because it is small and we have a theorem about
the external program in the logic. In Yong Kiam's terminology, this is
a two-fold bootstrap and the correctness theorem would say that by
running the external program, one gets another external program that
has the behaviour of the verified compiler.

Cheers,
Magnus

On 12 February 2017 at 09:41, Magnus Myreen <magnus.myreen at gmail.com> wrote:
>>   (sexp_to_ast o string_to_sexp o sexp_to_string o ast_to_sexp)
>
> By the way, it might be nicest to use a non-standard s-expression form
> to make parsing of concrete syntax trivial. I prefer Thomas Sewell's
> style where the syntax has no parenthesis and instead every AST
> element has a unique prefix, e.g. what would be in standard syntax:
>
>   ((1 . 2) . (4 . 5))
>
> could be:
>
>   dot dot int 1 int 2 dot int 4 int 5
>
> One can of course choose prefixes that make the representation much shorter.
>
> Cheers,
> Magnus
>
> On 12 February 2017 at 08:55, Magnus Myreen <magnus.myreen at gmail.com> wrote:
>> On 11 February 2017 at 22:06, Yong Kiam <tanyongkiam at gmail.com> wrote:
>>> Thanks for clarifying, I was thinking of:
>>>
>>> 1) There is a compiler in the logic, e.g. compile :ast -> bytes (ignoring
>>> configurations)
>>>
>>> 2) Translate this compiler, to get some cakeml AST (in HOL) [..., val
>>> compile = ... ]
>>>
>>> 3) Add new FFI/IO decls to the AST (as we do now) to get the full compiler
>>> [..., val compile = ..., val main = ... ]
>>>
>>> 4) Define a new constant for the AST in 3),  |- cml_ast =  [..., val compile
>>> = ..., val main = ... ]
>>>
>>> 5) Translate cml_ast to get the deeply embedded version of the AST: ``val
>>> cml_ast  = [ ... ]``
>>>
>>> 6) Stick cml_ast onto the back of 2): [ ..., val compile = ..., val cml_ast
>>> = [ ...], val main = print (compile cml_ast)]
>>>
>>> 7) Run the in-logic compiler on 6 (doesn't need to be the same as 1)
>>>
>>> I think Q2 would be 4) and Q1 would be 2).
>>>
>>> How does it speed things up (maybe?): 7) would run a "fast" version of the
>>> compiler in the logic, but it produces a program that runs a "slow" version
>>> of the bootstrap outside the logic.
>>>
>>> RE: translation bottleneck, it won't get past it, because we still need to
>>> translate the whole compiler anyhow. The main purpose would be to speedup
>>> bootstrap compilation, but I am not sure if the "fast" in-logic compiler
>>> exists though.
>>
>> This is a nice idea. It could cut down on the evaluation time quite a
>> lot, if we had a verified compiler that evaluates quickly in
>> logic. The translation time would not be improved.
>>
>> Regarding quick to evaluate compilers: version 1 was much faster than
>> the current one to a large extent because it was simple to compile via
>> the bytecode: each bytecode instruction compiled to a fixed list of
>> bytes, which avoids all of the heavy instruction encoding and all of
>> the wordLang/stackLang phases.
>>
>> So the question is: could we have a compiler that's fast to evaluate?
>>
>> A different idea: how about writing an interpreter for CakeML source
>> syntax (we already have one, i.e. the functional semantics) and use
>> that to produce the bootstrapped compiler outside of the logic
>> quickly? I'm thinking of the following steps:
>>
>>  1. Produce a CakeML program that is the following
>>
>>        fun sexp_to_ast = (* the sexpression parser translated *)
>>        fun evaluate ast = (* translation of functional big-step semantics *);
>>        fun read_all () = (* reads everything from STDIN *);
>>        fun print r = (* prints result to STDOUT *);
>>        val _ = (print (evaluate (sexp_to_ast (read_all ()))));
>>
>>  2. Compile the above program in the logic. This should be fast
>>     because nothing is large.
>>
>>  3. In the logic evaluate (sexp_to_string o ast_to_sexp) on the full
>>     bootstrap-translated compiler.
>>
>>  4. Outside of the logic: run the binary produced by step 2 with the
>>     string from 3 fed in via STDIN.
>>
>> For all of this to hang together, we need the following theorem to be
>> the identity:
>>
>>   (sexp_to_ast o string_to_sexp o sexp_to_string o ast_to_sexp)
>>
>> I think this is a cool use of the functional big-step semantics.
>>
>> Cheers,
>> Magnus
>>
>>
>>> On Sat, Feb 11, 2017 at 6:56 AM, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk>
>>> wrote:
>>>>
>>>> This looks like an interesting idea. I haven't understood it fully.
>>>> Let me try to clarify.
>>>> Here are 2 questions I have about your 2-fold example, which I'll copy
>>>> below:
>>>>
>>>> val prog2fold = process_topdecs`
>>>> val (compile,conf) = ^Q1;
>>>> val cml_ast = ^Q2;
>>>> val main = print(compile conf cml_ast);`
>>>>
>>>> My questions are about what Q1 and Q2 should be.
>>>>
>>>> I assume the above program, prog2fold, is going to be used as input
>>>> for proof-grounded evaluation of the compiler, i.e., we will produce a
>>>> theorem like this by EVAL: |- compile ^prog2fold = <... some bytes
>>>> ...>
>>>>
>>>> I guess Q2 is produced by translator (i.e., what is currently in
>>>> compiler/bootstrap/translation). I understand that part, if so.
>>>>
>>>> Is Q1 also produced by the translator? i.e., we roughly have Q1 = Q2?
>>>> Or what is it?
>>>>
>>>> I think this would be interesting if it provided a principled way to
>>>> substantially increase the speed for bootstrapping. It also seems like
>>>> it would interact interestingly with the eval primitive. It seems to
>>>> have no effect on the translation-time bottleneck for bootstrap speed,
>>>> though.
>>>>
>>>> On 11 February 2017 at 16:34, Yong Kiam <tanyongkiam at gmail.com> wrote:
>>>> > Hi dev,
>>>> >
>>>> > This is more of a random question than anything. I was wondering if it
>>>> > is
>>>> > possible to get a theorem about k-fold bootstrap of the compiler, i.e.
>>>> > using
>>>> > the compiler to iterate compilation on itself once in the logic and k-1
>>>> > times (outside the logic).
>>>> >
>>>> > Concretely, 1-fold bootstrap would be what we have right now, so the
>>>> > code
>>>> > being compiled in the logic looks like:
>>>> >
>>>> > ...
>>>> > val main str = print (compile conf str);
>>>> >
>>>> > A 2-fold bootstrap could look like:
>>>> >
>>>> > ...
>>>> > val cml_ast = ... //from an iterated translation of the compiler AST in
>>>> > HOL
>>>> > into a translated AST
>>>> > val main = print(compile conf (cml_ast));
>>>> >
>>>> > i.e. the output is a program, that when run, compiles the compiler AST.
>>>> >
>>>> > My sense is that such a k-fold bootstrap theorem could probably be done
>>>> > with
>>>> > what we have now (replace cml_ast with the appropriate n-1 th
>>>> > translation).
>>>> > The corresponding top-level correctness theorem would have some weird
>>>> > nested
>>>> > assumptions about each time the bootstrap is ran.
>>>> >
>>>> > One possible use of 2-fold bootstrap: one could evaluate in HOL using a
>>>> > fast
>>>> > (but less optimizing) version of the compiler, then turn on all the
>>>> > optimizations inside cml_ast. It probably doesn't matter for our
>>>> > bootstrap
>>>> > speed though.
>>>> >
>>>> >
>>>> > _______________________________________________
>>>> > Developers mailing list
>>>> > Developers at cakeml.org
>>>> > https://lists.cakeml.org/listinfo/developers
>>>> >
>>>
>>>
>>>
>>> _______________________________________________
>>> Developers mailing list
>>> Developers at cakeml.org
>>> https://lists.cakeml.org/listinfo/developers
>>>