[CakeML-dev] New string/bytearray operations

Mon Mar 20 07:20:58 UTC 2017

On 20 March 2017 at 08:11, Magnus Myreen <magnus.myreen at gmail.com> wrote:
> On 20 March 2017 at 07:59, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk> wrote:
>> I've just committed semantics for the primitives for strings/bytearrays,
>> which you can see here:
>> https://github.com/CakeML/cakeml/blob/strcat/semantics/semanticPrimitives.lem#L509
>>
>> I'd really like to get these right before pushing them through, since it's
>> quite a lot of work.
>> I have a few concerns currently:
>>
>> 1. It's not obvious that the bounds checking can be done efficiently. The
>> BoundsCheckByte primitive doesn't work neatly with checking whether an
>> offset+length pair is correct, so I'd have to do some special casing, e.g.,
>> when the length = 0. Should we have a different semantics which allows
>> negative or too long lengths and does something reasonable with them? Or
>> could there be another efficient operation like BoundsCheckByte for checking
>> an offset+length?
>
> If negative lengths are allowed, then the bounds check for such copies could be:
>
>   BoundsCheckByte offset andalso BoundsCheckByte (offset+length)
>
> otherwise:
>
>   BoundsCheckByte (offset + length) andalso 0 <= offset * length

I wrote the above too quickly. It should of course have been:

BoundsCheckByte (offset + length - 1)

and the entire operation should be wrapped in check whether length is
0, i.e. the check above would prefix the body of the non-zero case.
The zero case should not do much.

>> 2. It takes work to add new primitives (update type system, inferencer, all
>> backend do_app proofs..), and these 4 are reasonably complicated despite
>> being quite similar to each other. Could we get away with a smaller or
>> simpler set? E.g., maybe we only really need CopyAw8Aw8 but give it a
>> flexible semantics taking either bytearrays or strings? I don't know if that
>> can work though.
>>
>> If we add similar primitives for polymorphic arrays and vectors, it would be
>> good to factor out as much commonality as possible in the semantics... Sadly
>> it looks like a lot of copy-paste-tweaking is inevitable... I'd rather only
>> add the polymorphic ones after the byte ones are done, just to avoid too
>> much explosion of work.
>
> The difference between byte and value vectors/arrays is the values
> they carry, but the indexing and chopping the input is the same for
> all of them. Could you not move that logic into a helper function that
> is then used throughout many intermediate languages and also for both
> byte and value vectors/arrays?
>
> Cheers,
> Magnus
>
>> On 20 March 2017 at 17:32, Magnus Myreen <magnus.myreen at gmail.com> wrote:
>>>
>>> Sounds good to me. One thing to note: the semantics should read all
>>> the values before writing them. This is so that we give the obviously
>>> correct semantics to copying inside the same array. The implementation
>>> will need to check whether it should do left-to-right or right-to-left
>>> copying so that the code doesn't accidentally read from what it just
>>> wrote.
>>>
>>> Also, as I pointed out in my other email, I think these memcpy
>>> primitives should also exist for normal value arrays.
>>>
>>> Cheers,
>>> Magnus
>>>
>>> On 20 March 2017 at 06:29, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk> wrote:
>>> > I've been rethinking these primitives after the discussion at the last
>>> > hangout, and have come up with a different set altogether. Can you see a
>>> > simpler or more elegant approach to the one described below?
>>> >
>>> > Here is the new approach I am considering:
>>> >
>>> > 4 copying primitives in the source language going from string/bytearray
>>> > to
>>> > string/bytearray.
>>> > The source comes with an offset and a length to copy.
>>> > If the destination is a string, a new string is created. If the
>>> > destination
>>> > is a bytearray, it must be provided.
>>> >
>>> > Concatenation (of lists of strings/arrays) and conversions between
>>> > (whole)
>>> > strings and (whole) bytearrays can be implemented in the basis library
>>> > in
>>> > terms of these primitives. And the primitives should be efficiently
>>> > implementable in terms of a byte-based memcpy primitive further down.
>>> > (There
>>> > will need to be bounds checking in the source-level semantics (i.e.,
>>> > Subscript exception can be raised), and this will sometimes be
>>> > unfortunate
>>> > (i.e., when obviously in bounds), but I don't think this is too costly.)
>>> >
>>> > On 14 March 2017 at 16:52, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk>
>>> > wrote:
>>> >>
>>> >> Hi all,
>>> >>
>>> >> I've started adding string/bytearray conversion and concatenation
>>> >> primitives (issues 244 and 245). Before getting too deep into updating
>>> >> the compiler etc., may I request a review of the semantics? Here they
>>> >> are:
>>> >>
>>> >>
>>> >>
>>> >> https://github.com/CakeML/cakeml/commit/67dd15bbd03f516be618ba72f1d56a2764209263
>>> >>
>>> >> I noticed that v_to_char_list might be better as vs_to_char_list, to
>>> >> be run after v_to_list (rather than duplicating its
>>> >> list-deconstruction functionality). But I leave such refactoring for
>>> >> another time.
>>> >>
>>> >> Cheers,
>>> >> Ramana
>>> >
>>> >
>>> >
>>> > _______________________________________________
>>> > Developers mailing list
>>> > Developers at cakeml.org
>>> > https://lists.cakeml.org/listinfo/developers
>>> >
>>
>>