[CakeML-dev] New string/bytearray operations

Mon Mar 20 07:11:05 UTC 2017

On 20 March 2017 at 07:59, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk> wrote:
> I've just committed semantics for the primitives for strings/bytearrays,
> which you can see here:
> https://github.com/CakeML/cakeml/blob/strcat/semantics/semanticPrimitives.lem#L509
>
> I'd really like to get these right before pushing them through, since it's
> quite a lot of work.
> I have a few concerns currently:
>
> 1. It's not obvious that the bounds checking can be done efficiently. The
> BoundsCheckByte primitive doesn't work neatly with checking whether an
> offset+length pair is correct, so I'd have to do some special casing, e.g.,
> when the length = 0. Should we have a different semantics which allows
> negative or too long lengths and does something reasonable with them? Or
> could there be another efficient operation like BoundsCheckByte for checking
> an offset+length?

If negative lengths are allowed, then the bounds check for such copies could be:

  BoundsCheckByte offset andalso BoundsCheckByte (offset+length)

otherwise:

  BoundsCheckByte (offset + length) andalso 0 <= offset * length

> 2. It takes work to add new primitives (update type system, inferencer, all
> backend do_app proofs..), and these 4 are reasonably complicated despite
> being quite similar to each other. Could we get away with a smaller or
> simpler set? E.g., maybe we only really need CopyAw8Aw8 but give it a
> flexible semantics taking either bytearrays or strings? I don't know if that
> can work though.
>
> If we add similar primitives for polymorphic arrays and vectors, it would be
> good to factor out as much commonality as possible in the semantics... Sadly
> it looks like a lot of copy-paste-tweaking is inevitable... I'd rather only
> add the polymorphic ones after the byte ones are done, just to avoid too
> much explosion of work.

The difference between byte and value vectors/arrays is the values
they carry, but the indexing and chopping the input is the same for
all of them. Could you not move that logic into a helper function that
is then used throughout many intermediate languages and also for both
byte and value vectors/arrays?

Cheers,
Magnus

> On 20 March 2017 at 17:32, Magnus Myreen <magnus.myreen at gmail.com> wrote:
>>
>> Sounds good to me. One thing to note: the semantics should read all
>> the values before writing them. This is so that we give the obviously
>> correct semantics to copying inside the same array. The implementation
>> will need to check whether it should do left-to-right or right-to-left
>> copying so that the code doesn't accidentally read from what it just
>> wrote.
>>
>> Also, as I pointed out in my other email, I think these memcpy
>> primitives should also exist for normal value arrays.
>>
>> Cheers,
>> Magnus
>>
>> On 20 March 2017 at 06:29, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk> wrote:
>> > I've been rethinking these primitives after the discussion at the last
>> > hangout, and have come up with a different set altogether. Can you see a
>> > simpler or more elegant approach to the one described below?
>> >
>> > Here is the new approach I am considering:
>> >
>> > 4 copying primitives in the source language going from string/bytearray
>> > to
>> > string/bytearray.
>> > The source comes with an offset and a length to copy.
>> > If the destination is a string, a new string is created. If the
>> > destination
>> > is a bytearray, it must be provided.
>> >
>> > Concatenation (of lists of strings/arrays) and conversions between
>> > (whole)
>> > strings and (whole) bytearrays can be implemented in the basis library
>> > in
>> > terms of these primitives. And the primitives should be efficiently
>> > implementable in terms of a byte-based memcpy primitive further down.
>> > (There
>> > will need to be bounds checking in the source-level semantics (i.e.,
>> > Subscript exception can be raised), and this will sometimes be
>> > unfortunate
>> > (i.e., when obviously in bounds), but I don't think this is too costly.)
>> >
>> > On 14 March 2017 at 16:52, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk>
>> > wrote:
>> >>
>> >> Hi all,
>> >>
>> >> I've started adding string/bytearray conversion and concatenation
>> >> primitives (issues 244 and 245). Before getting too deep into updating
>> >> the compiler etc., may I request a review of the semantics? Here they
>> >> are:
>> >>
>> >>
>> >>
>> >> https://github.com/CakeML/cakeml/commit/67dd15bbd03f516be618ba72f1d56a2764209263
>> >>
>> >> I noticed that v_to_char_list might be better as vs_to_char_list, to
>> >> be run after v_to_list (rather than duplicating its
>> >> list-deconstruction functionality). But I leave such refactoring for
>> >> another time.
>> >>
>> >> Cheers,
>> >> Ramana
>> >
>> >
>> >
>> > _______________________________________________
>> > Developers mailing list
>> > Developers at cakeml.org
>> > https://lists.cakeml.org/listinfo/developers
>> >
>
>