[CakeML-dev] New string/bytearray operations

Mon Mar 20 07:29:12 UTC 2017

On 20 March 2017 at 08:23, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk> wrote:
> I think the bounds checks are more complicated than that because
> BoundsCheckByte checks strict < the array length, but for offset+length you
> want <=. Could there easily be a version that checks less or equal instead?

My previous email corrected it to offset+length-1. I don't think it
would be hard to have a flag to BoundsCheck which determines whether
it does < or <=. However, this is just optimising away one `-1`, which
doesn't seem worth it considering that a successful bounds check will
always be followed by a loop with a lot of conditionals.

> Yes I'll try to factor out a helper for the take/drop logic. The difference
> between chars and bytes makes it more complicated than you might have hoped
> though. I guess I should map everything into the target type before calling
> the helper.
>
> On 20 Mar 2017 6:11 pm, "Magnus Myreen" <magnus.myreen at gmail.com> wrote:
>>
>> On 20 March 2017 at 07:59, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk> wrote:
>> > I've just committed semantics for the primitives for strings/bytearrays,
>> > which you can see here:
>> >
>> > https://github.com/CakeML/cakeml/blob/strcat/semantics/semanticPrimitives.lem#L509
>> >
>> > I'd really like to get these right before pushing them through, since
>> > it's
>> > quite a lot of work.
>> > I have a few concerns currently:
>> >
>> > 1. It's not obvious that the bounds checking can be done efficiently.
>> > The
>> > BoundsCheckByte primitive doesn't work neatly with checking whether an
>> > offset+length pair is correct, so I'd have to do some special casing,
>> > e.g.,
>> > when the length = 0. Should we have a different semantics which allows
>> > negative or too long lengths and does something reasonable with them? Or
>> > could there be another efficient operation like BoundsCheckByte for
>> > checking
>> > an offset+length?
>>
>> If negative lengths are allowed, then the bounds check for such copies
>> could be:
>>
>>   BoundsCheckByte offset andalso BoundsCheckByte (offset+length)
>>
>> otherwise:
>>
>>   BoundsCheckByte (offset + length) andalso 0 <= offset * length
>>
>> > 2. It takes work to add new primitives (update type system, inferencer,
>> > all
>> > backend do_app proofs..), and these 4 are reasonably complicated despite
>> > being quite similar to each other. Could we get away with a smaller or
>> > simpler set? E.g., maybe we only really need CopyAw8Aw8 but give it a
>> > flexible semantics taking either bytearrays or strings? I don't know if
>> > that
>> > can work though.
>> >
>> > If we add similar primitives for polymorphic arrays and vectors, it
>> > would be
>> > good to factor out as much commonality as possible in the semantics...
>> > Sadly
>> > it looks like a lot of copy-paste-tweaking is inevitable... I'd rather
>> > only
>> > add the polymorphic ones after the byte ones are done, just to avoid too
>> > much explosion of work.
>>
>> The difference between byte and value vectors/arrays is the values
>> they carry, but the indexing and chopping the input is the same for
>> all of them. Could you not move that logic into a helper function that
>> is then used throughout many intermediate languages and also for both
>> byte and value vectors/arrays?
>>
>> Cheers,
>> Magnus
>>
>> > On 20 March 2017 at 17:32, Magnus Myreen <magnus.myreen at gmail.com>
>> > wrote:
>> >>
>> >> Sounds good to me. One thing to note: the semantics should read all
>> >> the values before writing them. This is so that we give the obviously
>> >> correct semantics to copying inside the same array. The implementation
>> >> will need to check whether it should do left-to-right or right-to-left
>> >> copying so that the code doesn't accidentally read from what it just
>> >> wrote.
>> >>
>> >> Also, as I pointed out in my other email, I think these memcpy
>> >> primitives should also exist for normal value arrays.
>> >>
>> >> Cheers,
>> >> Magnus
>> >>
>> >> On 20 March 2017 at 06:29, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk>
>> >> wrote:
>> >> > I've been rethinking these primitives after the discussion at the
>> >> > last
>> >> > hangout, and have come up with a different set altogether. Can you
>> >> > see a
>> >> > simpler or more elegant approach to the one described below?
>> >> >
>> >> > Here is the new approach I am considering:
>> >> >
>> >> > 4 copying primitives in the source language going from
>> >> > string/bytearray
>> >> > to
>> >> > string/bytearray.
>> >> > The source comes with an offset and a length to copy.
>> >> > If the destination is a string, a new string is created. If the
>> >> > destination
>> >> > is a bytearray, it must be provided.
>> >> >
>> >> > Concatenation (of lists of strings/arrays) and conversions between
>> >> > (whole)
>> >> > strings and (whole) bytearrays can be implemented in the basis
>> >> > library
>> >> > in
>> >> > terms of these primitives. And the primitives should be efficiently
>> >> > implementable in terms of a byte-based memcpy primitive further down.
>> >> > (There
>> >> > will need to be bounds checking in the source-level semantics (i.e.,
>> >> > Subscript exception can be raised), and this will sometimes be
>> >> > unfortunate
>> >> > (i.e., when obviously in bounds), but I don't think this is too
>> >> > costly.)
>> >> >
>> >> > On 14 March 2017 at 16:52, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk>
>> >> > wrote:
>> >> >>
>> >> >> Hi all,
>> >> >>
>> >> >> I've started adding string/bytearray conversion and concatenation
>> >> >> primitives (issues 244 and 245). Before getting too deep into
>> >> >> updating
>> >> >> the compiler etc., may I request a review of the semantics? Here
>> >> >> they
>> >> >> are:
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> https://github.com/CakeML/cakeml/commit/67dd15bbd03f516be618ba72f1d56a2764209263
>> >> >>
>> >> >> I noticed that v_to_char_list might be better as vs_to_char_list, to
>> >> >> be run after v_to_list (rather than duplicating its
>> >> >> list-deconstruction functionality). But I leave such refactoring for
>> >> >> another time.
>> >> >>
>> >> >> Cheers,
>> >> >> Ramana
>> >> >
>> >> >
>> >> >
>> >> > _______________________________________________
>> >> > Developers mailing list
>> >> > Developers at cakeml.org
>> >> > https://lists.cakeml.org/listinfo/developers
>> >> >
>> >
>> >