[CakeML-dev] New string/bytearray operations

Mon Mar 20 07:23:57 UTC 2017

I think the bounds checks are more complicated than that because
BoundsCheckByte checks strict < the array length, but for offset+length you
want <=. Could there easily be a version that checks less or equal instead?

Yes I'll try to factor out a helper for the take/drop logic. The difference
between chars and bytes makes it more complicated than you might have hoped
though. I guess I should map everything into the target type before calling
the helper.

On 20 Mar 2017 6:11 pm, "Magnus Myreen" <magnus.myreen at gmail.com> wrote:

> On 20 March 2017 at 07:59, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk> wrote:
> > I've just committed semantics for the primitives for strings/bytearrays,
> > which you can see here:
> > https://github.com/CakeML/cakeml/blob/strcat/semantics/
> semanticPrimitives.lem#L509
> >
> > I'd really like to get these right before pushing them through, since
> it's
> > quite a lot of work.
> > I have a few concerns currently:
> >
> > 1. It's not obvious that the bounds checking can be done efficiently. The
> > BoundsCheckByte primitive doesn't work neatly with checking whether an
> > offset+length pair is correct, so I'd have to do some special casing,
> e.g.,
> > when the length = 0. Should we have a different semantics which allows
> > negative or too long lengths and does something reasonable with them? Or
> > could there be another efficient operation like BoundsCheckByte for
> checking
> > an offset+length?
>
> If negative lengths are allowed, then the bounds check for such copies
> could be:
>
>   BoundsCheckByte offset andalso BoundsCheckByte (offset+length)
>
> otherwise:
>
>   BoundsCheckByte (offset + length) andalso 0 <= offset * length
>
> > 2. It takes work to add new primitives (update type system, inferencer,
> all
> > backend do_app proofs..), and these 4 are reasonably complicated despite
> > being quite similar to each other. Could we get away with a smaller or
> > simpler set? E.g., maybe we only really need CopyAw8Aw8 but give it a
> > flexible semantics taking either bytearrays or strings? I don't know if
> that
> > can work though.
> >
> > If we add similar primitives for polymorphic arrays and vectors, it
> would be
> > good to factor out as much commonality as possible in the semantics...
> Sadly
> > it looks like a lot of copy-paste-tweaking is inevitable... I'd rather
> only
> > add the polymorphic ones after the byte ones are done, just to avoid too
> > much explosion of work.
>
> The difference between byte and value vectors/arrays is the values
> they carry, but the indexing and chopping the input is the same for
> all of them. Could you not move that logic into a helper function that
> is then used throughout many intermediate languages and also for both
> byte and value vectors/arrays?
>
> Cheers,
> Magnus
>
> > On 20 March 2017 at 17:32, Magnus Myreen <magnus.myreen at gmail.com>
> wrote:
> >>
> >> Sounds good to me. One thing to note: the semantics should read all
> >> the values before writing them. This is so that we give the obviously
> >> correct semantics to copying inside the same array. The implementation
> >> will need to check whether it should do left-to-right or right-to-left
> >> copying so that the code doesn't accidentally read from what it just
> >> wrote.
> >>
> >> Also, as I pointed out in my other email, I think these memcpy
> >> primitives should also exist for normal value arrays.
> >>
> >> Cheers,
> >> Magnus
> >>
> >> On 20 March 2017 at 06:29, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk>
> wrote:
> >> > I've been rethinking these primitives after the discussion at the last
> >> > hangout, and have come up with a different set altogether. Can you
> see a
> >> > simpler or more elegant approach to the one described below?
> >> >
> >> > Here is the new approach I am considering:
> >> >
> >> > 4 copying primitives in the source language going from
> string/bytearray
> >> > to
> >> > string/bytearray.
> >> > The source comes with an offset and a length to copy.
> >> > If the destination is a string, a new string is created. If the
> >> > destination
> >> > is a bytearray, it must be provided.
> >> >
> >> > Concatenation (of lists of strings/arrays) and conversions between
> >> > (whole)
> >> > strings and (whole) bytearrays can be implemented in the basis library
> >> > in
> >> > terms of these primitives. And the primitives should be efficiently
> >> > implementable in terms of a byte-based memcpy primitive further down.
> >> > (There
> >> > will need to be bounds checking in the source-level semantics (i.e.,
> >> > Subscript exception can be raised), and this will sometimes be
> >> > unfortunate
> >> > (i.e., when obviously in bounds), but I don't think this is too
> costly.)
> >> >
> >> > On 14 March 2017 at 16:52, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk>
> >> > wrote:
> >> >>
> >> >> Hi all,
> >> >>
> >> >> I've started adding string/bytearray conversion and concatenation
> >> >> primitives (issues 244 and 245). Before getting too deep into
> updating
> >> >> the compiler etc., may I request a review of the semantics? Here they
> >> >> are:
> >> >>
> >> >>
> >> >>
> >> >> https://github.com/CakeML/cakeml/commit/
> 67dd15bbd03f516be618ba72f1d56a2764209263
> >> >>
> >> >> I noticed that v_to_char_list might be better as vs_to_char_list, to
> >> >> be run after v_to_list (rather than duplicating its
> >> >> list-deconstruction functionality). But I leave such refactoring for
> >> >> another time.
> >> >>
> >> >> Cheers,
> >> >> Ramana
> >> >
> >> >
> >> >
> >> > _______________________________________________
> >> > Developers mailing list
> >> > Developers at cakeml.org
> >> > https://lists.cakeml.org/listinfo/developers
> >> >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cakeml.org/pipermail/developers/attachments/20170320/40cf3ed1/attachment-0001.html>