[CakeML-dev] New string/bytearray operations

Ramana Kumar Ramana.Kumar at cl.cam.ac.uk
Thu Mar 23 07:54:28 UTC 2017


I'll add the flag to BoundsCheckByte and see if it makes things simpler. I
can take care of the other changes if so. This will be on the strcat branch.

On 23 March 2017 at 18:49, Magnus Myreen <magnus.myreen at gmail.com> wrote:

> On 23 March 2017 at 09:45, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk> wrote:
> > The SML semantics for when to raise exceptions is a little weird:
> >
> > "If di < 0 or if |dst| < di+|src|, then the Subscript exception is
> raised."
> >
> > This means if I want to copy an empty array into (non-existent) index 3
> of a
> > length 3 array, there should be no exception.
> > This is hard to accomplish with our BoundsCheckByte semantics which
> ensures
> > a strict less-than the length.
> >
> > How bad is it to use an explicit LessEq with LengthByte, rather than
> > BoundsCheckByte?
> > Or alternatively, can we add that flag to BoundsCheckByte to change its
> > comparison to less-or-equal?
>
> Adding this flag should be easy. I'm happy to update data_to_word for
> this if someone else takes care of the other changes for the new flag.
>
> > Or do we want to not follow the SML semantics?
> >
> > On 21 March 2017 at 11:01, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk>
> wrote:
> >>
> >> We don't have any higher-order primitives currently... I imagine they'd
> be
> >> quite a bit harder to verify and implement efficiently... You would want
> >> clos_call/known type optimisations to specialise a primitive like
> unfold.
> >>
> >> I think I'm currently leaning towards reinstating the concat primitives
> >> alongside the Copy{Str,Aw8}{Str,Aw8} primitives. The concat ones
> eventually
> >> get implemented in terms of the latter, but only once it's possible to
> do so
> >> without the extra copying.
> >>
> >> On 21 March 2017 at 09:53, <Michael.Norrish at data61.csiro.au> wrote:
> >>>
> >>> Rather than having to initialise an empty string and then copy into it,
> >>> why not provide something like a tabulate or unfold function that
> generates
> >>> the string?  The function passed in would then be able to write
> directly
> >>> into the string’s memory.  I’m not sure how to set it up to write whole
> >>> strings at a time (the helper function would have to return strings,
> which
> >>> would almost certainly require a new allocation), but if done a
> character at
> >>> a time, the char values might get to stay in registers:
> >>>
> >>>
> >>>
> >>>   String.tabulate : int * (int -> char) -> string
> >>>
> >>>
> >>>
> >>> If you had substrings (implemented as string * offset * length triples
> >>> and sharing the underlying string), then something like
> >>>
> >>>
> >>>
> >>>   String.unfold : ‘a * (‘a -> (substring * ‘a) option) * int -> string
> >>>
> >>>
> >>>
> >>> could be done, where the ‘a is a generic “state” parameter (the list of
> >>> strings to concat), and where the int is maximum length of the result.
> >>>
> >>>
> >>>
> >>> I think String.concat could be implemented with the latter without any
> >>> redundant copying.
> >>>
> >>>
> >>>
> >>> Michael
> >>>
> >>>
> >>>
> >>> From: Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk>
> >>> Date: Monday, 20 March 2017 at 16:29
> >>> To: "developers at cakeml.org" <developers at cakeml.org>
> >>> Subject: Re: [CakeML-dev] New string/bytearray operations
> >>>
> >>>
> >>>
> >>> I've been rethinking these primitives after the discussion at the last
> >>> hangout, and have come up with a different set altogether. Can you see
> a
> >>> simpler or more elegant approach to the one described below?
> >>>
> >>> Here is the new approach I am considering:
> >>>
> >>> 4 copying primitives in the source language going from string/bytearray
> >>> to string/bytearray.
> >>>
> >>> The source comes with an offset and a length to copy.
> >>>
> >>> If the destination is a string, a new string is created. If the
> >>> destination is a bytearray, it must be provided.
> >>>
> >>> Concatenation (of lists of strings/arrays) and conversions between
> >>> (whole) strings and (whole) bytearrays can be implemented in the basis
> >>> library in terms of these primitives. And the primitives should be
> >>> efficiently implementable in terms of a byte-based memcpy primitive
> further
> >>> down. (There will need to be bounds checking in the source-level
> semantics
> >>> (i.e., Subscript exception can be raised), and this will sometimes be
> >>> unfortunate (i.e., when obviously in bounds), but I don't think this
> is too
> >>> costly.)
> >>>
> >>>
> >>>
> >>> On 14 March 2017 at 16:52, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk>
> >>> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> I've started adding string/bytearray conversion and concatenation
> >>> primitives (issues 244 and 245). Before getting too deep into updating
> >>> the compiler etc., may I request a review of the semantics? Here they
> >>> are:
> >>>
> >>>
> >>> https://github.com/CakeML/cakeml/commit/67dd15bbd03f516be618ba72f1d56a
> 2764209263
> >>>
> >>> I noticed that v_to_char_list might be better as vs_to_char_list, to
> >>> be run after v_to_list (rather than duplicating its
> >>> list-deconstruction functionality). But I leave such refactoring for
> >>> another time.
> >>>
> >>> Cheers,
> >>> Ramana
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Developers mailing list
> >>> Developers at cakeml.org
> >>> https://lists.cakeml.org/listinfo/developers
> >>>
> >>
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cakeml.org/pipermail/developers/attachments/20170323/985e0558/attachment.html>


More information about the Developers mailing list