[CakeML-dev] New string/bytearray operations

Ramana Kumar Ramana.Kumar at cl.cam.ac.uk
Thu Mar 23 07:45:25 UTC 2017


The SML semantics for when to raise exceptions is a little weird:

"If di < 0 or if |dst| < di+|src|, then the Subscript
<http://sml-family.org/Basis/general.html#SIG:GENERAL.Subscript:EXN:SPEC>
exception is raised."

This means if I want to copy an empty array into (non-existent) index 3 of
a length 3 array, there should be no exception.
This is hard to accomplish with our BoundsCheckByte semantics which ensures
a strict less-than the length.

How bad is it to use an explicit LessEq with LengthByte, rather than
BoundsCheckByte?
Or alternatively, can we add that flag to BoundsCheckByte to change its
comparison to less-or-equal?

Or do we want to not follow the SML semantics?

On 21 March 2017 at 11:01, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk> wrote:

> We don't have any higher-order primitives currently... I imagine they'd be
> quite a bit harder to verify and implement efficiently... You would want
> clos_call/known type optimisations to specialise a primitive like unfold.
>
> I think I'm currently leaning towards reinstating the concat primitives
> alongside the Copy{Str,Aw8}{Str,Aw8} primitives. The concat ones eventually
> get implemented in terms of the latter, but only once it's possible to do
> so without the extra copying.
>
> On 21 March 2017 at 09:53, <Michael.Norrish at data61.csiro.au> wrote:
>
>> Rather than having to initialise an empty string and then copy into it,
>> why not provide something like a tabulate or unfold function that generates
>> the string?  The function passed in would then be able to write directly
>> into the string’s memory.  I’m not sure how to set it up to write whole
>> strings at a time (the helper function would have to return strings, which
>> would almost certainly require a new allocation), but if done a character
>> at a time, the char values might get to stay in registers:
>>
>>
>>
>>   String.tabulate : int * (int -> char) -> string
>>
>>
>>
>> If you had substrings (implemented as string * offset * length triples
>> and sharing the underlying string), then something like
>>
>>
>>
>>   String.unfold : ‘a * (‘a -> (substring * ‘a) option) * int -> string
>>
>>
>>
>> could be done, where the ‘a is a generic “state” parameter (the list of
>> strings to concat), and where the int is maximum length of the result.
>>
>>
>>
>> I think String.concat could be implemented with the latter without any
>> redundant copying.
>>
>>
>>
>> Michael
>>
>>
>>
>> *From: *Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk>
>> *Date: *Monday, 20 March 2017 at 16:29
>> *To: *"developers at cakeml.org" <developers at cakeml.org>
>> *Subject: *Re: [CakeML-dev] New string/bytearray operations
>>
>>
>>
>> I've been rethinking these primitives after the discussion at the last
>> hangout, and have come up with a different set altogether. Can you see a
>> simpler or more elegant approach to the one described below?
>>
>> Here is the new approach I am considering:
>>
>> 4 copying primitives in the source language going from string/bytearray
>> to string/bytearray.
>>
>> The source comes with an offset and a length to copy.
>>
>> If the destination is a string, a new string is created. If the
>> destination is a bytearray, it must be provided.
>>
>> Concatenation (of lists of strings/arrays) and conversions between
>> (whole) strings and (whole) bytearrays can be implemented in the basis
>> library in terms of these primitives. And the primitives should be
>> efficiently implementable in terms of a byte-based memcpy primitive further
>> down. (There will need to be bounds checking in the source-level semantics
>> (i.e., Subscript exception can be raised), and this will sometimes be
>> unfortunate (i.e., when obviously in bounds), but I don't think this is too
>> costly.)
>>
>>
>>
>> On 14 March 2017 at 16:52, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk>
>> wrote:
>>
>> Hi all,
>>
>> I've started adding string/bytearray conversion and concatenation
>> primitives (issues 244 and 245). Before getting too deep into updating
>> the compiler etc., may I request a review of the semantics? Here they
>> are:
>>
>> https://github.com/CakeML/cakeml/commit/67dd15bbd03f516be618
>> ba72f1d56a2764209263
>>
>> I noticed that v_to_char_list might be better as vs_to_char_list, to
>> be run after v_to_list (rather than duplicating its
>> list-deconstruction functionality). But I leave such refactoring for
>> another time.
>>
>> Cheers,
>> Ramana
>>
>>
>>
>> _______________________________________________
>> Developers mailing list
>> Developers at cakeml.org
>> https://lists.cakeml.org/listinfo/developers
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cakeml.org/pipermail/developers/attachments/20170323/d213b277/attachment.html>


More information about the Developers mailing list