[CakeML-dev] New string/bytearray operations

Magnus Myreen magnus.myreen at gmail.com
Thu Mar 23 07:49:00 UTC 2017


On 23 March 2017 at 09:45, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk> wrote:
> The SML semantics for when to raise exceptions is a little weird:
>
> "If di < 0 or if |dst| < di+|src|, then the Subscript exception is raised."
>
> This means if I want to copy an empty array into (non-existent) index 3 of a
> length 3 array, there should be no exception.
> This is hard to accomplish with our BoundsCheckByte semantics which ensures
> a strict less-than the length.
>
> How bad is it to use an explicit LessEq with LengthByte, rather than
> BoundsCheckByte?
> Or alternatively, can we add that flag to BoundsCheckByte to change its
> comparison to less-or-equal?

Adding this flag should be easy. I'm happy to update data_to_word for
this if someone else takes care of the other changes for the new flag.

> Or do we want to not follow the SML semantics?
>
> On 21 March 2017 at 11:01, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk> wrote:
>>
>> We don't have any higher-order primitives currently... I imagine they'd be
>> quite a bit harder to verify and implement efficiently... You would want
>> clos_call/known type optimisations to specialise a primitive like unfold.
>>
>> I think I'm currently leaning towards reinstating the concat primitives
>> alongside the Copy{Str,Aw8}{Str,Aw8} primitives. The concat ones eventually
>> get implemented in terms of the latter, but only once it's possible to do so
>> without the extra copying.
>>
>> On 21 March 2017 at 09:53, <Michael.Norrish at data61.csiro.au> wrote:
>>>
>>> Rather than having to initialise an empty string and then copy into it,
>>> why not provide something like a tabulate or unfold function that generates
>>> the string?  The function passed in would then be able to write directly
>>> into the string’s memory.  I’m not sure how to set it up to write whole
>>> strings at a time (the helper function would have to return strings, which
>>> would almost certainly require a new allocation), but if done a character at
>>> a time, the char values might get to stay in registers:
>>>
>>>
>>>
>>>   String.tabulate : int * (int -> char) -> string
>>>
>>>
>>>
>>> If you had substrings (implemented as string * offset * length triples
>>> and sharing the underlying string), then something like
>>>
>>>
>>>
>>>   String.unfold : ‘a * (‘a -> (substring * ‘a) option) * int -> string
>>>
>>>
>>>
>>> could be done, where the ‘a is a generic “state” parameter (the list of
>>> strings to concat), and where the int is maximum length of the result.
>>>
>>>
>>>
>>> I think String.concat could be implemented with the latter without any
>>> redundant copying.
>>>
>>>
>>>
>>> Michael
>>>
>>>
>>>
>>> From: Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk>
>>> Date: Monday, 20 March 2017 at 16:29
>>> To: "developers at cakeml.org" <developers at cakeml.org>
>>> Subject: Re: [CakeML-dev] New string/bytearray operations
>>>
>>>
>>>
>>> I've been rethinking these primitives after the discussion at the last
>>> hangout, and have come up with a different set altogether. Can you see a
>>> simpler or more elegant approach to the one described below?
>>>
>>> Here is the new approach I am considering:
>>>
>>> 4 copying primitives in the source language going from string/bytearray
>>> to string/bytearray.
>>>
>>> The source comes with an offset and a length to copy.
>>>
>>> If the destination is a string, a new string is created. If the
>>> destination is a bytearray, it must be provided.
>>>
>>> Concatenation (of lists of strings/arrays) and conversions between
>>> (whole) strings and (whole) bytearrays can be implemented in the basis
>>> library in terms of these primitives. And the primitives should be
>>> efficiently implementable in terms of a byte-based memcpy primitive further
>>> down. (There will need to be bounds checking in the source-level semantics
>>> (i.e., Subscript exception can be raised), and this will sometimes be
>>> unfortunate (i.e., when obviously in bounds), but I don't think this is too
>>> costly.)
>>>
>>>
>>>
>>> On 14 March 2017 at 16:52, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk>
>>> wrote:
>>>
>>> Hi all,
>>>
>>> I've started adding string/bytearray conversion and concatenation
>>> primitives (issues 244 and 245). Before getting too deep into updating
>>> the compiler etc., may I request a review of the semantics? Here they
>>> are:
>>>
>>>
>>> https://github.com/CakeML/cakeml/commit/67dd15bbd03f516be618ba72f1d56a2764209263
>>>
>>> I noticed that v_to_char_list might be better as vs_to_char_list, to
>>> be run after v_to_list (rather than duplicating its
>>> list-deconstruction functionality). But I leave such refactoring for
>>> another time.
>>>
>>> Cheers,
>>> Ramana
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Developers mailing list
>>> Developers at cakeml.org
>>> https://lists.cakeml.org/listinfo/developers
>>>
>>
>



More information about the Developers mailing list