[CakeML-dev] New string/bytearray operations

Michael.Norrish at data61.csiro.au Michael.Norrish at data61.csiro.au
Mon Mar 20 22:53:29 UTC 2017


Rather than having to initialise an empty string and then copy into it, why not provide something like a tabulate or unfold function that generates the string?  The function passed in would then be able to write directly into the string’s memory.  I’m not sure how to set it up to write whole strings at a time (the helper function would have to return strings, which would almost certainly require a new allocation), but if done a character at a time, the char values might get to stay in registers:

  String.tabulate : int * (int -> char) -> string

If you had substrings (implemented as string * offset * length triples and sharing the underlying string), then something like

  String.unfold : ‘a * (‘a -> (substring * ‘a) option) * int -> string

could be done, where the ‘a is a generic “state” parameter (the list of strings to concat), and where the int is maximum length of the result.

I think String.concat could be implemented with the latter without any redundant copying.

Michael

From: Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk>
Date: Monday, 20 March 2017 at 16:29
To: "developers at cakeml.org" <developers at cakeml.org>
Subject: Re: [CakeML-dev] New string/bytearray operations

I've been rethinking these primitives after the discussion at the last hangout, and have come up with a different set altogether. Can you see a simpler or more elegant approach to the one described below?
Here is the new approach I am considering:
4 copying primitives in the source language going from string/bytearray to string/bytearray.
The source comes with an offset and a length to copy.
If the destination is a string, a new string is created. If the destination is a bytearray, it must be provided.
Concatenation (of lists of strings/arrays) and conversions between (whole) strings and (whole) bytearrays can be implemented in the basis library in terms of these primitives. And the primitives should be efficiently implementable in terms of a byte-based memcpy primitive further down. (There will need to be bounds checking in the source-level semantics (i.e., Subscript exception can be raised), and this will sometimes be unfortunate (i.e., when obviously in bounds), but I don't think this is too costly.)

On 14 March 2017 at 16:52, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk<mailto:Ramana.Kumar at cl.cam.ac.uk>> wrote:
Hi all,

I've started adding string/bytearray conversion and concatenation
primitives (issues 244 and 245). Before getting too deep into updating
the compiler etc., may I request a review of the semantics? Here they
are:

https://github.com/CakeML/cakeml/commit/67dd15bbd03f516be618ba72f1d56a2764209263

I noticed that v_to_char_list might be better as vs_to_char_list, to
be run after v_to_list (rather than duplicating its
list-deconstruction functionality). But I leave such refactoring for
another time.

Cheers,
Ramana

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cakeml.org/pipermail/developers/attachments/20170320/bdde1405/attachment.html>


More information about the Developers mailing list