[CakeML-dev] New string/bytearray operations

Ramana Kumar Ramana.Kumar at cl.cam.ac.uk
Mon Mar 20 09:21:19 UTC 2017


Sorry I should also mention the other alternative: that this extra
copying for concat is worthwhile. It's a factor of two increase in the
amount of copying... We probably need goals and benchmarks...

(Yet another option could be to have a protostring type that is like a
mutable string but can be finalized (destructively) into an immutable
one. But that either breaks the type system or requires too many
runtime checks on strings to ensure they're finalized.)

On 20 March 2017 at 20:12, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk> wrote:
> I think you're right. Because I can't copy into an existing string,
> the result would need to be copied at the end... hmm, so maybe concat
> should be primitive again? Or maybe there's another better set of
> primitives? The SML basis doesn't have concat, but it does have copy
> and copyVec and the same on slices. (We don't have
> slices/substrings...)
>
> To think about this it helped me to try writing concat, and I want to
> keep it here for reference later.
>
> fun sumlengths [] acc = acc
>   | sumlengths (a::as) acc = sumlengths as (length a + acc)
> fun copy [] dst _ = dst
>   | copy (a::as) dst n =
>    let val l = length a
>         val () = CopyStrAw8 a 0 l dst n
>   in copy as dst (n+l) end
> fun concat ls = let
>   val l = sumlengths ls
>   val dst = alloc l 0w
> in CopyAw8Str (copy ls dst 0) 0 l end
>
>
> On 20 March 2017 at 19:56, Scott Owens <S.A.Owens at kent.ac.uk> wrote:
>> I don’t see how you implement string concat without extra copying.
>>
>> Scott
>>
>>> On 2017/03/20, at 05:29, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk> wrote:
>>>
>>> I've been rethinking these primitives after the discussion at the last hangout, and have come up with a different set altogether. Can you see a simpler or more elegant approach to the one described below?
>>>
>>> Here is the new approach I am considering:
>>>
>>> 4 copying primitives in the source language going from string/bytearray to string/bytearray.
>>> The source comes with an offset and a length to copy.
>>> If the destination is a string, a new string is created. If the destination is a bytearray, it must be provided.
>>>
>>> Concatenation (of lists of strings/arrays) and conversions between (whole) strings and (whole) bytearrays can be implemented in the basis library in terms of these primitives. And the primitives should be efficiently implementable in terms of a byte-based memcpy primitive further down. (There will need to be bounds checking in the source-level semantics (i.e., Subscript exception can be raised), and this will sometimes be unfortunate (i.e., when obviously in bounds), but I don't think this is too costly.)
>>>
>>> On 14 March 2017 at 16:52, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk> wrote:
>>> Hi all,
>>>
>>> I've started adding string/bytearray conversion and concatenation
>>> primitives (issues 244 and 245). Before getting too deep into updating
>>> the compiler etc., may I request a review of the semantics? Here they
>>> are:
>>>
>>> https://github.com/CakeML/cakeml/commit/67dd15bbd03f516be618ba72f1d56a2764209263
>>>
>>> I noticed that v_to_char_list might be better as vs_to_char_list, to
>>> be run after v_to_list (rather than duplicating its
>>> list-deconstruction functionality). But I leave such refactoring for
>>> another time.
>>>
>>> Cheers,
>>> Ramana
>>>
>>> _______________________________________________
>>> Developers mailing list
>>> Developers at cakeml.org
>>> https://lists.cakeml.org/listinfo/developers
>>



More information about the Developers mailing list