[CakeML-dev] memcpy in wordLang?

Mon Mar 20 06:29:52 UTC 2017

On 17 March 2017 at 12:28, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk> wrote:
> Hi Magnus,
>
> Thanks for the heads up. I should say that by "primitive in wordLang"
> I think my language was confused: I never meant a wordLang primitive,
> I mean a primitive in BVL that is implemented in wordLang.

Ah, OK.

> Out of the discussion we had today, one point came up for when you
> implement your ConsExtend stubs: it would be best to have a generic
> memcpy (on words) that can be re-used to make an efficient memcpy on
> bytes. In other words, I hope your stub won't fuse the functionality
> required for ConsExtend with the basic copying primitive.

There should be two memcpy stubs one for words and one for bytes. The
one on words can assume that all the words are word aligned. The one
on bytes can use the word version if the addresses and lengths it
copies happen to fit various constraints.

> Also, Scott suggested at one point looking at how GNU libc implements
> memcpy for ideas on an efficient implementation... it's a
> sophisticated piece of assembly code and highly arch dependent; I
> don't think we're at that level of complexity yet.. see e.g.
> https://sourceware.org/git/?p=glibc.git;a=blob_plain;f=sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S;hb=HEAD

An efficient memcpy is a very important tool and I'm sure a lot of
effort has gone into them. I think we should eventually have carefully
crafted stubs, but we are restricted by our target-neutral asm
language. For word memcpy, I bet the best we can do is unroll the
copying loop a few times but not so much that we cause register
spilling. The byte memcpy is tricker because one wants to use
word-sized memory access even in cases where the addresses aren't
aligned, but our asm semantics requires all word-sized access to be
aligned. This is possible to implement efficiently with our
target-neutral asm language, but will be more than just a few hours of
work.

If I am to implement the byte memcpy, then the first version won't be
optimised to do word-sized memory accesses for byte memcpy.

I see from Ramana's recent email that there was discussion about
exposing these in the source language. I think that's a good thing,
and that they should also be available for normal value arrays.
(Ramana's email was only talking about byte arrays and strings).

Cheers,
Magnus

> On 17 March 2017 at 16:48, Magnus Myreen <magnus.myreen at gmail.com> wrote:
>> Hi Ramana,
>>
>> I'm on holiday at the moment, but want to write a short reply.
>>
>> I will be implementing a mem copy stub in data-to-word early next week for
>> Scott's ConsExtend. That mem copy will copy word for word.
>>
>> Mem copy should not be primitive in wordlang. Having it as a primitive
>> doesn't buy you anything. This also applies to byte by byte mem copy.
>>
>> For efficiency, you want to implement these copy routines as stubs in
>> wordlang as opposed to stubs in higher levels like BVL, BVI or DataLang.
>>
>> Cheers,
>> Magnus
>> On Fri, 17 Mar 2017 at 06:07, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk>
>> wrote:
>>>
>>> Hi developers,
>>>
>>> This question is especially for those familiar with wordLang.
>>>
>>> Would it be reasonable to implement a memcpy primitive in wordLang? In
>>> particular, I would want to add a primitive to BVL/BVI that given a
>>> byte array, an offset, and another byte array, copies the contents of
>>> the latter into the former starting at the offset.
>>>
>>> The question is whether it is possible to do this efficiently in
>>> wordLang even if the offset is not word aligned.
>>>
>>> Obviously I can already write a byte-by-byte copying routine in BVL
>>> (or even higher). I'm trying to figure out how to actually be more
>>> efficient than that when implementing concatenation and
>>> string/bytearray conversion primitives.
>>>
>>> (Another annoying thing I noticed is that currently the only way to
>>> create a byte array forces you to write some initial dummy replicated
>>> value into it, even if you're going to overwrite them all right after.
>>> But I don't know what a good primitive to use instead would look like
>>> - some super create-and-copy-with-offsets primitive maybe, but that's
>>> pretty complicated.)
>>>
>>> Cheers,
>>> Ramana
>>>
>>> _______________________________________________
>>> Developers mailing list
>>> Developers at cakeml.org
>>> https://lists.cakeml.org/listinfo/developers