[CakeML-dev] memcpy in wordLang?

Ramana Kumar Ramana.Kumar at cl.cam.ac.uk
Sat Mar 25 03:50:05 UTC 2017


I'm down to BVI on the strcat branch now, and to go further I will soon
need the memcpy primitives/stubs in wordLang, to implement the new CopyByte
op.

We may need to coordinate this. On which branch are you planning to
implement those stubs in wordLang?

On 20 March 2017 at 17:29, Magnus Myreen <magnus.myreen at gmail.com> wrote:

> On 17 March 2017 at 12:28, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk> wrote:
> > Hi Magnus,
> >
> > Thanks for the heads up. I should say that by "primitive in wordLang"
> > I think my language was confused: I never meant a wordLang primitive,
> > I mean a primitive in BVL that is implemented in wordLang.
>
> Ah, OK.
>
> > Out of the discussion we had today, one point came up for when you
> > implement your ConsExtend stubs: it would be best to have a generic
> > memcpy (on words) that can be re-used to make an efficient memcpy on
> > bytes. In other words, I hope your stub won't fuse the functionality
> > required for ConsExtend with the basic copying primitive.
>
> There should be two memcpy stubs one for words and one for bytes. The
> one on words can assume that all the words are word aligned. The one
> on bytes can use the word version if the addresses and lengths it
> copies happen to fit various constraints.
>
> > Also, Scott suggested at one point looking at how GNU libc implements
> > memcpy for ideas on an efficient implementation... it's a
> > sophisticated piece of assembly code and highly arch dependent; I
> > don't think we're at that level of complexity yet.. see e.g.
> > https://sourceware.org/git/?p=glibc.git;a=blob_plain;f=
> sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S;hb=HEAD
>
> An efficient memcpy is a very important tool and I'm sure a lot of
> effort has gone into them. I think we should eventually have carefully
> crafted stubs, but we are restricted by our target-neutral asm
> language. For word memcpy, I bet the best we can do is unroll the
> copying loop a few times but not so much that we cause register
> spilling. The byte memcpy is tricker because one wants to use
> word-sized memory access even in cases where the addresses aren't
> aligned, but our asm semantics requires all word-sized access to be
> aligned. This is possible to implement efficiently with our
> target-neutral asm language, but will be more than just a few hours of
> work.
>
> If I am to implement the byte memcpy, then the first version won't be
> optimised to do word-sized memory accesses for byte memcpy.
>
> I see from Ramana's recent email that there was discussion about
> exposing these in the source language. I think that's a good thing,
> and that they should also be available for normal value arrays.
> (Ramana's email was only talking about byte arrays and strings).
>
> Cheers,
> Magnus
>
>
> > On 17 March 2017 at 16:48, Magnus Myreen <magnus.myreen at gmail.com>
> wrote:
> >> Hi Ramana,
> >>
> >> I'm on holiday at the moment, but want to write a short reply.
> >>
> >> I will be implementing a mem copy stub in data-to-word early next week
> for
> >> Scott's ConsExtend. That mem copy will copy word for word.
> >>
> >> Mem copy should not be primitive in wordlang. Having it as a primitive
> >> doesn't buy you anything. This also applies to byte by byte mem copy.
> >>
> >> For efficiency, you want to implement these copy routines as stubs in
> >> wordlang as opposed to stubs in higher levels like BVL, BVI or DataLang.
> >>
> >> Cheers,
> >> Magnus
> >> On Fri, 17 Mar 2017 at 06:07, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk>
> >> wrote:
> >>>
> >>> Hi developers,
> >>>
> >>> This question is especially for those familiar with wordLang.
> >>>
> >>> Would it be reasonable to implement a memcpy primitive in wordLang? In
> >>> particular, I would want to add a primitive to BVL/BVI that given a
> >>> byte array, an offset, and another byte array, copies the contents of
> >>> the latter into the former starting at the offset.
> >>>
> >>> The question is whether it is possible to do this efficiently in
> >>> wordLang even if the offset is not word aligned.
> >>>
> >>> Obviously I can already write a byte-by-byte copying routine in BVL
> >>> (or even higher). I'm trying to figure out how to actually be more
> >>> efficient than that when implementing concatenation and
> >>> string/bytearray conversion primitives.
> >>>
> >>> (Another annoying thing I noticed is that currently the only way to
> >>> create a byte array forces you to write some initial dummy replicated
> >>> value into it, even if you're going to overwrite them all right after.
> >>> But I don't know what a good primitive to use instead would look like
> >>> - some super create-and-copy-with-offsets primitive maybe, but that's
> >>> pretty complicated.)
> >>>
> >>> Cheers,
> >>> Ramana
> >>>
> >>> _______________________________________________
> >>> Developers mailing list
> >>> Developers at cakeml.org
> >>> https://lists.cakeml.org/listinfo/developers
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cakeml.org/pipermail/developers/attachments/20170325/911eed0c/attachment.html>


More information about the Developers mailing list