[CakeML-Dev] Bootstrap evaluation timings

Ramana Kumar Ramana.Kumar at cl.cam.ac.uk
Thu Nov 17 22:09:28 UTC 2016


I will put the timings from various runs of the bootstrap on this web page
so we can more easily refer to them: https://cakeml.org/bootstrap-timing.txt
.
I've added partial timings for a recent revision (still waiting on the
rest).
There is currently a problem with word_to_stack, which is now taking 5
hours where it used to take 1.
And given that it seems to be stuck on "sec_ok" I'm wondering how much the
asm_ok Asm check removal has really helped... I guess it was supposed to
bring it down to 6h, so we'll see...

On 29 September 2016 at 10:47, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk>
wrote:

> I have tried running the "sec_ok" part of the bootstrap with the asm_ok
> checks for Asm lines turned off. Two things to report:
>
> The time is significantly improved:
> runtime: 6h02m19s,    gctime: 30m06s,     systime: 3m11s.
> (compared to previously: 10h46m09s,    gctime: 1h07m37s,     systime:
> 5m21s.)
>
> Secondly, this timing information is bogus: the wall clock time was
> definitely under 2 hours (I didn't check precisely). I expect timing for
> threads in parallel execution are added sequentially. So for the parts of
> the bootstrap run in parallel, the timings probably need to be divided by
> up to 8 to get an accurate picture. Perhaps I should annotate which ones
> are in parallel and which not.
>
> I think it's worthwhile removing the asm_ok checks. I can help doing so if
> someone else sets up some cheated theorems to fill in.
>
> On 8 September 2016 at 14:33, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk>
> wrote:
>
>> Here are some timings now including the last step (x64Bootstrap).
>> (The commit number doesn't exactly match the output labels, since I
>> changed some of that (about to be committed) in the middle.)
>>
>> Here we can see that the biggest problem (>10 hours) is the checking that
>> all the lines in the final ASM code satisfy line_ok_light (i.e., the
>> "sec_ok" thing below). But there are a few steps that take 1-2 hours each,
>> and then a few more that take 20-40 mins, so ... definitely more than an
>> overnight job still...
>>
>> 1a939d3
>>
>> eval to_mod: runtime: 1m49s,    gctime: 3.4s,     systime: 0.35667s.
>> eval to_con: runtime: 8m56s,    gctime: 5.5s,     systime: 0.84000s.
>> eval to_dec: runtime: 7.4s,    gctime: 2.4s,     systime: 0.15667s.
>> eval to_exh: runtime: 3m32s,    gctime: 8.9s,     systime: 0.68667s.
>> eval to_pat: runtime: 14.1s,    gctime: 1.4s,     systime: 0.06667s.
>> eval to_clos: runtime: 5.3s,    gctime: 1.1s,     systime: 0.05667s.
>> eval to_bvl: runtime: 2m36s,    gctime: 12.6s,     systime: 0.88000s.
>> eval to_bvi:
>> ... inline: runtime: 6.1s,    gctime: 1.2s,     systime: 0.09333s.
>> ... optimise: runtime: 24m53s,    gctime: 1m53s,     systime: 17.1s.
>> ... compile: runtime: 2m27s,    gctime: 11.9s,     systime: 1.5s.
>> eval to_data: runtime: 8m11s,    gctime: 21.4s,     systime: 1.2s.
>> eval data_to_word: runtime: 8m28s,    gctime: 30.4s,     systime: 2.4s.
>>
>> word_to_word: runtime: 1h00m52s,    gctime: 5m53s,     systime: 34.1s.
>>
>> clash: runtime: 5m06s,    gctime: 22.3s,     systime: 2.8s.
>> (external oracle) runtime: 2m31s,    gctime: 1.4s,     systime: 0.77667s.
>>
>> chunk: runtime: 1h08m52s,    gctime: 6m06s,     systime: 28.2s.
>> (chunk results) runtime: 11m49s,    gctime: 1m13s,     systime: 7.2s.
>>
>> eval word_to_stack: runtime: 1h35m38s,    gctime: 3m53s,     systime:
>> 9.4s.
>>
>> stack_alloc: runtime: 23m27s,    gctime: 45.7s,     systime: 6.7s.
>> stack_remove: runtime: 19m15s,    gctime: 21.5s,     systime: 4.7s.
>> stack_names: runtime: 15m20s,    gctime: 24.9s,     systime: 3.2s.
>> stack_to_lab: runtime: 42m12s,    gctime: 9m27s,     systime: 10.4s.
>>
>> filter_skip: runtime: 3m56s,    gctime: 24.7s,     systime: 3.2s.
>> ffi_limit: runtime: 2m33s,    gctime: 1m39s,     systime: 3.2s.
>> enc_sec: runtime: 25m01s,    gctime: 1m40s,     systime: 11.0s.
>> sec_length: runtime: 4m11s,    gctime: 48.9s,     systime: 3.5s.
>>
>> compute_labels: runtime: 1h17m12s,    gctime: 13m50s,     systime: 58.7s.
>>
>> enc_secs_again: runtime: 2h00m15s,    gctime: 24m27s,     systime: 1m46s.
>>
>> upd_lab_len: runtime: 5m51s,    gctime: 55.9s,     systime: 3.4s.
>> sec_length2: runtime: 3m32s,    gctime: 8.4s,     systime: 1.2s.
>>
>> compute_labels2: runtime: 1h27m09s,    gctime: 22m29s,     systime: 1m17s.
>>
>> enc_secs_again2: runtime: 2h37m41s,    gctime: 1h03m15s,     systime:
>> 3m01s.
>>
>> pad_sectionruntime: 8m47s,    gctime: 35.8s,     systime: 3.8s.
>>
>> sec_ok: runtime: 10h46m09s,    gctime: 1h07m37s,     systime: 5m21s.
>>
>> all_secs_ok: runtime: 4.5s,    gctime: 0.07667s,     systime: 0.00333s.
>> prog_to_bytes: runtime: 7m27s,    gctime: 2m23s,     systime: 6.3s.
>>
>> flat_bytes: runtime: 45m21s,    gctime: 8m25s,     systime: 8.2s.
>> expand_defs: runtime: 49m25s,    gctime: 31m53s,     systime: 1.6s.
>>
>> On 7 September 2016 at 08:43, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk>
>> wrote:
>>
>>> More timing for a later commit. I believe this one includes Magnus's
>>> changes to add compile_seqs, which seem not to have had much effect. (I'm
>>> hoping I haven't messed up the timing somehow...  it's surprising how
>>> stable they are...)
>>>
>>> commit: 83b3aaf
>>>
>>> eval to_mod: runtime: 1m50s,    gctime: 5.8s,     systime: 1.0s.
>>> eval to_con: runtime: 8m41s,    gctime: 4.2s,     systime: 0.65000s.
>>> eval to_dec: runtime: 11.5s,    gctime: 6.4s,     systime: 0.46333s.
>>> eval to_exh: runtime: 3m27s,    gctime: 6.8s,     systime: 0.89000s.
>>> eval to_pat: runtime: 14.7s,    gctime: 1.7s,     systime: 0.11667s.
>>> eval to_clos: runtime: 5.5s,    gctime: 1.1s,     systime: 0.06333s.
>>> eval to_bvl: runtime: 2m29s,    gctime: 5.6s,     systime: 0.28333s.
>>> eval to_bvi:
>>> ... inline: runtime: 6.3s,    gctime: 1.3s,     systime: 0.06333s.
>>> ... optimise: runtime: 23m26s,    gctime: 28.8s,     systime: 6.3s.
>>> ... compile: runtime: 2m18s,    gctime: 4.1s,     systime: 0.25000s.
>>> eval to_data: runtime: 8m23s,    gctime: 44.3s,     systime: 3.2s.
>>> eval data_to_word: runtime: 8m23s,    gctime: 39.3s,     systime: 3.1s.
>>>
>>> word_to_word: runtime: 58m18s,    gctime: 4m53s,     systime: 30.1s.
>>>
>>> clash: runtime: 5m01s,    gctime: 18.4s,     systime: 2.5s.
>>> (external oracle) runtime: 2m31s,    gctime: 1.9s,     systime: 0.20667s.
>>>
>>> chunk: runtime: 1h01m18s,    gctime: 1m13s,     systime: 22.9s.
>>> (chunk results) runtime: 14m26s,    gctime: 5m57s,     systime: 9.1s.
>>>
>>> eval word_to_stack: runtime: 1h29m05s,    gctime: 4m38s,     systime:
>>> 14.4s.
>>>
>>> stack_alloc: runtime: 23m03s,    gctime: 58.3s,     systime: 5.8s.
>>> stack_remove: runtime: 20m09s,    gctime: 1m29s,     systime: 6.5s.
>>> stack_names: runtime: 15m05s,    gctime: 41.6s,     systime: 3.9s.
>>> stack_to_lab: runtime: 35m11s,    gctime: 2m54s,     systime: 12.0s.
>>>
>>>
>>> On 6 September 2016 at 10:11, Yong Kiam <tanyongkiam at gmail.com> wrote:
>>>
>>>> Hmm, that is surprising. I expected the main function to have a good
>>>> number of If-s (which my change should have helped with), but it seems like
>>>> that didn't affect the time at all.
>>>>
>>>> Here are two possibilities:
>>>>
>>>> 1) Caching lookups into the colouring function for the chunk step. The
>>>> function is defined as an option-lookup into an oracle sptree:
>>>> https://github.com/CakeML/cakeml/blob/9937677d34446adace9f29
>>>> d5719131f9a5b4aeac/compiler/backend/reg_alloc/reg_allocScript.sml#L110
>>>> I'm not sure how much that will help though, since the variable names
>>>> should be in SSA already, so it is mostly looking up different things each
>>>> time.
>>>>
>>>> 2) Make splitting more aggressive, i.e. the main function should be
>>>> divided into more pieces so that its colouring sptree isn't too big (it
>>>> gets larger when there are more variables too).
>>>>
>>>> Also, are you getting some kind of effect where every parallel thread
>>>> is just waiting for one to finish (namely the one working on the main
>>>> function)? 2) might help divide the work more evenly.
>>>>
>>>>
>>>> On Mon, Sep 5, 2016 at 7:45 PM, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk
>>>> > wrote:
>>>>
>>>>> Some more timings, for two recent commits. Not a big difference, it
>>>>> seems.
>>>>>
>>>>> commit: 69ac2f9
>>>>>
>>>>> eval to_mod: runtime: 1m45s,    gctime: 3.8s,     systime: 0.36667s.
>>>>> eval to_con: runtime: 8m36s,    gctime: 5.5s,     systime: 0.90667s.
>>>>> eval to_dec: runtime: 7.3s,    gctime: 2.3s,     systime: 0.17000s.
>>>>> eval to_exh: runtime: 3m27s,    gctime: 9.5s,     systime: 1.1s.
>>>>> eval to_pat: runtime: 13.9s,    gctime: 1.5s,     systime: 0.10333s.
>>>>> eval to_clos: runtime: 5.2s,    gctime: 1.1s,     systime: 0.07000s.
>>>>> eval to_bvl: runtime: 2m28s,    gctime: 5.5s,     systime: 0.22667s.
>>>>> eval to_bvi: ... inline: runtime: 6.0s,    gctime: 1.3s,     systime:
>>>>> 0.09000s.
>>>>> ... optimise:runtime: 23m27s,    gctime: 6.9s,     systime: 3.2s.
>>>>> ... compile: runtime: 2m33s,    gctime: 19.3s,     systime: 1.2s.
>>>>> eval to_data: runtime: 8m29s,    gctime: 51.5s,     systime: 3.4s.
>>>>> eval data_to_word: runtime: 8m41s,    gctime: 50.2s,     systime: 3.4s.
>>>>>
>>>>> word_to_word: runtime: 58m21s,    gctime: 4m28s,     systime: 28.0s.
>>>>>
>>>>> clash: runtime: 4m58s,    gctime: 15.5s,     systime: 2.7s.
>>>>> (external oracle) runtime: 2m32s,    gctime: 1.8s,     systime:
>>>>> 0.24000s.
>>>>>
>>>>> chunk: runtime: 1h13m05s,    gctime: 6m06s,     systime: 26.9s.
>>>>> (chunk results) runtime: 13m31s,    gctime: 2m47s,     systime: 14.4s.
>>>>>
>>>>> eval word_to_stack: runtime: 1h32m52s,    gctime: 6m12s,     systime:
>>>>> 12.7s.
>>>>>
>>>>> stack_alloc: runtime: 23m03s,    gctime: 42.2s,     systime: 5.8s.
>>>>> stack_remove: runtime: 19m08s,    gctime: 22.7s,     systime: 4.3s.
>>>>> stack_names: runtime: 15m45s,    gctime: 1m05s,     systime: 5.3s.
>>>>> stack_to_lab: runtime: 41m59s,    gctime: 9m22s,     systime: 8.6s.
>>>>>
>>>>>
>>>>> commit: 9937677
>>>>>
>>>>> eval to_mod: runtime: 1m51s,    gctime: 5.9s,     systime: 0.90000s.
>>>>> eval to_con: runtime: 8m48s,    gctime: 4.4s,     systime: 0.71667s.
>>>>> eval to_dec: runtime: 7.2s,    gctime: 2.3s,     systime: 0.15667s.
>>>>> eval to_exh: runtime: 3m30s,    gctime: 9.5s,     systime: 1.2s.
>>>>> eval to_pat: runtime: 14.3s,    gctime: 1.6s,     systime: 0.09000s.
>>>>> eval to_clos: runtime: 5.2s,    gctime: 0.95667s,     systime:
>>>>> 0.07333s.
>>>>> eval to_bvl: runtime: 2m28s,    gctime: 5.2s,     systime: 0.32667s.
>>>>> eval to_bvi: ... inline: runtime: 5.8s,    gctime: 0.88000s,
>>>>> systime: 0.05333s.
>>>>> ... optimise: runtime: 23m24s,    gctime: 5.8s,     systime: 3.0s.
>>>>> ... compile: runtime: 2m27s,    gctime: 12.5s,     systime: 0.78333s.
>>>>> eval to_data: runtime: 8m18s,    gctime: 34.2s,     systime: 3.0s.
>>>>> eval data_to_word: runtime: 8m24s,    gctime: 37.4s,     systime: 2.8s.
>>>>>
>>>>> word_to_word: runtime: 57m40s,    gctime: 4m56s,     systime: 30.1s.
>>>>>
>>>>> clash: runtime: 5m12s,    gctime: 33.6s,     systime: 3.9s.
>>>>> (external oracle) runtime: 2m31s,    gctime: 1.8s,     systime:
>>>>> 0.15667s.
>>>>>
>>>>> chunk: runtime: 1h07m54s,    gctime: 6m07s,     systime: 29.5s.
>>>>> (chunk results) runtime: 13m21s,    gctime: 2m54s,     systime: 14.0s.
>>>>>
>>>>> eval word_to_stack: runtime: 1h34m00s,    gctime: 7m32s,     systime:
>>>>> 8.1s.
>>>>>
>>>>> stack_alloc: runtime: 22m35s,    gctime: 18.7s,     systime: 3.7s.
>>>>> stack_remove: runtime: 19m04s,    gctime: 16.7s,     systime: 3.9s.
>>>>> stack_names: runtime: 15m13s,    gctime: 37.8s,     systime: 4.0s.
>>>>> stack_to_lab: runtime: 36m21s,    gctime: 3m29s,     systime: 13.9s.
>>>>>
>>>>>
>>>>> On 1 September 2016 at 09:04, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk>
>>>>> wrote:
>>>>>
>>>>>> Also, updated timings for the middle bit, as of 4d0d56b (which is
>>>>>> also for the front half timings in my previous email).
>>>>>>
>>>>>> eval data_to_word: runtime: 7m42s,    gctime: 35.7s,     systime:
>>>>>> 2.7s.
>>>>>>
>>>>>> word_to_word: runtime: 56m07s,    gctime: 4m48s,     systime: 28.7s.
>>>>>>
>>>>>> clash: runtime: 4m48s,    gctime: 16.0s,     systime: 2.2s.
>>>>>>
>>>>>> (external oracle) runtime: 2m21s,    gctime: 1.7s,     systime:
>>>>>> 0.10667s.
>>>>>>
>>>>>> chunk: runtime: 1h10m00s,    gctime: 6m06s,     systime: 27.1s.
>>>>>>
>>>>>> (use chunk) runtime: 12m26s,    gctime: 2m29s,     systime: 11.0s.
>>>>>>
>>>>>> eval word_to_stack: runtime: 1h29m28s,    gctime: 7m23s,     systime:
>>>>>> 8.0s.
>>>>>>
>>>>>> stack_alloc: runtime: 21m13s,    gctime: 24.3s,     systime: 3.9s.
>>>>>> stack_remove: runtime: 18m12s,    gctime: 46.6s,     systime: 4.3s.
>>>>>> stack_names: runtime: 14m08s,    gctime: 32.1s,     systime: 4.1s.
>>>>>> stack_to_lab: runtime: 36m41s,    gctime: 1m19s,     systime: 10.6s.
>>>>>>
>>>>>> It doesn't look like 91d539b made a huge difference.
>>>>>>
>>>>>> On 1 September 2016 at 08:42, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk
>>>>>> > wrote:
>>>>>>
>>>>>>> I have some timings for the front half of the backend now:
>>>>>>> eval to_mod: runtime: 1m45s,    gctime: 9.6s,     systime: 0.40000s.
>>>>>>> eval to_con: runtime: 7m00s,    gctime: 5.4s,     systime: 0.80667s.
>>>>>>> eval to_dec: runtime: 7.0s,    gctime: 2.3s,     systime: 0.13333s.
>>>>>>> eval to_exh: runtime: 3m18s,    gctime: 4.5s,     systime: 0.67000s.
>>>>>>> eval to_pat: runtime: 15.9s,    gctime: 3.7s,     systime: 0.50667s.
>>>>>>> eval to_clos: runtime: 5.0s,    gctime: 1.1s,     systime: 0.07000s.
>>>>>>> eval to_bvl: runtime: 2m18s,    gctime: 5.2s,     systime: 0.36000s.
>>>>>>> eval to_bvi:
>>>>>>> ... inline: runtime: 5.8s,    gctime: 1.1s,     systime: 0.07667s.
>>>>>>> ... optimise: runtime: 22m52s,    gctime: 30.1s,     systime: 6.1s.
>>>>>>> ... compile: runtime: 2m05s,    gctime: 6.9s,     systime: 0.72000s.
>>>>>>> eval to_data: runtime: 5m44s,    gctime: 28.8s,     systime: 2.1s.
>>>>>>>
>>>>>>> I'm not sure why there are suspicious timings for things like
>>>>>>> stack_alloc. One possibility is that the "time" function adds up times
>>>>>>> across all parallel threads. In that case, it would make sense to divide
>>>>>>> the time by 8. I would need to investigate whether this is actually what's
>>>>>>> going on...
>>>>>>>
>>>>>>> On 30 August 2016 at 19:36, Magnus Myreen <magnus.myreen at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> My guess is that evaluation times get slower once we pass dataLang
>>>>>>>> because the programs become a lot larger once the data abstraction
>>>>>>>> has
>>>>>>>> been removed.
>>>>>>>>
>>>>>>>> There are however some suspicious looking timings here. I note that
>>>>>>>> stack_alloc does near to nothing, but still takes 31 minutes to run.
>>>>>>>> Here is stack_alloc's main compiler function:
>>>>>>>>
>>>>>>>> https://github.com/CakeML/cakeml/blob/master/compiler/backen
>>>>>>>> d/stack_allocScript.sml#L166
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Magnus
>>>>>>>>
>>>>>>>> On 30 August 2016 at 03:43, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk>
>>>>>>>> wrote:
>>>>>>>> > I have some information now on where slow parts of the bootstrap
>>>>>>>> evaluation
>>>>>>>> > are, which we could use to direct effort in case we want to speed
>>>>>>>> things up.
>>>>>>>> >
>>>>>>>> > Currently I have timings for between dataLang and labLang. (I
>>>>>>>> know the
>>>>>>>> > phases above dataLang are mostly quick, and the ones after
>>>>>>>> labLang are
>>>>>>>> > mostly slow, but I will have to get actual numbers later)
>>>>>>>> >
>>>>>>>> > eval data_to_word: runtime: 7m48s,    gctime: 21.9s,     systime:
>>>>>>>> 2.1s.
>>>>>>>> >
>>>>>>>> > word_to_word: runtime: 55m07s,    gctime: 5m31s,     systime:
>>>>>>>> 30.9s.
>>>>>>>> >
>>>>>>>> > clash: runtime: 5m33s,    gctime: 34.4s,     systime: 4.4s.
>>>>>>>> >
>>>>>>>> > (external oracle) runtime: 2m30s,    gctime: 12.1s,     systime:
>>>>>>>> 1.5s
>>>>>>>> >
>>>>>>>> > chunk: runtime: 1h04m14s,    gctime: 1m37s,     systime: 20.8s.
>>>>>>>> > (use results of chunk) runtime: 11m21s,    gctime: 2m00s,
>>>>>>>>  systime:
>>>>>>>> > 10.1s.
>>>>>>>> >
>>>>>>>> > eval word_to_stack: runtime: 1h34m21s,    gctime: 9m42s,
>>>>>>>>  systime: 9.4s.
>>>>>>>> >
>>>>>>>> > stack_alloc: runtime: 31m55s,    gctime: 30.3s,     systime: 5.4s.
>>>>>>>> >
>>>>>>>> > stack_remove: runtime: 25m43s,    gctime: 1m32s,     systime:
>>>>>>>> 6.9s.
>>>>>>>> >
>>>>>>>> > stack_names: runtime: 22m03s,    gctime: 3m01s,     systime: 6.6s.
>>>>>>>> >
>>>>>>>> > stack_to_lab: runtime: 56m00s,    gctime: 4m38s,     systime:
>>>>>>>> 23.8s.
>>>>>>>> >
>>>>>>>> > What exactly these different phases are doing can be found by
>>>>>>>> looking at
>>>>>>>> > https://github.com/CakeML/cakeml/blob/master/compiler/bootst
>>>>>>>> rap/evaluation/to_lab_x64BootstrapScript.sml.
>>>>>>>> > I think the one confusingly called "chunk" means checking the
>>>>>>>> results of the
>>>>>>>> > external reg. alloc oracle.
>>>>>>>> >
>>>>>>>> > _______________________________________________
>>>>>>>> > Dev mailing list
>>>>>>>> > Dev at cakeml.org
>>>>>>>> > https://lists.cakeml.org/listinfo/dev
>>>>>>>> >
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Dev mailing list
>>>>> Dev at cakeml.org
>>>>> https://lists.cakeml.org/listinfo/dev
>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cakeml.org/pipermail/developers/attachments/20161118/b6e97c1a/attachment-0001.html>


More information about the Developers mailing list