[CakeML-Dev] Bootstrap evaluation timings

Ramana Kumar Ramana.Kumar at cl.cam.ac.uk
Thu Nov 24 00:27:45 UTC 2016


I think there may also be a problem since data_to_word got merged (i.e., as
of 277495c) with the inst,ssa,two-reg part - it's taking much longer than
expected.

On 23 November 2016 at 11:45, Yong Kiam <tanyongkiam at gmail.com> wrote:

> I went through the commit history on github, and here is a list of changes
> since 1a939d3 that looked like they might affect something (based on the
> title):
>
> 55d5fc0: Update unverified register allocator
> 63cbd66: Removed asm_ok
> 98188ad: Changed default bvl_to_bvi configurations
> 66ddd3d: exh_reorder
> 8c2f5ae: load_opt
>
> I also found that the eval/x64 directory could probably be deleted (the
> default configs for every backend is also defined in targets/configTheory).
> The only difference in the x64_compiler_config defined there and the one in
> configTheory is pad_bits for data_to_word though. I think it is only there
> because otherwise all the backends have to be built to eval anything using
> the x64 configuration.
>
> I also reduced the value for exp_cut locally on the nqueens benchmark, but
> I didn't any meaningful difference in compile times (the benchmark is
> probably too small though).
>
>
>
> On Tue, Nov 22, 2016 at 4:42 PM, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk>
> wrote:
>
>> Preliminary results with inline_size_limit set back down to 3 indicate
>> that word_to_stack is still taking longer than before, specifically 3.5
>> hours as opposed to the 1.5 hours of two months ago. Could the exp_cut
>> parameter also be relevant?
>>
>> On 22 November 2016 at 16:30, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk>
>> wrote:
>>
>>> Some updated times (now included program sizes) are up (
>>> https://cakeml.org/bootstrap-timing.txt). Things are still surprisingly
>>> slow (word_to_stack, sec_ok) - guess is still that it's because of inline
>>> limit, so the next thing to try is changing the inline limit..
>>>
>>> Is conf_ok required if I change the inline_limit just for the bootstrap?
>>> Isn't it easy to show that conf_ok isn't affected by that config field?
>>>
>>> On 18 November 2016 at 10:39, Yong Kiam <tanyongkiam at gmail.com> wrote:
>>>
>>>> It shouldn't be very hard... I just thought it wouldn't be high
>>>> priority since the remaining ones are only in LabAsms.
>>>>
>>>> Changing the config there seems right. You won't get a conf_ok proof
>>>> about it though.
>>>>
>>>> On Thu, Nov 17, 2016 at 5:46 PM, Ramana Kumar <
>>>> Ramana.Kumar at cl.cam.ac.uk> wrote:
>>>>
>>>>> I've seen the word_to_stack slowdown twice (although it only shows up
>>>>> in the timings document once - I didn't record the other data).
>>>>>
>>>>> I think removing more asm_ok is always a good idea, but I'm not sure
>>>>> how much priority it should be given - how hard is it to remove more?
>>>>>
>>>>> I guess after the current run finishes I could try changing the inline
>>>>> limit (I would do this by using a different config here
>>>>> https://github.com/CakeML/cakeml/blob/master/compiler/bootst
>>>>> rap/evaluation/to_lab_x64BootstrapScript.sml#L83 - does that seem
>>>>> right?)
>>>>>
>>>>> On 18 November 2016 at 09:36, Magnus Myreen <magnus.myreen at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I think the inline_size_limit of 10 is potentially risky. I suggest
>>>>>> the bootstrap is run with a lower setting, e.g. 2 or 3. -- Magnus
>>>>>>
>>>>>> On 18 November 2016 at 09:26, Yong Kiam <tanyongkiam at gmail.com>
>>>>>> wrote:
>>>>>> > Argh, that's a shame. Would moving more of asm_ok up help? There
>>>>>> are still
>>>>>> > some things in LabAsms that can be removed.
>>>>>> >
>>>>>> > I'm not sure know why word_to_stack -specifically- would slow down
>>>>>> by so
>>>>>> > much. Could it be a one-off thing or did you hit that slowdown
>>>>>> multiple
>>>>>> > times already?
>>>>>> >
>>>>>> > Another possibility: changes to the default compiler configuration
>>>>>> could
>>>>>> > have slowed it down.
>>>>>> >
>>>>>> > For example this:
>>>>>> > https://github.com/CakeML/cakeml/commit/98188addc34e5c3a0345
>>>>>> 72c10eca7b3b41aff71c
>>>>>> >
>>>>>> > (e.g. maybe more things got inlined)
>>>>>> >
>>>>>> > On Thu, Nov 17, 2016 at 5:09 PM, Ramana Kumar <
>>>>>> Ramana.Kumar at cl.cam.ac.uk>
>>>>>> > wrote:
>>>>>> >>
>>>>>> >> I will put the timings from various runs of the bootstrap on this
>>>>>> web page
>>>>>> >> so we can more easily refer to them:
>>>>>> >> https://cakeml.org/bootstrap-timing.txt.
>>>>>> >> I've added partial timings for a recent revision (still waiting on
>>>>>> the
>>>>>> >> rest).
>>>>>> >> There is currently a problem with word_to_stack, which is now
>>>>>> taking 5
>>>>>> >> hours where it used to take 1.
>>>>>> >> And given that it seems to be stuck on "sec_ok" I'm wondering how
>>>>>> much the
>>>>>> >> asm_ok Asm check removal has really helped... I guess it was
>>>>>> supposed to
>>>>>> >> bring it down to 6h, so we'll see...
>>>>>> >>
>>>>>> >> On 29 September 2016 at 10:47, Ramana Kumar <
>>>>>> Ramana.Kumar at cl.cam.ac.uk>
>>>>>> >> wrote:
>>>>>> >>>
>>>>>> >>> I have tried running the "sec_ok" part of the bootstrap with the
>>>>>> asm_ok
>>>>>> >>> checks for Asm lines turned off. Two things to report:
>>>>>> >>>
>>>>>> >>> The time is significantly improved:
>>>>>> >>> runtime: 6h02m19s,    gctime: 30m06s,     systime: 3m11s.
>>>>>> >>> (compared to previously: 10h46m09s,    gctime: 1h07m37s,
>>>>>>  systime:
>>>>>> >>> 5m21s.)
>>>>>> >>>
>>>>>> >>> Secondly, this timing information is bogus: the wall clock time
>>>>>> was
>>>>>> >>> definitely under 2 hours (I didn't check precisely). I expect
>>>>>> timing for
>>>>>> >>> threads in parallel execution are added sequentially. So for the
>>>>>> parts of
>>>>>> >>> the bootstrap run in parallel, the timings probably need to be
>>>>>> divided by up
>>>>>> >>> to 8 to get an accurate picture. Perhaps I should annotate which
>>>>>> ones are in
>>>>>> >>> parallel and which not.
>>>>>> >>>
>>>>>> >>> I think it's worthwhile removing the asm_ok checks. I can help
>>>>>> doing so
>>>>>> >>> if someone else sets up some cheated theorems to fill in.
>>>>>> >>>
>>>>>> >>> On 8 September 2016 at 14:33, Ramana Kumar <
>>>>>> Ramana.Kumar at cl.cam.ac.uk>
>>>>>> >>> wrote:
>>>>>> >>>>
>>>>>> >>>> Here are some timings now including the last step (x64Bootstrap).
>>>>>> >>>> (The commit number doesn't exactly match the output labels,
>>>>>> since I
>>>>>> >>>> changed some of that (about to be committed) in the middle.)
>>>>>> >>>>
>>>>>> >>>> Here we can see that the biggest problem (>10 hours) is the
>>>>>> checking
>>>>>> >>>> that all the lines in the final ASM code satisfy line_ok_light
>>>>>> (i.e., the
>>>>>> >>>> "sec_ok" thing below). But there are a few steps that take 1-2
>>>>>> hours each,
>>>>>> >>>> and then a few more that take 20-40 mins, so ... definitely more
>>>>>> than an
>>>>>> >>>> overnight job still...
>>>>>> >>>>
>>>>>> >>>> 1a939d3
>>>>>> >>>>
>>>>>> >>>> eval to_mod: runtime: 1m49s,    gctime: 3.4s,     systime:
>>>>>> 0.35667s.
>>>>>> >>>> eval to_con: runtime: 8m56s,    gctime: 5.5s,     systime:
>>>>>> 0.84000s.
>>>>>> >>>> eval to_dec: runtime: 7.4s,    gctime: 2.4s,     systime:
>>>>>> 0.15667s.
>>>>>> >>>> eval to_exh: runtime: 3m32s,    gctime: 8.9s,     systime:
>>>>>> 0.68667s.
>>>>>> >>>> eval to_pat: runtime: 14.1s,    gctime: 1.4s,     systime:
>>>>>> 0.06667s.
>>>>>> >>>> eval to_clos: runtime: 5.3s,    gctime: 1.1s,     systime:
>>>>>> 0.05667s.
>>>>>> >>>> eval to_bvl: runtime: 2m36s,    gctime: 12.6s,     systime:
>>>>>> 0.88000s.
>>>>>> >>>> eval to_bvi:
>>>>>> >>>> ... inline: runtime: 6.1s,    gctime: 1.2s,     systime:
>>>>>> 0.09333s.
>>>>>> >>>> ... optimise: runtime: 24m53s,    gctime: 1m53s,     systime:
>>>>>> 17.1s.
>>>>>> >>>> ... compile: runtime: 2m27s,    gctime: 11.9s,     systime: 1.5s.
>>>>>> >>>> eval to_data: runtime: 8m11s,    gctime: 21.4s,     systime:
>>>>>> 1.2s.
>>>>>> >>>> eval data_to_word: runtime: 8m28s,    gctime: 30.4s,
>>>>>>  systime: 2.4s.
>>>>>> >>>>
>>>>>> >>>> word_to_word: runtime: 1h00m52s,    gctime: 5m53s,     systime:
>>>>>> 34.1s.
>>>>>> >>>>
>>>>>> >>>> clash: runtime: 5m06s,    gctime: 22.3s,     systime: 2.8s.
>>>>>> >>>> (external oracle) runtime: 2m31s,    gctime: 1.4s,     systime:
>>>>>> >>>> 0.77667s.
>>>>>> >>>>
>>>>>> >>>> chunk: runtime: 1h08m52s,    gctime: 6m06s,     systime: 28.2s.
>>>>>> >>>> (chunk results) runtime: 11m49s,    gctime: 1m13s,     systime:
>>>>>> 7.2s.
>>>>>> >>>>
>>>>>> >>>> eval word_to_stack: runtime: 1h35m38s,    gctime: 3m53s,
>>>>>>  systime:
>>>>>> >>>> 9.4s.
>>>>>> >>>>
>>>>>> >>>> stack_alloc: runtime: 23m27s,    gctime: 45.7s,     systime:
>>>>>> 6.7s.
>>>>>> >>>> stack_remove: runtime: 19m15s,    gctime: 21.5s,     systime:
>>>>>> 4.7s.
>>>>>> >>>> stack_names: runtime: 15m20s,    gctime: 24.9s,     systime:
>>>>>> 3.2s.
>>>>>> >>>> stack_to_lab: runtime: 42m12s,    gctime: 9m27s,     systime:
>>>>>> 10.4s.
>>>>>> >>>>
>>>>>> >>>> filter_skip: runtime: 3m56s,    gctime: 24.7s,     systime: 3.2s.
>>>>>> >>>> ffi_limit: runtime: 2m33s,    gctime: 1m39s,     systime: 3.2s.
>>>>>> >>>> enc_sec: runtime: 25m01s,    gctime: 1m40s,     systime: 11.0s.
>>>>>> >>>> sec_length: runtime: 4m11s,    gctime: 48.9s,     systime: 3.5s.
>>>>>> >>>>
>>>>>> >>>> compute_labels: runtime: 1h17m12s,    gctime: 13m50s,
>>>>>>  systime:
>>>>>> >>>> 58.7s.
>>>>>> >>>>
>>>>>> >>>> enc_secs_again: runtime: 2h00m15s,    gctime: 24m27s,
>>>>>>  systime:
>>>>>> >>>> 1m46s.
>>>>>> >>>>
>>>>>> >>>> upd_lab_len: runtime: 5m51s,    gctime: 55.9s,     systime: 3.4s.
>>>>>> >>>> sec_length2: runtime: 3m32s,    gctime: 8.4s,     systime: 1.2s.
>>>>>> >>>>
>>>>>> >>>> compute_labels2: runtime: 1h27m09s,    gctime: 22m29s,
>>>>>>  systime:
>>>>>> >>>> 1m17s.
>>>>>> >>>>
>>>>>> >>>> enc_secs_again2: runtime: 2h37m41s,    gctime: 1h03m15s,
>>>>>>  systime:
>>>>>> >>>> 3m01s.
>>>>>> >>>>
>>>>>> >>>> pad_sectionruntime: 8m47s,    gctime: 35.8s,     systime: 3.8s.
>>>>>> >>>>
>>>>>> >>>> sec_ok: runtime: 10h46m09s,    gctime: 1h07m37s,     systime:
>>>>>> 5m21s.
>>>>>> >>>>
>>>>>> >>>> all_secs_ok: runtime: 4.5s,    gctime: 0.07667s,     systime:
>>>>>> 0.00333s.
>>>>>> >>>> prog_to_bytes: runtime: 7m27s,    gctime: 2m23s,     systime:
>>>>>> 6.3s.
>>>>>> >>>>
>>>>>> >>>> flat_bytes: runtime: 45m21s,    gctime: 8m25s,     systime: 8.2s.
>>>>>> >>>> expand_defs: runtime: 49m25s,    gctime: 31m53s,     systime:
>>>>>> 1.6s.
>>>>>> >>>>
>>>>>> >>>> On 7 September 2016 at 08:43, Ramana Kumar <
>>>>>> Ramana.Kumar at cl.cam.ac.uk>
>>>>>> >>>> wrote:
>>>>>> >>>>>
>>>>>> >>>>> More timing for a later commit. I believe this one includes
>>>>>> Magnus's
>>>>>> >>>>> changes to add compile_seqs, which seem not to have had much
>>>>>> effect. (I'm
>>>>>> >>>>> hoping I haven't messed up the timing somehow...  it's
>>>>>> surprising how stable
>>>>>> >>>>> they are...)
>>>>>> >>>>>
>>>>>> >>>>> commit: 83b3aaf
>>>>>> >>>>>
>>>>>> >>>>> eval to_mod: runtime: 1m50s,    gctime: 5.8s,     systime: 1.0s.
>>>>>> >>>>> eval to_con: runtime: 8m41s,    gctime: 4.2s,     systime:
>>>>>> 0.65000s.
>>>>>> >>>>> eval to_dec: runtime: 11.5s,    gctime: 6.4s,     systime:
>>>>>> 0.46333s.
>>>>>> >>>>> eval to_exh: runtime: 3m27s,    gctime: 6.8s,     systime:
>>>>>> 0.89000s.
>>>>>> >>>>> eval to_pat: runtime: 14.7s,    gctime: 1.7s,     systime:
>>>>>> 0.11667s.
>>>>>> >>>>> eval to_clos: runtime: 5.5s,    gctime: 1.1s,     systime:
>>>>>> 0.06333s.
>>>>>> >>>>> eval to_bvl: runtime: 2m29s,    gctime: 5.6s,     systime:
>>>>>> 0.28333s.
>>>>>> >>>>> eval to_bvi:
>>>>>> >>>>> ... inline: runtime: 6.3s,    gctime: 1.3s,     systime:
>>>>>> 0.06333s.
>>>>>> >>>>> ... optimise: runtime: 23m26s,    gctime: 28.8s,     systime:
>>>>>> 6.3s.
>>>>>> >>>>> ... compile: runtime: 2m18s,    gctime: 4.1s,     systime:
>>>>>> 0.25000s.
>>>>>> >>>>> eval to_data: runtime: 8m23s,    gctime: 44.3s,     systime:
>>>>>> 3.2s.
>>>>>> >>>>> eval data_to_word: runtime: 8m23s,    gctime: 39.3s,
>>>>>>  systime: 3.1s.
>>>>>> >>>>>
>>>>>> >>>>> word_to_word: runtime: 58m18s,    gctime: 4m53s,     systime:
>>>>>> 30.1s.
>>>>>> >>>>>
>>>>>> >>>>> clash: runtime: 5m01s,    gctime: 18.4s,     systime: 2.5s.
>>>>>> >>>>> (external oracle) runtime: 2m31s,    gctime: 1.9s,     systime:
>>>>>> >>>>> 0.20667s.
>>>>>> >>>>>
>>>>>> >>>>> chunk: runtime: 1h01m18s,    gctime: 1m13s,     systime: 22.9s.
>>>>>> >>>>> (chunk results) runtime: 14m26s,    gctime: 5m57s,     systime:
>>>>>> 9.1s.
>>>>>> >>>>>
>>>>>> >>>>> eval word_to_stack: runtime: 1h29m05s,    gctime: 4m38s,
>>>>>>  systime:
>>>>>> >>>>> 14.4s.
>>>>>> >>>>>
>>>>>> >>>>> stack_alloc: runtime: 23m03s,    gctime: 58.3s,     systime:
>>>>>> 5.8s.
>>>>>> >>>>> stack_remove: runtime: 20m09s,    gctime: 1m29s,     systime:
>>>>>> 6.5s.
>>>>>> >>>>> stack_names: runtime: 15m05s,    gctime: 41.6s,     systime:
>>>>>> 3.9s.
>>>>>> >>>>> stack_to_lab: runtime: 35m11s,    gctime: 2m54s,     systime:
>>>>>> 12.0s.
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>> On 6 September 2016 at 10:11, Yong Kiam <tanyongkiam at gmail.com>
>>>>>> wrote:
>>>>>> >>>>>>
>>>>>> >>>>>> Hmm, that is surprising. I expected the main function to have
>>>>>> a good
>>>>>> >>>>>> number of If-s (which my change should have helped with), but
>>>>>> it seems like
>>>>>> >>>>>> that didn't affect the time at all.
>>>>>> >>>>>>
>>>>>> >>>>>> Here are two possibilities:
>>>>>> >>>>>>
>>>>>> >>>>>> 1) Caching lookups into the colouring function for the chunk
>>>>>> step. The
>>>>>> >>>>>> function is defined as an option-lookup into an oracle sptree:
>>>>>> >>>>>> https://github.com/CakeML/cakeml/blob/9937677d34446adace9f29
>>>>>> d5719131f9a5b4aeac/compiler/backend/reg_alloc/reg_allocScrip
>>>>>> t.sml#L110
>>>>>> >>>>>> I'm not sure how much that will help though, since the
>>>>>> variable names
>>>>>> >>>>>> should be in SSA already, so it is mostly looking up different
>>>>>> things each
>>>>>> >>>>>> time.
>>>>>> >>>>>>
>>>>>> >>>>>> 2) Make splitting more aggressive, i.e. the main function
>>>>>> should be
>>>>>> >>>>>> divided into more pieces so that its colouring sptree isn't
>>>>>> too big (it gets
>>>>>> >>>>>> larger when there are more variables too).
>>>>>> >>>>>>
>>>>>> >>>>>> Also, are you getting some kind of effect where every parallel
>>>>>> thread
>>>>>> >>>>>> is just waiting for one to finish (namely the one working on
>>>>>> the main
>>>>>> >>>>>> function)? 2) might help divide the work more evenly.
>>>>>> >>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>> On Mon, Sep 5, 2016 at 7:45 PM, Ramana Kumar
>>>>>> >>>>>> <Ramana.Kumar at cl.cam.ac.uk> wrote:
>>>>>> >>>>>>>
>>>>>> >>>>>>> Some more timings, for two recent commits. Not a big
>>>>>> difference, it
>>>>>> >>>>>>> seems.
>>>>>> >>>>>>>
>>>>>> >>>>>>> commit: 69ac2f9
>>>>>> >>>>>>>
>>>>>> >>>>>>> eval to_mod: runtime: 1m45s,    gctime: 3.8s,     systime:
>>>>>> 0.36667s.
>>>>>> >>>>>>> eval to_con: runtime: 8m36s,    gctime: 5.5s,     systime:
>>>>>> 0.90667s.
>>>>>> >>>>>>> eval to_dec: runtime: 7.3s,    gctime: 2.3s,     systime:
>>>>>> 0.17000s.
>>>>>> >>>>>>> eval to_exh: runtime: 3m27s,    gctime: 9.5s,     systime:
>>>>>> 1.1s.
>>>>>> >>>>>>> eval to_pat: runtime: 13.9s,    gctime: 1.5s,     systime:
>>>>>> 0.10333s.
>>>>>> >>>>>>> eval to_clos: runtime: 5.2s,    gctime: 1.1s,     systime:
>>>>>> 0.07000s.
>>>>>> >>>>>>> eval to_bvl: runtime: 2m28s,    gctime: 5.5s,     systime:
>>>>>> 0.22667s.
>>>>>> >>>>>>> eval to_bvi: ... inline: runtime: 6.0s,    gctime: 1.3s,
>>>>>>  systime:
>>>>>> >>>>>>> 0.09000s.
>>>>>> >>>>>>> ... optimise:runtime: 23m27s,    gctime: 6.9s,     systime:
>>>>>> 3.2s.
>>>>>> >>>>>>> ... compile: runtime: 2m33s,    gctime: 19.3s,     systime:
>>>>>> 1.2s.
>>>>>> >>>>>>> eval to_data: runtime: 8m29s,    gctime: 51.5s,     systime:
>>>>>> 3.4s.
>>>>>> >>>>>>> eval data_to_word: runtime: 8m41s,    gctime: 50.2s,
>>>>>>  systime:
>>>>>> >>>>>>> 3.4s.
>>>>>> >>>>>>>
>>>>>> >>>>>>> word_to_word: runtime: 58m21s,    gctime: 4m28s,     systime:
>>>>>> 28.0s.
>>>>>> >>>>>>>
>>>>>> >>>>>>> clash: runtime: 4m58s,    gctime: 15.5s,     systime: 2.7s.
>>>>>> >>>>>>> (external oracle) runtime: 2m32s,    gctime: 1.8s,
>>>>>>  systime:
>>>>>> >>>>>>> 0.24000s.
>>>>>> >>>>>>>
>>>>>> >>>>>>> chunk: runtime: 1h13m05s,    gctime: 6m06s,     systime:
>>>>>> 26.9s.
>>>>>> >>>>>>> (chunk results) runtime: 13m31s,    gctime: 2m47s,
>>>>>>  systime:
>>>>>> >>>>>>> 14.4s.
>>>>>> >>>>>>>
>>>>>> >>>>>>> eval word_to_stack: runtime: 1h32m52s,    gctime: 6m12s,
>>>>>>  systime:
>>>>>> >>>>>>> 12.7s.
>>>>>> >>>>>>>
>>>>>> >>>>>>> stack_alloc: runtime: 23m03s,    gctime: 42.2s,     systime:
>>>>>> 5.8s.
>>>>>> >>>>>>> stack_remove: runtime: 19m08s,    gctime: 22.7s,     systime:
>>>>>> 4.3s.
>>>>>> >>>>>>> stack_names: runtime: 15m45s,    gctime: 1m05s,     systime:
>>>>>> 5.3s.
>>>>>> >>>>>>> stack_to_lab: runtime: 41m59s,    gctime: 9m22s,     systime:
>>>>>> 8.6s.
>>>>>> >>>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>>> commit: 9937677
>>>>>> >>>>>>>
>>>>>> >>>>>>> eval to_mod: runtime: 1m51s,    gctime: 5.9s,     systime:
>>>>>> 0.90000s.
>>>>>> >>>>>>> eval to_con: runtime: 8m48s,    gctime: 4.4s,     systime:
>>>>>> 0.71667s.
>>>>>> >>>>>>> eval to_dec: runtime: 7.2s,    gctime: 2.3s,     systime:
>>>>>> 0.15667s.
>>>>>> >>>>>>> eval to_exh: runtime: 3m30s,    gctime: 9.5s,     systime:
>>>>>> 1.2s.
>>>>>> >>>>>>> eval to_pat: runtime: 14.3s,    gctime: 1.6s,     systime:
>>>>>> 0.09000s.
>>>>>> >>>>>>> eval to_clos: runtime: 5.2s,    gctime: 0.95667s,     systime:
>>>>>> >>>>>>> 0.07333s.
>>>>>> >>>>>>> eval to_bvl: runtime: 2m28s,    gctime: 5.2s,     systime:
>>>>>> 0.32667s.
>>>>>> >>>>>>> eval to_bvi: ... inline: runtime: 5.8s,    gctime: 0.88000s,
>>>>>> >>>>>>> systime: 0.05333s.
>>>>>> >>>>>>> ... optimise: runtime: 23m24s,    gctime: 5.8s,     systime:
>>>>>> 3.0s.
>>>>>> >>>>>>> ... compile: runtime: 2m27s,    gctime: 12.5s,     systime:
>>>>>> 0.78333s.
>>>>>> >>>>>>> eval to_data: runtime: 8m18s,    gctime: 34.2s,     systime:
>>>>>> 3.0s.
>>>>>> >>>>>>> eval data_to_word: runtime: 8m24s,    gctime: 37.4s,
>>>>>>  systime:
>>>>>> >>>>>>> 2.8s.
>>>>>> >>>>>>>
>>>>>> >>>>>>> word_to_word: runtime: 57m40s,    gctime: 4m56s,     systime:
>>>>>> 30.1s.
>>>>>> >>>>>>>
>>>>>> >>>>>>> clash: runtime: 5m12s,    gctime: 33.6s,     systime: 3.9s.
>>>>>> >>>>>>> (external oracle) runtime: 2m31s,    gctime: 1.8s,
>>>>>>  systime:
>>>>>> >>>>>>> 0.15667s.
>>>>>> >>>>>>>
>>>>>> >>>>>>> chunk: runtime: 1h07m54s,    gctime: 6m07s,     systime:
>>>>>> 29.5s.
>>>>>> >>>>>>> (chunk results) runtime: 13m21s,    gctime: 2m54s,
>>>>>>  systime:
>>>>>> >>>>>>> 14.0s.
>>>>>> >>>>>>>
>>>>>> >>>>>>> eval word_to_stack: runtime: 1h34m00s,    gctime: 7m32s,
>>>>>>  systime:
>>>>>> >>>>>>> 8.1s.
>>>>>> >>>>>>>
>>>>>> >>>>>>> stack_alloc: runtime: 22m35s,    gctime: 18.7s,     systime:
>>>>>> 3.7s.
>>>>>> >>>>>>> stack_remove: runtime: 19m04s,    gctime: 16.7s,     systime:
>>>>>> 3.9s.
>>>>>> >>>>>>> stack_names: runtime: 15m13s,    gctime: 37.8s,     systime:
>>>>>> 4.0s.
>>>>>> >>>>>>> stack_to_lab: runtime: 36m21s,    gctime: 3m29s,     systime:
>>>>>> 13.9s.
>>>>>> >>>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>>> On 1 September 2016 at 09:04, Ramana Kumar
>>>>>> >>>>>>> <Ramana.Kumar at cl.cam.ac.uk> wrote:
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> Also, updated timings for the middle bit, as of 4d0d56b
>>>>>> (which is
>>>>>> >>>>>>>> also for the front half timings in my previous email).
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> eval data_to_word: runtime: 7m42s,    gctime: 35.7s,
>>>>>>  systime:
>>>>>> >>>>>>>> 2.7s.
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> word_to_word: runtime: 56m07s,    gctime: 4m48s,
>>>>>>  systime: 28.7s.
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> clash: runtime: 4m48s,    gctime: 16.0s,     systime: 2.2s.
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> (external oracle) runtime: 2m21s,    gctime: 1.7s,
>>>>>>  systime:
>>>>>> >>>>>>>> 0.10667s.
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> chunk: runtime: 1h10m00s,    gctime: 6m06s,     systime:
>>>>>> 27.1s.
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> (use chunk) runtime: 12m26s,    gctime: 2m29s,     systime:
>>>>>> 11.0s.
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> eval word_to_stack: runtime: 1h29m28s,    gctime: 7m23s,
>>>>>> >>>>>>>> systime: 8.0s.
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> stack_alloc: runtime: 21m13s,    gctime: 24.3s,     systime:
>>>>>> 3.9s.
>>>>>> >>>>>>>> stack_remove: runtime: 18m12s,    gctime: 46.6s,
>>>>>>  systime: 4.3s.
>>>>>> >>>>>>>> stack_names: runtime: 14m08s,    gctime: 32.1s,     systime:
>>>>>> 4.1s.
>>>>>> >>>>>>>> stack_to_lab: runtime: 36m41s,    gctime: 1m19s,
>>>>>>  systime: 10.6s.
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> It doesn't look like 91d539b made a huge difference.
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> On 1 September 2016 at 08:42, Ramana Kumar
>>>>>> >>>>>>>> <Ramana.Kumar at cl.cam.ac.uk> wrote:
>>>>>> >>>>>>>>>
>>>>>> >>>>>>>>> I have some timings for the front half of the backend now:
>>>>>> >>>>>>>>> eval to_mod: runtime: 1m45s,    gctime: 9.6s,     systime:
>>>>>> >>>>>>>>> 0.40000s.
>>>>>> >>>>>>>>> eval to_con: runtime: 7m00s,    gctime: 5.4s,     systime:
>>>>>> >>>>>>>>> 0.80667s.
>>>>>> >>>>>>>>> eval to_dec: runtime: 7.0s,    gctime: 2.3s,     systime:
>>>>>> 0.13333s.
>>>>>> >>>>>>>>> eval to_exh: runtime: 3m18s,    gctime: 4.5s,     systime:
>>>>>> >>>>>>>>> 0.67000s.
>>>>>> >>>>>>>>> eval to_pat: runtime: 15.9s,    gctime: 3.7s,     systime:
>>>>>> >>>>>>>>> 0.50667s.
>>>>>> >>>>>>>>> eval to_clos: runtime: 5.0s,    gctime: 1.1s,     systime:
>>>>>> >>>>>>>>> 0.07000s.
>>>>>> >>>>>>>>> eval to_bvl: runtime: 2m18s,    gctime: 5.2s,     systime:
>>>>>> >>>>>>>>> 0.36000s.
>>>>>> >>>>>>>>> eval to_bvi:
>>>>>> >>>>>>>>> ... inline: runtime: 5.8s,    gctime: 1.1s,     systime:
>>>>>> 0.07667s.
>>>>>> >>>>>>>>> ... optimise: runtime: 22m52s,    gctime: 30.1s,
>>>>>>  systime: 6.1s.
>>>>>> >>>>>>>>> ... compile: runtime: 2m05s,    gctime: 6.9s,     systime:
>>>>>> >>>>>>>>> 0.72000s.
>>>>>> >>>>>>>>> eval to_data: runtime: 5m44s,    gctime: 28.8s,
>>>>>>  systime: 2.1s.
>>>>>> >>>>>>>>>
>>>>>> >>>>>>>>> I'm not sure why there are suspicious timings for things
>>>>>> like
>>>>>> >>>>>>>>> stack_alloc. One possibility is that the "time" function
>>>>>> adds up times
>>>>>> >>>>>>>>> across all parallel threads. In that case, it would make
>>>>>> sense to divide the
>>>>>> >>>>>>>>> time by 8. I would need to investigate whether this is
>>>>>> actually what's going
>>>>>> >>>>>>>>> on...
>>>>>> >>>>>>>>>
>>>>>> >>>>>>>>> On 30 August 2016 at 19:36, Magnus Myreen <
>>>>>> magnus.myreen at gmail.com>
>>>>>> >>>>>>>>> wrote:
>>>>>> >>>>>>>>>>
>>>>>> >>>>>>>>>> My guess is that evaluation times get slower once we pass
>>>>>> dataLang
>>>>>> >>>>>>>>>> because the programs become a lot larger once the data
>>>>>> abstraction
>>>>>> >>>>>>>>>> has
>>>>>> >>>>>>>>>> been removed.
>>>>>> >>>>>>>>>>
>>>>>> >>>>>>>>>> There are however some suspicious looking timings here. I
>>>>>> note
>>>>>> >>>>>>>>>> that
>>>>>> >>>>>>>>>> stack_alloc does near to nothing, but still takes 31
>>>>>> minutes to
>>>>>> >>>>>>>>>> run.
>>>>>> >>>>>>>>>> Here is stack_alloc's main compiler function:
>>>>>> >>>>>>>>>>
>>>>>> >>>>>>>>>>
>>>>>> >>>>>>>>>> https://github.com/CakeML/cake
>>>>>> ml/blob/master/compiler/backend/stack_allocScript.sml#L166
>>>>>> >>>>>>>>>>
>>>>>> >>>>>>>>>> Cheers,
>>>>>> >>>>>>>>>> Magnus
>>>>>> >>>>>>>>>>
>>>>>> >>>>>>>>>> On 30 August 2016 at 03:43, Ramana Kumar
>>>>>> >>>>>>>>>> <Ramana.Kumar at cl.cam.ac.uk> wrote:
>>>>>> >>>>>>>>>> > I have some information now on where slow parts of the
>>>>>> bootstrap
>>>>>> >>>>>>>>>> > evaluation
>>>>>> >>>>>>>>>> > are, which we could use to direct effort in case we want
>>>>>> to
>>>>>> >>>>>>>>>> > speed things up.
>>>>>> >>>>>>>>>> >
>>>>>> >>>>>>>>>> > Currently I have timings for between dataLang and
>>>>>> labLang. (I
>>>>>> >>>>>>>>>> > know the
>>>>>> >>>>>>>>>> > phases above dataLang are mostly quick, and the ones
>>>>>> after
>>>>>> >>>>>>>>>> > labLang are
>>>>>> >>>>>>>>>> > mostly slow, but I will have to get actual numbers later)
>>>>>> >>>>>>>>>> >
>>>>>> >>>>>>>>>> > eval data_to_word: runtime: 7m48s,    gctime: 21.9s,
>>>>>> >>>>>>>>>> > systime: 2.1s.
>>>>>> >>>>>>>>>> >
>>>>>> >>>>>>>>>> > word_to_word: runtime: 55m07s,    gctime: 5m31s,
>>>>>>  systime:
>>>>>> >>>>>>>>>> > 30.9s.
>>>>>> >>>>>>>>>> >
>>>>>> >>>>>>>>>> > clash: runtime: 5m33s,    gctime: 34.4s,     systime:
>>>>>> 4.4s.
>>>>>> >>>>>>>>>> >
>>>>>> >>>>>>>>>> > (external oracle) runtime: 2m30s,    gctime: 12.1s,
>>>>>>  systime:
>>>>>> >>>>>>>>>> > 1.5s
>>>>>> >>>>>>>>>> >
>>>>>> >>>>>>>>>> > chunk: runtime: 1h04m14s,    gctime: 1m37s,     systime:
>>>>>> 20.8s.
>>>>>> >>>>>>>>>> > (use results of chunk) runtime: 11m21s,    gctime: 2m00s,
>>>>>> >>>>>>>>>> > systime:
>>>>>> >>>>>>>>>> > 10.1s.
>>>>>> >>>>>>>>>> >
>>>>>> >>>>>>>>>> > eval word_to_stack: runtime: 1h34m21s,    gctime: 9m42s,
>>>>>> >>>>>>>>>> > systime: 9.4s.
>>>>>> >>>>>>>>>> >
>>>>>> >>>>>>>>>> > stack_alloc: runtime: 31m55s,    gctime: 30.3s,
>>>>>>  systime:
>>>>>> >>>>>>>>>> > 5.4s.
>>>>>> >>>>>>>>>> >
>>>>>> >>>>>>>>>> > stack_remove: runtime: 25m43s,    gctime: 1m32s,
>>>>>>  systime:
>>>>>> >>>>>>>>>> > 6.9s.
>>>>>> >>>>>>>>>> >
>>>>>> >>>>>>>>>> > stack_names: runtime: 22m03s,    gctime: 3m01s,
>>>>>>  systime:
>>>>>> >>>>>>>>>> > 6.6s.
>>>>>> >>>>>>>>>> >
>>>>>> >>>>>>>>>> > stack_to_lab: runtime: 56m00s,    gctime: 4m38s,
>>>>>>  systime:
>>>>>> >>>>>>>>>> > 23.8s.
>>>>>> >>>>>>>>>> >
>>>>>> >>>>>>>>>> > What exactly these different phases are doing can be
>>>>>> found by
>>>>>> >>>>>>>>>> > looking at
>>>>>> >>>>>>>>>> >
>>>>>> >>>>>>>>>> > https://github.com/CakeML/cake
>>>>>> ml/blob/master/compiler/bootstrap/evaluation/to_lab_x64Boots
>>>>>> trapScript.sml.
>>>>>> >>>>>>>>>> > I think the one confusingly called "chunk" means
>>>>>> checking the
>>>>>> >>>>>>>>>> > results of the
>>>>>> >>>>>>>>>> > external reg. alloc oracle.
>>>>>> >>>>>>>>>> >
>>>>>> >>>>>>>>>> > _______________________________________________
>>>>>> >>>>>>>>>> > Dev mailing list
>>>>>> >>>>>>>>>> > Dev at cakeml.org
>>>>>> >>>>>>>>>> > https://lists.cakeml.org/listinfo/dev
>>>>>> >>>>>>>>>> >
>>>>>> >>>>>>>>>
>>>>>> >>>>>>>>>
>>>>>> >>>>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>>> _______________________________________________
>>>>>> >>>>>>> Dev mailing list
>>>>>> >>>>>>> Dev at cakeml.org
>>>>>> >>>>>>> https://lists.cakeml.org/listinfo/dev
>>>>>> >>>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>
>>>>>> >>>>
>>>>>> >>>
>>>>>> >>
>>>>>> >>
>>>>>> >> _______________________________________________
>>>>>> >> Developers mailing list
>>>>>> >> Developers at cakeml.org
>>>>>> >> https://lists.cakeml.org/listinfo/developers
>>>>>> >>
>>>>>> >
>>>>>> >
>>>>>> > _______________________________________________
>>>>>> > Developers mailing list
>>>>>> > Developers at cakeml.org
>>>>>> > https://lists.cakeml.org/listinfo/developers
>>>>>> >
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cakeml.org/pipermail/developers/attachments/20161124/7f078eee/attachment-0001.html>


More information about the Developers mailing list