[CakeML-Dev] Bootstrap evaluation timings

Yong Kiam tanyongkiam at gmail.com
Sun Nov 27 00:09:42 UTC 2016


Here's a summary of the issue:

The latest change to put Equal into data_to_word currently produces
unreachable code (GiveUp).

These unreachable bits of code goes through wordLang, but the allocator and
SSA don't touch it because the code refers to variables that are
"uninitialized". This ended up producing massive stack frames in the
bootstrap (upwards of 1000 variables on stack) going into word_to_stack
because the variables were just passed in as is.

My latest commit fixes this in wordLang, and hopefully the time for
word_to_stack goes back down to something more reasonable (untested).


On Sat, Nov 26, 2016 at 6:11 PM, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk>
wrote:

> It looks like exp_cut did make a difference. Latest timings here:
> https://cakeml.org/bootstrap-timing.txt
>
> Also, the asm_ok check removal has made a difference: sec_ok is only
> ~6hours (previously it has always been >10 and sometimes >18), as
> originally hoped.
>
> The longest thing in the latest timings is word_to_stack (8hrs), but as
> we've been discussing separately, there are fixes to that coming soon...
>
> Other things to consider: do any passes look like they're taking
> unexpectedly long, or there might be relatively easy tweaks to speed them
> up? Could other tweaks to the configuration parameters improve the speed?
> (e.g., set inline even lower)
>
>
> On 23 November 2016 at 11:45, Yong Kiam <tanyongkiam at gmail.com> wrote:
>
>> I went through the commit history on github, and here is a list of
>> changes since 1a939d3 that looked like they might affect something (based
>> on the title):
>>
>> 55d5fc0: Update unverified register allocator
>> 63cbd66: Removed asm_ok
>> 98188ad: Changed default bvl_to_bvi configurations
>> 66ddd3d: exh_reorder
>> 8c2f5ae: load_opt
>>
>> I also found that the eval/x64 directory could probably be deleted (the
>> default configs for every backend is also defined in targets/configTheory).
>> The only difference in the x64_compiler_config defined there and the one in
>> configTheory is pad_bits for data_to_word though. I think it is only there
>> because otherwise all the backends have to be built to eval anything using
>> the x64 configuration.
>>
>> I also reduced the value for exp_cut locally on the nqueens benchmark,
>> but I didn't any meaningful difference in compile times (the benchmark is
>> probably too small though).
>>
>>
>>
>> On Tue, Nov 22, 2016 at 4:42 PM, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk>
>> wrote:
>>
>>> Preliminary results with inline_size_limit set back down to 3 indicate
>>> that word_to_stack is still taking longer than before, specifically 3.5
>>> hours as opposed to the 1.5 hours of two months ago. Could the exp_cut
>>> parameter also be relevant?
>>>
>>> On 22 November 2016 at 16:30, Ramana Kumar <Ramana.Kumar at cl.cam.ac.uk>
>>> wrote:
>>>
>>>> Some updated times (now included program sizes) are up (
>>>> https://cakeml.org/bootstrap-timing.txt). Things are still
>>>> surprisingly slow (word_to_stack, sec_ok) - guess is still that it's
>>>> because of inline limit, so the next thing to try is changing the inline
>>>> limit..
>>>>
>>>> Is conf_ok required if I change the inline_limit just for the
>>>> bootstrap? Isn't it easy to show that conf_ok isn't affected by that config
>>>> field?
>>>>
>>>> On 18 November 2016 at 10:39, Yong Kiam <tanyongkiam at gmail.com> wrote:
>>>>
>>>>> It shouldn't be very hard... I just thought it wouldn't be high
>>>>> priority since the remaining ones are only in LabAsms.
>>>>>
>>>>> Changing the config there seems right. You won't get a conf_ok proof
>>>>> about it though.
>>>>>
>>>>> On Thu, Nov 17, 2016 at 5:46 PM, Ramana Kumar <
>>>>> Ramana.Kumar at cl.cam.ac.uk> wrote:
>>>>>
>>>>>> I've seen the word_to_stack slowdown twice (although it only shows up
>>>>>> in the timings document once - I didn't record the other data).
>>>>>>
>>>>>> I think removing more asm_ok is always a good idea, but I'm not sure
>>>>>> how much priority it should be given - how hard is it to remove more?
>>>>>>
>>>>>> I guess after the current run finishes I could try changing the
>>>>>> inline limit (I would do this by using a different config here
>>>>>> https://github.com/CakeML/cakeml/blob/master/compiler/bootst
>>>>>> rap/evaluation/to_lab_x64BootstrapScript.sml#L83 - does that seem
>>>>>> right?)
>>>>>>
>>>>>> On 18 November 2016 at 09:36, Magnus Myreen <magnus.myreen at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I think the inline_size_limit of 10 is potentially risky. I suggest
>>>>>>> the bootstrap is run with a lower setting, e.g. 2 or 3. -- Magnus
>>>>>>>
>>>>>>> On 18 November 2016 at 09:26, Yong Kiam <tanyongkiam at gmail.com>
>>>>>>> wrote:
>>>>>>> > Argh, that's a shame. Would moving more of asm_ok up help? There
>>>>>>> are still
>>>>>>> > some things in LabAsms that can be removed.
>>>>>>> >
>>>>>>> > I'm not sure know why word_to_stack -specifically- would slow down
>>>>>>> by so
>>>>>>> > much. Could it be a one-off thing or did you hit that slowdown
>>>>>>> multiple
>>>>>>> > times already?
>>>>>>> >
>>>>>>> > Another possibility: changes to the default compiler configuration
>>>>>>> could
>>>>>>> > have slowed it down.
>>>>>>> >
>>>>>>> > For example this:
>>>>>>> > https://github.com/CakeML/cakeml/commit/98188addc34e5c3a0345
>>>>>>> 72c10eca7b3b41aff71c
>>>>>>> >
>>>>>>> > (e.g. maybe more things got inlined)
>>>>>>> >
>>>>>>> > On Thu, Nov 17, 2016 at 5:09 PM, Ramana Kumar <
>>>>>>> Ramana.Kumar at cl.cam.ac.uk>
>>>>>>> > wrote:
>>>>>>> >>
>>>>>>> >> I will put the timings from various runs of the bootstrap on this
>>>>>>> web page
>>>>>>> >> so we can more easily refer to them:
>>>>>>> >> https://cakeml.org/bootstrap-timing.txt.
>>>>>>> >> I've added partial timings for a recent revision (still waiting
>>>>>>> on the
>>>>>>> >> rest).
>>>>>>> >> There is currently a problem with word_to_stack, which is now
>>>>>>> taking 5
>>>>>>> >> hours where it used to take 1.
>>>>>>> >> And given that it seems to be stuck on "sec_ok" I'm wondering how
>>>>>>> much the
>>>>>>> >> asm_ok Asm check removal has really helped... I guess it was
>>>>>>> supposed to
>>>>>>> >> bring it down to 6h, so we'll see...
>>>>>>> >>
>>>>>>> >> On 29 September 2016 at 10:47, Ramana Kumar <
>>>>>>> Ramana.Kumar at cl.cam.ac.uk>
>>>>>>> >> wrote:
>>>>>>> >>>
>>>>>>> >>> I have tried running the "sec_ok" part of the bootstrap with the
>>>>>>> asm_ok
>>>>>>> >>> checks for Asm lines turned off. Two things to report:
>>>>>>> >>>
>>>>>>> >>> The time is significantly improved:
>>>>>>> >>> runtime: 6h02m19s,    gctime: 30m06s,     systime: 3m11s.
>>>>>>> >>> (compared to previously: 10h46m09s,    gctime: 1h07m37s,
>>>>>>>  systime:
>>>>>>> >>> 5m21s.)
>>>>>>> >>>
>>>>>>> >>> Secondly, this timing information is bogus: the wall clock time
>>>>>>> was
>>>>>>> >>> definitely under 2 hours (I didn't check precisely). I expect
>>>>>>> timing for
>>>>>>> >>> threads in parallel execution are added sequentially. So for the
>>>>>>> parts of
>>>>>>> >>> the bootstrap run in parallel, the timings probably need to be
>>>>>>> divided by up
>>>>>>> >>> to 8 to get an accurate picture. Perhaps I should annotate which
>>>>>>> ones are in
>>>>>>> >>> parallel and which not.
>>>>>>> >>>
>>>>>>> >>> I think it's worthwhile removing the asm_ok checks. I can help
>>>>>>> doing so
>>>>>>> >>> if someone else sets up some cheated theorems to fill in.
>>>>>>> >>>
>>>>>>> >>> On 8 September 2016 at 14:33, Ramana Kumar <
>>>>>>> Ramana.Kumar at cl.cam.ac.uk>
>>>>>>> >>> wrote:
>>>>>>> >>>>
>>>>>>> >>>> Here are some timings now including the last step
>>>>>>> (x64Bootstrap).
>>>>>>> >>>> (The commit number doesn't exactly match the output labels,
>>>>>>> since I
>>>>>>> >>>> changed some of that (about to be committed) in the middle.)
>>>>>>> >>>>
>>>>>>> >>>> Here we can see that the biggest problem (>10 hours) is the
>>>>>>> checking
>>>>>>> >>>> that all the lines in the final ASM code satisfy line_ok_light
>>>>>>> (i.e., the
>>>>>>> >>>> "sec_ok" thing below). But there are a few steps that take 1-2
>>>>>>> hours each,
>>>>>>> >>>> and then a few more that take 20-40 mins, so ... definitely
>>>>>>> more than an
>>>>>>> >>>> overnight job still...
>>>>>>> >>>>
>>>>>>> >>>> 1a939d3
>>>>>>> >>>>
>>>>>>> >>>> eval to_mod: runtime: 1m49s,    gctime: 3.4s,     systime:
>>>>>>> 0.35667s.
>>>>>>> >>>> eval to_con: runtime: 8m56s,    gctime: 5.5s,     systime:
>>>>>>> 0.84000s.
>>>>>>> >>>> eval to_dec: runtime: 7.4s,    gctime: 2.4s,     systime:
>>>>>>> 0.15667s.
>>>>>>> >>>> eval to_exh: runtime: 3m32s,    gctime: 8.9s,     systime:
>>>>>>> 0.68667s.
>>>>>>> >>>> eval to_pat: runtime: 14.1s,    gctime: 1.4s,     systime:
>>>>>>> 0.06667s.
>>>>>>> >>>> eval to_clos: runtime: 5.3s,    gctime: 1.1s,     systime:
>>>>>>> 0.05667s.
>>>>>>> >>>> eval to_bvl: runtime: 2m36s,    gctime: 12.6s,     systime:
>>>>>>> 0.88000s.
>>>>>>> >>>> eval to_bvi:
>>>>>>> >>>> ... inline: runtime: 6.1s,    gctime: 1.2s,     systime:
>>>>>>> 0.09333s.
>>>>>>> >>>> ... optimise: runtime: 24m53s,    gctime: 1m53s,     systime:
>>>>>>> 17.1s.
>>>>>>> >>>> ... compile: runtime: 2m27s,    gctime: 11.9s,     systime:
>>>>>>> 1.5s.
>>>>>>> >>>> eval to_data: runtime: 8m11s,    gctime: 21.4s,     systime:
>>>>>>> 1.2s.
>>>>>>> >>>> eval data_to_word: runtime: 8m28s,    gctime: 30.4s,
>>>>>>>  systime: 2.4s.
>>>>>>> >>>>
>>>>>>> >>>> word_to_word: runtime: 1h00m52s,    gctime: 5m53s,     systime:
>>>>>>> 34.1s.
>>>>>>> >>>>
>>>>>>> >>>> clash: runtime: 5m06s,    gctime: 22.3s,     systime: 2.8s.
>>>>>>> >>>> (external oracle) runtime: 2m31s,    gctime: 1.4s,     systime:
>>>>>>> >>>> 0.77667s.
>>>>>>> >>>>
>>>>>>> >>>> chunk: runtime: 1h08m52s,    gctime: 6m06s,     systime: 28.2s.
>>>>>>> >>>> (chunk results) runtime: 11m49s,    gctime: 1m13s,     systime:
>>>>>>> 7.2s.
>>>>>>> >>>>
>>>>>>> >>>> eval word_to_stack: runtime: 1h35m38s,    gctime: 3m53s,
>>>>>>>  systime:
>>>>>>> >>>> 9.4s.
>>>>>>> >>>>
>>>>>>> >>>> stack_alloc: runtime: 23m27s,    gctime: 45.7s,     systime:
>>>>>>> 6.7s.
>>>>>>> >>>> stack_remove: runtime: 19m15s,    gctime: 21.5s,     systime:
>>>>>>> 4.7s.
>>>>>>> >>>> stack_names: runtime: 15m20s,    gctime: 24.9s,     systime:
>>>>>>> 3.2s.
>>>>>>> >>>> stack_to_lab: runtime: 42m12s,    gctime: 9m27s,     systime:
>>>>>>> 10.4s.
>>>>>>> >>>>
>>>>>>> >>>> filter_skip: runtime: 3m56s,    gctime: 24.7s,     systime:
>>>>>>> 3.2s.
>>>>>>> >>>> ffi_limit: runtime: 2m33s,    gctime: 1m39s,     systime: 3.2s.
>>>>>>> >>>> enc_sec: runtime: 25m01s,    gctime: 1m40s,     systime: 11.0s.
>>>>>>> >>>> sec_length: runtime: 4m11s,    gctime: 48.9s,     systime: 3.5s.
>>>>>>> >>>>
>>>>>>> >>>> compute_labels: runtime: 1h17m12s,    gctime: 13m50s,
>>>>>>>  systime:
>>>>>>> >>>> 58.7s.
>>>>>>> >>>>
>>>>>>> >>>> enc_secs_again: runtime: 2h00m15s,    gctime: 24m27s,
>>>>>>>  systime:
>>>>>>> >>>> 1m46s.
>>>>>>> >>>>
>>>>>>> >>>> upd_lab_len: runtime: 5m51s,    gctime: 55.9s,     systime:
>>>>>>> 3.4s.
>>>>>>> >>>> sec_length2: runtime: 3m32s,    gctime: 8.4s,     systime: 1.2s.
>>>>>>> >>>>
>>>>>>> >>>> compute_labels2: runtime: 1h27m09s,    gctime: 22m29s,
>>>>>>>  systime:
>>>>>>> >>>> 1m17s.
>>>>>>> >>>>
>>>>>>> >>>> enc_secs_again2: runtime: 2h37m41s,    gctime: 1h03m15s,
>>>>>>>  systime:
>>>>>>> >>>> 3m01s.
>>>>>>> >>>>
>>>>>>> >>>> pad_sectionruntime: 8m47s,    gctime: 35.8s,     systime: 3.8s.
>>>>>>> >>>>
>>>>>>> >>>> sec_ok: runtime: 10h46m09s,    gctime: 1h07m37s,     systime:
>>>>>>> 5m21s.
>>>>>>> >>>>
>>>>>>> >>>> all_secs_ok: runtime: 4.5s,    gctime: 0.07667s,     systime:
>>>>>>> 0.00333s.
>>>>>>> >>>> prog_to_bytes: runtime: 7m27s,    gctime: 2m23s,     systime:
>>>>>>> 6.3s.
>>>>>>> >>>>
>>>>>>> >>>> flat_bytes: runtime: 45m21s,    gctime: 8m25s,     systime:
>>>>>>> 8.2s.
>>>>>>> >>>> expand_defs: runtime: 49m25s,    gctime: 31m53s,     systime:
>>>>>>> 1.6s.
>>>>>>> >>>>
>>>>>>> >>>> On 7 September 2016 at 08:43, Ramana Kumar <
>>>>>>> Ramana.Kumar at cl.cam.ac.uk>
>>>>>>> >>>> wrote:
>>>>>>> >>>>>
>>>>>>> >>>>> More timing for a later commit. I believe this one includes
>>>>>>> Magnus's
>>>>>>> >>>>> changes to add compile_seqs, which seem not to have had much
>>>>>>> effect. (I'm
>>>>>>> >>>>> hoping I haven't messed up the timing somehow...  it's
>>>>>>> surprising how stable
>>>>>>> >>>>> they are...)
>>>>>>> >>>>>
>>>>>>> >>>>> commit: 83b3aaf
>>>>>>> >>>>>
>>>>>>> >>>>> eval to_mod: runtime: 1m50s,    gctime: 5.8s,     systime:
>>>>>>> 1.0s.
>>>>>>> >>>>> eval to_con: runtime: 8m41s,    gctime: 4.2s,     systime:
>>>>>>> 0.65000s.
>>>>>>> >>>>> eval to_dec: runtime: 11.5s,    gctime: 6.4s,     systime:
>>>>>>> 0.46333s.
>>>>>>> >>>>> eval to_exh: runtime: 3m27s,    gctime: 6.8s,     systime:
>>>>>>> 0.89000s.
>>>>>>> >>>>> eval to_pat: runtime: 14.7s,    gctime: 1.7s,     systime:
>>>>>>> 0.11667s.
>>>>>>> >>>>> eval to_clos: runtime: 5.5s,    gctime: 1.1s,     systime:
>>>>>>> 0.06333s.
>>>>>>> >>>>> eval to_bvl: runtime: 2m29s,    gctime: 5.6s,     systime:
>>>>>>> 0.28333s.
>>>>>>> >>>>> eval to_bvi:
>>>>>>> >>>>> ... inline: runtime: 6.3s,    gctime: 1.3s,     systime:
>>>>>>> 0.06333s.
>>>>>>> >>>>> ... optimise: runtime: 23m26s,    gctime: 28.8s,     systime:
>>>>>>> 6.3s.
>>>>>>> >>>>> ... compile: runtime: 2m18s,    gctime: 4.1s,     systime:
>>>>>>> 0.25000s.
>>>>>>> >>>>> eval to_data: runtime: 8m23s,    gctime: 44.3s,     systime:
>>>>>>> 3.2s.
>>>>>>> >>>>> eval data_to_word: runtime: 8m23s,    gctime: 39.3s,
>>>>>>>  systime: 3.1s.
>>>>>>> >>>>>
>>>>>>> >>>>> word_to_word: runtime: 58m18s,    gctime: 4m53s,     systime:
>>>>>>> 30.1s.
>>>>>>> >>>>>
>>>>>>> >>>>> clash: runtime: 5m01s,    gctime: 18.4s,     systime: 2.5s.
>>>>>>> >>>>> (external oracle) runtime: 2m31s,    gctime: 1.9s,     systime:
>>>>>>> >>>>> 0.20667s.
>>>>>>> >>>>>
>>>>>>> >>>>> chunk: runtime: 1h01m18s,    gctime: 1m13s,     systime: 22.9s.
>>>>>>> >>>>> (chunk results) runtime: 14m26s,    gctime: 5m57s,
>>>>>>>  systime: 9.1s.
>>>>>>> >>>>>
>>>>>>> >>>>> eval word_to_stack: runtime: 1h29m05s,    gctime: 4m38s,
>>>>>>>  systime:
>>>>>>> >>>>> 14.4s.
>>>>>>> >>>>>
>>>>>>> >>>>> stack_alloc: runtime: 23m03s,    gctime: 58.3s,     systime:
>>>>>>> 5.8s.
>>>>>>> >>>>> stack_remove: runtime: 20m09s,    gctime: 1m29s,     systime:
>>>>>>> 6.5s.
>>>>>>> >>>>> stack_names: runtime: 15m05s,    gctime: 41.6s,     systime:
>>>>>>> 3.9s.
>>>>>>> >>>>> stack_to_lab: runtime: 35m11s,    gctime: 2m54s,     systime:
>>>>>>> 12.0s.
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>> On 6 September 2016 at 10:11, Yong Kiam <tanyongkiam at gmail.com>
>>>>>>> wrote:
>>>>>>> >>>>>>
>>>>>>> >>>>>> Hmm, that is surprising. I expected the main function to have
>>>>>>> a good
>>>>>>> >>>>>> number of If-s (which my change should have helped with), but
>>>>>>> it seems like
>>>>>>> >>>>>> that didn't affect the time at all.
>>>>>>> >>>>>>
>>>>>>> >>>>>> Here are two possibilities:
>>>>>>> >>>>>>
>>>>>>> >>>>>> 1) Caching lookups into the colouring function for the chunk
>>>>>>> step. The
>>>>>>> >>>>>> function is defined as an option-lookup into an oracle sptree:
>>>>>>> >>>>>> https://github.com/CakeML/cakeml/blob/9937677d34446adace9f29
>>>>>>> d5719131f9a5b4aeac/compiler/backend/reg_alloc/reg_allocScrip
>>>>>>> t.sml#L110
>>>>>>> >>>>>> I'm not sure how much that will help though, since the
>>>>>>> variable names
>>>>>>> >>>>>> should be in SSA already, so it is mostly looking up
>>>>>>> different things each
>>>>>>> >>>>>> time.
>>>>>>> >>>>>>
>>>>>>> >>>>>> 2) Make splitting more aggressive, i.e. the main function
>>>>>>> should be
>>>>>>> >>>>>> divided into more pieces so that its colouring sptree isn't
>>>>>>> too big (it gets
>>>>>>> >>>>>> larger when there are more variables too).
>>>>>>> >>>>>>
>>>>>>> >>>>>> Also, are you getting some kind of effect where every
>>>>>>> parallel thread
>>>>>>> >>>>>> is just waiting for one to finish (namely the one working on
>>>>>>> the main
>>>>>>> >>>>>> function)? 2) might help divide the work more evenly.
>>>>>>> >>>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>> On Mon, Sep 5, 2016 at 7:45 PM, Ramana Kumar
>>>>>>> >>>>>> <Ramana.Kumar at cl.cam.ac.uk> wrote:
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> Some more timings, for two recent commits. Not a big
>>>>>>> difference, it
>>>>>>> >>>>>>> seems.
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> commit: 69ac2f9
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> eval to_mod: runtime: 1m45s,    gctime: 3.8s,     systime:
>>>>>>> 0.36667s.
>>>>>>> >>>>>>> eval to_con: runtime: 8m36s,    gctime: 5.5s,     systime:
>>>>>>> 0.90667s.
>>>>>>> >>>>>>> eval to_dec: runtime: 7.3s,    gctime: 2.3s,     systime:
>>>>>>> 0.17000s.
>>>>>>> >>>>>>> eval to_exh: runtime: 3m27s,    gctime: 9.5s,     systime:
>>>>>>> 1.1s.
>>>>>>> >>>>>>> eval to_pat: runtime: 13.9s,    gctime: 1.5s,     systime:
>>>>>>> 0.10333s.
>>>>>>> >>>>>>> eval to_clos: runtime: 5.2s,    gctime: 1.1s,     systime:
>>>>>>> 0.07000s.
>>>>>>> >>>>>>> eval to_bvl: runtime: 2m28s,    gctime: 5.5s,     systime:
>>>>>>> 0.22667s.
>>>>>>> >>>>>>> eval to_bvi: ... inline: runtime: 6.0s,    gctime: 1.3s,
>>>>>>>  systime:
>>>>>>> >>>>>>> 0.09000s.
>>>>>>> >>>>>>> ... optimise:runtime: 23m27s,    gctime: 6.9s,     systime:
>>>>>>> 3.2s.
>>>>>>> >>>>>>> ... compile: runtime: 2m33s,    gctime: 19.3s,     systime:
>>>>>>> 1.2s.
>>>>>>> >>>>>>> eval to_data: runtime: 8m29s,    gctime: 51.5s,     systime:
>>>>>>> 3.4s.
>>>>>>> >>>>>>> eval data_to_word: runtime: 8m41s,    gctime: 50.2s,
>>>>>>>  systime:
>>>>>>> >>>>>>> 3.4s.
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> word_to_word: runtime: 58m21s,    gctime: 4m28s,
>>>>>>>  systime: 28.0s.
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> clash: runtime: 4m58s,    gctime: 15.5s,     systime: 2.7s.
>>>>>>> >>>>>>> (external oracle) runtime: 2m32s,    gctime: 1.8s,
>>>>>>>  systime:
>>>>>>> >>>>>>> 0.24000s.
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> chunk: runtime: 1h13m05s,    gctime: 6m06s,     systime:
>>>>>>> 26.9s.
>>>>>>> >>>>>>> (chunk results) runtime: 13m31s,    gctime: 2m47s,
>>>>>>>  systime:
>>>>>>> >>>>>>> 14.4s.
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> eval word_to_stack: runtime: 1h32m52s,    gctime: 6m12s,
>>>>>>>  systime:
>>>>>>> >>>>>>> 12.7s.
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> stack_alloc: runtime: 23m03s,    gctime: 42.2s,     systime:
>>>>>>> 5.8s.
>>>>>>> >>>>>>> stack_remove: runtime: 19m08s,    gctime: 22.7s,
>>>>>>>  systime: 4.3s.
>>>>>>> >>>>>>> stack_names: runtime: 15m45s,    gctime: 1m05s,     systime:
>>>>>>> 5.3s.
>>>>>>> >>>>>>> stack_to_lab: runtime: 41m59s,    gctime: 9m22s,
>>>>>>>  systime: 8.6s.
>>>>>>> >>>>>>>
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> commit: 9937677
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> eval to_mod: runtime: 1m51s,    gctime: 5.9s,     systime:
>>>>>>> 0.90000s.
>>>>>>> >>>>>>> eval to_con: runtime: 8m48s,    gctime: 4.4s,     systime:
>>>>>>> 0.71667s.
>>>>>>> >>>>>>> eval to_dec: runtime: 7.2s,    gctime: 2.3s,     systime:
>>>>>>> 0.15667s.
>>>>>>> >>>>>>> eval to_exh: runtime: 3m30s,    gctime: 9.5s,     systime:
>>>>>>> 1.2s.
>>>>>>> >>>>>>> eval to_pat: runtime: 14.3s,    gctime: 1.6s,     systime:
>>>>>>> 0.09000s.
>>>>>>> >>>>>>> eval to_clos: runtime: 5.2s,    gctime: 0.95667s,
>>>>>>>  systime:
>>>>>>> >>>>>>> 0.07333s.
>>>>>>> >>>>>>> eval to_bvl: runtime: 2m28s,    gctime: 5.2s,     systime:
>>>>>>> 0.32667s.
>>>>>>> >>>>>>> eval to_bvi: ... inline: runtime: 5.8s,    gctime: 0.88000s,
>>>>>>> >>>>>>> systime: 0.05333s.
>>>>>>> >>>>>>> ... optimise: runtime: 23m24s,    gctime: 5.8s,     systime:
>>>>>>> 3.0s.
>>>>>>> >>>>>>> ... compile: runtime: 2m27s,    gctime: 12.5s,     systime:
>>>>>>> 0.78333s.
>>>>>>> >>>>>>> eval to_data: runtime: 8m18s,    gctime: 34.2s,     systime:
>>>>>>> 3.0s.
>>>>>>> >>>>>>> eval data_to_word: runtime: 8m24s,    gctime: 37.4s,
>>>>>>>  systime:
>>>>>>> >>>>>>> 2.8s.
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> word_to_word: runtime: 57m40s,    gctime: 4m56s,
>>>>>>>  systime: 30.1s.
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> clash: runtime: 5m12s,    gctime: 33.6s,     systime: 3.9s.
>>>>>>> >>>>>>> (external oracle) runtime: 2m31s,    gctime: 1.8s,
>>>>>>>  systime:
>>>>>>> >>>>>>> 0.15667s.
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> chunk: runtime: 1h07m54s,    gctime: 6m07s,     systime:
>>>>>>> 29.5s.
>>>>>>> >>>>>>> (chunk results) runtime: 13m21s,    gctime: 2m54s,
>>>>>>>  systime:
>>>>>>> >>>>>>> 14.0s.
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> eval word_to_stack: runtime: 1h34m00s,    gctime: 7m32s,
>>>>>>>  systime:
>>>>>>> >>>>>>> 8.1s.
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> stack_alloc: runtime: 22m35s,    gctime: 18.7s,     systime:
>>>>>>> 3.7s.
>>>>>>> >>>>>>> stack_remove: runtime: 19m04s,    gctime: 16.7s,
>>>>>>>  systime: 3.9s.
>>>>>>> >>>>>>> stack_names: runtime: 15m13s,    gctime: 37.8s,     systime:
>>>>>>> 4.0s.
>>>>>>> >>>>>>> stack_to_lab: runtime: 36m21s,    gctime: 3m29s,
>>>>>>>  systime: 13.9s.
>>>>>>> >>>>>>>
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> On 1 September 2016 at 09:04, Ramana Kumar
>>>>>>> >>>>>>> <Ramana.Kumar at cl.cam.ac.uk> wrote:
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> Also, updated timings for the middle bit, as of 4d0d56b
>>>>>>> (which is
>>>>>>> >>>>>>>> also for the front half timings in my previous email).
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> eval data_to_word: runtime: 7m42s,    gctime: 35.7s,
>>>>>>>  systime:
>>>>>>> >>>>>>>> 2.7s.
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> word_to_word: runtime: 56m07s,    gctime: 4m48s,
>>>>>>>  systime: 28.7s.
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> clash: runtime: 4m48s,    gctime: 16.0s,     systime: 2.2s.
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> (external oracle) runtime: 2m21s,    gctime: 1.7s,
>>>>>>>  systime:
>>>>>>> >>>>>>>> 0.10667s.
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> chunk: runtime: 1h10m00s,    gctime: 6m06s,     systime:
>>>>>>> 27.1s.
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> (use chunk) runtime: 12m26s,    gctime: 2m29s,     systime:
>>>>>>> 11.0s.
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> eval word_to_stack: runtime: 1h29m28s,    gctime: 7m23s,
>>>>>>> >>>>>>>> systime: 8.0s.
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> stack_alloc: runtime: 21m13s,    gctime: 24.3s,
>>>>>>>  systime: 3.9s.
>>>>>>> >>>>>>>> stack_remove: runtime: 18m12s,    gctime: 46.6s,
>>>>>>>  systime: 4.3s.
>>>>>>> >>>>>>>> stack_names: runtime: 14m08s,    gctime: 32.1s,
>>>>>>>  systime: 4.1s.
>>>>>>> >>>>>>>> stack_to_lab: runtime: 36m41s,    gctime: 1m19s,
>>>>>>>  systime: 10.6s.
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> It doesn't look like 91d539b made a huge difference.
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> On 1 September 2016 at 08:42, Ramana Kumar
>>>>>>> >>>>>>>> <Ramana.Kumar at cl.cam.ac.uk> wrote:
>>>>>>> >>>>>>>>>
>>>>>>> >>>>>>>>> I have some timings for the front half of the backend now:
>>>>>>> >>>>>>>>> eval to_mod: runtime: 1m45s,    gctime: 9.6s,     systime:
>>>>>>> >>>>>>>>> 0.40000s.
>>>>>>> >>>>>>>>> eval to_con: runtime: 7m00s,    gctime: 5.4s,     systime:
>>>>>>> >>>>>>>>> 0.80667s.
>>>>>>> >>>>>>>>> eval to_dec: runtime: 7.0s,    gctime: 2.3s,     systime:
>>>>>>> 0.13333s.
>>>>>>> >>>>>>>>> eval to_exh: runtime: 3m18s,    gctime: 4.5s,     systime:
>>>>>>> >>>>>>>>> 0.67000s.
>>>>>>> >>>>>>>>> eval to_pat: runtime: 15.9s,    gctime: 3.7s,     systime:
>>>>>>> >>>>>>>>> 0.50667s.
>>>>>>> >>>>>>>>> eval to_clos: runtime: 5.0s,    gctime: 1.1s,     systime:
>>>>>>> >>>>>>>>> 0.07000s.
>>>>>>> >>>>>>>>> eval to_bvl: runtime: 2m18s,    gctime: 5.2s,     systime:
>>>>>>> >>>>>>>>> 0.36000s.
>>>>>>> >>>>>>>>> eval to_bvi:
>>>>>>> >>>>>>>>> ... inline: runtime: 5.8s,    gctime: 1.1s,     systime:
>>>>>>> 0.07667s.
>>>>>>> >>>>>>>>> ... optimise: runtime: 22m52s,    gctime: 30.1s,
>>>>>>>  systime: 6.1s.
>>>>>>> >>>>>>>>> ... compile: runtime: 2m05s,    gctime: 6.9s,     systime:
>>>>>>> >>>>>>>>> 0.72000s.
>>>>>>> >>>>>>>>> eval to_data: runtime: 5m44s,    gctime: 28.8s,
>>>>>>>  systime: 2.1s.
>>>>>>> >>>>>>>>>
>>>>>>> >>>>>>>>> I'm not sure why there are suspicious timings for things
>>>>>>> like
>>>>>>> >>>>>>>>> stack_alloc. One possibility is that the "time" function
>>>>>>> adds up times
>>>>>>> >>>>>>>>> across all parallel threads. In that case, it would make
>>>>>>> sense to divide the
>>>>>>> >>>>>>>>> time by 8. I would need to investigate whether this is
>>>>>>> actually what's going
>>>>>>> >>>>>>>>> on...
>>>>>>> >>>>>>>>>
>>>>>>> >>>>>>>>> On 30 August 2016 at 19:36, Magnus Myreen <
>>>>>>> magnus.myreen at gmail.com>
>>>>>>> >>>>>>>>> wrote:
>>>>>>> >>>>>>>>>>
>>>>>>> >>>>>>>>>> My guess is that evaluation times get slower once we pass
>>>>>>> dataLang
>>>>>>> >>>>>>>>>> because the programs become a lot larger once the data
>>>>>>> abstraction
>>>>>>> >>>>>>>>>> has
>>>>>>> >>>>>>>>>> been removed.
>>>>>>> >>>>>>>>>>
>>>>>>> >>>>>>>>>> There are however some suspicious looking timings here. I
>>>>>>> note
>>>>>>> >>>>>>>>>> that
>>>>>>> >>>>>>>>>> stack_alloc does near to nothing, but still takes 31
>>>>>>> minutes to
>>>>>>> >>>>>>>>>> run.
>>>>>>> >>>>>>>>>> Here is stack_alloc's main compiler function:
>>>>>>> >>>>>>>>>>
>>>>>>> >>>>>>>>>>
>>>>>>> >>>>>>>>>> https://github.com/CakeML/cake
>>>>>>> ml/blob/master/compiler/backend/stack_allocScript.sml#L166
>>>>>>> >>>>>>>>>>
>>>>>>> >>>>>>>>>> Cheers,
>>>>>>> >>>>>>>>>> Magnus
>>>>>>> >>>>>>>>>>
>>>>>>> >>>>>>>>>> On 30 August 2016 at 03:43, Ramana Kumar
>>>>>>> >>>>>>>>>> <Ramana.Kumar at cl.cam.ac.uk> wrote:
>>>>>>> >>>>>>>>>> > I have some information now on where slow parts of the
>>>>>>> bootstrap
>>>>>>> >>>>>>>>>> > evaluation
>>>>>>> >>>>>>>>>> > are, which we could use to direct effort in case we
>>>>>>> want to
>>>>>>> >>>>>>>>>> > speed things up.
>>>>>>> >>>>>>>>>> >
>>>>>>> >>>>>>>>>> > Currently I have timings for between dataLang and
>>>>>>> labLang. (I
>>>>>>> >>>>>>>>>> > know the
>>>>>>> >>>>>>>>>> > phases above dataLang are mostly quick, and the ones
>>>>>>> after
>>>>>>> >>>>>>>>>> > labLang are
>>>>>>> >>>>>>>>>> > mostly slow, but I will have to get actual numbers
>>>>>>> later)
>>>>>>> >>>>>>>>>> >
>>>>>>> >>>>>>>>>> > eval data_to_word: runtime: 7m48s,    gctime: 21.9s,
>>>>>>> >>>>>>>>>> > systime: 2.1s.
>>>>>>> >>>>>>>>>> >
>>>>>>> >>>>>>>>>> > word_to_word: runtime: 55m07s,    gctime: 5m31s,
>>>>>>>  systime:
>>>>>>> >>>>>>>>>> > 30.9s.
>>>>>>> >>>>>>>>>> >
>>>>>>> >>>>>>>>>> > clash: runtime: 5m33s,    gctime: 34.4s,     systime:
>>>>>>> 4.4s.
>>>>>>> >>>>>>>>>> >
>>>>>>> >>>>>>>>>> > (external oracle) runtime: 2m30s,    gctime: 12.1s,
>>>>>>>  systime:
>>>>>>> >>>>>>>>>> > 1.5s
>>>>>>> >>>>>>>>>> >
>>>>>>> >>>>>>>>>> > chunk: runtime: 1h04m14s,    gctime: 1m37s,
>>>>>>>  systime: 20.8s.
>>>>>>> >>>>>>>>>> > (use results of chunk) runtime: 11m21s,    gctime:
>>>>>>> 2m00s,
>>>>>>> >>>>>>>>>> > systime:
>>>>>>> >>>>>>>>>> > 10.1s.
>>>>>>> >>>>>>>>>> >
>>>>>>> >>>>>>>>>> > eval word_to_stack: runtime: 1h34m21s,    gctime: 9m42s,
>>>>>>> >>>>>>>>>> > systime: 9.4s.
>>>>>>> >>>>>>>>>> >
>>>>>>> >>>>>>>>>> > stack_alloc: runtime: 31m55s,    gctime: 30.3s,
>>>>>>>  systime:
>>>>>>> >>>>>>>>>> > 5.4s.
>>>>>>> >>>>>>>>>> >
>>>>>>> >>>>>>>>>> > stack_remove: runtime: 25m43s,    gctime: 1m32s,
>>>>>>>  systime:
>>>>>>> >>>>>>>>>> > 6.9s.
>>>>>>> >>>>>>>>>> >
>>>>>>> >>>>>>>>>> > stack_names: runtime: 22m03s,    gctime: 3m01s,
>>>>>>>  systime:
>>>>>>> >>>>>>>>>> > 6.6s.
>>>>>>> >>>>>>>>>> >
>>>>>>> >>>>>>>>>> > stack_to_lab: runtime: 56m00s,    gctime: 4m38s,
>>>>>>>  systime:
>>>>>>> >>>>>>>>>> > 23.8s.
>>>>>>> >>>>>>>>>> >
>>>>>>> >>>>>>>>>> > What exactly these different phases are doing can be
>>>>>>> found by
>>>>>>> >>>>>>>>>> > looking at
>>>>>>> >>>>>>>>>> >
>>>>>>> >>>>>>>>>> > https://github.com/CakeML/cake
>>>>>>> ml/blob/master/compiler/bootstrap/evaluation/to_lab_x64Boots
>>>>>>> trapScript.sml.
>>>>>>> >>>>>>>>>> > I think the one confusingly called "chunk" means
>>>>>>> checking the
>>>>>>> >>>>>>>>>> > results of the
>>>>>>> >>>>>>>>>> > external reg. alloc oracle.
>>>>>>> >>>>>>>>>> >
>>>>>>> >>>>>>>>>> > _______________________________________________
>>>>>>> >>>>>>>>>> > Dev mailing list
>>>>>>> >>>>>>>>>> > Dev at cakeml.org
>>>>>>> >>>>>>>>>> > https://lists.cakeml.org/listinfo/dev
>>>>>>> >>>>>>>>>> >
>>>>>>> >>>>>>>>>
>>>>>>> >>>>>>>>>
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> _______________________________________________
>>>>>>> >>>>>>> Dev mailing list
>>>>>>> >>>>>>> Dev at cakeml.org
>>>>>>> >>>>>>> https://lists.cakeml.org/listinfo/dev
>>>>>>> >>>>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>
>>>>>>> >>>>
>>>>>>> >>>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> _______________________________________________
>>>>>>> >> Developers mailing list
>>>>>>> >> Developers at cakeml.org
>>>>>>> >> https://lists.cakeml.org/listinfo/developers
>>>>>>> >>
>>>>>>> >
>>>>>>> >
>>>>>>> > _______________________________________________
>>>>>>> > Developers mailing list
>>>>>>> > Developers at cakeml.org
>>>>>>> > https://lists.cakeml.org/listinfo/developers
>>>>>>> >
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cakeml.org/pipermail/developers/attachments/20161126/ddb5824a/attachment-0001.html>


More information about the Developers mailing list