<div dir="ltr">I think there may also be a problem since data_to_word got merged (i.e., as of 277495c) with the inst,ssa,two-reg part - it's taking much longer than expected.<br></div><div class="gmail_extra"><br><div class="gmail_quote">On 23 November 2016 at 11:45, Yong Kiam <span dir="ltr"><<a href="mailto:tanyongkiam@gmail.com" target="_blank">tanyongkiam@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div><div><div><div><div>I went through the commit history on github, and here is a list of changes since 1a939d3 that looked like they might affect something (based on the title):<br><br></div></div></div></div></div>55d5fc0: Update unverified register allocator<br></div><div>63cbd66: Removed asm_ok<br></div>98188ad: Changed default bvl_to_bvi configurations<br></div><div>66ddd3d: exh_reorder<br></div>8c2f5ae: load_opt<br><div><div><div><div><div><div><div><div><br></div><div>I also found that the eval/x64 directory could probably be deleted (the default configs for every backend is also defined in targets/configTheory). The only difference in the x64_compiler_config defined there and the one in configTheory is pad_bits for data_to_word though. I think it is only there because otherwise all the backends have to be built to eval anything using the x64 configuration.<br><br></div><div>I also reduced the value for exp_cut locally on the nqueens benchmark, but I didn't any meaningful difference in compile times (the benchmark is probably too small though).<br><br></div><div><br></div></div></div></div></div></div></div></div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Nov 22, 2016 at 4:42 PM, Ramana Kumar <span dir="ltr"><<a href="mailto:Ramana.Kumar@cl.cam.ac.uk" target="_blank">Ramana.Kumar@cl.cam.ac.uk</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Preliminary results with inline_size_limit set back down to 3 indicate that word_to_stack is still taking longer than before, specifically 3.5 hours as opposed to the 1.5 hours of two months ago. Could the exp_cut parameter also be relevant?<br></div><div class="m_4046598807805666937HOEnZb"><div class="m_4046598807805666937h5"><div class="gmail_extra"><br><div class="gmail_quote">On 22 November 2016 at 16:30, Ramana Kumar <span dir="ltr"><<a href="mailto:Ramana.Kumar@cl.cam.ac.uk" target="_blank">Ramana.Kumar@cl.cam.ac.uk</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>Some updated times (now included program sizes) are up (<a href="https://cakeml.org/bootstrap-timing.txt" target="_blank">https://cakeml.org/<span class="m_4046598807805666937m_-3216512473160185274m_4821345270521482447il">bootstrap</span>-<span class="m_4046598807805666937m_-3216512473160185274m_4821345270521482447il"><wbr>timing</span>.txt</a>). Things are still surprisingly slow (word_to_stack, sec_ok) - guess is still that it's because of inline limit, so the next thing to try is changing the inline limit..<br><br></div>Is conf_ok required if I change the inline_limit just for the bootstrap? Isn't it easy to show that conf_ok isn't affected by that config field?<br></div><div class="m_4046598807805666937m_-3216512473160185274HOEnZb"><div class="m_4046598807805666937m_-3216512473160185274h5"><div class="gmail_extra"><br><div class="gmail_quote">On 18 November 2016 at 10:39, Yong Kiam <span dir="ltr"><<a href="mailto:tanyongkiam@gmail.com" target="_blank">tanyongkiam@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>It shouldn't be very hard... I just thought it wouldn't be high priority since the remaining ones are only in LabAsms.<br><br></div>Changing the config there seems right. You won't get a conf_ok proof about it though.<br></div><div class="m_4046598807805666937m_-3216512473160185274m_4821345270521482447HOEnZb"><div class="m_4046598807805666937m_-3216512473160185274m_4821345270521482447h5"><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Nov 17, 2016 at 5:46 PM, Ramana Kumar <span dir="ltr"><<a href="mailto:Ramana.Kumar@cl.cam.ac.uk" target="_blank">Ramana.Kumar@cl.cam.ac.uk</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div>I've seen the word_to_stack slowdown twice (although it only shows up in the timings document once - I didn't record the other data).<br><br></div>I think removing more asm_ok is always a good idea, but I'm not sure how much priority it should be given - how hard is it to remove more?<br><br></div>I guess after the current run finishes I could try changing the inline limit (I would do this by using a different config here <a href="https://github.com/CakeML/cakeml/blob/master/compiler/bootstrap/evaluation/to_lab_x64BootstrapScript.sml#L83" target="_blank">https://github.com/CakeML/cake<wbr>ml/blob/master/compiler/bootst<wbr>rap/evaluation/to_lab_x64Boots<wbr>trapScript.sml#L83</a> - does that seem right?)<br></div><div class="m_4046598807805666937m_-3216512473160185274m_4821345270521482447m_-5954245915293851245HOEnZb"><div class="m_4046598807805666937m_-3216512473160185274m_4821345270521482447m_-5954245915293851245h5"><div class="gmail_extra"><br><div class="gmail_quote">On 18 November 2016 at 09:36, Magnus Myreen <span dir="ltr"><<a href="mailto:magnus.myreen@gmail.com" target="_blank">magnus.myreen@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I think the inline_size_limit of 10 is potentially risky. I suggest<br>
the bootstrap is run with a lower setting, e.g. 2 or 3. -- Magnus<br>
<div class="m_4046598807805666937m_-3216512473160185274m_4821345270521482447m_-5954245915293851245m_-2552985646971575345HOEnZb"><div class="m_4046598807805666937m_-3216512473160185274m_4821345270521482447m_-5954245915293851245m_-2552985646971575345h5"><br>
On 18 November 2016 at 09:26, Yong Kiam <<a href="mailto:tanyongkiam@gmail.com" target="_blank">tanyongkiam@gmail.com</a>> wrote:<br>
> Argh, that's a shame. Would moving more of asm_ok up help? There are still<br>
> some things in LabAsms that can be removed.<br>
><br>
> I'm not sure know why word_to_stack -specifically- would slow down by so<br>
> much. Could it be a one-off thing or did you hit that slowdown multiple<br>
> times already?<br>
><br>
> Another possibility: changes to the default compiler configuration could<br>
> have slowed it down.<br>
><br>
> For example this:<br>
> <a href="https://github.com/CakeML/cakeml/commit/98188addc34e5c3a034572c10eca7b3b41aff71c" rel="noreferrer" target="_blank">https://github.com/CakeML/cake<wbr>ml/commit/98188addc34e5c3a0345<wbr>72c10eca7b3b41aff71c</a><br>
><br>
> (e.g. maybe more things got inlined)<br>
><br>
> On Thu, Nov 17, 2016 at 5:09 PM, Ramana Kumar <<a href="mailto:Ramana.Kumar@cl.cam.ac.uk" target="_blank">Ramana.Kumar@cl.cam.ac.uk</a>><br>
> wrote:<br>
>><br>
>> I will put the timings from various runs of the bootstrap on this web page<br>
>> so we can more easily refer to them:<br>
>> <a href="https://cakeml.org/bootstrap-timing.txt" rel="noreferrer" target="_blank">https://cakeml.org/bootstrap-t<wbr>iming.txt</a>.<br>
>> I've added partial timings for a recent revision (still waiting on the<br>
>> rest).<br>
>> There is currently a problem with word_to_stack, which is now taking 5<br>
>> hours where it used to take 1.<br>
>> And given that it seems to be stuck on "sec_ok" I'm wondering how much the<br>
>> asm_ok Asm check removal has really helped... I guess it was supposed to<br>
>> bring it down to 6h, so we'll see...<br>
>><br>
>> On 29 September 2016 at 10:47, Ramana Kumar <<a href="mailto:Ramana.Kumar@cl.cam.ac.uk" target="_blank">Ramana.Kumar@cl.cam.ac.uk</a>><br>
>> wrote:<br>
>>><br>
>>> I have tried running the "sec_ok" part of the bootstrap with the asm_ok<br>
>>> checks for Asm lines turned off. Two things to report:<br>
>>><br>
>>> The time is significantly improved:<br>
>>> runtime: 6h02m19s,    gctime: 30m06s,     systime: 3m11s.<br>
>>> (compared to previously: 10h46m09s,    gctime: 1h07m37s,     systime:<br>
>>> 5m21s.)<br>
>>><br>
>>> Secondly, this timing information is bogus: the wall clock time was<br>
>>> definitely under 2 hours (I didn't check precisely). I expect timing for<br>
>>> threads in parallel execution are added sequentially. So for the parts of<br>
>>> the bootstrap run in parallel, the timings probably need to be divided by up<br>
>>> to 8 to get an accurate picture. Perhaps I should annotate which ones are in<br>
>>> parallel and which not.<br>
>>><br>
>>> I think it's worthwhile removing the asm_ok checks. I can help doing so<br>
>>> if someone else sets up some cheated theorems to fill in.<br>
>>><br>
>>> On 8 September 2016 at 14:33, Ramana Kumar <<a href="mailto:Ramana.Kumar@cl.cam.ac.uk" target="_blank">Ramana.Kumar@cl.cam.ac.uk</a>><br>
>>> wrote:<br>
>>>><br>
>>>> Here are some timings now including the last step (x64Bootstrap).<br>
>>>> (The commit number doesn't exactly match the output labels, since I<br>
>>>> changed some of that (about to be committed) in the middle.)<br>
>>>><br>
>>>> Here we can see that the biggest problem (>10 hours) is the checking<br>
>>>> that all the lines in the final ASM code satisfy line_ok_light (i.e., the<br>
>>>> "sec_ok" thing below). But there are a few steps that take 1-2 hours each,<br>
>>>> and then a few more that take 20-40 mins, so ... definitely more than an<br>
>>>> overnight job still...<br>
>>>><br>
>>>> 1a939d3<br>
>>>><br>
>>>> eval to_mod: runtime: 1m49s,    gctime: 3.4s,     systime: 0.35667s.<br>
>>>> eval to_con: runtime: 8m56s,    gctime: 5.5s,     systime: 0.84000s.<br>
>>>> eval to_dec: runtime: 7.4s,    gctime: 2.4s,     systime: 0.15667s.<br>
>>>> eval to_exh: runtime: 3m32s,    gctime: 8.9s,     systime: 0.68667s.<br>
>>>> eval to_pat: runtime: 14.1s,    gctime: 1.4s,     systime: 0.06667s.<br>
>>>> eval to_clos: runtime: 5.3s,    gctime: 1.1s,     systime: 0.05667s.<br>
>>>> eval to_bvl: runtime: 2m36s,    gctime: 12.6s,     systime: 0.88000s.<br>
>>>> eval to_bvi:<br>
>>>> ... inline: runtime: 6.1s,    gctime: 1.2s,     systime: 0.09333s.<br>
>>>> ... optimise: runtime: 24m53s,    gctime: 1m53s,     systime: 17.1s.<br>
>>>> ... compile: runtime: 2m27s,    gctime: 11.9s,     systime: 1.5s.<br>
>>>> eval to_data: runtime: 8m11s,    gctime: 21.4s,     systime: 1.2s.<br>
>>>> eval data_to_word: runtime: 8m28s,    gctime: 30.4s,     systime: 2.4s.<br>
>>>><br>
>>>> word_to_word: runtime: 1h00m52s,    gctime: 5m53s,     systime: 34.1s.<br>
>>>><br>
>>>> clash: runtime: 5m06s,    gctime: 22.3s,     systime: 2.8s.<br>
>>>> (external oracle) runtime: 2m31s,    gctime: 1.4s,     systime:<br>
>>>> 0.77667s.<br>
>>>><br>
>>>> chunk: runtime: 1h08m52s,    gctime: 6m06s,     systime: 28.2s.<br>
>>>> (chunk results) runtime: 11m49s,    gctime: 1m13s,     systime: 7.2s.<br>
>>>><br>
>>>> eval word_to_stack: runtime: 1h35m38s,    gctime: 3m53s,     systime:<br>
>>>> 9.4s.<br>
>>>><br>
>>>> stack_alloc: runtime: 23m27s,    gctime: 45.7s,     systime: 6.7s.<br>
>>>> stack_remove: runtime: 19m15s,    gctime: 21.5s,     systime: 4.7s.<br>
>>>> stack_names: runtime: 15m20s,    gctime: 24.9s,     systime: 3.2s.<br>
>>>> stack_to_lab: runtime: 42m12s,    gctime: 9m27s,     systime: 10.4s.<br>
>>>><br>
>>>> filter_skip: runtime: 3m56s,    gctime: 24.7s,     systime: 3.2s.<br>
>>>> ffi_limit: runtime: 2m33s,    gctime: 1m39s,     systime: 3.2s.<br>
>>>> enc_sec: runtime: 25m01s,    gctime: 1m40s,     systime: 11.0s.<br>
>>>> sec_length: runtime: 4m11s,    gctime: 48.9s,     systime: 3.5s.<br>
>>>><br>
>>>> compute_labels: runtime: 1h17m12s,    gctime: 13m50s,     systime:<br>
>>>> 58.7s.<br>
>>>><br>
>>>> enc_secs_again: runtime: 2h00m15s,    gctime: 24m27s,     systime:<br>
>>>> 1m46s.<br>
>>>><br>
>>>> upd_lab_len: runtime: 5m51s,    gctime: 55.9s,     systime: 3.4s.<br>
>>>> sec_length2: runtime: 3m32s,    gctime: 8.4s,     systime: 1.2s.<br>
>>>><br>
>>>> compute_labels2: runtime: 1h27m09s,    gctime: 22m29s,     systime:<br>
>>>> 1m17s.<br>
>>>><br>
>>>> enc_secs_again2: runtime: 2h37m41s,    gctime: 1h03m15s,     systime:<br>
>>>> 3m01s.<br>
>>>><br>
>>>> pad_sectionruntime: 8m47s,    gctime: 35.8s,     systime: 3.8s.<br>
>>>><br>
>>>> sec_ok: runtime: 10h46m09s,    gctime: 1h07m37s,     systime: 5m21s.<br>
>>>><br>
>>>> all_secs_ok: runtime: 4.5s,    gctime: 0.07667s,     systime: 0.00333s.<br>
>>>> prog_to_bytes: runtime: 7m27s,    gctime: 2m23s,     systime: 6.3s.<br>
>>>><br>
>>>> flat_bytes: runtime: 45m21s,    gctime: 8m25s,     systime: 8.2s.<br>
>>>> expand_defs: runtime: 49m25s,    gctime: 31m53s,     systime: 1.6s.<br>
>>>><br>
>>>> On 7 September 2016 at 08:43, Ramana Kumar <<a href="mailto:Ramana.Kumar@cl.cam.ac.uk" target="_blank">Ramana.Kumar@cl.cam.ac.uk</a>><br>
>>>> wrote:<br>
>>>>><br>
>>>>> More timing for a later commit. I believe this one includes Magnus's<br>
>>>>> changes to add compile_seqs, which seem not to have had much effect. (I'm<br>
>>>>> hoping I haven't messed up the timing somehow...  it's surprising how stable<br>
>>>>> they are...)<br>
>>>>><br>
>>>>> commit: 83b3aaf<br>
>>>>><br>
>>>>> eval to_mod: runtime: 1m50s,    gctime: 5.8s,     systime: 1.0s.<br>
>>>>> eval to_con: runtime: 8m41s,    gctime: 4.2s,     systime: 0.65000s.<br>
>>>>> eval to_dec: runtime: 11.5s,    gctime: 6.4s,     systime: 0.46333s.<br>
>>>>> eval to_exh: runtime: 3m27s,    gctime: 6.8s,     systime: 0.89000s.<br>
>>>>> eval to_pat: runtime: 14.7s,    gctime: 1.7s,     systime: 0.11667s.<br>
>>>>> eval to_clos: runtime: 5.5s,    gctime: 1.1s,     systime: 0.06333s.<br>
>>>>> eval to_bvl: runtime: 2m29s,    gctime: 5.6s,     systime: 0.28333s.<br>
>>>>> eval to_bvi:<br>
>>>>> ... inline: runtime: 6.3s,    gctime: 1.3s,     systime: 0.06333s.<br>
>>>>> ... optimise: runtime: 23m26s,    gctime: 28.8s,     systime: 6.3s.<br>
>>>>> ... compile: runtime: 2m18s,    gctime: 4.1s,     systime: 0.25000s.<br>
>>>>> eval to_data: runtime: 8m23s,    gctime: 44.3s,     systime: 3.2s.<br>
>>>>> eval data_to_word: runtime: 8m23s,    gctime: 39.3s,     systime: 3.1s.<br>
>>>>><br>
>>>>> word_to_word: runtime: 58m18s,    gctime: 4m53s,     systime: 30.1s.<br>
>>>>><br>
>>>>> clash: runtime: 5m01s,    gctime: 18.4s,     systime: 2.5s.<br>
>>>>> (external oracle) runtime: 2m31s,    gctime: 1.9s,     systime:<br>
>>>>> 0.20667s.<br>
>>>>><br>
>>>>> chunk: runtime: 1h01m18s,    gctime: 1m13s,     systime: 22.9s.<br>
>>>>> (chunk results) runtime: 14m26s,    gctime: 5m57s,     systime: 9.1s.<br>
>>>>><br>
>>>>> eval word_to_stack: runtime: 1h29m05s,    gctime: 4m38s,     systime:<br>
>>>>> 14.4s.<br>
>>>>><br>
>>>>> stack_alloc: runtime: 23m03s,    gctime: 58.3s,     systime: 5.8s.<br>
>>>>> stack_remove: runtime: 20m09s,    gctime: 1m29s,     systime: 6.5s.<br>
>>>>> stack_names: runtime: 15m05s,    gctime: 41.6s,     systime: 3.9s.<br>
>>>>> stack_to_lab: runtime: 35m11s,    gctime: 2m54s,     systime: 12.0s.<br>
>>>>><br>
>>>>><br>
>>>>> On 6 September 2016 at 10:11, Yong Kiam <<a href="mailto:tanyongkiam@gmail.com" target="_blank">tanyongkiam@gmail.com</a>> wrote:<br>
>>>>>><br>
>>>>>> Hmm, that is surprising. I expected the main function to have a good<br>
>>>>>> number of If-s (which my change should have helped with), but it seems like<br>
>>>>>> that didn't affect the time at all.<br>
>>>>>><br>
>>>>>> Here are two possibilities:<br>
>>>>>><br>
>>>>>> 1) Caching lookups into the colouring function for the chunk step. The<br>
>>>>>> function is defined as an option-lookup into an oracle sptree:<br>
>>>>>> <a href="https://github.com/CakeML/cakeml/blob/9937677d34446adace9f29d5719131f9a5b4aeac/compiler/backend/reg_alloc/reg_allocScript.sml#L110" rel="noreferrer" target="_blank">https://github.com/CakeML/cake<wbr>ml/blob/9937677d34446adace9f29<wbr>d5719131f9a5b4aeac/compiler/ba<wbr>ckend/reg_alloc/reg_allocScrip<wbr>t.sml#L110</a><br>
>>>>>> I'm not sure how much that will help though, since the variable names<br>
>>>>>> should be in SSA already, so it is mostly looking up different things each<br>
>>>>>> time.<br>
>>>>>><br>
>>>>>> 2) Make splitting more aggressive, i.e. the main function should be<br>
>>>>>> divided into more pieces so that its colouring sptree isn't too big (it gets<br>
>>>>>> larger when there are more variables too).<br>
>>>>>><br>
>>>>>> Also, are you getting some kind of effect where every parallel thread<br>
>>>>>> is just waiting for one to finish (namely the one working on the main<br>
>>>>>> function)? 2) might help divide the work more evenly.<br>
>>>>>><br>
>>>>>><br>
>>>>>> On Mon, Sep 5, 2016 at 7:45 PM, Ramana Kumar<br>
>>>>>> <<a href="mailto:Ramana.Kumar@cl.cam.ac.uk" target="_blank">Ramana.Kumar@cl.cam.ac.uk</a>> wrote:<br>
>>>>>>><br>
>>>>>>> Some more timings, for two recent commits. Not a big difference, it<br>
>>>>>>> seems.<br>
>>>>>>><br>
>>>>>>> commit: 69ac2f9<br>
>>>>>>><br>
>>>>>>> eval to_mod: runtime: 1m45s,    gctime: 3.8s,     systime: 0.36667s.<br>
>>>>>>> eval to_con: runtime: 8m36s,    gctime: 5.5s,     systime: 0.90667s.<br>
>>>>>>> eval to_dec: runtime: 7.3s,    gctime: 2.3s,     systime: 0.17000s.<br>
>>>>>>> eval to_exh: runtime: 3m27s,    gctime: 9.5s,     systime: 1.1s.<br>
>>>>>>> eval to_pat: runtime: 13.9s,    gctime: 1.5s,     systime: 0.10333s.<br>
>>>>>>> eval to_clos: runtime: 5.2s,    gctime: 1.1s,     systime: 0.07000s.<br>
>>>>>>> eval to_bvl: runtime: 2m28s,    gctime: 5.5s,     systime: 0.22667s.<br>
>>>>>>> eval to_bvi: ... inline: runtime: 6.0s,    gctime: 1.3s,     systime:<br>
>>>>>>> 0.09000s.<br>
>>>>>>> ... optimise:runtime: 23m27s,    gctime: 6.9s,     systime: 3.2s.<br>
>>>>>>> ... compile: runtime: 2m33s,    gctime: 19.3s,     systime: 1.2s.<br>
>>>>>>> eval to_data: runtime: 8m29s,    gctime: 51.5s,     systime: 3.4s.<br>
>>>>>>> eval data_to_word: runtime: 8m41s,    gctime: 50.2s,     systime:<br>
>>>>>>> 3.4s.<br>
>>>>>>><br>
>>>>>>> word_to_word: runtime: 58m21s,    gctime: 4m28s,     systime: 28.0s.<br>
>>>>>>><br>
>>>>>>> clash: runtime: 4m58s,    gctime: 15.5s,     systime: 2.7s.<br>
>>>>>>> (external oracle) runtime: 2m32s,    gctime: 1.8s,     systime:<br>
>>>>>>> 0.24000s.<br>
>>>>>>><br>
>>>>>>> chunk: runtime: 1h13m05s,    gctime: 6m06s,     systime: 26.9s.<br>
>>>>>>> (chunk results) runtime: 13m31s,    gctime: 2m47s,     systime:<br>
>>>>>>> 14.4s.<br>
>>>>>>><br>
>>>>>>> eval word_to_stack: runtime: 1h32m52s,    gctime: 6m12s,     systime:<br>
>>>>>>> 12.7s.<br>
>>>>>>><br>
>>>>>>> stack_alloc: runtime: 23m03s,    gctime: 42.2s,     systime: 5.8s.<br>
>>>>>>> stack_remove: runtime: 19m08s,    gctime: 22.7s,     systime: 4.3s.<br>
>>>>>>> stack_names: runtime: 15m45s,    gctime: 1m05s,     systime: 5.3s.<br>
>>>>>>> stack_to_lab: runtime: 41m59s,    gctime: 9m22s,     systime: 8.6s.<br>
>>>>>>><br>
>>>>>>><br>
>>>>>>> commit: 9937677<br>
>>>>>>><br>
>>>>>>> eval to_mod: runtime: 1m51s,    gctime: 5.9s,     systime: 0.90000s.<br>
>>>>>>> eval to_con: runtime: 8m48s,    gctime: 4.4s,     systime: 0.71667s.<br>
>>>>>>> eval to_dec: runtime: 7.2s,    gctime: 2.3s,     systime: 0.15667s.<br>
>>>>>>> eval to_exh: runtime: 3m30s,    gctime: 9.5s,     systime: 1.2s.<br>
>>>>>>> eval to_pat: runtime: 14.3s,    gctime: 1.6s,     systime: 0.09000s.<br>
>>>>>>> eval to_clos: runtime: 5.2s,    gctime: 0.95667s,     systime:<br>
>>>>>>> 0.07333s.<br>
>>>>>>> eval to_bvl: runtime: 2m28s,    gctime: 5.2s,     systime: 0.32667s.<br>
>>>>>>> eval to_bvi: ... inline: runtime: 5.8s,    gctime: 0.88000s,<br>
>>>>>>> systime: 0.05333s.<br>
>>>>>>> ... optimise: runtime: 23m24s,    gctime: 5.8s,     systime: 3.0s.<br>
>>>>>>> ... compile: runtime: 2m27s,    gctime: 12.5s,     systime: 0.78333s.<br>
>>>>>>> eval to_data: runtime: 8m18s,    gctime: 34.2s,     systime: 3.0s.<br>
>>>>>>> eval data_to_word: runtime: 8m24s,    gctime: 37.4s,     systime:<br>
>>>>>>> 2.8s.<br>
>>>>>>><br>
>>>>>>> word_to_word: runtime: 57m40s,    gctime: 4m56s,     systime: 30.1s.<br>
>>>>>>><br>
>>>>>>> clash: runtime: 5m12s,    gctime: 33.6s,     systime: 3.9s.<br>
>>>>>>> (external oracle) runtime: 2m31s,    gctime: 1.8s,     systime:<br>
>>>>>>> 0.15667s.<br>
>>>>>>><br>
>>>>>>> chunk: runtime: 1h07m54s,    gctime: 6m07s,     systime: 29.5s.<br>
>>>>>>> (chunk results) runtime: 13m21s,    gctime: 2m54s,     systime:<br>
>>>>>>> 14.0s.<br>
>>>>>>><br>
>>>>>>> eval word_to_stack: runtime: 1h34m00s,    gctime: 7m32s,     systime:<br>
>>>>>>> 8.1s.<br>
>>>>>>><br>
>>>>>>> stack_alloc: runtime: 22m35s,    gctime: 18.7s,     systime: 3.7s.<br>
>>>>>>> stack_remove: runtime: 19m04s,    gctime: 16.7s,     systime: 3.9s.<br>
>>>>>>> stack_names: runtime: 15m13s,    gctime: 37.8s,     systime: 4.0s.<br>
>>>>>>> stack_to_lab: runtime: 36m21s,    gctime: 3m29s,     systime: 13.9s.<br>
>>>>>>><br>
>>>>>>><br>
>>>>>>> On 1 September 2016 at 09:04, Ramana Kumar<br>
>>>>>>> <<a href="mailto:Ramana.Kumar@cl.cam.ac.uk" target="_blank">Ramana.Kumar@cl.cam.ac.uk</a>> wrote:<br>
>>>>>>>><br>
>>>>>>>> Also, updated timings for the middle bit, as of 4d0d56b (which is<br>
>>>>>>>> also for the front half timings in my previous email).<br>
>>>>>>>><br>
>>>>>>>> eval data_to_word: runtime: 7m42s,    gctime: 35.7s,     systime:<br>
>>>>>>>> 2.7s.<br>
>>>>>>>><br>
>>>>>>>> word_to_word: runtime: 56m07s,    gctime: 4m48s,     systime: 28.7s.<br>
>>>>>>>><br>
>>>>>>>> clash: runtime: 4m48s,    gctime: 16.0s,     systime: 2.2s.<br>
>>>>>>>><br>
>>>>>>>> (external oracle) runtime: 2m21s,    gctime: 1.7s,     systime:<br>
>>>>>>>> 0.10667s.<br>
>>>>>>>><br>
>>>>>>>> chunk: runtime: 1h10m00s,    gctime: 6m06s,     systime: 27.1s.<br>
>>>>>>>><br>
>>>>>>>> (use chunk) runtime: 12m26s,    gctime: 2m29s,     systime: 11.0s.<br>
>>>>>>>><br>
>>>>>>>> eval word_to_stack: runtime: 1h29m28s,    gctime: 7m23s,<br>
>>>>>>>> systime: 8.0s.<br>
>>>>>>>><br>
>>>>>>>> stack_alloc: runtime: 21m13s,    gctime: 24.3s,     systime: 3.9s.<br>
>>>>>>>> stack_remove: runtime: 18m12s,    gctime: 46.6s,     systime: 4.3s.<br>
>>>>>>>> stack_names: runtime: 14m08s,    gctime: 32.1s,     systime: 4.1s.<br>
>>>>>>>> stack_to_lab: runtime: 36m41s,    gctime: 1m19s,     systime: 10.6s.<br>
>>>>>>>><br>
>>>>>>>> It doesn't look like 91d539b made a huge difference.<br>
>>>>>>>><br>
>>>>>>>> On 1 September 2016 at 08:42, Ramana Kumar<br>
>>>>>>>> <<a href="mailto:Ramana.Kumar@cl.cam.ac.uk" target="_blank">Ramana.Kumar@cl.cam.ac.uk</a>> wrote:<br>
>>>>>>>>><br>
>>>>>>>>> I have some timings for the front half of the backend now:<br>
>>>>>>>>> eval to_mod: runtime: 1m45s,    gctime: 9.6s,     systime:<br>
>>>>>>>>> 0.40000s.<br>
>>>>>>>>> eval to_con: runtime: 7m00s,    gctime: 5.4s,     systime:<br>
>>>>>>>>> 0.80667s.<br>
>>>>>>>>> eval to_dec: runtime: 7.0s,    gctime: 2.3s,     systime: 0.13333s.<br>
>>>>>>>>> eval to_exh: runtime: 3m18s,    gctime: 4.5s,     systime:<br>
>>>>>>>>> 0.67000s.<br>
>>>>>>>>> eval to_pat: runtime: 15.9s,    gctime: 3.7s,     systime:<br>
>>>>>>>>> 0.50667s.<br>
>>>>>>>>> eval to_clos: runtime: 5.0s,    gctime: 1.1s,     systime:<br>
>>>>>>>>> 0.07000s.<br>
>>>>>>>>> eval to_bvl: runtime: 2m18s,    gctime: 5.2s,     systime:<br>
>>>>>>>>> 0.36000s.<br>
>>>>>>>>> eval to_bvi:<br>
>>>>>>>>> ... inline: runtime: 5.8s,    gctime: 1.1s,     systime: 0.07667s.<br>
>>>>>>>>> ... optimise: runtime: 22m52s,    gctime: 30.1s,     systime: 6.1s.<br>
>>>>>>>>> ... compile: runtime: 2m05s,    gctime: 6.9s,     systime:<br>
>>>>>>>>> 0.72000s.<br>
>>>>>>>>> eval to_data: runtime: 5m44s,    gctime: 28.8s,     systime: 2.1s.<br>
>>>>>>>>><br>
>>>>>>>>> I'm not sure why there are suspicious timings for things like<br>
>>>>>>>>> stack_alloc. One possibility is that the "time" function adds up times<br>
>>>>>>>>> across all parallel threads. In that case, it would make sense to divide the<br>
>>>>>>>>> time by 8. I would need to investigate whether this is actually what's going<br>
>>>>>>>>> on...<br>
>>>>>>>>><br>
>>>>>>>>> On 30 August 2016 at 19:36, Magnus Myreen <<a href="mailto:magnus.myreen@gmail.com" target="_blank">magnus.myreen@gmail.com</a>><br>
>>>>>>>>> wrote:<br>
>>>>>>>>>><br>
>>>>>>>>>> My guess is that evaluation times get slower once we pass dataLang<br>
>>>>>>>>>> because the programs become a lot larger once the data abstraction<br>
>>>>>>>>>> has<br>
>>>>>>>>>> been removed.<br>
>>>>>>>>>><br>
>>>>>>>>>> There are however some suspicious looking timings here. I note<br>
>>>>>>>>>> that<br>
>>>>>>>>>> stack_alloc does near to nothing, but still takes 31 minutes to<br>
>>>>>>>>>> run.<br>
>>>>>>>>>> Here is stack_alloc's main compiler function:<br>
>>>>>>>>>><br>
>>>>>>>>>><br>
>>>>>>>>>> <a href="https://github.com/CakeML/cakeml/blob/master/compiler/backend/stack_allocScript.sml#L166" rel="noreferrer" target="_blank">https://github.com/CakeML/cake<wbr>ml/blob/master/compiler/backen<wbr>d/stack_allocScript.sml#L166</a><br>
>>>>>>>>>><br>
>>>>>>>>>> Cheers,<br>
>>>>>>>>>> Magnus<br>
>>>>>>>>>><br>
>>>>>>>>>> On 30 August 2016 at 03:43, Ramana Kumar<br>
>>>>>>>>>> <<a href="mailto:Ramana.Kumar@cl.cam.ac.uk" target="_blank">Ramana.Kumar@cl.cam.ac.uk</a>> wrote:<br>
>>>>>>>>>> > I have some information now on where slow parts of the bootstrap<br>
>>>>>>>>>> > evaluation<br>
>>>>>>>>>> > are, which we could use to direct effort in case we want to<br>
>>>>>>>>>> > speed things up.<br>
>>>>>>>>>> ><br>
>>>>>>>>>> > Currently I have timings for between dataLang and labLang. (I<br>
>>>>>>>>>> > know the<br>
>>>>>>>>>> > phases above dataLang are mostly quick, and the ones after<br>
>>>>>>>>>> > labLang are<br>
>>>>>>>>>> > mostly slow, but I will have to get actual numbers later)<br>
>>>>>>>>>> ><br>
>>>>>>>>>> > eval data_to_word: runtime: 7m48s,    gctime: 21.9s,<br>
>>>>>>>>>> > systime: 2.1s.<br>
>>>>>>>>>> ><br>
>>>>>>>>>> > word_to_word: runtime: 55m07s,    gctime: 5m31s,     systime:<br>
>>>>>>>>>> > 30.9s.<br>
>>>>>>>>>> ><br>
>>>>>>>>>> > clash: runtime: 5m33s,    gctime: 34.4s,     systime: 4.4s.<br>
>>>>>>>>>> ><br>
>>>>>>>>>> > (external oracle) runtime: 2m30s,    gctime: 12.1s,     systime:<br>
>>>>>>>>>> > 1.5s<br>
>>>>>>>>>> ><br>
>>>>>>>>>> > chunk: runtime: 1h04m14s,    gctime: 1m37s,     systime: 20.8s.<br>
>>>>>>>>>> > (use results of chunk) runtime: 11m21s,    gctime: 2m00s,<br>
>>>>>>>>>> > systime:<br>
>>>>>>>>>> > 10.1s.<br>
>>>>>>>>>> ><br>
>>>>>>>>>> > eval word_to_stack: runtime: 1h34m21s,    gctime: 9m42s,<br>
>>>>>>>>>> > systime: 9.4s.<br>
>>>>>>>>>> ><br>
>>>>>>>>>> > stack_alloc: runtime: 31m55s,    gctime: 30.3s,     systime:<br>
>>>>>>>>>> > 5.4s.<br>
>>>>>>>>>> ><br>
>>>>>>>>>> > stack_remove: runtime: 25m43s,    gctime: 1m32s,     systime:<br>
>>>>>>>>>> > 6.9s.<br>
>>>>>>>>>> ><br>
>>>>>>>>>> > stack_names: runtime: 22m03s,    gctime: 3m01s,     systime:<br>
>>>>>>>>>> > 6.6s.<br>
>>>>>>>>>> ><br>
>>>>>>>>>> > stack_to_lab: runtime: 56m00s,    gctime: 4m38s,     systime:<br>
>>>>>>>>>> > 23.8s.<br>
>>>>>>>>>> ><br>
>>>>>>>>>> > What exactly these different phases are doing can be found by<br>
>>>>>>>>>> > looking at<br>
>>>>>>>>>> ><br>
>>>>>>>>>> > <a href="https://github.com/CakeML/cakeml/blob/master/compiler/bootstrap/evaluation/to_lab_x64BootstrapScript.sml" rel="noreferrer" target="_blank">https://github.com/CakeML/cake<wbr>ml/blob/master/compiler/bootst<wbr>rap/evaluation/to_lab_x64Boots<wbr>trapScript.sml</a>.<br>
>>>>>>>>>> > I think the one confusingly called "chunk" means checking the<br>
>>>>>>>>>> > results of the<br>
>>>>>>>>>> > external reg. alloc oracle.<br>
>>>>>>>>>> ><br>
>>>>>>>>>> > ______________________________<wbr>_________________<br>
>>>>>>>>>> > Dev mailing list<br>
>>>>>>>>>> > <a href="mailto:Dev@cakeml.org" target="_blank">Dev@cakeml.org</a><br>
>>>>>>>>>> > <a href="https://lists.cakeml.org/listinfo/dev" rel="noreferrer" target="_blank">https://lists.cakeml.org/listi<wbr>nfo/dev</a><br>
>>>>>>>>>> ><br>
>>>>>>>>><br>
>>>>>>>>><br>
>>>>>>>><br>
>>>>>>><br>
>>>>>>><br>
>>>>>>> ______________________________<wbr>_________________<br>
>>>>>>> Dev mailing list<br>
>>>>>>> <a href="mailto:Dev@cakeml.org" target="_blank">Dev@cakeml.org</a><br>
>>>>>>> <a href="https://lists.cakeml.org/listinfo/dev" rel="noreferrer" target="_blank">https://lists.cakeml.org/listi<wbr>nfo/dev</a><br>
>>>>>>><br>
>>>>>><br>
>>>>><br>
>>>><br>
>>><br>
>><br>
>><br>
>> ______________________________<wbr>_________________<br>
>> Developers mailing list<br>
>> <a href="mailto:Developers@cakeml.org" target="_blank">Developers@cakeml.org</a><br>
>> <a href="https://lists.cakeml.org/listinfo/developers" rel="noreferrer" target="_blank">https://lists.cakeml.org/listi<wbr>nfo/developers</a><br>
>><br>
><br>
><br>
> ______________________________<wbr>_________________<br>
> Developers mailing list<br>
> <a href="mailto:Developers@cakeml.org" target="_blank">Developers@cakeml.org</a><br>
> <a href="https://lists.cakeml.org/listinfo/developers" rel="noreferrer" target="_blank">https://lists.cakeml.org/listi<wbr>nfo/developers</a><br>
><br>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>