<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Paul Bone</title>
    <description>Paul Bone is a software engineer working on Firefox&apos;s JavaScript engine, in particular the garbage collector. His interests include His interests include programming languages, declarative programming, programming language implementation, parallelism and concurrency.</description>
    <link>https://paul.bone.id.au/</link>
    <atom:link href="https://paul.bone.id.au/blog-planet-mozilla.xml" rel="self" type="application/rss+xml"/>
    <pubDate>Sun, 08 Feb 2026 01:05:40 +1100</pubDate>
    <lastBuildDate>Sun, 08 Feb 2026 01:05:40 +1100</lastBuildDate>
    <generator>Jekyll v4.4.1</generator>
    
      <item>
        <title>The right amount of poison</title>
        <description>&lt;div id=&quot;preamble&quot;&gt;
&lt;div class=&quot;sectionbody&quot;&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Oh, you don&amp;#8217;t want any poison in your porridge.
But how about in your computer&amp;#8217;s memory?&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;sect1&quot;&gt;
&lt;h2 id=&quot;papa-bear-too-much-poison&quot;&gt;Papa Bear - too much poison&lt;/h2&gt;
&lt;div class=&quot;sectionbody&quot;&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Papa Bear likes his chair hard, his porridge hot and his browser written in
a memory safe language that helps engineers avoid memory bugs like
&lt;em&gt;buffer overruns&lt;/em&gt; and &lt;em&gt;use after frees&lt;/em&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;But even Papa Bear has to compromise, part of Firefox is written in a
memory safe language and the rest is written in C++.  When using
C++ there are a variety of defenses programmers can take to help
catch memory errors.  One of those is called &lt;em&gt;memory poisoning&lt;/em&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;&lt;code&gt;mozjemalloc&lt;/code&gt; the memory allocator built into Firefox will &lt;em&gt;poison&lt;/em&gt; memory
by calling &lt;code&gt;memset(aPtr, 0xE5, size);&lt;/code&gt; before freeing it.
Any memory containing the pattern &lt;code&gt;0xE5E5E5E5&lt;/code&gt; is therefore very likely to be
memory that&amp;#8217;s already been freed.
This has two and a half benefits:
If some code were to free &lt;strong&gt;and then dereference&lt;/strong&gt; some memory
(&lt;em&gt;a use after free bug&lt;/em&gt;)
it would most likely cause the browser to crash, which is much better
than a potentially exploitable bug allowing Goldilocks to steal Papa Bear&amp;#8217;s
banking credentials!
The other benefit is that when Firefox does crash due to such a
use-after-free, the presence of this pattern in the crash report allows
engineers to see the type of error that occurred and hopefully fix the
mistake.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Note that back in March 2023 we
&lt;a href=&quot;https://bugzilla.mozilla.org/show_bug.cgi?id=1609478&quot;&gt;moved the poison
operation outside of the arena lock&amp;#8217;s critical section&lt;/a&gt;;
which improved performance in some tests.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;sect1&quot;&gt;
&lt;h2 id=&quot;mama-bear-no-poisoning&quot;&gt;Mama Bear - no poisoning&lt;/h2&gt;
&lt;div class=&quot;sectionbody&quot;&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;You probably figured out by now that I&amp;#8217;m going to persist with this
metaphor.
Mama Bear likes her chair soft, her porridge cold (and congealed (yuck)),
and her browser fast.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;But how much faster is Mama Bear&amp;#8217;s experience?
This is the question that was raised recently when
Randell Jesup was benchmarking various memory allocators in Firefox.
He noted that while mozjemalloc performs poisoning, many of the other
allocators do not and to compare the performance of the allocators more
fairly they should either all perform poisoning or none of them should.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;And so Randell noted that, depending on the test,
Firefox could be
&lt;a href=&quot;https://bugzilla.mozilla.org/show_bug.cgi?id=1850008#c3&quot;&gt;between 0.5% and 4%
faster&lt;/a&gt;
with poisoning disabled.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;There are some results I collected.  The &quot;sp2&quot; (Speedometer 2) and &quot;sp3&quot;
(Speedometer 3) tests are browser benchmarks - larger numbers indicate
better performance.
The amazon and instagram tests are pageload tests measured in seconds with
the &lt;em&gt;ContentfulSpeedIndex&lt;/em&gt; metric - smaller numbers indicate better
performance.&lt;/p&gt;
&lt;/div&gt;
&lt;table class=&quot;tableblock frame-all grid-all stretch&quot;&gt;
&lt;colgroup&gt;
&lt;col style=&quot;width: 20%;&quot;&gt;
&lt;col style=&quot;width: 20%;&quot;&gt;
&lt;col style=&quot;width: 20%;&quot;&gt;
&lt;col style=&quot;width: 20%;&quot;&gt;
&lt;col style=&quot;width: 20%;&quot;&gt;
&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;tableblock halign-left valign-top&quot;&gt;&lt;/th&gt;
&lt;th class=&quot;tableblock halign-left valign-top&quot;&gt;sp2 (score)&lt;/th&gt;
&lt;th class=&quot;tableblock halign-left valign-top&quot;&gt;sp3 (score)&lt;/th&gt;
&lt;th class=&quot;tableblock halign-left valign-top&quot;&gt;amazon (sec)&lt;/th&gt;
&lt;th class=&quot;tableblock halign-left valign-top&quot;&gt;instagram (sec)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;tableblock halign-left valign-top&quot;&gt;&lt;p class=&quot;tableblock&quot;&gt;Poison&lt;/p&gt;&lt;/td&gt;
&lt;td class=&quot;tableblock halign-left valign-top&quot;&gt;&lt;p class=&quot;tableblock&quot;&gt;178.84 ± 0.84&lt;/p&gt;&lt;/td&gt;
&lt;td class=&quot;tableblock halign-left valign-top&quot;&gt;&lt;p class=&quot;tableblock&quot;&gt;13.32 ± 1.03&lt;/p&gt;&lt;/td&gt;
&lt;td class=&quot;tableblock halign-left valign-top&quot;&gt;&lt;p class=&quot;tableblock&quot;&gt;243.2 ± 1.96&lt;/p&gt;&lt;/td&gt;
&lt;td class=&quot;tableblock halign-left valign-top&quot;&gt;&lt;p class=&quot;tableblock&quot;&gt;419.43 ± 1.04&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;tableblock halign-left valign-top&quot;&gt;&lt;p class=&quot;tableblock&quot;&gt;No poisoning&lt;/p&gt;&lt;/td&gt;
&lt;td class=&quot;tableblock halign-left valign-top&quot;&gt;&lt;p class=&quot;tableblock&quot;&gt;179.42 ± 0.48&lt;/p&gt;&lt;/td&gt;
&lt;td class=&quot;tableblock halign-left valign-top&quot;&gt;&lt;p class=&quot;tableblock&quot;&gt;13.39 ± 0.31&lt;/p&gt;&lt;/td&gt;
&lt;td class=&quot;tableblock halign-left valign-top&quot;&gt;&lt;p class=&quot;tableblock&quot;&gt;237.55 ± 2.6&lt;/p&gt;&lt;/td&gt;
&lt;td class=&quot;tableblock halign-left valign-top&quot;&gt;&lt;p class=&quot;tableblock&quot;&gt;414.5 ± 0.8&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;The speedometer figures are pretty close and these are the best pageload
figures (the others showed very little difference but nothing regressed, yes
I&amp;#8217;m aware I&amp;#8217;ve cherry-picked data).&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;This means that if it weren&amp;#8217;t for the lack of security and debugability
Mama Bear would have the right approach.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;sect1&quot;&gt;
&lt;h2 id=&quot;baby-bear&quot;&gt;Baby Bear&lt;/h2&gt;
&lt;div class=&quot;sectionbody&quot;&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Baby Bear loves a compromise, they want their computer to be safe from
Goldilocks&apos; hacking attempts but also love performance improvements.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;One compromise may be to probabilistic poison memory some of the time, e.g.
a roughly 5% chance of poisoning.
That&amp;#8217;s more complex and involves a memory write anyway to keep the &quot;time
until poison&quot; counter updated.
We didn&amp;#8217;t investigate it.
But it&amp;#8217;s worth noting that it would be similar in spirit to the
&lt;a href=&quot;https://groups.google.com/g/mozilla.dev.platform/c/AyECjDNsqUE/m/Jd7Jr4cXAgAJ?pli=1&quot;&gt;Probabilistic Heap Checker (PHC)&lt;/a&gt;
that&amp;#8217;s
&lt;a href=&quot;https://groups.google.com/a/mozilla.org/g/dev-platform/c/C1LcRpii-cI&quot;&gt;rolling out&lt;/a&gt;
in Firefox or the similar &lt;a href=&quot;https://arxiv.org/abs/2311.09394&quot;&gt;GWP-ASan&lt;/a&gt;
capability in Chrome.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Instead we tested &quot;what if we poison only the first cache line of a memory
cell&quot;.
Andrew McCreight and Olli Pettay
&lt;a href=&quot;https://bugzilla.mozilla.org/show_bug.cgi?id=1850008#c9&quot;&gt;pointed out that&lt;/a&gt;
Element, a common DOM structure, is 128 bytes long and poisoning it is
useful to detect memory errors in DOM code, as a lot of DOM code will
involve Element.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;We tested poisoning the first 64, 128 and 256 bytes of each structure.
We assume that management of cache and writing cache lines back to RAM is
going to be the dominant cost.  Therefore we round-up our writes to the next
cache line boundary..&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;For example, on a computer with 64-byte cache lines, if a 96-byte object is
allocated so that the first 32-bytes is in one cache-line, while the next
64-bytes is in another.  Our 64-byte write would cover two halves of
different cache lines.  In this case we will poison all 96-bytes because
doing so writes to the same number of cache lines as the original 64-byte
write.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Let&amp;#8217;s add these options to our table of results.&lt;/p&gt;
&lt;/div&gt;
&lt;table class=&quot;tableblock frame-all grid-all stretch&quot;&gt;
&lt;colgroup&gt;
&lt;col style=&quot;width: 20%;&quot;&gt;
&lt;col style=&quot;width: 20%;&quot;&gt;
&lt;col style=&quot;width: 20%;&quot;&gt;
&lt;col style=&quot;width: 20%;&quot;&gt;
&lt;col style=&quot;width: 20%;&quot;&gt;
&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;tableblock halign-left valign-top&quot;&gt;&lt;/th&gt;
&lt;th class=&quot;tableblock halign-left valign-top&quot;&gt;sp2 (score)&lt;/th&gt;
&lt;th class=&quot;tableblock halign-left valign-top&quot;&gt;sp3 (score)&lt;/th&gt;
&lt;th class=&quot;tableblock halign-left valign-top&quot;&gt;amazon (sec)&lt;/th&gt;
&lt;th class=&quot;tableblock halign-left valign-top&quot;&gt;instagram (sec)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;tableblock halign-left valign-top&quot;&gt;&lt;p class=&quot;tableblock&quot;&gt;Poison&lt;/p&gt;&lt;/td&gt;
&lt;td class=&quot;tableblock halign-left valign-top&quot;&gt;&lt;p class=&quot;tableblock&quot;&gt;178.84 ± 0.84&lt;/p&gt;&lt;/td&gt;
&lt;td class=&quot;tableblock halign-left valign-top&quot;&gt;&lt;p class=&quot;tableblock&quot;&gt;13.32 ± 1.03&lt;/p&gt;&lt;/td&gt;
&lt;td class=&quot;tableblock halign-left valign-top&quot;&gt;&lt;p class=&quot;tableblock&quot;&gt;243.20 ± 1.96&lt;/p&gt;&lt;/td&gt;
&lt;td class=&quot;tableblock halign-left valign-top&quot;&gt;&lt;p class=&quot;tableblock&quot;&gt;419.43 ± 1.04&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;tableblock halign-left valign-top&quot;&gt;&lt;p class=&quot;tableblock&quot;&gt;Poison 256&lt;/p&gt;&lt;/td&gt;
&lt;td class=&quot;tableblock halign-left valign-top&quot;&gt;&lt;p class=&quot;tableblock&quot;&gt;179.50 ± 0.55&lt;/p&gt;&lt;/td&gt;
&lt;td class=&quot;tableblock halign-left valign-top&quot;&gt;&lt;p class=&quot;tableblock&quot;&gt;13.35 ± 0.33&lt;/p&gt;&lt;/td&gt;
&lt;td class=&quot;tableblock halign-left valign-top&quot;&gt;&lt;p class=&quot;tableblock&quot;&gt;240.47 ± 2.82&lt;/p&gt;&lt;/td&gt;
&lt;td class=&quot;tableblock halign-left valign-top&quot;&gt;&lt;p class=&quot;tableblock&quot;&gt;415.28 ± 1.30&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;tableblock halign-left valign-top&quot;&gt;&lt;p class=&quot;tableblock&quot;&gt;Poison 128&lt;/p&gt;&lt;/td&gt;
&lt;td class=&quot;tableblock halign-left valign-top&quot;&gt;&lt;p class=&quot;tableblock&quot;&gt;179.19 ± 0.43&lt;/p&gt;&lt;/td&gt;
&lt;td class=&quot;tableblock halign-left valign-top&quot;&gt;&lt;p class=&quot;tableblock&quot;&gt;13.35 ± 0.59&lt;/p&gt;&lt;/td&gt;
&lt;td class=&quot;tableblock halign-left valign-top&quot;&gt;&lt;p class=&quot;tableblock&quot;&gt;241.62 ± 3.05&lt;/p&gt;&lt;/td&gt;
&lt;td class=&quot;tableblock halign-left valign-top&quot;&gt;&lt;p class=&quot;tableblock&quot;&gt;414.95 ± 1.15&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;tableblock halign-left valign-top&quot;&gt;&lt;p class=&quot;tableblock&quot;&gt;Poison 64&lt;/p&gt;&lt;/td&gt;
&lt;td class=&quot;tableblock halign-left valign-top&quot;&gt;&lt;p class=&quot;tableblock&quot;&gt;179.09 ± 0.87&lt;/p&gt;&lt;/td&gt;
&lt;td class=&quot;tableblock halign-left valign-top&quot;&gt;&lt;p class=&quot;tableblock&quot;&gt;13.33 ± 0.83&lt;/p&gt;&lt;/td&gt;
&lt;td class=&quot;tableblock halign-left valign-top&quot;&gt;&lt;p class=&quot;tableblock&quot;&gt;242.13 ± 2.56&lt;/p&gt;&lt;/td&gt;
&lt;td class=&quot;tableblock halign-left valign-top&quot;&gt;&lt;p class=&quot;tableblock&quot;&gt;414.11 ± 0.91&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;tableblock halign-left valign-top&quot;&gt;&lt;p class=&quot;tableblock&quot;&gt;No poisoning&lt;/p&gt;&lt;/td&gt;
&lt;td class=&quot;tableblock halign-left valign-top&quot;&gt;&lt;p class=&quot;tableblock&quot;&gt;179.42 ± 0.48&lt;/p&gt;&lt;/td&gt;
&lt;td class=&quot;tableblock halign-left valign-top&quot;&gt;&lt;p class=&quot;tableblock&quot;&gt;13.39 ± 0.31&lt;/p&gt;&lt;/td&gt;
&lt;td class=&quot;tableblock halign-left valign-top&quot;&gt;&lt;p class=&quot;tableblock&quot;&gt;237.55 ± 2.60&lt;/p&gt;&lt;/td&gt;
&lt;td class=&quot;tableblock halign-left valign-top&quot;&gt;&lt;p class=&quot;tableblock&quot;&gt;414.5 ± 0.8&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;As above, sp2 and sp3 are scores - bigger numbers are better.  While amazon
and instagram are page load tests where smaller numbers are better.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;As expected the partial poisoning results fall between full and no
poisoning. But what&amp;#8217;s a little bit surprising is that in some tests (sp2 and
amazon) poisoning a larger amount of memory made things faster.
This could be because the &lt;code&gt;memset()&lt;/code&gt; routine or the hardware itself is able
to optimise larger writes more effectively.
That said it&amp;#8217;s important to acknowledge that the standard deviation is
fairly high and doing the right statistical analysis is beyond this blog post.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;sect1&quot;&gt;
&lt;h2 id=&quot;just-right&quot;&gt;Just right&lt;/h2&gt;
&lt;div class=&quot;sectionbody&quot;&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Since poisoning more memory isn&amp;#8217;t &lt;em&gt;much&lt;/em&gt; slower and in some
cases is faster than poisoning a little memory, then we might as well choose to
poison
&lt;a href=&quot;https://searchfox.org/mozilla-central/rev/9013524d23da6523a7ec4479b5682407a1323f6c/memory/build/mozjemalloc.cpp#1484&quot;&gt;256 bytes&lt;/a&gt;
which comfortably covers the Element object and most
others and for the others it likely covers many of their most-often accessed
fields.
We&amp;#8217;re confident that this is enough to help us catch many errors that can be
caught with poisoning.
While also performing well enough, especially for the pageload tests where
it is closer to the performance available with poisoning disabled.
We think that Baby Bear would agree, it is &lt;em&gt;Just Right&lt;/em&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;sect1&quot;&gt;
&lt;h2 id=&quot;it-gets-better&quot;&gt;It gets better&lt;/h2&gt;
&lt;div class=&quot;sectionbody&quot;&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;With the
&lt;a href=&quot;https://groups.google.com/g/mozilla.dev.platform/c/AyECjDNsqUE/m/Jd7Jr4cXAgAJ?pli=1&quot;&gt;Probablistic Heap Checker (PHC)&lt;/a&gt;
rolling out soon we will have an even greater ability to catch information
related to memory errors.
I&amp;#8217;ll be writing about this in the future.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;sect1&quot;&gt;
&lt;h2 id=&quot;why-papa-bear-is-safe-and-mama-bear-is-secure&quot;&gt;Why Papa Bear is safe and Mama Bear is secure?&lt;/h2&gt;
&lt;div class=&quot;sectionbody&quot;&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;In some ways it feels more natural to lean in to (negative) gender
stereotypes where Papa Bear wants things fast and Mama Bear is the
cautious one.  I considered this however to make comprehension easier it&amp;#8217;s
easier to explain poisoning before explaining turning poisoning off  and the
nursery tale describes Papa Bear&amp;#8217;s preferences first,
so that&amp;#8217;s the order I introduced them here.
Flipping the script on gender stereotypes was accidental.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;</description>
        <pubDate>Tue, 13 Feb 2024 00:00:00 +1100</pubDate>
        <link>https://paul.bone.id.au/blog/2024/02/13/poisoning-firefox-memory/</link>
        <guid isPermaLink="true">https://paul.bone.id.au/blog/2024/02/13/poisoning-firefox-memory/</guid>
        
        <category>planet-mozilla</category>
        
        <category>Firefox</category>
        
        <category>poison</category>
        
        <category>memory</category>
        
        <category>jemalloc</category>
        
        <category>mozjemalloc</category>
        
        
        <category>blog</category>
        
      </item>
    
      <item>
        <title>Waiting for web content to do something in a Firefox mochitest</title>
        <description>&lt;div id=&quot;preamble&quot;&gt;
&lt;div class=&quot;sectionbody&quot;&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;It&amp;#8217;s not unusual for a Firefox test to have to wait for various things such
as a tab loading.
But recently I needed to write a test that loaded a content tab with a
web worker and wait for that before observing the result in a different tab.
I am writing this for my own reference in the future,
and if it helps someone else, that&amp;#8217;s extra good.
But I don&amp;#8217;t think it will be of much interest if you don&amp;#8217;t work on Firefox
as the problem I&amp;#8217;m solving won&amp;#8217;t be relevant and the APIs won&amp;#8217;t be familiar.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;I don&amp;#8217;t think of myself as a JavaScript programmer - I&amp;#8217;m learning what I
need to know when I need to know it, but mainly to write tests.
So I&amp;#8217;m not sure I&amp;#8217;ll pitch this article at any particular level of JS
knowledge, sorry.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;sect1&quot;&gt;
&lt;h2 id=&quot;web-workers&quot;&gt;Web Workers&lt;/h2&gt;
&lt;div class=&quot;sectionbody&quot;&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;&lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Using_web_workers&quot;&gt;Web Workers&lt;/a&gt;
provide web pages a way to execute long-running JavaScript tasks in a
separate &lt;em&gt;thread&lt;/em&gt;, where it won&amp;#8217;t block the main event loop.
They solve the same problem, allowing a page to use &lt;em&gt;concurrency&lt;/em&gt;.
However their programming model is more like &lt;em&gt;processes&lt;/em&gt;
because they don&amp;#8217;t share state (global variables or even functions) and
communicate by
&lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Using_web_workers#sending_messages_to_and_from_a_dedicated_worker&quot;&gt;sending
and receiving messages&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;I realise this is a tangent but it&amp;#8217;s a topic I like and you may have the
same questions I did:
So if workers are supposed to solve the same problems as threads do in other
languages, why are they more like processes?
Furthermore, at least in Firefox, each worker instantiates another copy of
the JavaScript engine (the &lt;code&gt;JSRuntime&lt;/code&gt; class) with its own instantiation of
JIT, GC etc.
Isn&amp;#8217;t this fairly &lt;strong&gt;heavy&lt;/strong&gt; just to add concurrency?&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;It is, but there are benefits:&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;ulist&quot;&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;I&amp;#8217;m not certain, but I think
this was the easiest way to retrofit concurrency to JavaScript (the
language standard) without breaking backwards compatibility with existing
web sites.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Message-passing concurrency makes the boundary between threads very
clear.  This makes it a simpler programming model, especially if you&amp;#8217;re
working on some code that is isolated from the concurrency happening
elsewhere.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It worked for Erlang, although Erlang likely shares bytecode caches
and some other systems.  But not garbage collection.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Anyway, the point is that Web Workers are concurrent &quot;process like&quot; things
that communicate through message-passing.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;sect1&quot;&gt;
&lt;h2 id=&quot;aboutperformance&quot;&gt;about:performance&lt;/h2&gt;
&lt;div class=&quot;sectionbody&quot;&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Firefox has a number of &lt;code&gt;about:&lt;/code&gt; pages, used for diagnostics and tweaking.
&lt;code&gt;about:config&lt;/code&gt; is probably the most infamous (if you touch those settings
you can break your browser or make it insecure).
&lt;code&gt;about:support&lt;/code&gt; is interesting too it contains diagnostic information about
Firefox on your computer.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Today we&amp;#8217;re looking at &lt;code&gt;about:performance&lt;/code&gt;, which is useful when you are
thinking &quot;Firefox seems slow, I wonder why..&quot;.  &lt;code&gt;about:performance&lt;/code&gt; will
show your busiest tabs, how much CPU time/power and memory they&amp;#8217;re using.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Measuring memory usage can be tricky at the best of times
(more on this in an upcoming article).
We can&amp;#8217;t afford to count every allocation since that is too slow for a page
like &lt;code&gt;about:performance&lt;/code&gt;.  Although &lt;code&gt;about:memory&lt;/code&gt; comes closer to doing
this.
For about:performance we can ask major subsystems how much
memory they&amp;#8217;re using and rely on their counters.
This isn&amp;#8217;t accurate but it&amp;#8217;s good enough.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;I noticed two major things that weren&amp;#8217;t counted:&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;ulist&quot;&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Malloc memory used by JS objects was not counted.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Web workers were not counted.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;I fixed them in
&lt;a href=&quot;https://bugzilla.mozilla.org/show_bug.cgi?id=1760920&quot;&gt;Bug 1760920&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;So I wanted to write a test that would verify that we are indeed counting
memory belonging to web workers.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;sect1&quot;&gt;
&lt;h2 id=&quot;my-web-worker&quot;&gt;My web worker&lt;/h2&gt;
&lt;div class=&quot;sectionbody&quot;&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;To make it easier to see if we&amp;#8217;re counting a component&amp;#8217;s memory, it&amp;#8217;s great
of our test causes that component to use a lot of memory then we can test
for that.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Here&amp;#8217;s a Web Worker that uses about 40MB of memory using an array with 4
million elements.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;listingblock&quot;&gt;
&lt;div class=&quot;content&quot;&gt;
&lt;pre&gt;var big_array = [];
var n = 0;

onmessage = function(e) {
  var sum = 0;
  if (n == 0) {
    for (let i = 0; i &amp;lt; 4 * 1024 * 1024; i++) {
      big_array[i] = i * i;
    }
  } else {
    for (let i = 0; i &amp;lt; 4 * 1024 * 1024; i++) {
      sum += big_array[i];
      big_array[i] += 1;
    }
  }
  self.postMessage(`Iter: ${n}, sum: ${sum}`);
  n++;
};&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;It registers an &lt;code&gt;onmessage&lt;/code&gt; event hander.  When the page sends it a message it
will execute the anonymous function.
The first time this happens the function will create the array, the next
time it will manipulate the array.
Since the array is a global and is also
captured by the handler I doubt the GC would free it.
But I also don&amp;#8217;t want an optimiser (now or in the future) from reducing the
whole program to a large summation, or caching an answer.
Which is why the array is manipulated each time the event handler is called.
It doesn&amp;#8217;t matter that it&amp;#8217;s ridiculous - it&amp;#8217;s a test - just that it uses
&quot;enough&quot; memory.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;From the main page it can be started like this:&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;listingblock&quot;&gt;
&lt;div class=&quot;content&quot;&gt;
&lt;pre&gt;  var worker = new Worker(&quot;workers_memory_script.js&quot;);
  worker.postMessage(n);&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;But that&amp;#8217;s not enough to make a working test.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;sect1&quot;&gt;
&lt;h2 id=&quot;the-test&quot;&gt;The test&lt;/h2&gt;
&lt;div class=&quot;sectionbody&quot;&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Our test needs to open this page in one tab, and in another tab look at
&lt;code&gt;about:performance&lt;/code&gt; and observe that the memory is being used.
Opening and managing multiple tabs and is standard faire for a browser test,
but what we need is for our test to wait for the tab with the worker to be
/ready/.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Waiting for a tab to be loaded is also very easy, which means that the tab
will have executed &lt;code&gt;worker.postMessage(n)&lt;/code&gt; by the time the test code checks.
But that doesn&amp;#8217;t mean that the worker &lt;strong&gt;has received the message&lt;/strong&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;So we need to make our test wait for the worker to start and complete one
iteration (creating its array).&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;In the test we can add code such as:&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;listingblock&quot;&gt;
&lt;div class=&quot;content&quot;&gt;
&lt;pre&gt;  let tabContent = BrowserTestUtils.addTab(gBrowser, url);

  // Wait for the browser to load the tab.
  await BrowserTestUtils.browserLoaded(tabContent.linkedBrowser);

  // For some of these tests we have to wait for the test to consume some
  // computation or memory.
  await SpecialPowers.spawn(tabContent.linkedBrowser, [], async () =&amp;gt; {
    await content.wrappedJSObject.waitForTestReady();
  });&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;The last three lines here are the interesting ones.  &lt;code&gt;SpecialPowers.spawn&lt;/code&gt;
allows us to execute code in the context of the tab.  In which we wait on a
&lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise&quot;&gt;promise&lt;/a&gt; that the test is ready.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Now we need to add this promise to the page that owns the worker:&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;listingblock&quot;&gt;
&lt;div class=&quot;content&quot;&gt;
&lt;pre&gt;  var result = document.querySelector(&apos;#result&apos;);
  var worker = new Worker(&quot;workers_memory_script.js&quot;);
  var n = 1;

  var waitPromise = new Promise(ready =&amp;gt; {
    worker.onmessage = function(event) {
      result.textContent = event.data;
      ready();

      // We seem to need to keep the worker doing something to keep the
      // memory usage up.
      setTimeout(() =&amp;gt; {
        n++;
        worker.postMessage(n);
      }, 1000);
    };
  });

  worker.postMessage(n);

  window.waitForTestReady = async () =&amp;gt; {
    await waitPromise;
  };&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Starting at the bottom.
For some reason I had to wrap the promise up in a function, I can&amp;#8217;t remember
why!
I&amp;#8217;m tempted to complain about JavaScript and it&amp;#8217;s inconsistent rules here,
but it could also be my limited understanding preventing me from getting it.
What I do know is that this function must be in the &lt;code&gt;window&lt;/code&gt; object so that
the test code above can find it in &lt;code&gt;wrappedJSObject&lt;/code&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;The promise wrapped here (&lt;code&gt;waitPromise&lt;/code&gt; I could have picked a better name)
is resolved when &lt;code&gt;ready()&lt;/code&gt; is called, which happens after we receive the
worker&amp;#8217;s response.
Finally we use &lt;code&gt;setTimeout()&lt;/code&gt; to post another message to keep memory usage
up.
I don&amp;#8217;t know why this was necessary either.  Was the worker completely
terminated without it?&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;sect1&quot;&gt;
&lt;h2 id=&quot;one-more-thing&quot;&gt;One more thing&lt;/h2&gt;
&lt;div class=&quot;sectionbody&quot;&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Our test almost works.
For whatever reason when the test accesses the right part of the
&lt;code&gt;about:performance&lt;/code&gt; page there&amp;#8217;s no value for how much memory is being used.
Waiting for a single update fixes this:&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;listingblock&quot;&gt;
&lt;div class=&quot;content&quot;&gt;
&lt;pre&gt;  if (!memCell.innerText) {
    info(&quot;There&apos;s no text yet, wait for an update&quot;);
    await new Promise(resolve =&amp;gt; {
      let observer = new row.ownerDocument.ownerGlobal.MutationObserver(() =&amp;gt; {
        observer.disconnect();
        resolve();
      });
      observer.observe(memCell, { childList: true });
    });
  }
  let text = memCell.innerText;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;For the complete code for this test checkout
&lt;a href=&quot;https://bugzilla.mozilla.org/show_bug.cgi?id=1760920&quot;&gt;Bug 1760920&lt;/a&gt;
and
&lt;a href=&quot;https://searchfox.org/mozilla-central/source/toolkit/components/aboutperformance/tests/browser&quot;&gt;toolkit/components/aboutperformance/tests/browser&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;sect1&quot;&gt;
&lt;h2 id=&quot;theres-things-i-dont-know&quot;&gt;There&amp;#8217;s things I don&amp;#8217;t know&lt;/h2&gt;
&lt;div class=&quot;sectionbody&quot;&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;There&amp;#8217;s three places here where I&amp;#8217;ve said &quot;it needs this code, I don&amp;#8217;t know
why&quot;.
I hate programming like this, and I feel shameful writing it in a blog post
and calling myself an engineer.
I don&amp;#8217;t want to spin it as a joke on JavaScript, or myself &quot;lol, that&amp;#8217;s
programming!  AMIRITE?!&quot;
There&amp;#8217;s obviously some further subtleties I don&amp;#8217;t know the rules for, and
JavaScript does have some pretty
&lt;a href=&quot;https://www.destroyallsoftware.com/talks/wat&quot;&gt;inconsistent rules&lt;/a&gt;,
throw in a browser, two tabs and a web worker and feeling like you don&amp;#8217;t
know how something works is relatable.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Do I wish I knew?  Sure, I&amp;#8217;m uncomfortable not knowing, but I&amp;#8217;ve already
spent enough time on this.  But this is also why I wrote down what I &lt;strong&gt;do&lt;/strong&gt;
know.
Next time I&amp;#8217;ll be able to find this much and solve my problem quicker.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;</description>
        <pubDate>Fri, 25 Nov 2022 00:00:00 +1100</pubDate>
        <link>https://paul.bone.id.au/blog/2022/11/25/how-to-wait-for-content-in-a-test/</link>
        <guid isPermaLink="true">https://paul.bone.id.au/blog/2022/11/25/how-to-wait-for-content-in-a-test/</guid>
        
        <category>planet-mozilla</category>
        
        <category>Firefox</category>
        
        <category>tests</category>
        
        <category>mochitest</category>
        
        <category>promises</category>
        
        <category>race</category>
        
        <category>javascript</category>
        
        <category>concurrency</category>
        
        
        <category>blog</category>
        
      </item>
    
      <item>
        <title>Running the AWSY benchmark in the Firefox profiler</title>
        <description>&lt;div id=&quot;preamble&quot;&gt;
&lt;div class=&quot;sectionbody&quot;&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;The are we slim yet (AWSY) benchmark measures memory usage.
Recently when I made a
&lt;a href=&quot;https://bugzilla.mozilla.org/show_bug.cgi?id=1728273&quot;&gt;simple change to
firefox&lt;/a&gt; and expected it might save a bit of memory,
it actually
&lt;a href=&quot;https://bugzilla.mozilla.org/show_bug.cgi?id=1729890&quot;&gt;increased memory usage&lt;/a&gt;
on the AWSY benchmark.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;We have lots of tools to hunt down memory usage problems.  But to see an
almost &quot;log&quot; of when garbage collection and cycle collection occurs, the
&lt;a href=&quot;https://profiler.firefox.com&quot;&gt;Firefox profiler&lt;/a&gt; is amazing.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;I wanted to profile the AWSY benchmark to try and understand what was
happening with GC scheduling.  But it didn&amp;#8217;t work out-of-the-box.
This is one of those blog posts that I&amp;#8217;m writing down so next time this
happens, to me or anyone else, although I am selfish.
And I websearch for &quot;AWSY and Firefox Profiler&quot; I want this to be the number
1 result and help me (or someone else) out.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;sect1&quot;&gt;
&lt;h2 id=&quot;the-normal-instructions&quot;&gt;The normal instructions&lt;/h2&gt;
&lt;div class=&quot;sectionbody&quot;&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;First you need a build with profiling enabled.  Put this in your &lt;code&gt;mozconfig&lt;/code&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;listingblock&quot;&gt;
&lt;div class=&quot;content&quot;&gt;
&lt;pre&gt;ac_add_options --enable-debug
ac_add_options --enable-debug-symbols
ac_add_options --enable-optimize
ac_add_options --enable-profiling&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;The instructions to get the profiler to run came from Ted Campbell.  Thanks
Ted.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Ted&amp;#8217;s instructions disabled stack sampling, we didn&amp;#8217;t care about that since
the data we need comes from profile markers.  I can also run a reduced awsy
test because 10 entries is enough to create the problem.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;listingblock&quot;&gt;
&lt;div class=&quot;content&quot;&gt;
&lt;pre&gt;export MOZ_PROFILER_STARTUP=1
export MOZ_PROFILER_SHUTDOWN=awsy-profile.json
export MOZ_PROFILER_STARTUP_FEATURES=&quot;nostacksampling&quot;
./mach awsy-test --tp6 --headless --iterations 1 --entities 10&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;But it crashes due to
&lt;a href=&quot;https://bugzilla.mozilla.org/show_bug.cgi?id=1710408&quot;&gt;Bug 1710408&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;So I can&amp;#8217;t use &lt;code&gt;nostacksampling&lt;/code&gt;, which would have been nice to save some
memory/disk space, never mind.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;So I removed that option, then I get profiles that are too short.  The
profiler records into a circular buffer so if that buffer is too small it&amp;#8217;ll
discard the earlier information.  In this case I want the earlier
information because I think something at the beginning is the problem.
So I need to add this to get a bigger buffer.  The default is 4 million
entries (32MB).&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;listingblock&quot;&gt;
&lt;div class=&quot;content&quot;&gt;
&lt;pre&gt;export MOZ_PROFILER_STARTUP_ENTRIES=$((200*1024*1024))&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;But now the profiles are too big and Firefox shutdown times out (over 70
seconds) so the marionette test driver kills Firefox before it can write out
the profile.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;sect1&quot;&gt;
&lt;h2 id=&quot;the-solution&quot;&gt;The solution&lt;/h2&gt;
&lt;div class=&quot;sectionbody&quot;&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;So we hack
&lt;code&gt;testing/marionette/client/marionette_driver/marionette.py&lt;/code&gt;
to replace shutdown_timeout with &lt;code&gt;300&lt;/code&gt; in some places.
Setting &lt;code&gt;DEFAULT_SHUTDOWN_TIMEOUT&lt;/code&gt; and also &lt;code&gt;self.shutdown_timeout&lt;/code&gt; to 300
will do.
There&amp;#8217;s probably a way to pass a parameter, but I didn&amp;#8217;t find it yet.
So after making that change and running &lt;code&gt;./mach build&lt;/code&gt; the invocation is now:&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;listingblock&quot;&gt;
&lt;div class=&quot;content&quot;&gt;
&lt;pre&gt;export MOZ_PROFILER_STARTUP=1
export MOZ_PROFILER_SHUTDOWN=awsy-profile.json
export MOZ_PROFILER_STARTUP_FEATURES=&quot;&quot;
export MOZ_PROFILER_STARTUP_ENTRIES=$((200*1024*1024))
./mach awsy-test --tp6 --headless --iterations 1 --entities 10&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;And it writes a &lt;code&gt;awsy-profile.json&lt;/code&gt; into the root directory of the project).&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Hurray!&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;sect1&quot;&gt;
&lt;h2 id=&quot;follow-up&quot;&gt;Follow-up&lt;/h2&gt;
&lt;div class=&quot;sectionbody&quot;&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Whimboo says that setting &lt;code&gt;toolkit.asyncshutdown.crash_timeout&lt;/code&gt; might help.
But it may wait until after some stuff has been implemented:&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;quoteblock&quot;&gt;
&lt;blockquote&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;a solution here should also be to extend the toolkit.asyncshutdown.crash_timeout value&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;oh wait. actually we haven&amp;#8217;t fixed that yet, but only use it via
geckodriver`&lt;/p&gt;
&lt;/div&gt;
&lt;/blockquote&gt;
&lt;div class=&quot;attribution&quot;&gt;
&amp;#8212; Whimboo
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;</description>
        <pubDate>Sat, 18 Sep 2021 00:00:00 +1000</pubDate>
        <link>https://paul.bone.id.au/blog/2021/09/18/how-to-profile-awsy/</link>
        <guid isPermaLink="true">https://paul.bone.id.au/blog/2021/09/18/how-to-profile-awsy/</guid>
        
        <category>planet-mozilla</category>
        
        <category>Firefox</category>
        
        <category>Profiler</category>
        
        <category>AWSY</category>
        
        
        <category>blog</category>
        
      </item>
    
      <item>
        <title>Avoiding large immediate values</title>
        <description>&lt;div id=&quot;preamble&quot;&gt;
&lt;div class=&quot;sectionbody&quot;&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;We&amp;#8217;re often told that we shouldn&amp;#8217;t worry about the small details in
optimisation, that either &quot;premature optimisation is the root of all evil&quot;
or &quot;the compiler is smarter than you&quot;.  These things are true, in general.
Which is why if you asked me about 10 years ago if I thought I would be
using knowledge of machine code (not just assembly!) to improve a browser&amp;#8217;s
benchmark score by 2.5% I wouldn&amp;#8217;t have believed you.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;First of, I&amp;#8217;m sorry (not sorry) for the gloating, and for what it&amp;#8217;s worth
the optimisation isn&amp;#8217;t really that clever, and wasn&amp;#8217;t even my idea.
What I&amp;#8217;m finding almost funny is that younger-me would not have believed that
such low level details mattered this much.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;sect1&quot;&gt;
&lt;h2 id=&quot;bump-pointer-allocation&quot;&gt;Bump-pointer allocation&lt;/h2&gt;
&lt;div class=&quot;sectionbody&quot;&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;SpiderMonkey (Firefox&amp;#8217;s JavaScript engine) separates its garbage collector
into two areas, the nursery and the tenured heap.  New objects are typically
allocated first in the nursery, when the nursery is collected the object
will be moved into the tenured heap if it is still alive.  Collecting the
nursery is faster than the whole heap since less data needs to be scanned,
and most objects die when they are young.  This is a fairly standard way to
manage a garbage collector and is called generational garbage collection.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Allocating something in either heap should be fast, but since nursery
allocation is more common it needs to be VERY fast.  When JITing JavaScript
code, allocation code is JITed right into the execution paths in each place
it is needed.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;I was working on a change to this code, I want to count the number of
tenured and nursery allocations.
And above all, I have to not add too much of a performance impact.
That work is
&lt;a href=&quot;https://bugzilla.mozilla.org/show_bug.cgi?id=1473213&quot;&gt;Bug 1473213&lt;/a&gt; and isn&amp;#8217;t
actually the topic of this post, it&amp;#8217;s just what drew my attention.
(TL;DR: this work is
&lt;a href=&quot;https://bugzilla.mozilla.org/show_bug.cgi?id=1479360&quot;&gt;Bug 1479360&lt;/a&gt;.)&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;The nursery fast-path looked like this, I&amp;#8217;ve simplified it for easier
reading, mostly by removing unnecessary things.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;listingblock&quot;&gt;
&lt;div class=&quot;content&quot;&gt;
&lt;pre&gt;Register result(...), temp(...);
CompileZone* zone = GetJitContext()-&amp;gt;realm-&amp;gt;zone();
size_t totalSize = ...
void *ptrNurseryPosition = zone-&amp;gt;addressOfNurseryPosition();
const void *ptrNurseryCurrentEnd = zone-&amp;gt;addressOfNurseryCurrentEnd();

loadPtr(AbsoluteAddress(ptrNurseryPosition), result);
computeEffectiveAddress(Address(result, totalSize), temp);
branchPtr(Assembler::Below, AbsoluteAddress(ptrNurseryCurrentEnd), temp,
    fail);
storePtr(temp, AbsoluteAddress(ptrNurseryPosition));&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;That probably didn&amp;#8217;t read right for most readers.
What we&amp;#8217;re looking at here is the code generator of the JIT compiler, this
is not the allocation code itself, but the code that creates the machine
code that does the allocation.  I&amp;#8217;ve broken it into two sections,
the first five lines prepare some values and have absolutely zero runtime
cost.
The last five lines generate the code that does the bump pointer allocation.
Function calls like loadPtr generate one or more machine code instructions:&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;dlist&quot;&gt;
&lt;dl&gt;
&lt;dt class=&quot;hdlist1&quot;&gt;loadPtr(AbsoluteAddress(ptrNurseryPosition), result)&lt;/dt&gt;
&lt;dd&gt;
&lt;p&gt;Read a pointer-sized value from memory at ptrNurseryPosition and store
it in the register result.  ptrNurseryPosition points to a pointer
that points to the next free cell in the heap.  So this places the pointer
of the next free cell into the result register.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt class=&quot;hdlist1&quot;&gt;computeEffectiveAddress(Address(result, totalSize), temp)&lt;/dt&gt;
&lt;dd&gt;
&lt;p&gt;Use an lea or similar instruction to add totalSize (a displacement) to
the contents of the result register, store the result of this addition
into temp.  After executing this temp will contain the pointer to the
&lt;em&gt;next&lt;/em&gt; free cell once we perform the current allocation.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt class=&quot;hdlist1&quot;&gt;branchPtr(..., AbsoluteAddress(ptrNurseryCurrentEnd), temp, fail)&lt;/dt&gt;
&lt;dd&gt;
&lt;p&gt;Compare the temp register&amp;#8217;s contents against the contents of the memory
at ptrNurseryCurrentEnd and if temp is higher, branch to the fail
label.  This compares the next value for the allocation pointer to the
end of the heap, if the allocation would go beyond the end of the nursery
then fail.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt class=&quot;hdlist1&quot;&gt;storePtr(temp, AbsoluteAddress(ptrNurseryPosition))&lt;/dt&gt;
&lt;dd&gt;
&lt;p&gt;Store the new value for the next free cell (temp) into the memory at
ptrNurseryPosition.&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Unfortunately this isn&amp;#8217;t as efficient as it could be.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;sect1&quot;&gt;
&lt;h2 id=&quot;immediates-and-displacements&quot;&gt;Immediates and displacements&lt;/h2&gt;
&lt;div class=&quot;sectionbody&quot;&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;I&amp;#8217;ve recently written about
&lt;a href=&quot;../../../09/05/x86-addressing&quot;&gt;addressing in x86&lt;/a&gt;
where I wrote that instructions refer to operands and these operands may be
registers, memory locations or immediate values.
To recap, there are two main situations where some value can follow the
instruction, it&amp;#8217;s either as an immediate value or as a displacement for a
memory operand.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;dlist&quot;&gt;
&lt;dl&gt;
&lt;dt class=&quot;hdlist1&quot;&gt;Displacement&lt;/dt&gt;
&lt;dd&gt;
&lt;p&gt;A displacement my be either 8 or 32 bits (on x86 running in 32 or 64 bit
mode).&lt;/p&gt;
&lt;/dd&gt;
&lt;dt class=&quot;hdlist1&quot;&gt;Immediate&lt;/dt&gt;
&lt;dd&gt;
&lt;p&gt;An immediate value depends on the size of the operation, and may be 8, 16,
32 or 64 bits.&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;The point here, is that displacements cannot store a 64 bit value, so:&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;listingblock&quot;&gt;
&lt;div class=&quot;content&quot;&gt;
&lt;pre&gt;branchPtr(Assembler::Below, AbsoluteAddress(ptrNurseryCurrentEnd), temp,
    fail);&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Cannot directly use 64 bit displacement (ptrNurseryPosition) for its
memory operand,
and requires an extra instruction to first load this value into a scratch
register from an immediate (which can be 64 bit) before doing the comparison.
This operation will now need three instructions rather than two (compare and
jump are already separate instructions).&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Intel provides a special exception to these rules about displacements for
move instructions.  There are four special opcodes for move that allow it to
work with a 64-bit &lt;em&gt;moffset&lt;/em&gt;.  So:&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;listingblock&quot;&gt;
&lt;div class=&quot;content&quot;&gt;
&lt;pre&gt;loadPtr(AbsoluteAddress(ptrNurseryPosition), result);&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Can be &lt;em&gt;almost&lt;/em&gt; be represented.  But these opcodes hard code result to the
ax or eax registers, which is not suitable for a 64-bit value as this
is.  Therefore using 64-bit addresses also makes these loadPtr and
storePtr operations use two instructions rather than one.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Here&amp;#8217;s the
&lt;a href=&quot;../../../08/07/dissassembling-jit-code-in-gdb/&quot;&gt;disassembled&lt;/a&gt;
code that this generates.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;listingblock&quot;&gt;
&lt;div class=&quot;content&quot;&gt;
&lt;pre&gt;movabs $0x7ffff5d1b618,%r11
mov    (%r11),%rbx
lea    0x60(%rbx),%rbp
movabs $0x7ffff5d1b630,%r11
cmp    %rbp,(%r11)
jb     0x1f2f3ed1a351
movabs $0x7ffff5d1b618,%r11
mov    %rbp,(%r11)&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;This sequence, rather than being five instructions long is now eight
instructions long (and 49 bytes) and makes more use of a scratch register
(which may impact instruction-level parallelism).&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;sect1&quot;&gt;
&lt;h2 id=&quot;the-instruction-cache&quot;&gt;The instruction cache&lt;/h2&gt;
&lt;div class=&quot;sectionbody&quot;&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Instructions aren&amp;#8217;t the only cost.  This code sequence contains four 64-bit
addresses, that&amp;#8217;s a total of 32 bytes in the instruction stream (including
the target for the jump on failed allocations).
That takes up room in the CPU&amp;#8217;s caches and other resources in the processor
front-end.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;The front-end of a processor&amp;#8217;s pipeline must fetch and decode instructions
before they&amp;#8217;re queued, scheduled, executed and retired.
Processor front-ends have changed a lot, and there are multiple levels of
cacheing and buffering.
Let&amp;#8217;s use the Intel Core Microarchitecture as an
example, it&amp;#8217;s new enough to be in common use and things got more complex in
the next microarchitecture due to having two different font-end pathways.
The resource for this information is
Intel&amp;#8217;s
&lt;a href=&quot;https://software.intel.com/en-us/articles/intel-sdm#optimization&quot;&gt;optimisation
reference manual&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Instructions are fetched 16-bytes at a time and immediately following the
fetch a pre-decode pass occurs, a fast calculation of instruction lengths,
Once the processor knows the lengths (and boundaries) of the instructions
within the 16-bytes, they&amp;#8217;re written into a buffer (the instruction queue)
six at a time, if there are more than six instructions in the 16-byte block,
then more cycles are used to pre-decode the remaining instructions.
If fewer than six instructions were in the 16 bytes, or a read of less than
16 bytes occurred due to alignment or branching, then the full bandwidth of
the pre-decode is not being utilised.  If this happens often the instruction
queue may starve.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;The instruction queue is 18 instructions deep (but I think it&amp;#8217;s shared by
hyper-threading) instructions are decoded from this queue four or five at a
time by the four decoders.
One of the decoders is special and can handle some pairs of instructions
turning them into a single operation.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Our instruction sequence above contains eight instructions, in 49 bytes.
Assuming alignment is in our favour this will take four and pre-decode
steps, averaging 2 instructions per pre-decode cycle; less than the
CPU is capable of.
(I don&amp;#8217;t know how this behaves when an instruction crosses then 16-byte
boundary, but back-of-the-envelope reasoning tells me it&amp;#8217;s not a problem.)&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;This low instruction density might not be a problem in many situations, such
as when the instruction cache already contains plenty of instructions and
this &lt;em&gt;bubble&lt;/em&gt; does not affect overall throughput.
However in a loop or when other things already affect the processor&amp;#8217;s
pipeline, it could definitely be an issue.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;sect1&quot;&gt;
&lt;h2 id=&quot;the-change&quot;&gt;The change&lt;/h2&gt;
&lt;div class=&quot;sectionbody&quot;&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;My colleague &lt;a href=&quot;https://blog.mozilla.org/sfink/&quot;&gt;sfink&lt;/a&gt; had
&lt;a href=&quot;https://hg.mozilla.org/mozilla-central/file/FIREFOX_NIGHTLY_62_END/js/src/jit/MacroAssembler.cpp#l1049&quot;&gt;left
a comment&lt;/a&gt;
in the nursery string allocation path where he attempted to experiment with
this in the past.
His solution was eventually removed because it was a little bit fiddly, but
it was the inspiration for my
&lt;a href=&quot;https://hg.mozilla.org/mozilla-central/rev/a914cedebde5&quot;&gt;eventual change&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;The code (tidied up) now looks like:&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;listingblock&quot;&gt;
&lt;div class=&quot;content&quot;&gt;
&lt;pre&gt;CheckedInt&amp;lt;int32_t&amp;gt; endOffset = (CheckedInt&amp;lt;uintptr_t&amp;gt;(uintptr_t(curEndAddr)) -
    CheckedInt&amp;lt;uintptr_t&amp;gt;(uintptr_t(posAddr))).toChecked&amp;lt;int32_t&amp;gt;();
MOZ_ASSERT(endOffset.isValid(),
    &quot;Position and end pointers must be nearby&quot;);

movePtr(ImmPtr(posAddr), temp);
loadPtr(Address(temp, 0), result);
addPtr(Imm32(totalSize), result);
branchPtr(Assembler::Below, Address(temp, endOffset.value()), result, fail);
storePtr(result, Address(temp, 0));
subPtr(Imm32(size), result);&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;This loads a 64-bit address once and uses a relative address to describe the
end of the nursery (the Address argument to the branchPtr call),
then can re-use the original address when updating the
current pointer (storePtr).
We have to add the object size to result and subtract it later because we
can&amp;#8217;t easily get guaranteed access to another register with the way the code
generator is written.
So there are six operations in this sequence, let&amp;#8217;s see the machine code:&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;listingblock&quot;&gt;
&lt;div class=&quot;content&quot;&gt;
&lt;pre&gt;movabs $0x7ffff5d1b618,%rbp
mov    0x0(%rbp),%rbx
add    $0x60,%rbx
cmp    %rbx,0x18(%rbp)
jb     0x164f300ea154
mov    %rbx,0x0(%rbp)
sub    $0x60,%rbx&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Seven instructions long rather than eight, and 36 bytes rather than 49.  This
can be retrieved in three 16-byte transfers, rather than four.
The instructions per fetch is now a 2 1/3 rather than 2.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;sect1&quot;&gt;
&lt;h2 id=&quot;results&quot;&gt;Results&lt;/h2&gt;
&lt;div class=&quot;sectionbody&quot;&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;It doesn&amp;#8217;t look like a huge improvement, seven instructions compared with
eight?!
But now it uses one less 16-byte fetch which means one less cycle to fill
the pipeline for these instructions,
in the right loop that could make a huge difference.
It did make Firefox perform about 2.5% faster on the Speedometer
benchmark when tested on my laptop (Intel Core i7-6600U, Skylake).
Sadly we didn&amp;#8217;t see any noticeable difference in our performance testing
infrastructure (arewefastyet or perfherder).
This could be because our CI systems have different CPUs that behave
differently with regard to instruction lengths/density.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;My examples above were for the simpler Core microarchitecture, whereas my
testing was on a Skylake CPU and will be quite different.
Starting with Sandy Bridge there are two paths for code to take through the
CPU front end, and which one is used depends on multiple conditions.
To simplify it, on tight enough loops the CPU is able to cache decoded
instructions and execute them out of a &amp;mu;op cache.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;sect1&quot;&gt;
&lt;h2 id=&quot;macro-fusion&quot;&gt;Macro-fusion&lt;/h2&gt;
&lt;div class=&quot;sectionbody&quot;&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Another difference is that with an absolute address used in the cmp
instruction it could behave different with regard to macro-fusion
(being fused with the jmp to execute as a single operation).
I&amp;#8217;m not sure if large displacements affect macro-fusion.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;sect1&quot;&gt;
&lt;h2 id=&quot;update-2018-09-18&quot;&gt;Update 2018-09-18&lt;/h2&gt;
&lt;div class=&quot;sectionbody&quot;&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;I received some feedback from
&lt;a href=&quot;http://www.ocallahan.org/&quot;&gt;Robert O&amp;#8217;Callahan&lt;/a&gt;, he wrote with three
suggestions.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;ulist&quot;&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Allocate all JIT code and globals within a single 2GB region and use
RIP-relative addressing (x86-64), so that addresses will not be larger
than 32bits.  This is a good idea and I
&lt;a href=&quot;https://bugzilla.mozilla.org/show_bug.cgi?id=1491202&quot;&gt;considered this&lt;/a&gt; for
the jump instruction in that sequence which still uses a 64 bit address
(because the jump is created before the label, and so the address is
written after, it must leave 64bits of space for now).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Using known bit patterns in the nursery address range we could test for
overflow by checking the value of the bits, avoiding an extra memory
read.  This is a great idea but will require some other work first.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The final subtraction might be skippable if the caller can handle an
address to the end of the structure and use negative offsets, eg by
filling in slots in the object using negative offsets.  I&amp;#8217;m skeptical if
this will provide much benefit compared to the effort required to avoid
the subtraction, or probably at best delay it.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;</description>
        <pubDate>Fri, 14 Sep 2018 00:00:00 +1000</pubDate>
        <link>https://paul.bone.id.au/blog/2018/09/14/large-immediate-values/</link>
        <guid isPermaLink="true">https://paul.bone.id.au/blog/2018/09/14/large-immediate-values/</guid>
        
        <category>assembly</category>
        
        <category>firefox</category>
        
        <category>performance</category>
        
        <category>x86</category>
        
        <category>immediate</category>
        
        <category>64bit</category>
        
        <category>planet-mozilla</category>
        
        
        <category>blog</category>
        
      </item>
    
      <item>
        <title>Good First Bugs</title>
        <description>&lt;div id=&quot;preamble&quot;&gt;
&lt;div class=&quot;sectionbody&quot;&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;One great way (of many) to get started in software development, particularly
in open source, is to find good first bugs.
This is a class of software bugs (which should be called issues, since
they&amp;#8217;re not always bugs) that are easy to fix with little experience.
It can also be a great way, once you have software development skills, to
learn a new domain or set of tools.
Many projects, even well funded ones, are very happy to receive community
contributions, if nothing else it&amp;#8217;s one other way they can provide
opportunities to the community.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;At Mozilla we use bugzilla to track our bugs, and use the
&lt;a href=&quot;https://bugzilla.mozilla.org/buglist.cgi?keywords=good-first-bug&amp;amp;keywords_type=allwords&amp;amp;list_id=14293547&amp;amp;resolution=---&amp;amp;query_format=advanced&quot;&gt;good first bug keyword&lt;/a&gt;
to identify such bugs.
You&amp;#8217;re welcome to contribute patches for these bugs, and potentially
have your work included in Firefox.
You can also search by component, so the list of open good first bugs for
the garbage collector is
&lt;a href=&quot;https://bugzilla.mozilla.org/buglist.cgi?keywords=good-first-bug&amp;amp;keywords_type=allwords&amp;amp;list_id=14293527&amp;amp;resolution=---&amp;amp;query_format=advanced&amp;amp;bug_status=UNCONFIRMED&amp;amp;bug_status=NEW&amp;amp;component=JavaScript%3A%20GC&amp;amp;product=Core&quot;&gt;here&lt;/a&gt;
and I&amp;#8217;d be happy to help with any of these.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;sect1&quot;&gt;
&lt;h2 id=&quot;good-second-bugs&quot;&gt;Good second bugs&lt;/h2&gt;
&lt;div class=&quot;sectionbody&quot;&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;As far as I know I created the concept of good second bugs.  They&amp;#8217;re not
really second in the sense that you solve one good first bug then move on to
a second bug.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;To me this means that the contributor already have a fair amount of
development experience but aren&amp;#8217;t familiar with the domain.
So let&amp;#8217;s say you know C but you don&amp;#8217;t know how to write a garbage collector
or the theory behind it.
A good second bug would be a bug filed against something like a garbage
collector but not require any GC knowledge, but probably does require C
knowledge and roughly 5 years of development experience.  It might take such
a person a couple of hours to solve, rather than 5 minutes.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;The intention is that it can help someone get into contributing to a
particular project or learn some new type of programming.
Particularly when those topics are generally regarded as deep or complex
(but all topics are deep/complex, I don&amp;#8217;t think GC is special, but that&amp;#8217;s
another topic).&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;I have created both a
&lt;a href=&quot;https://github.com/PlasmaLang/plasma/issues?q=is%3Aopen+is%3Aissue+no%3Aassignee+label%3Agood-first-bug&quot;&gt;good first bug&lt;/a&gt;
and a
&lt;a href=&quot;https://github.com/PlasmaLang/plasma/issues?utf8=%E2%9C%93&amp;amp;q=is%3Aopen+is%3Aissue+no%3Aassignee+label%3Agood-second-bug+&quot;&gt;good
second bug&lt;/a&gt;
tag for &lt;a href=&quot;https://plasmalang.org&quot;&gt;Plasma&lt;/a&gt; (my side project),
based on this idea.
Until there are more contributors I&amp;#8217;m not sure if this distinction is
useful, it has not been tested.
I&amp;#8217;ve also created labels for skills that each issue may require,
knowing that most people probably don&amp;#8217;t know
&lt;a href=&quot;https://mercurylang.org&quot;&gt;Mercury&lt;/a&gt; which Plasma is written in.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;</description>
        <pubDate>Thu, 23 Aug 2018 00:00:00 +1000</pubDate>
        <link>https://paul.bone.id.au/blog/2018/08/23/good-first-bugs/</link>
        <guid isPermaLink="true">https://paul.bone.id.au/blog/2018/08/23/good-first-bugs/</guid>
        
        <category>good-first-bug</category>
        
        <category>open-source</category>
        
        <category>planet-mozilla</category>
        
        
        <category>blog</category>
        
      </item>
    
      <item>
        <title>Dissassembling Jit Code In Gdb</title>
        <description>&lt;div id=&quot;preamble&quot;&gt;
&lt;div class=&quot;sectionbody&quot;&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;I&amp;#8217;ve been making changes to the JIT in SpiderMonkey, and sometimes get a
SEGFAULT, okay so open it in gdb, then this happens:&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;listingblock&quot;&gt;
&lt;div class=&quot;content&quot;&gt;
&lt;pre&gt;Thread 1 &quot;js&quot; received signal SIGSEGV, Segmentation fault.
0x0000129af35af5e9 in ?? ()&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Not helpful, maybe there&amp;#8217;s something in the stack?&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;listingblock&quot;&gt;
&lt;div class=&quot;content&quot;&gt;
&lt;pre&gt;(gdb) backtrace
#0  0x0000129af35af5e9 in  ()
#1  0x0000129af35b107d in  ()
#2  0xfff9800000000000 in  ()
#3  0xfff8800000000002 in  ()
#4  0xfff8800000000002 in  ()&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Still not helpful, I&amp;#8217;m reasonably confident the crash is in JITed code which
has no debugging symbols or other info.  So I don&amp;#8217;t know what it&amp;#8217;s actually
executing when it crashed.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;In case it&amp;#8217;s not apparent, this is a short blog post where I can make notes
of one way to get some more information when debugging JITed code.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;First of all, those really large addresses (frames 2, 3 and 4) look
suspicious.  I&amp;#8217;m not sure what causes that.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Now, I know the change I made to the JIT, so it&amp;#8217;s likely that that&amp;#8217;s the
code that&amp;#8217;s crashing, I just don&amp;#8217;t know why.  It would help to see what code
is being executed:&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;listingblock&quot;&gt;
&lt;div class=&quot;content&quot;&gt;
&lt;pre&gt;(gdb) disassemble
No function contains program counter for selected frame.&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;What it&amp;#8217;s trying to say, is that the current program counter at this level
in the backtrace does not correspond with the C program (SpiderMonkey).
Yes, unless we did a call or goto of something invalid, then we&amp;#8217;re probably
executing JITed code.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Let&amp;#8217;s get more info:&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;listingblock&quot;&gt;
&lt;div class=&quot;content&quot;&gt;
&lt;pre&gt;(gdb) info registers
rax            0x7ffff54b30c0   140737308733632
rbx            0xe4e4e4e400000891       -1953184670468274031
rcx            0xc      12
rdx            0x7ffff54c1058   140737308790872
rsi            0xa      10
rdi            0x7ffff54c1040   140737308790848
rbp            0x7fffffff9438   0x7fffffff9438
rsp            0x7fffffff9418   0x7fffffff9418
r8             0x7fffffff9088   140737488326792
r9             0x8      8
r10            0x7fffffff9068   140737488326760
r11            0x7ffff5d2f128   140737317630248
r12            0x0      0
r13            0x0      0
r14            0x7ffff54a0040   140737308655680
r15            0x0      0
rip            0x129af35af5e9   0x129af35af5e9
eflags         0x10202  [ IF RF ]
cs             0x33     51
ss             0x2b     43
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x0      0&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;These are the values in the CPU registers.  The debugger the rip (program
counter) and rsp (stack pointer) and rbp (frame pointer) registers to
know what it&amp;#8217;s executing and to read the stack, including the calls that
lead to this one.  We can use this too, we&amp;#8217;re going to use rip to figure
out what&amp;#8217;s being executed, it&amp;#8217;s current value is 0x129af35af5e9.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;listingblock&quot;&gt;
&lt;div class=&quot;content&quot;&gt;
&lt;pre&gt;(gdb) dump memory code.raw 0x129af35af5e9 0x129af35af600&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Then in a shell:&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;listingblock&quot;&gt;
&lt;div class=&quot;content&quot;&gt;
&lt;pre&gt;$ hexdump -C code.raw
00000000  83 03 01 c7 02 4b 00 00  00 e9 82 00 00 00 49 bb
|.....K........I.|
00000010  a8 ab d1 f5 ff 7f 00                              |.......|&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;I have asked gdb, to write the contents of memory at the instruction pointer
to a file named code.raw.  Note that on x86-64 you need to write at least 15
bytes, as some instructions can be that long; I have 23 bytes.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;I&amp;#8217;d normally disassemble code using the objdump program:&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;listingblock&quot;&gt;
&lt;div class=&quot;content&quot;&gt;
&lt;pre&gt;$ objdump -d code.raw
objdump: code.raw: File format not recognised&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;In this case it needs extra clues about the raw data in this file.  We tell it the file format, the machine &quot;i386&quot; and give the disassembler more information about the machine &quot;x86-64&quot;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;listingblock&quot;&gt;
&lt;div class=&quot;content&quot;&gt;
&lt;pre&gt;$ objdump -b binary -m i386 -M x86-64 -D code.raw

code.raw:     file format binary


Disassembly of section .data:

00000000 &amp;lt;.data&amp;gt;:
   0:   83 03 01                addl   $0x1,(%rbx)
   3:   c7 02 4b 00 00 00       movl   $0x4b,(%rdx)
   9:   e9 82 00 00 00          jmpq   0x90
   e:   49                      rex.WB
   f:   bb a8 ab d1 f5          mov    $0xf5d1aba8,%ebx
  14:   ff                      (bad)
  15:   7f 00                   jg     0x17&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Yay.  I can see the instruction it crashed on.  Adding the number 1 to the
32-bit value stored at the address pointed to by rbx.  I&amp;#8217;d like some more
context, so I have to get the instructions that lead to this.  Note that after
the jmpq instruction nothing makes sense, that&amp;#8217;s okay since that jump is
always taken.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;listingblock&quot;&gt;
&lt;div class=&quot;content&quot;&gt;
&lt;pre&gt;(gdb) dump memory code.raw 0x2ce07c3895e6 0x2ce07c3895f7
...
$ objdump -b binary -m i386 -M x86-64 -D code.raw

code.raw:     file format binary


Disassembly of section .data:

00000000 &amp;lt;.data&amp;gt;:
   0:   49 8b 1b                mov    (%r11),%rbx
   3:   83 03 01                addl   $0x1,(%rbx)
   6:   c7 02 4b 00 00 00       movl   $0x4b,(%rdx)
   c:   e9 82 00 00 00          jmpq   0x93&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;When I go back three bytes I get lucky and find another valid instruction that
also makes sense.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;listingblock&quot;&gt;
&lt;div class=&quot;content&quot;&gt;
&lt;pre&gt;(gdb) dump memory code.raw 0x2ce07c3895e5 0x2ce07c3895f7
...
$ objdump -b binary -m i386 -M x86-64 -D code.raw

code.raw:     file format binary


Disassembly of section .data:

00000000 &amp;lt;.data&amp;gt;:
   0:   00 49 8b                add    %cl,-0x75(%rcx)
   3:   1b 83 03 01 c7 02       sbb    0x2c70103(%rbx),%eax
   9:   4b 00 00                rex.WXB add %al,(%r8)
   c:   00 e9                   add    %ch,%cl
   e:   82                      (bad)
   f:   00 00                   add    %al,(%rax)
        ...&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Gibberish.  Unfortunately I just have to guess which byte an instruction might
begin on.  Or go back byte-by-byte finding instructions that make sense.  There
was quiet a bit of experimentation, and a lot more gibberish until I found:&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;listingblock&quot;&gt;
&lt;div class=&quot;content&quot;&gt;
&lt;pre&gt;(gdb) dump memory code.raw 0x2ce07c3895dd 0x2ce07c3895f7
...
$ objdump -b binary -m i386 -M x86-64 -D code.raw

code.raw:     file format binary


Disassembly of section .data:

00000000 &amp;lt;.data&amp;gt;:
   0:   bb 28 f1 d2 f5          mov    $0xf5d2f128,%ebx
   5:   ff                      (bad)
   6:   7f 00                   jg     0x8
   8:   00 49 8b                add    %cl,-0x75(%rcx)
   b:   1b 83 03 01 c7 02       sbb    0x2c70103(%rbx),%eax
  11:   4b 00 00                rex.WXB add %al,(%r8)
  14:   00 e9                   add    %ch,%cl
  16:   82                      (bad)
  17:   00 00                   add    %al,(%rax)
        ...&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;This is almost correct (except for all the gibberish).  But at least it starts
on an instruction that kind-of makes sense with a valid-looking memory address.
But wait, that instruction uses ebx a 32-bit register.  Which is not what I&amp;#8217;m
expecting since the code I&amp;#8217;m JITing works with 64-bit memory addresses.  And all
that gibberish could be part of a memory address, it has bytes like 0xff and
0x7f in it!&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;I go back one more byte:&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;listingblock&quot;&gt;
&lt;div class=&quot;content&quot;&gt;
&lt;pre&gt;(gdb) dump memory code.raw 0x2ce07c3895dc 0x2ce07c3895f7
...
$ objdump -b binary -m i386 -M x86-64 -D code.raw

code.raw:     file format binary


Disassembly of section .data:

00000000 &amp;lt;.data&amp;gt;:
   0:   49 bb 28 f1 d2 f5 ff    movabs $0x7ffff5d2f128,%r11
   7:   7f 00 00
   a:   49 8b 1b                mov    (%r11),%rbx
   d:   83 03 01                addl   $0x1,(%rbx)
  10:   c7 02 4b 00 00 00       movl   $0x4b,(%rdx)
  16:   e9 82 00 00 00          jmpq   0x9d&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Got it.  That&amp;#8217;s a long instruction (which I&amp;#8217;ll talk more about in my next
article)  Now that we have the extra byte at the beginning.  x86 has &lt;em&gt;prefix
bytes&lt;/em&gt; for some instructions which can override some things about the
instruction.  In this case 0x49 is saying this instruction operates on
64-bit data (well 0x48 says that and +1 is part of the register address).&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;And there&amp;#8217;s the bug (3rd line).  I&amp;#8217;m dereferencing this address, the one
that I load into r11 once, and then again during the addl.   I should only
de-reference it once.  The cause was that I misunderstood SpiderMonkey&amp;#8217;s
macro assembler&amp;#8217;s mnemonics.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;sect1&quot;&gt;
&lt;h2 id=&quot;update-2018-08-07&quot;&gt;Update 2018-08-07&lt;/h2&gt;
&lt;div class=&quot;sectionbody&quot;&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;One response to this pointed out that I could have just used:&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;listingblock&quot;&gt;
&lt;div class=&quot;content&quot;&gt;
&lt;pre&gt;(gdb) disassemble 0x12345, +0x100&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;To disassemble a range of memory,
and wouldn&amp;#8217;t have had the
&quot;No function contains program counter for selected frame.&quot; error.
They even suggested I could use something like:&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;listingblock&quot;&gt;
&lt;div class=&quot;content&quot;&gt;
&lt;pre&gt;(gdb) disassemble $rip-50, +0x100&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;I&amp;#8217;ll definitely try these next time, they might not be the exact syntax.
I haven&amp;#8217;t tested them..&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;sect1&quot;&gt;
&lt;h2 id=&quot;update-2018-08-18&quot;&gt;Update 2018-08-18&lt;/h2&gt;
&lt;div class=&quot;sectionbody&quot;&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Another tip is to use: x/20i $pc&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;That&amp;#8217;s the whole command. x means that GDB should use the $pc as a
memory location and not as a literal; /20i means &quot;treat that memory
location as containing instructions and show 20 of them&quot;&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;You can also use this with display, like in display x/4i $pc so that every
time you stepi, it will auto-print the next 4 instructions.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;</description>
        <pubDate>Tue, 07 Aug 2018 00:00:00 +1000</pubDate>
        <link>https://paul.bone.id.au/blog/2018/08/07/dissassembling-jit-code-in-gdb/</link>
        <guid isPermaLink="true">https://paul.bone.id.au/blog/2018/08/07/dissassembling-jit-code-in-gdb/</guid>
        
        <category>disassemble</category>
        
        <category>jit</category>
        
        <category>gdb</category>
        
        <category>objdump</category>
        
        <category>planet-mozilla</category>
        
        
        <category>blog</category>
        
      </item>
    
      <item>
        <title>Static Assert Type In Cplusplus</title>
        <description>&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Static type checking helps us be more confident that our software does what we
think it does.
But it can&amp;#8217;t see everything, and this post was originally going to share a
neat C++ feature that might have helped me be a little more confident about
the code I&amp;#8217;m writing.
However just after I started writing this I found that it&amp;#8217;s not necessary
and there is (doh, I should have known) a nice C way to get the same check.
However I still want to write it because it might be handy to remember this
in the future.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;I&amp;#8217;m trying to add a counter to SpiderMonkey to count how many nursery, and
tenured allocations there are.
I&amp;#8217;ve implemented this for the non-JIT paths and now it&amp;#8217;s time to implement
the same for the JIT paths.
Here&amp;#8217;s the code for the JIT during a nursery allocation:&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;listingblock&quot;&gt;
&lt;div class=&quot;content&quot;&gt;
&lt;pre&gt;add32(Imm32(1), AbsoluteAddress(zone-&amp;gt;addressOfNurseryAllocCount()));&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;This code isn&amp;#8217;t what runs at runtime (kinda) it&amp;#8217;s part of the JIT and is
executed when we want to generate code that performs a nursery allocation.
In other words, it doesn&amp;#8217;t perform the allocation itself, the machine code it
generates will.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;This causes the JIT to write instruction(s) (usually 1) that perform a
32-bit add.  They add the immediate value 1, to the value contained at the
address provided by the call zone-&amp;gt;addressOfNurseryAllocCount().
This returns a pointer to a uint32_t value.
However the AbsoluteAddress constructor will cast this to a void pointer,
before writing the 32bit add instruction(s) using it.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;This means that if in 6 months time I decide that we need 64 bit counters,
or want to save so much memory that 16 bit counters would be better.
That if the type of CompileZone::addressOfNurseryAllocCount() changed from
uint32_t* to uint64_t* we&amp;#8217;d have a problem (yes we would, popular
platforms like x86 are little endian and will add the wrong bytes together).&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;So I wanted some kind of check here that if someone did make this change,
they&amp;#8217;d get a compiler error and change the add32 to an add64 for
example.  So I used:&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;listingblock&quot;&gt;
&lt;div class=&quot;content&quot;&gt;
&lt;pre&gt;static_assert(mozilla::IsSame&amp;lt;uint32_t*,
    decltype(zone-&amp;gt;addressOfNurseryAllocCount())&amp;gt;::value,
    &quot;JIT expects this to be a 32bit counter&quot;);
add32(Imm32(1), AbsoluteAddress(zone-&amp;gt;addressOfNurseryAllocCount()));&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Note that mozilla::IsSame is like std::is_same, it uses the template
system to substitute in a different value for its value member depending on
if the substituted types are the same.
If they are, then the value is non-zero and the static_assert is accepted.
But if the type of this function were to change then the assertion would
fail, exactly what we want!&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;But there&amp;#8217;s a simpler way.
Just create a local variable of the desired type and assign to it first.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;listingblock&quot;&gt;
&lt;div class=&quot;content&quot;&gt;
&lt;pre&gt;uint32_t *allocCount = zone-&amp;gt;addressOfNurseryAllocCount();
add32(Imm32(1), AbsoluteAddress(allocCount));&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;The compilation will fail if the coercion implied by the assignment isn&amp;#8217;t
possible or safe.
However, this won&amp;#8217;t prevent all coercions (and hence my confusion),
a uint32_t may be coerced to a uint64_t, but not a uint32_t* to a
uint64_t*.
So the simple solution is all we need here.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;I&amp;#8217;ve been through much of the existing JIT code today and made this type of
improvement in many places
&lt;a href=&quot;https://bugzilla.mozilla.org/show_bug.cgi?id=1476500&quot;&gt;Bug 1476500&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;I wonder if dependent types can ensure that a code generator generates
(more) type correct code?&lt;/p&gt;
&lt;/div&gt;</description>
        <pubDate>Wed, 18 Jul 2018 00:00:00 +1000</pubDate>
        <link>https://paul.bone.id.au/blog/2018/07/18/static-assert-type-in-cplusplus/</link>
        <guid isPermaLink="true">https://paul.bone.id.au/blog/2018/07/18/static-assert-type-in-cplusplus/</guid>
        
        <category>Paul Bone</category>
        
        <category>Static Typeing</category>
        
        <category>static_assert</category>
        
        <category>JIT</category>
        
        <category>GC</category>
        
        <category>SpiderMonkey</category>
        
        <category>C++</category>
        
        <category>planet-mozilla</category>
        
        
        <category>blog</category>
        
      </item>
    
      <item>
        <title>icecc and ccache - Compiling lots of C++ quickly</title>
        <description>&lt;div id=&quot;preamble&quot;&gt;
&lt;div class=&quot;sectionbody&quot;&gt;
&lt;div class=&quot;openblock floatright&quot;&gt;
&lt;div class=&quot;content&quot;&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;div class=&quot;title&quot;&gt;XKCD #303&lt;/div&gt;
&lt;p&gt;&lt;a href=&quot;https://xkcd.com/303/&quot;&gt;&lt;span class=&quot;image&quot;&gt;&lt;img src=&quot;https://imgs.xkcd.com/comics/compiling.png&quot; alt=&quot;XKCD #303&quot;&gt;&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Copyright Randall Munroe.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Firefox is a big project and takes quite some time to compile.
If you&amp;#8217;re working on such a large project, you make a change, recompile,
accidentally touch a header, recompile then lose a lot of time waiting and
resort to checking social media or bouts of wheelie-chair jousting.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Within my home office,
I&amp;#8217;ve set up icecc and ccache on Linux Mint (similar to Ubuntu) on amd64
using GCC.
I haven&amp;#8217;t yet tried clang but probably will soon, instructions should be
similar for other OSs and compilers, but YMMV.
I&amp;#8217;m going to give instructions for setting up both tools at the same time,
they can be used independently but if want to compile large C/C++ projects
often, you probably want both of them.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;icecc (pronounced ice cream) is a tool to distribute c-compiler jobs among
a network of peers for parallel compilation.  Think make -j but across
multiple computers.  You may have heard of distcc, it&amp;#8217;s like that but
smarter at scheduling jobs.  Use icecc (or distcc if that&amp;#8217;s your thing) if
you have some spare (or used) computers to distribute compilation across.
Some Mozillians (not sure I like that word yet, labels etc) setup icecc
groups within the Mozilla offices.
I&amp;#8217;m told from someone who tried that it&amp;#8217;s not worth connecting to these
from home.
I&amp;#8217;m also avoiding WiFi for this reason and more.
Also on connecting to other networks, be aware this could be a security
issue, a peer could replace your code with something nasty and trick you
into running it.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;ccache is a c compiler cache.  If you&amp;#8217;re recompiling the same project often
this ccache will remember the .o file generated previously and return it
rather than running the compiler again.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;listingblock&quot;&gt;
&lt;div class=&quot;content&quot;&gt;
&lt;pre&gt;sudo aptitude install ccache icecc&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;And on your workstation you can also install icecc-monitor&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;icecc uses two daemons, iceccd accepts jobs and runs them locally on each
node, it connects to icecc-scheduler which manages the jobs for a group of
machines and distributes them.  I believe it&amp;#8217;s supposed to work if you have
multiple icecc-schedulers on your network, but I found that this would
easily create two separate smaller clusters as my laptop came and went from
the network.  Instead it was simple to disable icecc-scheduler on all but
one of the nodes.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;listingblock&quot;&gt;
&lt;div class=&quot;content&quot;&gt;
&lt;pre&gt;sudo update-rc.d icecc-scheduler disable
sudo service icecc-scheduler stop&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph floatright&quot;&gt;
&lt;div class=&quot;title&quot;&gt;icemon idle&lt;/div&gt;
&lt;p&gt;&lt;span class=&quot;image&quot;&gt;&lt;img src=&quot;/assets/img/icecc-ccache/icemon1-small.png&quot; alt=&quot;icemon idle&quot;&gt;&lt;/span&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;icecc-monitor is a GUI application that will let me visualise the cluster
and its jobs.
It was useful at this point to confirm that things were working, start it as
part of your usual desktop environment.
You should see the nodes of your cluster sitting idle.
In the image you can see my two nodes, &quot;fluorine&quot; and &quot;oxygen&quot;,
I will be adding &quot;neon&quot; soon and have 16 cores/threads at my disposal.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;If you&amp;#8217;re not seeing this then I&amp;#8217;ve found that:&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;ulist&quot;&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;You may have to disable your firewall,
I use a firewall on my laptop when I&amp;#8217;m out-and-about but find I have to
turn it off to use icecc in my home office.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Restart the iceccd daemons to get them to connect to the scheduler.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Close and reopen icemon any time you restart the scheduler.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;We could use icecc on its own, it&amp;#8217;s as simple as adding /usr/lib/icecc/bin to
your path.
Instead of doing that we&amp;#8217;ll add ccache.
ccache likes a lot of hard disk space, 15-20GB is suitable if you&amp;#8217;re working
on Firefox, which usually uses about 5GB per build
(reported by shu, I didn&amp;#8217;t measure myself).
I use btrfs with which I like to use snapshots, but there&amp;#8217;s no point
snapshotting my ccache, instead I created a new logical volume, used ext4
(remember to use noatime or relatime in your mount options (for any FS))
and mounted that at /mnt/ccache, depending on how your system is configured
these steps could be quite different, or you might not use a separate
filesystem at all.  (I wish installers would let me name the volume group.)&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Make the filesystem:&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;listingblock&quot;&gt;
&lt;div class=&quot;content&quot;&gt;
&lt;pre&gt;sudo lvcreate -L 20G -n ccache mint-vg
sudo mkfs.ext4 /dev/mint-vg/ccache&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Put this in /etc/fstab:&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;listingblock&quot;&gt;
&lt;div class=&quot;content&quot;&gt;
&lt;pre&gt;/dev/mapper/mint--vg-ccache /mnt/ccache ext4    errors=remount-ro,noatime 0
2&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Mount the filesystem a and make one directory per user in it, that&amp;#8217;s
probably just one directory:&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;listingblock&quot;&gt;
&lt;div class=&quot;content&quot;&gt;
&lt;pre&gt;sudo mkdir /mnt/ccache
sudo mount /mnt/ccache
sudo mkdir /mnt/ccache/paul
sudo chown paul:paul /mnt/ccache/paul&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;And put this in your user&amp;#8217;s ~/.ccache/ccache.conf, set the size here, and
the filesystem size appropriately.
Most filesystems run more smoothly with some free space, your SSD may be
happier with some free space too.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;listingblock&quot;&gt;
&lt;div class=&quot;content&quot;&gt;
&lt;pre&gt;max_size = 17G
cache_dir = /mnt/ccache/paul&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;One more thing, tell ccache to use icecc to run the compiler.  I put this in
my ~/.bashrc.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;listingblock&quot;&gt;
&lt;div class=&quot;content&quot;&gt;
&lt;pre&gt;export CCACHE_PREFIX=icecc&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Time to test it out.
I&amp;#8217;m not sure how this works generally, but for SpiderMonkey (JS shell only)
builds you simply add --with-ccache to your ./configure arguments, then
build with make -j12 (since I have 12 cores/threads in my two machines).
For Firefox itself a build is normally configured by placing a mozconfig
file in the project root directory.  Add to that file:&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;listingblock&quot;&gt;
&lt;div class=&quot;content&quot;&gt;
&lt;pre&gt;ac_add_options --with-ccache=/usr/bin/ccache
mk_add_options MOZ_MAKE_FLAGS=&quot;-j12&quot;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;I haven&amp;#8217;t measured the effects of using either ccache or icecc, but I&amp;#8217;ve
definitely noticed that ccache can speed up repeated builds.
I also suspect that some parallel slackness (adding more tasks than there
are cores) could help speed things up to cover some latency introduced by
ccache.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;The icecc-monitor program has a number of different views.  The image above
was &quot;Star view&quot; I think my favorite is &quot;Gantt view&quot; (below).&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;div class=&quot;title&quot;&gt;icemon running&lt;/div&gt;
&lt;p&gt;&lt;span class=&quot;image&quot;&gt;&lt;img src=&quot;/assets/img/icecc-ccache/icemon2-small.png&quot; alt=&quot;icemon running&quot;&gt;&lt;/span&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;sect1&quot;&gt;
&lt;h2 id=&quot;update-2017-08-17&quot;&gt;Update 2017-08-17&lt;/h2&gt;
&lt;div class=&quot;sectionbody&quot;&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Most of ccache&amp;#8217;s options can be controlled by either configuration options in
~/.ccache/ccache.conf or by environment variables.
Therefore I have removed CCACHE_PREFIX from my ~/.bashrc file and instead
added it to ccache.conf.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;I have also learnt that when -g (or similar) is on the command line ccache
will hash the directory name (I think the working directory) and incorporate
that into its cache.  This ensures that any path references in debugging
symbols resolve correctly and don&amp;#8217;t mislead you during a debugging session.
Which is a good idea, but if most of your builds use -g and you use
multiple workspaces for the same projects it can lead to more cache misses.
If you don&amp;#8217;t use a debugger often, and promise set to CCACHE_NOHASHDIR
when you do (or be confused by references to different source files), then
this can be disabled with the hash_dir option.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;paragraph&quot;&gt;
&lt;p&gt;Now my ccache.conf looks like&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;listingblock&quot;&gt;
&lt;div class=&quot;content&quot;&gt;
&lt;pre&gt;max_size = 17G
cache_dir = /mnt/ccache/paul
prefix_command=icecc

# Might get confusing for debugging
hash_dir = false&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;</description>
        <pubDate>Fri, 04 Aug 2017 00:00:00 +1000</pubDate>
        <link>https://paul.bone.id.au/blog/2017/08/04/icecc-and-ccache/</link>
        <guid isPermaLink="true">https://paul.bone.id.au/blog/2017/08/04/icecc-and-ccache/</guid>
        
        <category>planet-mozilla</category>
        
        <category>icecc</category>
        
        <category>ccache</category>
        
        <category>Firefox</category>
        
        <category>SpiderMonkey</category>
        
        <category>C</category>
        
        <category>C++</category>
        
        <category>C/C++</category>
        
        
        <category>blog</category>
        
      </item>
    
  </channel>
</rss>
