adding the async assembly article

2025-12-08 12:39:44 -05:00 · 2025-12-08 12:39:44 -05:00 · 00fc46587a
commit 00fc46587a
parent dd68256440
2 changed files with 519 additions and 2 deletions
--- a/articles/lcdt/async.article.html
+++ b/articles/lcdt/async.article.html
@ -0,0 +1,509 @@
+<article>
+	<h1 id="an-asynchronous-assembly-adventure-on-the-game-boy-" class="title">an asynchronous assembly adventure, on the game boy!</h1>
+	<p><strong>By:</strong> Shoofle</p>
+	<p>As a preface: This article assumes some familiarity with assembly, or a willingness to pick it up on the fly. The super ultra quick crash course on game boy sm83 assembly is this: </p>
+	<ul>
+		<li><code>ld a, b</code> means &quot;copy the value from register <code>b</code> into register <code>a</code>&quot; (note the order!)</li>
+		<li><code>ld [vCurrentSelection], a</code> means &quot;copy the value from register <code>a</code> into RAM at memory address <code>vCurrentSelection</code>&quot;.</li>
+		<li>Terms such as <code>vCurrentSelection</code> are statically allocated memory addresses, so they get substituted with raw immediate numbers at assembly time.</li>
+		<li>A line beginning with, for ex., <code>MenuSetup:</code> is a label, which is essentially a constant that gets replaced at compile time by the memory address of the following line</li>
+		<li>There are a few registers: <code>a</code>, <code>b</code>, <code>c</code>, <code>d</code>, <code>e</code>, <code>h</code>, and <code>l</code> are all 8-bit registers that can be used for various operations. you can also sometimes use the pairs <code>bc</code>, <code>de</code>, and <code>hl</code> as 16-bit registers, in addition to occasionally using the special pseudo-register-pairs <code>af</code> (<code>a</code> plus the processor&#39;s flag bits), <code>sp</code> (stack pointer), and <code>pc</code> (program counter)</li>
+		<li>There&#39;s also a stack. The 16-bit register <code>sp</code> can be set and read, and interacts with the <code>push</code>, <code>pop</code>, <code>call</code>, and <code>ret</code> instructions.</li>
+		<li>Most of the time I stick to passing variables through registers - a subroutine call requires loading values into the appropriate registers, then executing a <code>call</code> instruction to jump to the subroutine. Then the subroutine uses the <code>ret</code> instruction to return to the call site.</li>
+		<li><code>; semicolons for comments</code></li>
+	</ul>
+	<p>Picture it: You&#39;re writing a program for the nintendo game boy, in raw assembly, as you do. You want to initialize the menu screen, by setting a variable and loading graphical data to the screen. You&#39;ve got two subroutines to use. <code>CopyTilesToMap</code> is used to copy a tile map (a list of tile IDs) from the ROM into the dedicated screen memory, so that the right tiles will be displayed on the screen. <code>CopyRange</code> is used to copy the pixel data from the ROM into the specific spot in VRAM so the game boy knows how to draw the tile IDs we copied before. They just walk over a range of bytes, copying them one by one to their specified destination. It&#39;s most important to know that these are functions we use to copy a range of data into VRAM.</p>
+	<p>So, you write this code:</p>
+	<pre><code>MenuSetup:
+	; <span class="hljs-built_in">set</span> up whatever variables <span class="hljs-keyword">and</span> memory <span class="hljs-keyword">the</span> screen needs
+	ld <span class="hljs-keyword">a</span>, <span class="hljs-number">0</span>
+	ld [vCurrentSelection], <span class="hljs-keyword">a</span>
+
+	; <span class="hljs-built_in">load</span> <span class="hljs-keyword">the</span> tile IDs <span class="hljs-keyword">into</span> <span class="hljs-keyword">the</span> background map    
+	ld hl, Menu.UITileMap ; source tile map location <span class="hljs-keyword">in</span> rom
+	ld de, _SCRN0         ; destination is <span class="hljs-keyword">the</span> <span class="hljs-built_in">start</span> <span class="hljs-keyword">of</span> <span class="hljs-keyword">the</span> screen <span class="hljs-keyword">in</span> memory
+	ld b, <span class="hljs-number">18</span>              ; height (<span class="hljs-keyword">in</span> tiles) 
+	ld c, <span class="hljs-number">20</span>              ; width (<span class="hljs-keyword">in</span> tiles) (takes up <span class="hljs-keyword">the</span> full screen)
+	call CopyTilesToMap
+
+	; <span class="hljs-built_in">load</span> <span class="hljs-keyword">the</span> data <span class="hljs-keyword">for</span> all <span class="hljs-keyword">the</span> tiles used <span class="hljs-keyword">for</span> drawing <span class="hljs-keyword">the</span> screen
+	ld hl, Menu.UITileData                      ; source
+	ld de, _VRAM + $<span class="hljs-number">1000</span>                        ; destination
+	ld bc, Menu.UITileDataEnd - Menu.UITileData ; <span class="hljs-built_in">length</span> <span class="hljs-keyword">of</span> data
+	call CopyRange
+
+	ret
+	</code></pre><p>It&#39;s simple enough. First you set up whatever variables you need for the screen, then you use <code>CopyTilesToMap</code> to load the menu&#39;s tilemap, then you use <code>CopyRange</code> to load the data for what those tiles should look like. Seems good, right?</p>
+	<p>Wrong. The problem comes up immediately: The game boy CPU can&#39;t write to or read from graphics memory while the screen is drawing. You have to wait for the v-blank period, an extra ten scanlines&#39; worth of processor time between every frame. Only during that time are you given access to load data into VRAM.</p>
+	<p>If you&#39;re like me, your first thought is &quot;Okay, I&#39;ll make new versions of <code>CopyTilesToMap</code> and <code>CopyRange</code> that will safely restrict their activity to v-blank.&quot; They&#39;ll check between each byte transfer whether it&#39;s safe to copy data to VRAM, and otherwise they&#39;ll spin their wheels.</p>
+	<p>So if <code>CopyRange</code> looks like this:</p>
+	<pre><code>CopyRange:
+	<span class="hljs-keyword">if</span> <span class="hljs-keyword">the</span> <span class="hljs-built_in">length</span> <span class="hljs-built_in">to</span> copy is <span class="hljs-literal">zero</span>, <span class="hljs-literal">return</span> 
+	copy <span class="hljs-keyword">the</span> <span class="hljs-keyword">byte</span> <span class="hljs-keyword">at</span> <span class="hljs-keyword">the</span> source address <span class="hljs-built_in">to</span> <span class="hljs-keyword">the</span> destination
+	step <span class="hljs-keyword">the</span> source address forward 
+	step <span class="hljs-keyword">the</span> destination address forward
+	decrease <span class="hljs-keyword">the</span> <span class="hljs-built_in">length</span> <span class="hljs-built_in">to</span> copy
+	<span class="hljs-keyword">and</span> jump <span class="hljs-built_in">to</span> CopyRange
+</code></pre>
+	<p>Then your new <code>CopyRangeSafely</code> will look like this:</p>
+<pre><code>CopyRangeSafely: 
+	<span class="hljs-keyword">if</span> <span class="hljs-keyword">the</span> <span class="hljs-built_in">length</span> <span class="hljs-keyword">to</span> <span class="hljs-keyword">copy</span> <span class="hljs-keyword">is</span> zero, <span class="hljs-literal">return</span> 
+	<span class="hljs-keyword">copy</span> <span class="hljs-keyword">the</span> byte <span class="hljs-keyword">at</span> <span class="hljs-keyword">the</span> source address <span class="hljs-keyword">to</span> <span class="hljs-keyword">the</span> destination
+	step <span class="hljs-keyword">the</span> source address forward 
+	step <span class="hljs-keyword">the</span> destination address forward
+	decrease <span class="hljs-keyword">the</span> <span class="hljs-built_in">length</span> <span class="hljs-keyword">to</span> <span class="hljs-keyword">copy</span>
+.checkIfDone:
+	check <span class="hljs-keyword">if</span> <span class="hljs-keyword">the</span> game boy <span class="hljs-keyword">is</span> <span class="hljs-keyword">in</span> v-blank.
+		<span class="hljs-keyword">if</span> <span class="hljs-keyword">it</span> <span class="hljs-keyword">is</span> <span class="hljs-keyword">in</span> v-blank, jump <span class="hljs-keyword">to</span> CopyRangeSafely
+		<span class="hljs-keyword">if</span> <span class="hljs-keyword">it</span>'s <span class="hljs-keyword">not</span> <span class="hljs-keyword">in</span> v-blank, wait a few cycles <span class="hljs-keyword">and</span> jump <span class="hljs-keyword">to</span> .checkIfDone
+	</code></pre><p>Checking for v-blank ultimately increases the number of instructions by at least 30%, between juggling registers, making fetches, and making comparisons. Much worse, this solution freezes up the entire handheld in a busy loop until it&#39;s done copying! You struggle onwards for a bit before realizing this isn&#39;t tenable at all. So you sit and think.</p>
+	<p>If only there were some way to write the same code for v-blank-safe memory transfers as for other memory transfers. This <code>CopyRangeSafely</code> function is really eating at you. If there were some way to take any block of code, and guarantee that it only executes during v-blank, and pauses otherwise. What you&#39;d really like is to be able to run game update code every frame outside of v-blank, and let your transfer code run during every v-blank until it&#39;s complete. </p>
+	<p>But that sounds like having two different threads. Multi-threaded code is hard, right? And the game boy processor is famously weak, it doesn&#39;t even have a modulo instruction! And this is just a hobby project. And you have no experience writing OS-level code... </p>
+	<p>But there&#39;s no harm in trying, eh? Maybe you could start by chipping away at the efficiency at least. So, what have you got available to you?</p>
+	<h2 id="a-first-attempt">a first attempt</h2>
+	<p>The &quot;safe&quot; versions of the copy functions were slower, in part because they have to check - once for every byte copied - whether it&#39;s safe to do so. Is there any way you could get the processor to start and stop that behavior on its own, without having to do the checks yourself? Does the game boy processor even have a way to do that?</p>
+	<p>Well, the game boy has interrupts. Under certain conditions, as the processor is running, it will execute a <code>call</code> to a hard-coded address determined by the conditions - <code>$0060</code> if the joypad was pressed, <code>$0058</code> if a byte was received from the serial port, <code>$0050</code> if a timer rolled over... These addresses have just enough space to call or jump somewhere else to react to the interrupt. It just so happens that you can configure one of these interrupts to happen when the game boy enters v-blank, and another for when it starts drawing line zero and thus exits v-blank. It also supports a <code>halt</code> instruction, which suspends the CPU until an interrupt fires.</p>
+	<p>Maybe you can use these somehow.</p>
+	<p>For a start, you can use the interrupt to at least stop needing to check when the transfer is safe. Here&#39;s how it works: You <code>halt</code> execution. Then, configure the v-blank interrupt to wake up the processor and continue what it&#39;s doing. Then it&#39;ll run some of your code. Then another interrupt, when the device exits v-blank and it&#39;s no longer safe to copy, will execute another <code>halt</code> instruction, to put it to sleep until the next v-blank period.</p>
+	<p>Maybe that could work. Then you could at least get rid of the slow and ugly <code>CopyRangeSafe</code> functions. So how does this look?</p>
+	<p>We want to run the <code>MenuSetup</code> subroutine from above. The changes here are simple:</p>
+	<pre><code>MenuSetup:
+	; <span class="hljs-built_in">set</span> up some statically allocated screen variables
+	ld <span class="hljs-keyword">a</span>, <span class="hljs-number">0</span>
+	ld [vCurrentSelection], <span class="hljs-keyword">a</span>
+
+	call SetUpInterrupts ; turn <span class="hljs-keyword">on</span> <span class="hljs-title">interrupts</span> <span class="hljs-title">and</span> <span class="hljs-title">set</span> <span class="hljs-title">up</span> <span class="hljs-title">the</span> <span class="hljs-title">handler</span>
+	halt                 ; <span class="hljs-built_in">wait</span> <span class="hljs-keyword">until</span> <span class="hljs-keyword">the</span> next interrupt
+
+	ld hl, Menu.UITileMap ; tile map location <span class="hljs-keyword">in</span> rom
+	ld de, _SCRN0         ; draw <span class="hljs-keyword">it</span> starting <span class="hljs-keyword">at</span> <span class="hljs-number">0</span>,<span class="hljs-number">0</span> <span class="hljs-keyword">on</span> <span class="hljs-title">the</span> <span class="hljs-title">screen</span>
+	ld b, <span class="hljs-number">18</span>              ; height (<span class="hljs-keyword">in</span> tiles) 
+	ld c, <span class="hljs-number">20</span>              ; width (<span class="hljs-keyword">in</span> tiles) (takes up <span class="hljs-keyword">the</span> full screen)
+	call CopyTilesToMap
+
+	; <span class="hljs-built_in">load</span> <span class="hljs-keyword">the</span> data <span class="hljs-keyword">for</span> all <span class="hljs-keyword">the</span> tiles used <span class="hljs-keyword">for</span> drawing <span class="hljs-keyword">the</span> screen
+	ld hl, Menu.UITileData                      ; source
+	ld de, _VRAM + $<span class="hljs-number">1000</span>                        ; destination
+	ld bc, Menu.UITileDataEnd - Menu.UITileData ; <span class="hljs-built_in">length</span> <span class="hljs-keyword">of</span> data
+	call CopyRange
+
+	call TearDownInterrupts
+
+	ret
+	</code></pre><p><code>SetUpInterrupts</code> will clear the interrupt flags, enable the specific interrupts we want, enable interrupts globally, set the STAT interrupt to fire on scanline zero, and do whatever busywork needs to happen to connect the interrupt vectors (those hardcoded ROM addresses for each interrupt) to the code that you&#39;re writing. The <a href="https://gbdev.io/pandocs/Interrupts.html">pandocs</a> will help you here. Likewise, <code>TearDownInterrupts</code> will disable the two interrupt handlers we&#39;re using to restore the regular flow of code.</p>
+	<p>Then you&#39;ll have the two interrupt handlers, one for when we&#39;re entering v-blank and one for when we&#39;re exiting, which get hooked up by <code>SetUpInterrupts</code>. After that <code>halt</code> instruction, the processor is going to wait until an interrupt happens. What should the interrupt handlers look like?</p>
+	<p>When the device enters the v-blank period, nothing needs to happen, because the interrupt firing at all will wake the processor up. So it just needs to enable interrupts again (the processor turns off interrupts globally when it starts handling an interrupt) and return the processor to its previous work.</p>
+	<pre><code>VBlankInterrupt:
+	ei  ; enable interrupts
+	ret ; <span class="hljs-literal">return</span> <span class="hljs-built_in">to</span> wherever <span class="hljs-keyword">the</span> processor was <span class="hljs-keyword">before</span> <span class="hljs-keyword">the</span> interrupt fired
+</code></pre><p>When it reaches scanline zero, the <a href="https://gbdev.io/pandocs/Interrupt_Sources.html#int-48--stat-interrupt">configurable STAT interrupt</a> (set up in <code>SetUpInterrupts</code>) fires. It should enable interrupts, then execute a <code>halt</code> instruction to put the CPU to sleep.</p>
+<pre><code>STATInterrupt:
+	ei   ; enable interrupts
+	halt ; sleep processor <span class="hljs-keyword">until</span> next interrupt
+	ret  ; <span class="hljs-literal">return</span> <span class="hljs-built_in">to</span> wherever <span class="hljs-keyword">the</span> processor was <span class="hljs-keyword">before</span> <span class="hljs-keyword">the</span> interrupt fired
+	</code></pre><p>So the execution goes like this:</p>
+	<ol>
+		<li>Our <code>MenuSetup</code> routine runs the first bit, doing synchronous normal code, setting up variables, and such. Everything for which it doesn&#39;t need to touch VRAM.</li>
+		<li>It calls <code>SetUpInterrupts</code>, which does the busywork of setting and enabling the v-blank interrupt and STAT interrupt.</li>
+		<li>It then halts, which puts the processor to sleep until...</li>
+		<li>The v-blank interrupt we set up in step 2 fires, waking up the processor. It immediately returns...</li>
+		<li>And starts executing the code in <code>MenuSetup</code> that touches VRAM. That code runs for a bit until...</li>
+		<li>When the game boy starts drawing scanline zero of the screen, the STAT interrupt we also set up in step 2 fires, which executes a <code>halt</code> to put the processor to sleep, until...</li>
+		<li>The v-blank interrupt fires again, waking up the processor. It immediately returns...</li>
+		<li>Continuing execution of the <code>MenuSetup</code> code from where we left off in step 5, until...</li>
+		<li>The STAT interrupt fires, putting the processor to sleep until...</li>
+		<li>The v-blank interrupt fires again, waking up the processor. It returns...</li>
+		<li>Continuing execution of the <code>MenuSetup</code> code from where it left off in step 8, until...</li>
+		<li><em>And so on!</em></li>
+	</ol>
+	<p>So what did all this (a few helper functions and two interrupts) net you? Well, now you don&#39;t need to have special <code>CopyRangeSafely</code> functions, and it&#39;ll run much faster without the overhead of checking all the time whether it&#39;s safe. I think we can feel pretty good about that! </p>
+	<p>But most of all, you&#39;ve learned a bit about the idea of using interrupts to enter and exit a specific &quot;safe&quot; period in the game loop. We&#39;re using the v-blank interrupt to enable our code to run, and the STAT interrupt to take control away and stop it again, so that the code that needs to run exclusively in v-blank can look the same as code that can run whenever, without changes!</p>
+	<h2 id="a-second-attempt">a second attempt</h2>
+	<p>But that first attempt doesn&#39;t solve the problem of interleaving other code with the <code>CopyRange</code> operation. Your program will now sleep whenever it&#39;s not able to copy data. But you wonder: is it possible to use that time, when the program is sleeping, to run something else at the same time? To use the interrupts to switch between two simultaneous &quot;threads&quot; of code being executed?</p>
+	<p>Well, what&#39;s the state of the processor at any given moment? Ignore the RAM for now, which should be shared between threads.</p>
+	<ol>
+		<li>There&#39;s the registers it uses to pass information around - <code>af</code>, <code>bc</code>, <code>de</code>, and <code>hl</code>. </li>
+		<li>There&#39;s the program counter <code>pc</code>, which indicates the specific line being executed. </li>
+		<li>And there&#39;s the stack pointer <code>sp</code> which holds the address of the variable stack of data used to store call and return locations. 
+		Could we somehow... keep two copies of all of those? You could certainly define dedicated memory locations to store all the registers. </li>
+	</ol>
+	<p>The issue is the stack and the program counter. The stack is used for holding some data (<code>push</code> and <code>pop</code> will put register pairs on and take them off) and for tracking the call stack - used to remember where to resume when you <code>ret</code>urn from a subroutine, or from an interrupt.</p>
+	<p>In my experiments (particular to my code) the call stack only got four or five calls deep, and I wasn&#39;t ever putting much data on it. So it&#39;d be easy enough to allocate space for a second call stack, and then freely set the stack pointer <code>sp</code> to whatever you want.</p>
+	<p>The program counter isn&#39;t generally manipulated directly, except by <code>jp</code> (jump), <code>call</code>, and <code>ret</code> instructions. Pretend we can just read and write from and to it.</p>
+	<p>Maybe you could do as before, and write a v-blank handler to swap all of that context out, and a STAT handler to swap back... That might work! First, some starting assumptions:</p>
+	<p>Your goal is to be able to say &quot;hey, processor, run this other subroutine whenever it&#39;s safe to do so&quot;, and then the processor will handle scheduling its execution while you can go on and continue doing other stuff. You&#39;ll have to write a function to set up the interrupts to execute our asynchronous code. The point of this is to be able to write normal-looking code, so we&#39;ll make a new function <code>RunInVBlank</code> that will execute a specified subroutine (passed in <code>hl</code>) in the &quot;safe&quot; part of each frame. </p>
+	<p>So your new <code>MenuSetup</code> subroutine would break up into a part that runs immediately:</p>
+	<pre><code>MenuSetup:
+	; <span class="hljs-built_in">do</span> whatever synchronous stuff we want <span class="hljs-built_in">to</span> <span class="hljs-built_in">do</span> <span class="hljs-keyword">in</span> <span class="hljs-keyword">the</span> setup
+	; like initializing variables <span class="hljs-keyword">for</span> this screen.
+	ld <span class="hljs-keyword">a</span>, <span class="hljs-number">0</span>
+	ld [vCurrentSelection], <span class="hljs-keyword">a</span> ; example!
+
+	ld hl, MenuSetupVRAMPart ; pass <span class="hljs-keyword">the</span> subroutine we want <span class="hljs-keyword">as</span> <span class="hljs-keyword">an</span> argument
+	call RunInVBlank
+	ret
+	</code></pre><p>And a second part, which gets scheduled to only be running when VRAM is safe to access:</p>
+	<pre><code><span class="hljs-symbol">MenuSetupVRAMPart:</span>
+		<span class="hljs-keyword">ld</span> hl, Menu.UITileMap <span class="hljs-comment">; tile map location in rom</span>
+		<span class="hljs-keyword">ld</span> de, _SCRN0         <span class="hljs-comment">; draw it starting at 0,0 on the screen</span>
+		<span class="hljs-keyword">ld</span> b, <span class="hljs-number">18</span>              <span class="hljs-comment">; height (in tiles) </span>
+		<span class="hljs-keyword">ld</span> c, <span class="hljs-number">20</span>              <span class="hljs-comment">; width (in tiles) (takes up the full screen)</span>
+		<span class="hljs-keyword">call</span> CopyTilesToMap
+
+		<span class="hljs-comment">; load the data for all the tiles used for drawing the screen</span>
+		<span class="hljs-keyword">ld</span> hl, Menu.UITileData                      <span class="hljs-comment">; source</span>
+		<span class="hljs-keyword">ld</span> de, _VRAM + $<span class="hljs-number">1000</span>                        <span class="hljs-comment">; destination</span>
+		<span class="hljs-keyword">ld</span> bc, Menu.UITileDataEnd - Menu.UITileData <span class="hljs-comment">; length of data</span>
+		<span class="hljs-keyword">call</span> CopyRange
+
+		<span class="hljs-keyword">ret</span>
+	</code></pre><p>Then the normal flow of execution is that <code>MenuSetup</code> does its stuff, updates variables, calls <code>RunInVBlank</code> to schedule its subroutine for execution in v-blank, and then returns to do whatever else the main game loop wants done. When the v-blank period arrives, an interrupt will fire and switch contexts to execute a bit of the <code>MenuSetupVRAMPart</code>. When the v-blank period ends, another interrupt fires and context switches back to the main game loop, and things continue in this way, switching back and forth between the &quot;main thread&quot; and the execution of <code>MenuSetupVRAMPart</code>.</p>
+	<p>Now it&#39;s a matter of figuring out what that mysterious <code>RunInVBlank</code> subroutine will do. First off, you need to keep a separate copy of our registers. Define some static memory addresses wherever you do that: <code>vAsyncAF</code>, <code>vAsyncBC</code>, <code>vAsyncDE</code>, <code>vAsyncHL</code>, <code>vAsyncSP</code>, <code>vAsyncPC</code>. Next up, the stack: by default, the stack grows down from <code>$FFFF</code>. If your async stack starts at <code>$FFBF</code>, that will leave 64 bytes out of the special HRAM memory region (<code>$FF80</code>-<code>$FFFF</code>) for each stack. (Note: If we wanted to, we could configure our stacks to be anywhere in RAM, which would enable them to grow much bigger. I opted not to do this, because I&#39;m a silly goose.)</p>
+	<p><code>RunInVBlank</code> needs to set up that parallel execution environment (give all those registers starting values), and then enable the handler for entering v-blank. (Note: I&#39;m also going to take some liberties and pretend there are a few extra instructions the game boy doesn&#39;t actually support, like using <code>ld</code> to put two-byte values into memory addresses. Rewriting this to use the available asm instructions is a pain but it&#39;s doable.)</p>
+	<pre><code><span class="hljs-symbol">RunInVBlank:</span>
+	<span class="hljs-comment">; store starting values for the registers</span>
+	<span class="hljs-keyword">ld</span> [vAsyncAF], af
+	<span class="hljs-keyword">ld</span> [vAsyncBC], bc
+	<span class="hljs-keyword">ld</span> [vAsyncDE], de
+	<span class="hljs-keyword">ld</span> [vAsyncHL], hl
+
+	<span class="hljs-comment">; store starting value for the stack pointer</span>
+	<span class="hljs-keyword">ld</span> [vAsyncSP], $FFBF
+
+	<span class="hljs-comment">; store starting value for the program counter, passed as arg in hl</span>
+	<span class="hljs-keyword">ld</span> [vAsyncPC], hl
+
+	<span class="hljs-comment">; enable v-blank interrupt</span>
+	<span class="hljs-keyword">ld</span> hl, rIE            <span class="hljs-comment">; target the interrupt enable flag</span>
+	<span class="hljs-keyword">set</span> B_IE_VBLANK, [hl] <span class="hljs-comment">; set the bit to enable the v-blank interrupt</span>
+	ei                    <span class="hljs-comment">; enable interrupts globally</span>
+
+		<span class="hljs-keyword">ret</span>
+	</code></pre><p>So you store the starting values for the registers, you set the starting value for the stack pointer, and you store the program counter we want to start from. Ezpz! Then what&#39;s the v-blank handler look like? It&#39;s gotta stash all the context info from the main thread, and unstash all the context info for the async thread.</p>
+	<pre><code><span class="hljs-symbol">VBlankInterrupt:</span>
+	<span class="hljs-comment">; store current values of the registers</span>
+	<span class="hljs-keyword">ld</span> [vMainAF], af <span class="hljs-comment">; stash af registers</span>
+	<span class="hljs-keyword">ld</span> [vMainBC], bc 
+	<span class="hljs-keyword">ld</span> [vMainDE], de
+	<span class="hljs-keyword">ld</span> [vMainHL], hl
+
+	<span class="hljs-comment">; store current value of the stack pointer</span>
+	<span class="hljs-keyword">ld</span> [vMainSP], sp
+
+	<span class="hljs-comment">; store the current program counter</span>
+	<span class="hljs-keyword">ld</span> [vMainPC], pc <span class="hljs-comment">; hmm....</span>
+
+	<span class="hljs-comment">; get last values of the async registers</span>
+	<span class="hljs-keyword">ld</span> af, [vAsyncAF]
+	<span class="hljs-keyword">ld</span> bc, [vAsyncBC]
+	<span class="hljs-keyword">ld</span> de, [vAsyncDE]
+	<span class="hljs-keyword">ld</span> hl, [vAsyncHL]
+
+	<span class="hljs-comment">; get last value of the stack pointer</span>
+	<span class="hljs-keyword">ld</span> sp, [vAsyncSP]
+
+	<span class="hljs-comment">; get last program counter</span>
+	<span class="hljs-keyword">ld</span> pc, [vAsyncPC] <span class="hljs-comment">; hmm...</span>
+
+	<span class="hljs-keyword">ret</span>
+	</code></pre><p>And then you can write a <code>STATInterrupt</code> that should do the inverse, storing the async registers and fetching the main registers. These are context-switching interrupts! When the interrupt fires to signal the game boy is in the &quot;safe&quot; period, it switches context from main to async, and when the interrupt fires to signal we&#39;re out of the safe period, it switches context back.</p>
+	<p>But there&#39;s a big problem: we&#39;ve been very cavalier with the program counter. On the line where I&#39;ve commented <code>hmm...</code> we read from the program counter to get the state of the main thread. If <code>VBlankInterrupt</code> tries to store the current address of execution, it&#39;s not going to be where to resume the main thread - it&#39;s going to be inside <code>VBlankInterrupt</code>! Ditto for the <code>hmm!</code> line - writing directly to the program counter would mess up all sorts of things! When you want to interact with the program counter, you really need to use <code>jp</code> or <code>call</code> or <code>ret</code> instructions.</p>
+	<p>One more try.</p>
+	<h2 id="a-third-attempt">a third attempt</h2>
+	<p>The problems with <code>pc</code> are big. The approach above falls apart completely and is a huge pain to implement. Fret not, though, for we are valiant. The issue is in getting and storing information about where the processor is currently executing - you can&#39;t just read and write <code>pc</code>willy-nilly. But how does the processor handle that information? Well, it puts it on the stack! It&#39;s time to talk about the call stack, how it interacts with interrupts, and you.</p>
+	<p>The call stack works like this: the stack pointer <code>sp</code> always contains a memory address. It&#39;s initialized to <code>$FFFE</code>, the second-to-last memory address, at processor start-up. Whenever a <code>push</code> instruction is executed (<code>push hl</code>), the stack pointer <code>sp</code> is decreased by two bytes, and the register pair is copied into the new location <code>sp</code> points to (like <code>ld [sp], hl</code>). When a <code>pop</code> instruction is executed (<code>pop hl</code>), the memory at <code>sp</code> is copied into the argument, (<code>ld hl, [sp]</code>) and <code>sp</code> is <em>increased</em> by two bytes. Similarly, the <code>call Subroutine</code> instruction effectively pushes the address of the next instruction to execute (after <code>call Subroutine</code>) onto the stack, and jumps to <code>Subroutine</code>; <code>ret</code> likewise pops an address off the stack and jumps to it. </p>
+	<p>Tragically, talking about the &quot;top&quot; and &quot;bottom&quot; of the stack, which is normally quite a sensible metaphor for a stack (you can only interact with the top of the stack, and change what the stack holds by putting things on or taking them off), is now hopelessly confusing due to the stack growing backwards, and thus confusion about whether we talk about the &quot;top&quot; as the end or the beginning of the region of memory, which has opposite sense from the end or beginning of the values placed on the stack, and before you know it you&#39;re going <code>@_@</code> and are totally lost. </p>
+	<p>I&#39;m going to adopt my own convention. When I talk about the stack, I&#39;ll try to refer to the &quot;earliest&quot; and &quot;latest&quot; values: the &quot;earliest&quot; value on the stack (as an organization of information) is the first value that was pushed there chronologically. The &quot;latest&quot; is the last value that was pushed there. If you pop data off the stack, you&#39;re getting the latest value, and the stack shortens; if you push data on the stack you&#39;re changing the latest and the stack grows. If you executed a <code>push $BEEF</code> and then <code>push $B0DE</code> and then <code>push $1337</code>, the stack would look like this, listed from earliest to latest:</p>
+	<table>
+		<thead>
+			<tr>
+				<th>stack</th>
+			</tr>
+		</thead>
+		<tbody>
+			<tr>
+				<td>[ whatever was on the stack previously ]</td>
+			</tr>
+			<tr>
+				<td><code>$BEEF</code></td>
+			</tr>
+			<tr>
+				<td><code>$B0DE</code></td>
+			</tr>
+			<tr>
+				<td><code>$1337</code></td>
+			</tr>
+			<tr>
+				<td>{ <strong>stack ends here</strong> }</td>
+			</tr>
+		</tbody>
+	</table>
+	<p>In this diagram I&#39;ve written in [ brackets ] to suggest some amount of data <em>previously</em> pushed onto the stack, and in { <strong>curly braces</strong> } a placeholder to indicate the end of the stack, the location <code>sp</code> points to, where new data will be pushed to. Most of the time, though, the stack is used as a call stack. When you execute a <code>call Subroutine</code> instruction, the address of the next line gets pushed onto the stack, and the processor jumps to <code>Subroutine</code>. When you execute a <code>ret</code> instruction, that address gets popped off the stack, and the processor jumps to it. So the stack stores a record of all the locations in memory it should return to!</p>
+	<p>So you have a stack pointer <code>sp</code> representing the end of a stack in memory. It holds data you put there, as well as the &quot;call stack&quot; featuring the locations successive <code>ret</code> instructions should return to. How does it interact with interrupts, though? Well, when an interrupt is handled, the processor effectively executes a <code>call InterruptHandler</code> instruction - it pushes the next address to execute onto the stack, and jumps to <code>InterruptHandler</code>. Then, when that code does a <code>ret</code>, it will restore computation from where we were before the interrupt.</p>
+	<p>Here&#39;s a theoretical interrupt we might write, and a marked line to pay attention to:</p>
+	<pre><code><span class="hljs-symbol">VBlankInterrupt:</span>
+	<span class="hljs-keyword">nop </span><span class="hljs-comment">; do nothing</span>
+	<span class="hljs-comment">;;;;;; What's the stack look like here?</span>
+	<span class="hljs-keyword">ei </span><span class="hljs-comment">; enable interrupts</span>
+	ret
+	</code></pre><p>At the marked line, the stack has the following stuff on it, from earliest to latest:</p>
+	<table>
+		<thead>
+			<tr>
+				<th>call stack</th>
+			</tr>
+		</thead>
+		<tbody>
+			<tr>
+				<td>[ ... ]</td>
+			</tr>
+			<tr>
+				<td>[ various stuff from before the interrupt fires ]</td>
+			</tr>
+			<tr>
+				<td>[ more of that stuff ... ]</td>
+			</tr>
+			<tr>
+				<td>the address that was being executed right before the interrupt fires, placed on the stack by the CPU when reacting to the interrupt.</td>
+			</tr>
+			<tr>
+				<td>{ <strong>stack ends here</strong> }</td>
+			</tr>
+		</tbody>
+	</table>
+	<p>Here I&#39;ve written in [ brackets ] some placeholder data, which could be <code>call</code> stack data, or could be data that was <code>push</code>ed onto the stack previously. But &quot;the address that was being executed&quot; got placed on the stack by the processor&#39;s interrupt handler.</p>
+	<p>When the <code>ret</code> is executed in the <code>VBlankInterrupt</code>, it pops the last value off the stack, and jumps execution to that address. But if, perchance, the last value on the stack was a different one than when this interrupt started, it would jump to a totally new spot...</p>
+	<p>Bear with me now: suppose you have two stacks. The &quot;main thread stack&quot; is currently in use, and, elsewhere in memory, there is an &quot;async thread stack&quot; which holds an address the async thread is executing. They look like this before the interrupt fires, from earliest to latest:</p>
+	<table>
+		<thead>
+			<tr>
+				<th>main thread stack</th>
+				<th>async thread stack</th>
+			</tr>
+		</thead>
+		<tbody>
+			<tr>
+				<td>[ various data ... ]</td>
+				<td>[ various data ... ]</td>
+			</tr>
+			<tr>
+				<td>{ <strong>stack ends here</strong> }</td>
+				<td>async thread program counter</td>
+			</tr>
+		</tbody>
+	</table>
+	<p>When the interrupt fires, it pushes the main thread&#39;s program counter onto the stack:</p>
+	<table>
+		<thead>
+			<tr>
+				<th>main thread stack</th>
+				<th>async thread stack</th>
+			</tr>
+		</thead>
+		<tbody>
+			<tr>
+				<td>[ various data ... ]</td>
+				<td>[ various data ... ]</td>
+			</tr>
+			<tr>
+				<td>main thread program counter</td>
+				<td>async thread program counter</td>
+			</tr>
+			<tr>
+				<td>{ <strong>stack ends here</strong> }</td>
+			</tr>
+		</tbody>
+	</table>
+	<p>What if the interrupt now swapped our stack pointer from the main stack to the async stack?</p>
+	<table>
+		<thead>
+			<tr>
+				<th>main thread stack</th>
+				<th>async thread stack</th>
+			</tr>
+		</thead>
+		<tbody>
+			<tr>
+				<td>[ various data ... ]</td>
+				<td>[ various data ... ]</td>
+			</tr>
+			<tr>
+				<td>main thread program counter</td>
+				<td>async thread program counter</td>
+			</tr>
+			<tr>
+				<td></td>
+				<td>{ <strong>stack ends here</strong> }</td>
+			</tr>
+		</tbody>
+	</table>
+	<p>Then at the end, it would <code>ret</code> and resume execution in the async thread.</p>
+	<table>
+		<thead>
+			<tr>
+				<th>main thread stack</th>
+				<th>async thread stack</th>
+			</tr>
+		</thead>
+		<tbody>
+			<tr>
+				<td>[ various data ... ]</td>
+				<td>[ various data ... ]</td>
+			</tr>
+			<tr>
+				<td>main thread program counter</td>
+				<td>{ <strong>stack ends here</strong> }</td>
+			</tr>
+		</tbody>
+	</table>
+	<p>That&#39;s very simple! All you need in your handler to achieve this is the following:</p>
+	<pre><code><span class="hljs-symbol">VBlankInterrupt:</span>
+	<span class="hljs-comment">; save main thread stack pointer</span>
+	ld [vMainSP], <span class="hljs-built_in">sp</span> 
+
+	<span class="hljs-comment">; load side thread stack pointer</span>
+	ld <span class="hljs-built_in">sp</span>, [vAsyncSP]
+
+	<span class="hljs-keyword">ei</span>
+	ret
+	</code></pre><p>and then a matching STAT interrupt handler:</p>
+	<pre><code><span class="hljs-symbol">STATInterrupt:</span>
+	<span class="hljs-comment">; load side thread stack pointer</span>
+	ld <span class="hljs-built_in">sp</span>, [vAsyncSP]
+
+	<span class="hljs-comment">; save main thread stack pointer</span>
+	ld [vMainSP], <span class="hljs-built_in">sp</span> 
+
+	<span class="hljs-keyword">ei</span>
+	ret
+	</code></pre><p>This switches the stack context beautifully and avoids having to do any difficult manipulation of the program counter <code>pc</code> - it&#39;s all handled by the call stacks! </p>
+	<p>But now you aren&#39;t holding onto the registers. In the last attempt, you had to write <code>ld [vAsyncAF], af</code> and the like, and I mentioned that those instructions don&#39;t actually exist and brushed over them. You can do it but it&#39;s slow and ugly. But! It turns out the stack can help you here as well! Just push all the registers onto the stack before switching, and then pop them off after. ggez!</p>
+	<p>Here&#39;s the new approach: the interrupt handler to switch contexts should do the following sequence:</p>
+	<ol>
+		<li>push all the registers onto the stack </li>
+		<li>save the stack pointer for the old context</li>
+		<li>fetch the stack pointer for the new context </li>
+		<li>and then pop all the registers off the stack.</li>
+	</ol>
+	<p>So here&#39;s what our &quot;enter async thread&quot; interrupt handler looks like now:</p>
+	<pre><code><span class="hljs-symbol">VBlankInterrupt</span>: 
+	<span class="hljs-keyword">push </span>af
+	<span class="hljs-keyword">push </span><span class="hljs-keyword">bc</span>
+	<span class="hljs-keyword">push </span>de
+	<span class="hljs-keyword">push </span>hl 
+
+	<span class="hljs-comment">; save main thread stack pointer</span>
+	ld [vMainSP], <span class="hljs-built_in">sp</span> 
+
+	<span class="hljs-comment">; load async thread stack pointer</span>
+	ld <span class="hljs-built_in">sp</span>, [vAsyncSP]
+
+	<span class="hljs-keyword">pop </span>hl 
+	<span class="hljs-keyword">pop </span>de 
+	<span class="hljs-keyword">pop </span><span class="hljs-keyword">bc</span>
+	<span class="hljs-keyword">pop </span>af 
+
+	ei
+	ret
+	</code></pre><p>And then a matching interrupt handler to fire on the STAT interrupt when we hit scanline zero:</p>
+	<pre><code><span class="hljs-symbol">STATInterrupt</span>: 
+	<span class="hljs-keyword">push </span>af
+	<span class="hljs-keyword">push </span><span class="hljs-keyword">bc</span>
+	<span class="hljs-keyword">push </span>de
+	<span class="hljs-keyword">push </span>hl 
+
+	<span class="hljs-comment">; save async thread stack pointer</span>	:w
+
+	ld [vAsyncSP], <span class="hljs-built_in">sp</span> 
+
+	<span class="hljs-comment">; load main thread stack pointer</span>
+	ld <span class="hljs-built_in">sp</span>, [vMainSP]
+
+	<span class="hljs-keyword">pop </span>hl 
+	<span class="hljs-keyword">pop </span>de 
+	<span class="hljs-keyword">pop </span><span class="hljs-keyword">bc</span>
+	<span class="hljs-keyword">pop </span>af 
+
+	ei
+	ret
+	</code></pre><p>Pleasingly symmetric, no? This is quite close to the code I wrote in my project. There are two steps left: First, clean up one loose end, then, write a <code>RunInVBlank</code> subroutine to work with this stack-centric approach. Time for you to trim that loose end:</p>
+	<p>What happens when the subroutine in our thread returns the final time? At that point the stack pointer will be pointing past the stack, and you&#39;ll underflow the stack, but this is dang ol&#39; game boy assembly, so there&#39;s no error handling but what you write yourself. The solution to this is very simple: We write a handler for when the subroutine returns, and put that on the stack first! When the subroutine returns, it&#39;ll execute the &quot;early return&quot; handler, and that can clean up and turn off the interrupts itself.</p>
+	<p>This &quot;early return&quot; handler is pretty simple: it just needs to turn off the interrupts, and maybe have some places to put other bookkeeping we might add in the future.</p>
+	<pre><code><span class="hljs-symbol">EarlyReturn:</span>
+	<span class="hljs-keyword">di </span><span class="hljs-comment">; disable interrupts globally, because this would result in very strange </span>
+	   <span class="hljs-comment">; behavior otherwise if an interrupt somehow fired during it</span>
+
+	<span class="hljs-comment">; turn off the specific interrupts we've been using</span>
+	ld hl, rIE            <span class="hljs-comment">; target the hardware register controlling interrupts</span>
+	res <span class="hljs-keyword">B_IE_VBLANK, </span>[hl] <span class="hljs-comment">; reset the bit to turn off the v-blank interrupt</span>
+	res <span class="hljs-keyword">B_IE_STAT, </span>[hl]   <span class="hljs-comment">; reset the bit to turn off the STAT interrupt</span>
+
+	<span class="hljs-comment">; [do any other bookkeeping necessary here]</span>
+
+	ld <span class="hljs-built_in">sp</span>, [vMainSP] <span class="hljs-comment">; restore the main thread's stack</span>
+
+	<span class="hljs-comment">; get all the registers off the stack, because there's no longer</span>
+	<span class="hljs-comment">; going to be a STAT interrupt to restore them</span>
+	pop hl
+	pop de
+	pop <span class="hljs-keyword">bc</span>
+	pop af
+
+	<span class="hljs-keyword">ei </span><span class="hljs-comment">; re-enable interrupts globally at the end</span>
+
+	ret <span class="hljs-comment">; return execution to the main thread context</span>
+	</code></pre>
+	<p>Now, to incorporate it. You&#39;ll put this on your stack first when you&#39;re preparing your <code>RunInVBlank</code> function. Then the subroutine you want to run goes on the stack next, and then the registers. Define a couple constants for the memory locations these live at. Let&#39;s write the final <code>RunInVBlank</code> function, fully using the stack and early return!</p>
+	<pre><code>def ASYNC_STACK_TOP = $FFBF <span class="hljs-comment">; the top of the stack will be at this address</span>
+def ASYNC_STACK_EARLY_RETURN = ASYNC_STACK_TOP - <span class="hljs-number">2</span> <span class="hljs-comment">; allocate two bytes to hold the early return handle</span>
+def ASYNC_STACK_FUNCTION = ASYNC_STACK_EARLY_RETURN - <span class="hljs-number">2</span> <span class="hljs-comment">; two more bytes for where the async thread should resume from when it's called for the first time</span>
+<span class="hljs-symbol">RunInVBlank:</span> 
+	ld [vMainSP], <span class="hljs-built_in">sp</span> <span class="hljs-comment">; store the stack pointer so we can restore after using it </span>
+
+	<span class="hljs-comment">; make sure we've got the early return handle at the base of the stack</span>
+	ld [ASYNC_STACK_EARLY_RETURN], EarlyReturn
+
+	<span class="hljs-comment">; now we want to build our stack. the first thing on it will be the function</span>
+	<span class="hljs-comment">; we're running in the thread, so it can resume. so point the stack pointer </span>
+	<span class="hljs-comment">; at it</span>
+	ld <span class="hljs-built_in">sp</span>, ASYNC_STACK_FUNCTION 
+
+	push hl <span class="hljs-comment">; the argument to RunInVBlank is a subroutine address in hl.</span>
+			<span class="hljs-comment">; so it goes on the stack first, at the location we just set sp to</span>
+
+	push af <span class="hljs-comment">; then we put all the registers in the right order </span>
+	push <span class="hljs-keyword">bc </span><span class="hljs-comment">; so that when the program switches context into the async thread,</span>
+	push de <span class="hljs-comment">; it can get them out</span>
+	push hl
+
+	<span class="hljs-comment">; and now our async stack is set up! we just need to store it and </span>
+	<span class="hljs-comment">; restore the main thread stack</span>
+
+	ld [vAsyncSP], <span class="hljs-built_in">sp</span>
+	ld <span class="hljs-built_in">sp</span>, [vMainSP]
+
+	<span class="hljs-comment">; enable the interrupts</span>
+	ld hl, rIE            <span class="hljs-comment">; target the interrupt enable flag</span>
+	set <span class="hljs-keyword">B_IE_VBLANK, </span>[hl] <span class="hljs-comment">; set the bit to enable the v-blank interrupt</span>
+	set <span class="hljs-keyword">B_IE_STAT, </span>[hl]   <span class="hljs-comment">; set the bit to enable the STAT interrupt</span>
+	<span class="hljs-keyword">ei </span>                   <span class="hljs-comment">; enable interrupts globally</span>
+
+	ret
+	</code></pre>
+	<p>And that&#39;s more or less the same as the code I wrote! At any time, you can pass a subroutine address via <code>hl</code>  to the <code>RunInVBlank</code> function, and it will then be executed in the background, only running between the v-blank and STAT interrupts. When it finishes by executing a <code>ret</code> instruction, it&#39;ll clean itself up, turn the interrupts off, and restore flow to the main thread. I think it&#39;s a pretty clean interface, and very usable. I&#39;ve used it extensively in my year-long game boy project, the Liquid Crystal Dreams tarot deck. (Look for it soon on kickstarter!) I use this async function whenever I want to load graphics data, so I don&#39;t ever have to worry about when there&#39;s time to do it safely. It&#39;s all scheduled by interrupts and a couple of assembly-time constants!</p>
+	<p>Thanks for coming on this little journey with me. It was really fun to invent the wheel like this, especially because OS-level code is such a black box to me most of the time, but here I am, writing the assembly for a context switching thread management system. </p>
+	<p>There&#39;s a handful of additional tasks which you might find interesting to think through, if you&#39;ve been following along and want some more:</p>
+	<ul>
+		<li>You don&#39;t actually have some of the instructions I used, like loading a constant into <code>sp</code>. Can you write performant replacements for them?</li>
+		<li>It&#39;s probably possible to combine the two interrupts into one.</li>
+		<li>Use the stat interrupt for both the &quot;switch from main context to async context&quot; and &quot;switch from async context to main context&quot; cases. This requires the handler code to reconfigure what handler code is being used! Self-modifying as heck!</li>
+		<li>What if the interrupts are needed for other functionality? Could you swap out interrupt handlers based on the state of the processor? How does this work with more interrupts?</li>
+		<li>Can you use the same technique to write code that executes during the h-blank period? Why not?</li>
+		<li>How would you pass information between the two threads? How would you store information about the state of the threaded code? How would you work with that information to make sure that all the code that needs to get executed does get executed?</li>
+		<li>What happens when the async code returns?</li>
+		<li>What do you do if you want to cancel the async thread?</li>
+	</ul>
+	<p>And finally, some disclaimers and warnings:</p>
+	<p>Variable names have been changed to protect the innocent. There&#39;s some layers of indirection I&#39;ve skipped, such as the interrupt vector jumping into a specified RAM address edited for configurable interrupts. This could have been avoided if I had known about the <code>jp hl</code> instruction, but it&#39;s probably faster. Concurrent access to RAM is the same headache as in modern code, which has caused some truly perplexing bugs - there&#39;s a very small chance that a context switch will happen between writing the first byte of an important memory address and the second byte, which can wreak havoc. I found it was usually sufficient to temporarily disable interrupts to make operations atomic - surround a memory access with <code>di</code> and <code>ei</code> to turn interrupts on and off, and they&#39;ll get handled afterwards if they happened in between. I am not an expert. In fact I know shockingly little about the conventional wisdom. This does not constitute legal or medical advice.</p>
+	<p><em>I can be reached as &quot;shoofle&quot; wherever the internet is sold - most frequently these days i&#39;m on <a href="https://beach.city/@shoofle">the fediverse as @shoofle@beach.city</a>, or on <a href="https://bsky.app/profile/shoofle.bsky.social">bluesky as shoofle.bsky.social</a>, or on <a href="https://ada-adorable.tumblr.com">tumblr as ada-adorable</a>. Gimme a holler if you read this!</em></p>
+</article>
--- a/index.html
+++ b/index.html
@ -17,6 +17,12 @@
 		<div class="row-fluid">
 			<div class="span8">
 				<div class="row-fluid">
+					<div class="span4">
+						<div class="small_project art">
+							<p class="description"><a href="articles/lcdt/async/">LCDT: Async Assembly Adventure</a></p>
+							<p class="name">I wrote about developing asynchronous execution in assembly on the nintendo game boy.</p>
+						</div>
+					</div>
 					<div class="span4">
 						<div class="small_project web">
 							<p class="description">The shoofle.net constellation!</p>
@ -29,20 +35,22 @@
 							<p class="name">I created an artsy phonetic writing system for making intricate circles.</p>
 						</div>
 					</div>
+				</div>
+				<div class="row-fluid">
 					<div class="span4">
 						<div class="small_project game">
 							<p class="description"><a href="articles/atelier-phoebe">Atelier Phoebe</a></p>
 							<p class="name">An Atelier fangame made for the <a href="https://www.lexaloffle.com/pico-8.php">Pico-8</a>.</p>
 						</div>
 					</div>
-				</div>
-				<div class="row-fluid">
 					<div class="span6">
 						<div class="small_project game">
 							<p class="description"><a href="articles/mindjail-engine">A 2D Physics-Based Game Engine in Python with OpenGL</a></p>
 							<p class="name">The Mindjail Engine, my hand-crafted 2D game engine!</p>
 						</div>
 					</div>
+				</div>
+				<div class="row-fluid">
 					<div class="span3">
 						<div class="small_project art">
 							<p class="description"><a href="articles/mirror-to-yesterday">the Mirror to Yesterday</a></p>