The Retroputer's 6516 CPU executes instructions using the typical fetch, decode, and execute phases present in most CPUs. Additionally, the CPU features an instruction cache and a microtask queue. In this manner, the CPU can be busy decoding instructions well beyond the currently executing instruction. This means that each phase may not be operating on the same instruction.

The following logic is used on each clock cycle:

Because of the method by which this is implemented, it's possible for a single-byte instruction to execute within a single clock cycle. For example, consider clearing the CARRY flag:

CLR C

The above instruction is only a single byte long (B1). Once fetched, it is immediately possible to decode this instruction into it's requisite tasks (which the CPU proceeds to do):

CLEAR_FLAG_IMM 1

Because this instruction maps directly to a single microtask instruction, the CPU can immediately execute it (assuming there are no other microtasks in the queue).

Typically, however, an instruction will map to four or more microtasks — sometimes as many as twenty-four — and as such, the CPU is actually busy decoding instructions and adding additional microtasks to the queue while the queue itself is being serviced. Depending on the instructions in the execution stream, it's possible for the microtask queue to grow quite large due to the fact that the CPU may have decoded several instructions while waiting for the current instruction to complete.

This efficiency comes at a severe cost, however, when a branch is taken. The CPU does not attempt to do any form of branch prediction, and as such, it has to dump both its cache and microtask queue in order to service the branch.

The Microtask Queue

The Microtask queue is a 1024 byte queue. Each microtask is four bytes wide, and is added to the queue as instructions are decoded.

One microtask is executed per clock cycle. This means that some instructions may take several clock cycles to execute. In the worst case, the BR instruction may take twenty four cycles to complete.

The Instruction Cache

The instruction cache serves as temporary storage for bytes fetched from memory until the instruction can be completely decoded. Only one byte is fetched per clock cycle, so a four-byte instruction could take four bytes to fully fetch, decode, and start executing.

<aside> ⚠️ Should the instruction cache grow more than four bytes, an invalid instruction has been encountered, and the processor halts.

</aside>