Writing a Chip 8 Emulator (Part 2)

In a previous post, I wrote about the basic CPU structure needed to emulate a Chip 8. I also wrote about memory access, and how to define bytes and words. In this post, I will talk about how to interpret Chip 8 instructions, as well as how to set up a main emulator loop.

The Execution Cycle

Every computer executes a simple loop over and over again while it is powered up. This loop consists of the following steps:

  • Fetch
  • Decode
  • Execute

During the fetch portion of the loop, the CPU will grab the contents of memory where the Program Counter is currently pointing. This memory content indicates the instruction that the CPU should execute. For example, let’s say that the program counter points at memory location $0200, and that the data $1100 is stored there (technically, $11 is stored in location $0200, and $00 is stored in location $0201). The fetch portion of the cycle will load the CPU with the instruction $1100 (more on what this instruction actually does a little later).

The decode portion of the loop translates the instruction into a form that the CPU understands. Each instruction has a specific number of operands that are associated with it. These are essentially parameters that describe what registers or memory locations the instruction should actually be executed on. In the case of the Chip 8, the operands are all part of the 16-bit word that is read from the Program Counter. For example, with the instruction $1100, the $1 indicates that the instruction is a JUMP instruction, while the 100 is the operand that relates to the address that the CPU needs to jump to.

The execute portion of the loop actually executes the instruction on the operands that were previously decoded. In the case of the decoded JUMP instruction $1100, the CPU will advance the program counter to the memory location $100.

The computer will continue performing this loop over and over again until the power is turned off. Now that we understand what we need to do in the main loop, let’s look at how we decode the different instructions.

Anatomy of a Chip 8 Instruction

Chip 8 instructions are actually really nice to work with. Each instruction that the CPU can execute is exactly two bytes long, making them a 16-bit word in length. All of the data needed to execute the instruction is stored within this 16-bit word. Let’s take a look at a simple instruction and dissect how the operation works.

The Jump Instruction

The Chip 8 JUMP instruction tells the CPU to move the Program Counter to the address specified. The JUMP instruction is encoded as follows:


The n characters are used to denote 4-bit numbers. When combined, the numbers form a 12-bit address that the CPU should jump to. For example, if the CPU encountered the following instruction:


The CPU would set the program counter to $234. During the next loop of the CPU, the next instruction would be fetched from location $234.

The Load Operation

Let’s look at another example. The LOAD instruction tells the CPU to load a specific number into a register. The LOAD instruction is encoded as follows:


The s character is used to denote a 4-bit value that specifies a register to store the number in. The N characters denote 4-bit numbers that, when combined, indicate the 8-bit value that should be loaded into that register. For example, if the CPU encountered the instruction:


The CPU would load the value $48 into register 2.

Chip 8 Operands

We can start to see patterns in the way a Chip 8 instruction is encoded. The first hex digit of the instruction usually provides a hint at what major operation is about to occur. The next three hex digits encode numeric information, or the registers that the operations work on. Here is a mostly complete set of Chip 8 instructions:

Opcode Operands Description
00E0 0 Clear the screen
00EE 0 Return from subroutine
1nnn 1 Jump to address nnn
2nnn 1 Call routine at address nnn
3snn 2 Skip next instruction if register s equals nn
4snn 2 Do not skip next instruction if register s equals nn
5st0 2 Skip if register s equals register t
6snn 2 Load register s with value nn
7snn 2 Add value nn to register s
8st0 2 Move value from register s to register t
8st1 2 Perform logical OR on register s and t and store in t
8st2 2 Perform logical AND on register s and t and store in t
8st3 2 Perform logical XOR on register s and t and store in t
8st4 2 Add s to t and store in s – register F set on carry
8st5 2 Subtract s from t and store in s – register F set on !borrow
8s06 1 Shift bits in register s 1 bit to the right – bit 0 shifts to register F
8s0E 1 Shift bits in register s 1 bit to the left – bit 7 shifts to register F
9st0 2 Skip next instruction if register s not equal register t
Annn 1 Load index with value nnn
Bnnn 1 Jump to address nnn + index
Ctnn 2 Generate random number between 0 and nn and store in t
Dstn 3 Draw n byte sprite at x location reg s, y location reg t
Ft07 1 Move delay timer value into register t
Ft0A 1 Wait for keypress and store in register t
Fs15 1 Load delay timer with value in register s
Fs18 1 Load sound timer with value in register s
Fs1E 1 Add value in register s to index
Fs29 1 Load index with sprite from register s
Fs33 1 Store the binary coded decimal value of register s at index
Fs55 1 Store the values of register s registers at index
Fs65 1 Read back the stored values at index into registers

Bit Shifting and Bit Masks

So, when writing an emulator, how do you separate out the various operands? The simple way is through bit masks and bit shifting. A bit mask is a binary pattern, usually used to turn certain bits on or off, or used to determine if a particular bit pattern is activated. Bit shifting is where the bits in a pattern are moved left or right by a certain number of spaces. Let’s work through a simple mask and shift example.

Let’s say that we had the hex number $23. In binary, this would be the following:

0010 0011

The 0010 corresponds to the 2, while the 0011 corresponds to the 3. Now let’s say you want to get just the hex value of the first digit. What we do is apply a bit mask with a logical AND operation, and shift the bits.

To break this down into a few steps, first what we want to do is isolate the high 4-bits in the pattern, and blank everything else out. We do that by applying a bit mask that has all ones in the high 4-bits like so:

1111 0000

That number in hex is $F0. Now that we know the mask value we need to apply, the next thing we do is apply a logical AND operation to $23 and $F0. The way an AND works is if both digits are a 1, then a 1 is the result, otherwise, the value is a 0. When we apply the AND:

0010 0011
1111 0000
--------- &
0010 0000

The result we are left with is 0010 0000, which is the hex value of $20. Now that we have blanked everything else out, we perform the second step, which is the logical shift. We want to move all of the bits of the result 4 places to the left. In a language like C, this is done with the >> operator.

0010 0000
--------- >> 4
0000 0010

This leaves us with 0000 0010 – the hex value of $2, which is exactly what we wanted!

Shifting and Masking Chip 8 Instructions

Let’s take a look at the Chip 8 instruction to move a value of one register into another. This corresponds to the instruction 8st0. This says to move the value in register s into the register t. An example of this would be 8620, which would make the Chip 8 CPU move the value in register 6 into register 2.

The first thing we need is the value for s. Using a bit mask, we want the following:

1000 0110 0010 0000
0000 1111 0000 0000
------------------- &
0000 0110 0000 0000

Once again, the magic hex value is $F for the 4-bit pattern 1111. The same operation in hex gives us:

$ 8 6 2 0
$ 0 F 0 0
  ------- &
$ 0 6 0 0

This gives us the result $0600. Now we need to shift the bits 8 places to the left this time:

0000 0110 0000 0000
------------------- >> 8
0000 0000 0000 0110

The same operation in hex gives us:

$ 0 6 0 0
  ------- >> 8
$ 0 0 0 6

So now we know that the s value should be $6! We do the same thing for the t register, except this time we only need to perform a shift of 4 places to the left. The hex operations are as follows:

$ 8 6 2 0
$ 0 0 F 0 
  ------- &
$ 0 0 2 0
  ------- >> 4
$ 0 0 0 2

This tells us that the t value should be $2!

The key points to take away here are:

  • Groups of 4-bits correspond to a single hex number. For example, 1001 corresponds to the hex digit $9.
  • Bit masks allow you to figure out specific bit patterns in a larger bit sequence.
  • When performing bit shifting hex values, always shift in multiples of 4. So if you want to get the $6 in $8620, you need to shift 2 hex digits to the left, which is 8 binary bits to the left (2 x 4).
  • If you are decoding the JUMP instruction, which is 1nnn, you don’t need to shift anything! Just apply the bit mask of $0FFF and you have the address you need to jump to.


In this post, we looked at the execution loop, and examined how Chip 8 instructions are encoded. We also looked at bit shifting and bit masks, and how those two simple tools can be used during the decoding phase. In a future post, I’ll look at some of the Chip 8 instructions in more detail, as well as how to code the execution loop in a language like C.

Writing a Chip 8 Emulator (Part 1)

While I was in the middle of my PhD, I needed a coding project to keep my skills sharp. Even though my area was Computer Science, a lot of my work with Natural Language Generation and Semantics was theoretical (read: my Computer Science degree did not involve a lot of programming). I needed something fun that was not related to the many papers and books I was reading – something that I really enjoyed for no practical reason. Enter nostalgia for 1980’s vintage computers.

The CoCo

My first computer was a TRS-80 Color Computer. The Color Computer (CoCo) was a Motorola 6809-based machine that ran at 0.89 MHz, and was Tandy’s solution for at home personal computing. The Color Computer line was quite different from the Zilog 80-based business machines that earned the nickname “Trash 80”. The only similarity between those lines are the TRS-80 moniker and awesome chrome plating. My experience with the CoCo at the time consisted of punching in programs from various computer magazines (such as the amazing Rainbow publication), as well as typing school reports using Scripsit, and trying my hand at designing my own video games.

At the time, I didn’t appreciate the power of the CoCo. While it lacked dedicated sound and sprite hardware that made the Commodore 64 an awesome games platform, the 6809 processor was quite advanced at that point in time. Now – having more experience with computing theory and knowledge of computing hardware in general – I wanted to learn more about it. So, I decided to write a virtual version of one. However, having not written an emulator before, I needed to test out what techniques would work first on a smaller project.

Enter the Chip 8

The Chip 8 isn’t technically a hardware device – it’s actually an interpreted language. The whole point of the Chip 8 was to create a language which would have a standardized execution profile across different hardware platforms (much like the Java language specification and Java Virtual Machine). According to sources such as Wikipedia, Chip 8 virtual machines were written for several different platforms from the late 70’s to the early 90’s – the most notable being for HP graphics calculators.

What makes writing a Chip 8 emulator a good learning project is it’s simplicity. There are roughly 40 instructions, each of which is composed of 2 bytes. Compared to other architectures, the Chip 8 has only a single addressing mode (inherent), a simple memory structure, and straightforward I/O routines. There is also a wealth of knowledge readily available about the Chip 8, and many other implementations of it available for reference if you need to know how something should work.

Writing the Emulator

There are many sources out there that describe the Chip 8 and how to emulate it in more detail (for example, here, here, and here). I’ll touch upon some of the high points. For my first shot at writing a Chip 8 emulator, I decided to use C (later I went on to re-write it in both Python and Java. One of my reasons for doing so was due to C’s control over primitive types and memory allocation.

Defining Bytes and Words

There are only two real types that we need to define for the Chip 8 – a byte and a word. A byte is simply a structure or data type capable of storing an 8-bit unsigned integer. A simple way to represent a byte is to create a type definition for one using C’s char, which is usually 8-bits in length:

typedef unsigned char byte;

The other type that we need to define is a word – two bytes put together (in other words, a word is a 16-bit unsigned integer). However, we often want to access either the high or low 8-bits individually. While we could use a byte mask and shift values to accomplish this task, it’s much easier to define a structure that will allow us this type of access for free. C provides a union type that allows us to do this quite easily:

typedef union {
   unsigned short int WORD;
   struct {
         byte high, low;
         byte low, high;
   } BYTE;
} word;

The union essentially defines a word by saying there are two components – a WORD, which is an unsigned short integer (16-bits long), as well as a structure called a BYTE that has high and low components in it. So, to access the full 16-bit value, assuming you had a word called w, you would use w.WORD. Similarly, if you wanted the high 8-bit value of the word, it’s just w.BYTE.high, and w.BYTE.low for the lower 8-bits.

The #ifdef WORDS_LITTLE_ENDIAN statement is way of dealing with byte ordering. In machines that deal with data larger than 8-bits, there are two ways to store the data. The most significant byte (MSB) can be stored in the first memory location, or it can be stored last. For example, if we have the 16-bit value FFEE, the most significant byte is FF, and the least significant byte is EE. In memory, two locations (say 54 and 55) are used to represent this value. The endianness of the architecture determines what memory location the MSB is stored in. For big-endian machines, FF would be stored in location 54, while EE would be stored in location 55. With little-endian machines, the locations are reversed – EE would be stored in location 54, while FF would be stored in location 55.

Notice that in the union above, we define two different orderings for the byte interpretation of the wordlow, high in the case of big-endian, and high, low in the case of little-endian. By defining a word in this way, we can change the endianness of the emulator by defining a single constant. Notice also that the definition of big- and little-endian appear to be backwards – this is because I’m developing on an x86-based system, which is little-endian. In essence, this statement converts from a little-endian architecture into whatever architecture is required.


The Chip 8 is usually defined with 4 kilobytes (4K) of memory. The first 512 bytes are reserved for the interpreter, with the first 80 bytes being reserved for sprite information relating to the characters 0-9 and A-Z. Chip 8 programs typically start at memory address 0x200.

The memory model for the Chip 8 is quite simple. Some computer architectures at the time – such as the CoCo – had memory mapped I/O routines. This meant that to run specific input or output routines, you needed to write a byte sequence into a particular memory address. Because the Chip 8 I/O routines are captured as actual CPU instructions, there is no need to trap memory accesses and pass them to an I/O handler. If you want to, you could easily represent its memory as a fixed length array of bytes.

For my emulator, I chose to represent memory as a global variable in the form of a pointer to a malloced block of bytes. I then created simple helper functions to initialize memory. Note that the helper function takes the size of memory as an argument. The reason behind this is that I wanted to reuse as many components for the Chip 8 in a CoCo emulator, so being able to specify memory size was useful:

memory_init(int memorysize)
   memory = (byte *)malloc(sizeof (byte) * memorysize);
   return memory != NULL;

I also created helper functions to read and write to memory locations:

inline byte
memory_read(register int address)
   return memory[address];
inline void
memory_write(register word address, register byte value)
   memory[address.WORD] = value;

The purpose of these helper functions is again for the future. Because of the simplicity of the Chip 8 memory structure, you don’t really need functions to read and write from memory. However, for the CoCo emulator, because memory is I/O mapped, you need to filter the reads and writes so that you can perform I/O functions if need be. It was just as easy to create the functions here and get used to using them. To help with performance, I used the inline statement to ask the compiler to expand the function where it is called from, instead of jumping to the function. Likewise, the register function asks the compiler to keep the specified variables on a CPU register, again in an effort to make the functions faster.

The Virtual CPU

General Purpose Registers

The Chip 8 CPU contains 16 general-purpose 8-bit registers. The registers – from 0 through F – can store values from 0 to 255. The registers are all unsigned, meaning that the Chip 8 does not differentiate between positive and negative numbers. The F register is usually used to store information relating to carries or borrows when addition or subtraction operations are performed. The Chip 8 registers can easily be represented in an array (note – from here on, I will be using hexadecimal instead of decimal):

byte v[0x10];

The Index Register

In addition to the general purpose register, the Chip 8 contains a 16-bit index register I. The index register is used when iterating over arrays or tables. Its value is typically added to some other register to provide an address into memory.

word i;

The Stack Pointer

A special register called the stack pointer SP is used to store the current top of the hardware stack. The stack pointer is not directly accessible to the programmer. For example, the stack pointer is used by instructions such as RTS (return from subroutine) and STOR (store registers). In the case of RTS, the return address is stored in memory at the location pointed to by SP, and is then incremented. Similarly for STOR, the contents of the registers are stored in memory at the address of SP, and is incremented. Because the stack pointer can point anywhere in memory, it needs to be represented by a 16-bit value:

word sp;

Delay and Sound Timers

The Chip 8 defines two different timers – the delay timer register DT and the sound timer register ST. Both timers fire 60 times a second. When fired, the value in the timer registers is decremented by 1 until they reach zero. For the sound timer, a beep is emitted by the Chip 8 whenever the value is not zero. Both timers can store values from 0 – 255, making them byte sized.

byte st;
byte dt;

Program Counter

Perhaps the most important register is the program counter PC. The program counter keeps track of where in memory the next instruction should be fetched from. Because this value can come from anywhere in memory, it needs to store a 16-bit integer, making it word sized.

word pc;

Representing the CPU

To make all of these registers easier to keep track of, I defined a single structure:

typedef struct {
    byte v[0x10];      /**< Registers                 */
    word i;            /**< Index register            */
    word pc;           /**< Program Counter register  */
    word sp;           /**< Stack Pointer register    */
    byte dt;           /**< Delay Timer register      */
    byte st;           /**< Sound Timer register      */
    word operand;      /**< The current operand       */
} chip8regset;

This allowed me to create a single global variable called cpu that is a chip8regset. One of the items that I haven’t yet described is the operand. This is essentially the instruction that was fetched by the processor from the current address pointed to by the PC. The reason I have this here is because you often need to perform actions on it, and having it in one place is very convenient.

End of Part 1

In this article, I discussed the memory structure of the Chip 8, and described the elements that go into representing a Chip 8 CPU. In the next article, I will discuss the main emulator loop, as well as how to handle keyboard and screen interactions. The full source code of my Chip 8 emulator is available on GitHub. If C isn’t your language, you can also feel free to check out my Python implementation (which is meant specifically as a teaching tool), or my Java implementation.