RSP Interpreter User's Manual

  1. Installation
    1. System Requirements
      1. Do I really need to install Microsoft Windows?
      2. Why do I need SSE2? Why not SSE1?
      3. What emulators support this RSP plugin system?
    2. File Placement
  2. Program Usage
    1. Using the Right Plugins
      1. Available Graphics LLE Plugins
      2. Available Audio LLE Plugins
    2. Using Other Plugins
      1. Available Graphics HLE Plugins
      2. Available Audio HLE Plugins
    3. What could happen if I use the wrong plugins?
    4. Configuring the RSP Interpreter Settings
      1. Send display lists to the graphics plugin
      2. Send audio lists to the audio plugin
      3. Force CPU-RSP signals synchronization
      4. Support CPU-RSP semaphore lock
    5. Can't I just use the Use High Level GFX and Use High Level Audio options in Project64?
    6. Error Messages
    7. If I set HLE for both graphics and audio, won't this plugin be useless?
  3. About this Software
    1. What is this software?
    2. Why is this emulator an interpreter?
    3. What is so accurate in this RSP interpreter over Project64's?
    4. How was this software developed?
  4. Build Instructions
    1. Software Needed to Build
    2. Build Configuration
    3. Build Maintenance
    4. Benchmarking RSP Build Speeds
    5. Technical RSP Information
  5. History
    1. Credits
    2. Change Log

1: Installation

1.1: System Requirements

The following system specifications are required to use the binaries provided in this distribution. To run this emulator straight out of the box, you will need:

1.1.1: Do I really need to install Microsoft Windows?

Certainly not. As a matter of fact, in the absence of any operating-system-specific code, a direct port to the Mupen64Plus variation on Linux was nearly effortless, but that fork is not in the scope of this document since it is not guaranteed to sync all features of the master branch. Any operating system supporting the C language could run this program, although some targets will inevitably be more favorable than others. See section 4 for build details and instructions.

1.1.2: Why do I need SSE2? Why not SSE1?

Technically, you need neither. From a developer's perspective, the intelligence of the GNU compiler suite is able to vectorize all of the ANSI C code into any variation of SSE. (Of course, the actual C source code its self needed a lot of revision to adjust to optimal, automatic SSE code generation.) The price of this is that the raw output of compiling this ANSI C but SSE-style code, without any SSE code generation at all, will have poorer run-time performance than it would if the code were improved upon in favor of the interpretation of a compiler with pure scalar (no SSE) code output. At the very least, attempt an MMX build for your machine. SSE1, on the other hand, offers nothing for RSP emulation because its extensions revolve around floating-point, 32-bit data elements, whereas RSP data elements are 16-bit integers.

1.1.3: What emulators support this RSP plugin system?

The "common plugin specifications" by zilmar were somewhat influenced by the closed-source Nemu64 plugin system, but only a select few emulators went so far as to actually integrate this system for loading plugins of the RSP type. As of the time of this writing, they are 1964, Mupen64, and Project64. Any other emulators, or older versions of Mupen64 and 1964, revolve largely around only the high-level simulation of just specific RSP microcode tasks, statically pre-translated to Intel x86 code for efficient but inaccurate imitation of the 3-D graphics and audio (and other multimedia) task algorithms, due to the long-time lack of stable and complete documentation about the RSP hardware in the code off of which those emulators were based, apart from zilmar's reverse-engineering of the RSP in Project64.

1.2: File Placement

In addition to the files listed above, you might also eventually encounter the following files while using the plugin. None of them are currently provided in the downloads archive, of course.

2: Program Usage

2.1: Using the Right Plugins

Project Reality's signal processor is ultimately the Nintendo 64's communicator. The emulator will not just work randomly with carelessly selected graphics and audio plugins. In fact, by default, this RSP emulator assumes that the user's chosen graphics and audio plugins are low-level emulation (LLE) plugins, realistically operating on the hardware (low) level of the actual Nintendo 64 system, for more universal compatibility and accuracy. If this is not the case for either your currently selected graphics or audio plugin, you may need to change plugins!

2.1.1: Available Graphics LLE Plugins

The sudden leak of documentation on the RDP has paved way for the introduction of a few LLE graphics plugins. As of the time of this writing, they are Jabo's Direct3D8 (supports both HLE and LLE in one plugin and uses whatever the RSP tells it to), ziggy's OpenGL backport of the MAME team's RDP reverse-engineering, and angrylion's software-rendering, per-pixel-accurate video interface deviating off MooglyGuy's MAME contributions into a zilmar plugin system port. The first option, Jabo's Direct3D, only has near-full RDP emulation as of versions 1.7.x in LLE, and even then was never very complete in implementing what little was known, but nonetheless is perhaps the most stable choice. The second option, ziggy's "z64gl", is much less stable in several areas than Jabo's Direct3D, but has an overall better quality of RDP implementation and triangle rendering and, when configured properly, is perhaps the fastest possible choice currently for emulating graphics in LLE. The third option, on the other hand, angrylion's `nocomment` repository, despite many valiant efforts of optimizations by angrylion and even sometimes the author of this plugin, pays the definitive price of terrible speed for its extraordinary precision of RDP commands and VI rendering accuracy on the per-pixel level of the "real" graphical filtering on the hardware.

2.1.2: Available Audio LLE Plugins

The vast majority, by far, of audio plugins emulate sound on the low level with the RSP. You should have way more trouble finding an audio plugin which is not LLE (such as 1964's new audio plugin and LaC's outstanding audio plugin for Nemu64), before you should end up trying to find one that is LLE. Even Azimer's audio plugins, although dedicated to HLE, easily support LLE as well within the same plugin if the RSP should request executing this mode.

2.2: Using Other Plugins

Say, you do not necessarily want LLE accuracy. You are satisfied using HLE plugins instead, as long as they work with the variety of games you favor. If this is the case, it is possible for you to use other plugins with this RSP emulator besides the "right" ones. This, however, requires changing the settings of this program, which will be discussed in section 2.4. For now, let's take a quick look at the available HLE plugins that you could use with this interpreter.

2.2.1: Available Graphics HLE Plugins

The vast majority, by far, of graphics plugins simulate the RDP on a high level. You most likely are familiar with the most compatible ones already as of the time of this writing—Jabo's Direct3D, glN64 and Direct64, Glide64 (and its upcoming successor in OpenGL), and Rice's Video Plugin versions 6.1.0 and earlier.

2.2.2: Available Audio HLE Plugins

Azimer did the optimizing approach to statically conduct the pre-translated RSP audio microcodes within his audio interface emulator, which supplemented the rest of Mupen64's high-level RSP translations for miscellaneous task types by Hacktarux. Possibly a more accurate, or at least automated, implementation of audio HLE is the 1964 Audio Plugin's use of microcode de-compilation to the C language (and re-compilation, of course, to the code for our processors) of the audio microcodes discovered throughout almost all of the Nintendo 64 games, including MusyX ones. As of the time of this writing, there are no other known, legitimate implementations of audio HLE in zilmar-spec plugins.

2.3: What could happen if I use the wrong plugins?

If the RSP emulator is set to treat graphics as HLE while you use a LLE graphics plugin, or if it is set to treat it as LLE in conjunction with a HLE graphics plugin, you might get a few raw colored frame buffer pixel maps onto the screen which are independent of RDP commands on RSP-processed data, but you will most likely just see nothing (even worse yet, may also hear nothing if this problem stalls the CPU host). As for mixing HLE/LLE audio on the RSP side with a LLE/HLE audio plugin, hope for your own good that you don't hear anything at all. If this is what you want, use zilmar's No Sound plugin while processing audio RSP tasks as HLE within this RSP plugin. The next section discusses how to arrange that.

2.4: Configuring the RSP Interpreter Settings

The basic idea is the same for 1964, Mupen64 and Project64: "[Configure] RSP Plugin/Settings..." from under the "Options" menu of these host CPU emulators. Alternatively, launch Garteal's sp_cfgui.exe GUI directly from where you installed it. There are currently only four settings you can change. All settings will take effect immediately without delay.

2.4.1: Send display lists to the graphics plugin

Each time the VR4300 pokes the RSP to execute a graphics task, the RSP will instead forward this list to the user's selected graphics plugin for high-level preprocessing. Obviously, this is not a very helpful option if your currently selected graphics plugin only supports LLE.

2.4.2: Send audio lists to the audio plugin

Each time the VR4300 pokes the RSP to execute an audio task, the RSP will instead forward this list to the user's selected audio plugin for high-level preprocessing. For any decent implementation on the audio plugin's side, this will effectively be audio HLE, though there is no absolute guarantee that the audio plugin didn't just copy an RSP interpreter inside of it to basically still do LLE.

2.4.3: Force CPU-RSP signals synchronization

In the absence of cycle-accurate CPU emulation of the Nintendo 64's VR4300 core, Project Reality's multimedia coprocessor (the RCP) also is not cycle-accurate. This never really matters for any commercial games except for Gauntlet Legends, Stunt Racer 64, World Driver Championship, and possibly in the steep depths of playability of a few other rare gems. Try this option if you find that, either while initially booting a ROM image to start executing or while in the middle of running a ROM image, the emulation thread seems to be frozen in a permanent loop, perhaps waiting on a specific signal to be set by the host CPU and read by the RSP in what should become an infinite semaphore wait loop. Unfortunately, this option requires special emulator host support which, as of yet, only is present in Project64 2.x. The use of this option with any other emulators may be unpleasant, so make sure that it is off by default.

2.4.4: Support CPU-RSP semaphore lock

The incorrect treatment of the RCP's SP_SEMAPHORE_REG by host emulation cores 1964, Mupen64, and even Project64 stems all the way back to the time of Project Unreality, the very first known Nintendo 64 emulator featuring bpoint's reverse-engineering of the RSP. It was not until recently that the surviving author of Project64, zilmar, received confirmation about the correct treatment of this register from the RSP-CPU viewpoint. The semaphore lock can now function accurately with this option enabled under Project64 2.x, although the only known, as of yet, noticeable impact it will have is showing Mario no Photopie graphics on start-up. Most of the time, it merely affects the flow of audio microcodes or the NUS-CIC-6105 boot task.

2.5: Can't I just use the Use High Level GFX and Use High Level Audio options in Project64?

No. Those options are integrated into the newer, unpublished version #1.2 of the RSP plugin specifications, in which it is instead the CPU host interface that controls what the RSP will communicate, instead of the RSP plugin its self. Reasons for this proposal are not entirely known, but the only open way to emulate the RSP is currently to use the design applied to version #1.1 of the plugin spec (the only published and available version, in fact). The fact is, unless you are using the one and only RSP plugin which conforms to #1.2—RSP 1.7.x by zilmar—the two options in question will have completely null effect on the emulation thread. You must use the options provided within this RSP emulator to dictate what it will treat as LLE or not.

2.6: Error Messages

Even if you understand everything in this manual, there are undiscovered situations where you may possibly run into error messages. In this case, the major purpose of having such error messages is to notify the user (more to the point, to notify any vendors maintaining the source code of this plugin) that some game may have exploited an implemented (or even perfected) but untested feature of this RSP emulator. A significant problem to the original developer of the plugin was the lack of accessibility to do real hardware reverse-engineering, so all accuracy was perfected from the point of view of unanimous agreement amongst all other certain sources of information (including some official knowledge, as well as MAME's successful reverse-engineering to test with zilmar's). In fact, sometimes to optimize the performance of an accurate interpreter, some pseudo-re-compiler strategies were the only option. That may mean a few holes of untested code, yet to be reached to entirely validate the implementation solely from what was documented, in which case, error messages must be thrown to alert. In any case, whoever is to blame for the message you find, here is a list of all possible error messages within the DLL.

When reporting error messages to any vendors of this plugin, first make sure that they are consistently reproducible. If they happen inconsistently, then that makes it a lot harder to test and isolate the bug. In fact, it may not even be a RSP bug. If you are playing a game on a VR4300 re-compiler core (the default of all the major emulators), the host CPU might have mistranslated something at a most inopportune moment, causing the incorrect data to be sent to the RSP. While, even still, such RSP instructions and cases could be implemented, chances are that you would not want execution to continue from that point, especially if the RSP task receiving the corrupted CPU data was one of audio, in which case, I hope for your sake that you are not wearing headphones turned up real loud. Besides, there is no real way to know that whatever implementation of that RSP edge case would be correct, solely by testing it with a ROM that only exploited it because of some re-compiler bug in the host CPU. Make sure that any active RAM or ROM hacks, such as GameShark cheat codes, were also disabled while you are trying to verify the consistency of this bug's occurrence. Once you feel sure that it must be a bug in this RSP interpreter, then please report it.

2.7: If I set HLE for both graphics and audio, won't this plugin be useless?

Since the only HLE implementation in this RSP plugin is to request a plugin of a different type (graphics or audio only) to conduct the pre-translated RSP microcode, it would most of the time make this plugin an empty shell for nothing more than forwarding most work to other plugins. However, while SGI intended the RCP as a "graphics and audio coprocessor", it was forward-extensible enough that they would eventually allocate for the possibility of other task types, such as MPEG video, JPEG decompression, HVQM video decompression as used in Pokémon Puzzle League, and even other unique possibilities. Since zilmar's plugin system does not directly handle any of these miscellaneous possibilities, they must either be executed in normal LLE or be simulated with HLE code from inside the RSP plugin itself. (A good example of this is Hacktarux's HLE code for Mupen64.) Since these task types are almost never encountered, there really is nothing to lose with executing them in LLE and sacrificing speed for accuracy for something applied so seldomly. But what happens if you play only the common games for this system, which do not need any of these other tasks? Then there would be no unique need for this RSP plugin over any of the others supporting requests in HLE. The next chapter will discuss goals, reasons, and uses behind this plugin.

3: About this Software

3.1: What is this software?

It is an interpreter for the Nintendo 64 (codename: Project Reality) media signal processor, the barrier of all communications between the console's VR4300 CPU core (based on revision 4300i of the MIPS architecture) and SGI's drawings display processor, the RDP.

3.2: Why is this emulator an interpreter?

It is an interpreter, because that is as accurate as it can get. (Of course, a re-compiler can be just as accurate as an interpreter in theory, but it is nowhere near as pure, small, simple, portable, and forward-extensible to the possibility of adding in cycle-accuracy.) It was created for the primary goal of surpassing Project64's RSP emulation in both speed and accuracy. The goal was successfully met for zilmar's already accurate RSP interpreter, which applied a few fixes discovered through the development of this plugin. The "speed" half of the goal, however, was not uncontestedly met for Jabo's dynamic re-compiler extension to zilmar's RSP interpreter, within the same RSP plugin for Project64. To this day, the Project64 RSP re-compiler remains anywhere between 0 to 10 percent faster than this RSP interpreter, depending on the level of RCP complexity for the current part of the game. Even so, many infamous games to the progress of Nintendo 64 emulation do not work with the current RSP recompiler, due to surviving bugs, and its speed is easily outclassed by doing the RSP in HLE instead of executing it at all, via dynamic re-compilation or otherwise.

3.3: What is so accurate in this RSP interpreter over Project64's?

To this day, the parts of Project64's RSP interpreter which may be exploited as bugs, incorrect implementation, or faulty programming, survive but stand few and far apart—far apart enough, that, it is unlikely that it will ever pose any end users a bug while actually playing a game in an emulator. This software has, however, been found to survive in games with faulty SP DMA requests sent by the host CPU's re-compiler, whereas zilmar's RSP interpreter is tied to a halting complaint. It emulates all of the system control registers and vector control registers (the RSP flags) in ways that correct the exploits of zilmar's RSP interpreter. It also, of course, accurately fights the RSP vector operations, with Intel SIMD vector operations in its own right, while Jabo's RSP re-compiler code is limited to MMX and lacks many of the special intrinsics needed for core vector management such as vector shuffling and staticizations of vector compare conditions. Hence the name of this plugin: "Static Interpreter". Even the need for branch weighing and jumping has been minimized where beneficial in the primary CPU loop control.

3.4: How was this software developed?

It was always developed in the C language, as there never were any opposing alternatives. Automatic intrinsics by compiling the source as C++ proved only to be detrimental, but the possibility is open to experiment with object-oriented templates and other intrinsic ideas of C++ in the future interest of this plugin by any other vendors. For the time frame in which it was appropriate, it was compiled and built using Microsoft Visual Studio 2010, but that is no longer a wise option. The C code is now so strongly styled in a way that adjusts to auto-vectorizer intelligence in the compiler, that the use of Microsoft Visual Studio and probably any compilers besides GCC is simply masochistic. The next chapter will discuss vendor concerns for building the software and maintaining it.

4: Build Instructions

4.1: Software Needed to Build

The GNU compiler collection is needed to properly build this plugin. On Microsoft Windows, any attempts to build the solution via Microsoft Visual Studio are not supported and, if successful, will be determintal to the performance of the resulting software for several reasons and compile-time optimization problems that still apply to versions of Visual Studio today. Instead, when compiling on Windows, use the MinGW suite. The easiest way to get it is to follow their instructions for doing an automatic installation, although the original, masochistic author of this plugin usually chooses the longwinded manual installation. As of the latest release of this software, it was compiled by GCC 4.8.1-4.

4.2: Build Configuration

When compiling a DLL on Windows, the command line invocation should look something like this for a machine which supports version 3 of Intel's Streaming SIMD Extensions (SSSE3):

GCC.EXE -S -O3 -DARCH_MIN_SSSE3 -mssse3 -mstackrealign -Wall -pedantic -o ../rsp/rsp.s ../../rsp/rsp.c
AS.EXE --statistics -o ../rsp/rsp.o ../rsp/rsp.s
GCC.EXE --shared -s -o ../rsp/rsp.dll ../rsp/rsp.o

If, of course, your system supports beyond SSSE3, you have the benefit of compiling the plugin to factor in extra support with -msse4, -mavx, -mfma, or even higher. The option that should always succeed in at least building, on the other hand, is -march=native.

When compiling a DLL without SSSE3 support, please attempt at least a build with the much more widely available SSE2, using a configuration like this:

GCC.EXE -S -O3 -DARCH_MIN_SSE2 -msse2 -mstackrealign -Wall -pedantic -o ../rsp/rsp.s ../../rsp/rsp.c
AS.EXE --statistics -o ../rsp/rsp.o ../rsp/rsp.s
GCC.EXE --shared -s -o ../rsp/rsp.dll ../rsp/rsp.o

4.3: Build Maintenance

On occasion, new releases to GCC will be made and eventually ported to MinGW on Windows. Sometimes, the new release will be beneficial to the optimized vectorization and other code generation for this program, but other times, it will be in a slight bit of a beta state. Even though GCC is the only real option for properly building the code here, is it not nearly as orthodox in its behavior as other compilers. At times, the performance may be even worse than the previous build. In fact, before the latest rewrite to the entire software source tree, it was much better off compiled using the stable 4.7.2 release of GCC than any of the 4.8.1 experiments. Incidentally, the converse came true after a point of restyling the CPU loop's natural flow to be more direct and resorting to implicit optimizations on using function pointer tables with smallish-bulky argument call stacks, resulting in a surprising overall boost in performance. You should check the rsp.s assembly output file generated by the compiler for precise details on comparing the code generation between compiler upgrades. You can also keep track of the manual downloads and updates of the MinGW port of GCC updates here.

4.4: Benchmarking RSP Build Speeds

The good news is that this software has its own built-in benchmark application for timing the latency of each RSP vector operation, for which SSE speed-ups are the most definitely applicable. The benchmark is initiated in the form of a call to the zilmar plugin system's `DllTest` procedure, which goes largely unused but was inherent since the Nemu64 generation of plugins off which the newer more common specifications had been based. Currently, the only main CPU emulator which enables you to call the `DllTest` procedure today is Mupen64 as of version 0.5, in the form of clicking the "Test" button while changing global plugin settings. This will log a text file of how many seconds it takes to execute most RSP vector operations 16,777,216 times. You can use this feature to help test changes to the program and verify either your updates to the source code or your use of updated versions of the GNU compiler collection.

4.5: Technical RSP Information

It is not easy to maintain an emulator for a system of which you know little about. Although an essayed documentary of the inside features is beyond the scope of this document, as well as anything openly available on the Internet, what little which can be legally said is this. It is a slave processor attached to its master, the Nintendo 64 VR4300 CPU core. It was based entirely on the MIPS R4000 model of the MIPS architecture, but remember that it is a slave processor and must execute any instruction, even illegal or reserved ones, to maintain its fast exception-free processing speed at efficiently conducting thousands to millions of 3-D graphics and audio computations in less than one second. For many varying reasons, different instructions originally available on the actual MIPS R4000 processor model, were removed from the op-code matrix of the RSP and sit there as reserved instructions. (The standard floating-point coprocessor, CP1, and the entire `COP1` primary operation code, was also subsetted out of the RSP's design.) For example, it is impossible to do any multiplies or divides on the RSP without using the Vector Unit (VU) operation codes under `COP2`. The basic template of the RSP VU's operation codes matrix boils down to this:

        000      001      010      011      100      101      110      111
     +--------+--------+--------+--------+--------+--------+--------+--------+
000  |MULT SP |MULT SP |DCT     |IQ      |MULT DP |MULT DP |MULT DP |MULT DP |
     +--------+--------+--------+--------+--------+--------+--------+--------+
001  |MULT SP |MULT SP |DCT     |IQ      |MULT DP |MULT DP |MULT DP |MULT DP | # acc
     +--------+--------+--------+--------+--------+--------+--------+--------+
010  |ADD     |ADD     |ADD     |ADD     |ADD     |ADD     |ADD     |ADD     |
     +--------+--------+--------+--------+--------+--------+--------+--------+
011  |ADD     |ADD     |ADD     |ADD     |ADD     |ADD     |ADD     |ADD     |
     +--------+--------+--------+--------+--------+--------+--------+--------+
100  |SELECT  |SELECT  |SELECT  |SELECT  |SELECT  |SELECT  |SELECT  |SELECT  |
     +--------+--------+--------+--------+--------+--------+--------+--------+
101  |LOGICAL |LOGICAL |LOGICAL |LOGICAL |LOGICAL |LOGICAL |LOGICAL |LOGICAL |
     +--------+--------+--------+--------+--------+--------+--------+--------+
110  |DIVIDE  |DIVIDE  |DIVIDE  |DIVIDE  |DIVIDE  |DIVIDE  |DIVIDE  |DIVIDE  |
     +--------+--------+--------+--------+--------+--------+--------+--------+
111  |PACK    |PACK    |PACK    |PACK    |PACK    |PACK    |PACK    |PACK    |
     +--------+--------+--------+--------+--------+--------+--------+--------+

Generally speaking, these operations take three operands (destination, source, source) of 128-bit vector register specifiers, between zero to thirty-one, and a fourth operand between 0 to 15 which dictates the decoding of the sixteen-bit-precise element shuffling with the second vector source operand, but the exact operations and behavior are left as an exercise to the adventurer who will read across Michael Tedder's RSP notes in the very first known Nintendo 64 emulator, Project Unreality, and zilmar's compilation of the RCP details in anarko's "n64ops" archive. Even more information, of course, is in the source code to this plugin and the MAME team's RSP emulator.

Lastly, this should be self-explanatory to any academically interested developer, but information on the RSP insofar as the standard scalar operations inherent from the original MIPS R4000 CPU can be found in great, open detail in the official manual for the MIPS R4000 CPU published by MIPS Technologies themselves.

5: History

5.1: Credits

The following people have had an impact on the direction of this software.

5.2: Change Log

  1. 2013.04.14—public release 1
  2. 2013.05.13—public release 2
  3. 2013.05.15—public release 3
  4. 2013.06.09—public release 4
  5. 2013.12.04—public release 5
  6. 2013.12.12—public release 6