The following system specifications are required to use the binaries provided in this distribution. To run this emulator straight out of the box, you will need:
Certainly not. As a matter of fact, in the absence of any operating-system-specific code, a direct port to the Mupen64Plus variation on Linux was nearly effortless, but that fork is not in the scope of this document since it is not guaranteed to sync all features of the master branch. Any operating system supporting the C language could run this program, although some targets will inevitably be more favorable than others. See section 4 for build details and instructions.
Technically, you need neither. From a developer's perspective, the intelligence of the GNU compiler suite is able to vectorize all of the ANSI C code into any variation of SSE. (Of course, the actual C source code its self needed a lot of revision to adjust to optimal, automatic SSE code generation.) The price of this is that the raw output of compiling this ANSI C but SSE-style code, without any SSE code generation at all, will have poorer run-time performance than it would if the code were improved upon in favor of the interpretation of a compiler with pure scalar (no SSE) code output. At the very least, attempt an MMX build for your machine. SSE1, on the other hand, offers nothing for RSP emulation because its extensions revolve around floating-point, 32-bit data elements, whereas RSP data elements are 16-bit integers.
The "common plugin specifications" by zilmar were somewhat influenced by the closed-source Nemu64 plugin system, but only a select few emulators went so far as to actually integrate this system for loading plugins of the RSP type. As of the time of this writing, they are 1964, Mupen64, and Project64. Any other emulators, or older versions of Mupen64 and 1964, revolve largely around only the high-level simulation of just specific RSP microcode tasks, statically pre-translated to Intel x86 code for efficient but inaccurate imitation of the 3-D graphics and audio (and other multimedia) task algorithms, due to the long-time lack of stable and complete documentation about the RSP hardware in the code off of which those emulators were based, apart from zilmar's reverse-engineering of the RSP in Project64.
bin/rsp.dll
—The SP emulator plugin, which goes in the CPU host emulator's plugins folder. Do not install and use this version of the plugin if your CPU lacks SSSE3 support, or it will almost certainly crash with a Win32 reserved instruction exception.bin/rsp_sse2.dll
—The SSE2 build of the Ultra64 SP interpreter. If your CPU supports SSSE3 or higher, just use `rsp.dll`. There really was no way to allocate vector register 16-bit element shuffling using static SSE2 code, without some kind of x86 re-compiler injection or deficient jump table, but, other than that, this version should be almost as fast as `rsp.dll`.bin/sp_cfgui.exe
—The SP settings configuration user interface, which goes in the CPU host emulator's main working directory. (Note that some emulator versions, such as Project64 2.0, but not 2.1 and later, might end up attempting to read base-directory files from somewhere other than the "EXE folder".)rsp_conf.cfg
—This is Garteal's text-based settings configuration file, which should be generated if necessary upon running his `sp_cfgui`. To be safe, include this from the downloads archive to the same folder as where you installed his `sp_cfgui.exe`.manual.html
—The current HTML2 user help manual.In addition to the files listed above, you might also eventually encounter the following files while using the plugin. None of them are currently provided in the downloads archive, of course.
rcpcache.dhex
—This is just a normal binary dump of the RCP's data memory, created and updated per each request to configure the RSP plugin during the emulation thread.rcpcache.ihex
—This is just a normal binary dump of the RCP's RSP instruction memory, created and updated per each request to configure the RSP plugin during the emulation thread.rsp_task.txt
—For diagnostic purposes, like the binary dumps. This, however, is essentially `rcpcache.ihex` translated into a text disassembly of the RSP machine code for the current microcode task being executed.SP_STATE.TXT
—Everything else about the RCP's communication with the host VR4300 CPU will be listed in this file.sp_bench.txt
—Vector operation benchmark results, recorded by using the `DllTest` feature via an emulator interface like Mupen64.Project Reality's signal processor is ultimately the Nintendo 64's communicator. The emulator will not just work randomly with carelessly selected graphics and audio plugins. In fact, by default, this RSP emulator assumes that the user's chosen graphics and audio plugins are low-level emulation (LLE) plugins, realistically operating on the hardware (low) level of the actual Nintendo 64 system, for more universal compatibility and accuracy. If this is not the case for either your currently selected graphics or audio plugin, you may need to change plugins!
The sudden leak of documentation on the RDP has paved way for the introduction of a few LLE graphics plugins. As of the time of this writing, they are Jabo's Direct3D8 (supports both HLE and LLE in one plugin and uses whatever the RSP tells it to), ziggy's OpenGL backport of the MAME team's RDP reverse-engineering, and angrylion's software-rendering, per-pixel-accurate video interface deviating off MooglyGuy's MAME contributions into a zilmar plugin system port. The first option, Jabo's Direct3D, only has near-full RDP emulation as of versions 1.7.x in LLE, and even then was never very complete in implementing what little was known, but nonetheless is perhaps the most stable choice. The second option, ziggy's "z64gl", is much less stable in several areas than Jabo's Direct3D, but has an overall better quality of RDP implementation and triangle rendering and, when configured properly, is perhaps the fastest possible choice currently for emulating graphics in LLE. The third option, on the other hand, angrylion's `nocomment` repository, despite many valiant efforts of optimizations by angrylion and even sometimes the author of this plugin, pays the definitive price of terrible speed for its extraordinary precision of RDP commands and VI rendering accuracy on the per-pixel level of the "real" graphical filtering on the hardware.
The vast majority, by far, of audio plugins emulate sound on the low level with the RSP. You should have way more trouble finding an audio plugin which is not LLE (such as 1964's new audio plugin and LaC's outstanding audio plugin for Nemu64), before you should end up trying to find one that is LLE. Even Azimer's audio plugins, although dedicated to HLE, easily support LLE as well within the same plugin if the RSP should request executing this mode.
Say, you do not necessarily want LLE accuracy. You are satisfied using HLE plugins instead, as long as they work with the variety of games you favor. If this is the case, it is possible for you to use other plugins with this RSP emulator besides the "right" ones. This, however, requires changing the settings of this program, which will be discussed in section 2.4. For now, let's take a quick look at the available HLE plugins that you could use with this interpreter.
The vast majority, by far, of graphics plugins simulate the RDP on a high level. You most likely are familiar with the most compatible ones already as of the time of this writing—Jabo's Direct3D, glN64 and Direct64, Glide64 (and its upcoming successor in OpenGL), and Rice's Video Plugin versions 6.1.0 and earlier.
Azimer did the optimizing approach to statically conduct the pre-translated RSP audio microcodes within his audio interface emulator, which supplemented the rest of Mupen64's high-level RSP translations for miscellaneous task types by Hacktarux. Possibly a more accurate, or at least automated, implementation of audio HLE is the 1964 Audio Plugin's use of microcode de-compilation to the C language (and re-compilation, of course, to the code for our processors) of the audio microcodes discovered throughout almost all of the Nintendo 64 games, including MusyX ones. As of the time of this writing, there are no other known, legitimate implementations of audio HLE in zilmar-spec plugins.
If the RSP emulator is set to treat graphics as HLE while you use a LLE graphics plugin, or if it is set to treat it as LLE in conjunction with a HLE graphics plugin, you might get a few raw colored frame buffer pixel maps onto the screen which are independent of RDP commands on RSP-processed data, but you will most likely just see nothing (even worse yet, may also hear nothing if this problem stalls the CPU host). As for mixing HLE/LLE audio on the RSP side with a LLE/HLE audio plugin, hope for your own good that you don't hear anything at all. If this is what you want, use zilmar's No Sound plugin while processing audio RSP tasks as HLE within this RSP plugin. The next section discusses how to arrange that.
The basic idea is the same for 1964, Mupen64 and Project64: "[Configure] RSP Plugin/Settings..." from under the "Options" menu of these host CPU emulators. Alternatively, launch Garteal's sp_cfgui.exe
GUI directly from where you installed it. There are currently only four settings you can change. All settings will take effect immediately without delay.
Each time the VR4300 pokes the RSP to execute a graphics task, the RSP will instead forward this list to the user's selected graphics plugin for high-level preprocessing. Obviously, this is not a very helpful option if your currently selected graphics plugin only supports LLE.
Each time the VR4300 pokes the RSP to execute an audio task, the RSP will instead forward this list to the user's selected audio plugin for high-level preprocessing. For any decent implementation on the audio plugin's side, this will effectively be audio HLE, though there is no absolute guarantee that the audio plugin didn't just copy an RSP interpreter inside of it to basically still do LLE.
In the absence of cycle-accurate CPU emulation of the Nintendo 64's VR4300 core, Project Reality's multimedia coprocessor (the RCP) also is not cycle-accurate. This never really matters for any commercial games except for Gauntlet Legends, Stunt Racer 64, World Driver Championship, and possibly in the steep depths of playability of a few other rare gems. Try this option if you find that, either while initially booting a ROM image to start executing or while in the middle of running a ROM image, the emulation thread seems to be frozen in a permanent loop, perhaps waiting on a specific signal to be set by the host CPU and read by the RSP in what should become an infinite semaphore wait loop. Unfortunately, this option requires special emulator host support which, as of yet, only is present in Project64 2.x. The use of this option with any other emulators may be unpleasant, so make sure that it is off by default.
The incorrect treatment of the RCP's SP_SEMAPHORE_REG
by host emulation cores 1964, Mupen64, and even Project64 stems all the way back to the time of Project Unreality, the very first known Nintendo 64 emulator featuring bpoint's reverse-engineering of the RSP. It was not until recently that the surviving author of Project64, zilmar, received confirmation about the correct treatment of this register from the RSP-CPU viewpoint. The semaphore lock can now function accurately with this option enabled under Project64 2.x, although the only known, as of yet, noticeable impact it will have is showing Mario no Photopie graphics on start-up. Most of the time, it merely affects the flow of audio microcodes or the NUS-CIC-6105 boot task.
No. Those options are integrated into the newer, unpublished version #1.2 of the RSP plugin specifications, in which it is instead the CPU host interface that controls what the RSP will communicate, instead of the RSP plugin its self. Reasons for this proposal are not entirely known, but the only open way to emulate the RSP is currently to use the design applied to version #1.1 of the plugin spec (the only published and available version, in fact). The fact is, unless you are using the one and only RSP plugin which conforms to #1.2—RSP 1.7.x by zilmar—the two options in question will have completely null effect on the emulation thread. You must use the options provided within this RSP emulator to dictate what it will treat as LLE or not.
Even if you understand everything in this manual, there are undiscovered situations where you may possibly run into error messages. In this case, the major purpose of having such error messages is to notify the user (more to the point, to notify any vendors maintaining the source code of this plugin) that some game may have exploited an implemented (or even perfected) but untested feature of this RSP emulator. A significant problem to the original developer of the plugin was the lack of accessibility to do real hardware reverse-engineering, so all accuracy was perfected from the point of view of unanimous agreement amongst all other certain sources of information (including some official knowledge, as well as MAME's successful reverse-engineering to test with zilmar's). In fact, sometimes to optimize the performance of an accurate interpreter, some pseudo-re-compiler strategies were the only option. That may mean a few holes of untested code, yet to be reached to entirely validate the implementation solely from what was documented, in which case, error messages must be thrown to alert. In any case, whoever is to blame for the message you find, here is a list of all possible error messages within the DLL.
Failed to read config.
—The configuration settings file, `rsp_conf.cfg`, could not be located in the emulator's working directory, or, is it currently locked away from present read-access by a pre-existing process which is currently using it in some way. Try launching Garteal's `sp_cfgui.exe` executable directly from Windows Explorer instead of through the emulator, and it should safely create the configuration file if it was not already found.SP_STATUS_HALT
—Weird. For some inconceivable reason, the host emulator called this RSP emulation plugin to start a task, but forgot to clear the HALT status so that microcode execution wouldn't be frozen. Nothing I can do about that....SP_SET_HALT
—While fetching the next instruction in the RSP CPU execution loop, an unexpected HALT status was abruptly encountered, without the typical BROKE status set from a `BREAK` operation.DPC_CLR_FREEZE
—While initiating a request to forward the upcoming graphics task to a high-level graphics emulation plugin, the RDP command buffer unexpectedly had FREEZE status set. zilmar's RSP interpreter makes sure this is NOT set when doing graphics in HLE, for reasons beyond the frame of time to his own memory, so we lack the information on when or how this could occur, or what really to do for sure.RESERVED\nSee SP_STATE.TXT.
—It would appear that you have managed to encounter an invalid primary R4000 or other RSP scalar operation code. Reserved instructions on the RSP definitely do write something, since the RCP has no exceptions, but we have absolutely no source of information as to what. Since this is extremely unlikely to occur in any legitimate and unhacked ROM images, it more likely indicates that you may have attempted to initiate the RSP emulator in LLE with a HLE graphics or audio plugin, or vice versa. Check your settings file for the program. SP_STATE.TXT is information useful to vendors maintaining the source code of this plugin, in case there really is something to blame on the RSP emulator.C2\nRESERVED
—It would appear that you have managed to encounter an invalid RSP vector operation code. Reserved instructions on the RSP do write something (in Michael Tedder's notes for the alleged `VSUT` op-code, zero) to the destination scalar or vector, but there appears to be no information out there whatsoever as to what specifically should happen. As such, again, as in the case above with reserved scalar instructions, this sort of thing is extremely unexpected to occur, in the absence of having your configuration of the emulator and RSP in conjunction with other plugins to blame. Make sure that the HLE/LLE settings in this plugin match with the plugins you choose!VMUL IQ
—Not only did you manage to encounter a reserved RSP vector operation, as in the case just above, but it was one of the vector op-codes which originally were, in fact, valid on the Ultra64 RSP prototype but removed in the final RCP product: either `VMULQ`, `VRNDN` or `VRNDP`. Perhaps you have loaded an executable to the emulation thread which adheres to the original Ultra64 CPU and RCP prototype. As such, perhaps someone could customarily implement these instructions, although they do not exist on the true Nintendo 64's RSP, since discrete cosine transformation (MPEG DCT) did not ultimately survive for that project.MTC0\nSP_STATUS
—A reserved and unused bit (or, the unimplemented SP_STATUS_SSTEP
bit) was encountered in the source scalar while accessing SP_STATUS_REG
in write mode.MTC0\nCMD_STATUS
—A reserved and unused bit was encountered while accessing DPC_STATUS_REG
in write mode, or the RSP is trying to clear the RDP command buffer/pipe busy registers.MTC0\nCMD_CLOCK
—According to some references, this could just be another one of the read-only RSP system control registers, but, according to others, there are implications of trying to write to it. We are not sure; we need to see a ROM try it.MTC0\nCMD_START
or MTC0\nCMD_END
—The RDP command buffer had BUSY status at the time that a scalar requested the RSP to overwrite either of these registers. However, it is probably impossible that this should occur, because it is pertinent to a cycle-accurate CPU host for running on top of the RSP, which we currently lack.MTC0\nInvalid write attempt.\nSR[%i] = 0x%08X
—Somehow, you have encountered a point where the RSP read in a scalar that requests writing over the contents of one of the read-only system control registers.LFV $v%i[%01X], 0x%03X($%i)
or SWV $v%i[%01X], 0x%03X($%i)
—Very rare ?WC2 operations which have never once been encountered while testing a ROM image.L/S?V\nIllegal element.
—This is an illegal RSP instruction, although there are no exceptions. We have yet to encounter an iteration of this instruction using an illegal element. Sometimes, it means the element is nonzero when it should be zero. For other op-codes, it means the element was not aligned on the same boundary as the offset coefficient for the load or store.L/S?V\nIllegal addr.
—This is an illegal RSP instruction, although there are no exceptions. We have yet to encounter an iteration of this instruction using an illegal effective address, of which any of bits one to three are set (addr & 0xE
).L/S?V\nWeird addr.
or L/S?V\nOdd addr.
—This is a legal RSP instruction in which the effective address has a legal but perverse alignment that destroys most of the parallel data writing possibilities attached to the pseudo-re-compiler speed-ups intended for the current `?WC2` operation.L/S?V\nOdd element.
—This is either a legal or an illegal RSP instruction in which the target vector element is not divisible by two. What's more? Whether it is legal or not, the pointer magic of C is insufficient for a direct approach at simultaneously fetching more data in parallel.VMACQ\nUnimplemented.
—It is the surviving op-code of the MPEG-DCT- or inverse-quantization-designated RSP circuitries, and we may have it correctly implemented with the more unanimous algorithm against the one in zilmar's RSP interpreter, but we have never once found the opportunity to test present itself.VSAW\nInvalid mask.
—The four-bit vector element was set to something other than either 0x8 (high), 0x9 (middle), or 0xA (low). How to handle this is quite a fairly certain matter, but again, untested.VRSQ\nUntested.
—Here, the implementation is almost certainly correct, as it is directly proportional to the `VRCP` operation in many references. However, it is a very curious thing that, after all of this time, no game has ever been found to execute it. As such, it is still untested.Cannot run RSP tests while playing!
—You hit the "Test" button in Mupen64, or otherwise invoked the DllTest procedure, but the RSP emulator detected traces of open RSP state information suggesting that an emulation thread might still be active. The simplicity of the RSP vector op-codes benchmark design feeds on the assumption that no emulation thread may currently be running.Virtual host map noncontiguity.
—Unbelievable. You are using a CPU host emulator, probably besides Project64, 1964 and Mupen64, which does not correctly map the RCP's DMEM and IMEM to the simple 4,096-byte distance proposed in the standard RCP memory map. Why is this an error? Because it means I have to make the SP DMA algorithm less accurate and, in addition, even slower than it naturally is, and its optimizations assuming proper allocation on the CPU host's part contributed a significant edge in the interpreter's speed.When reporting error messages to any vendors of this plugin, first make sure that they are consistently reproducible. If they happen inconsistently, then that makes it a lot harder to test and isolate the bug. In fact, it may not even be a RSP bug. If you are playing a game on a VR4300 re-compiler core (the default of all the major emulators), the host CPU might have mistranslated something at a most inopportune moment, causing the incorrect data to be sent to the RSP. While, even still, such RSP instructions and cases could be implemented, chances are that you would not want execution to continue from that point, especially if the RSP task receiving the corrupted CPU data was one of audio, in which case, I hope for your sake that you are not wearing headphones turned up real loud. Besides, there is no real way to know that whatever implementation of that RSP edge case would be correct, solely by testing it with a ROM that only exploited it because of some re-compiler bug in the host CPU. Make sure that any active RAM or ROM hacks, such as GameShark cheat codes, were also disabled while you are trying to verify the consistency of this bug's occurrence. Once you feel sure that it must be a bug in this RSP interpreter, then please report it.
Since the only HLE implementation in this RSP plugin is to request a plugin of a different type (graphics or audio only) to conduct the pre-translated RSP microcode, it would most of the time make this plugin an empty shell for nothing more than forwarding most work to other plugins. However, while SGI intended the RCP as a "graphics and audio coprocessor", it was forward-extensible enough that they would eventually allocate for the possibility of other task types, such as MPEG video, JPEG decompression, HVQM video decompression as used in Pokémon Puzzle League, and even other unique possibilities. Since zilmar's plugin system does not directly handle any of these miscellaneous possibilities, they must either be executed in normal LLE or be simulated with HLE code from inside the RSP plugin itself. (A good example of this is Hacktarux's HLE code for Mupen64.) Since these task types are almost never encountered, there really is nothing to lose with executing them in LLE and sacrificing speed for accuracy for something applied so seldomly. But what happens if you play only the common games for this system, which do not need any of these other tasks? Then there would be no unique need for this RSP plugin over any of the others supporting requests in HLE. The next chapter will discuss goals, reasons, and uses behind this plugin.
It is an interpreter for the Nintendo 64 (codename: Project Reality) media signal processor, the barrier of all communications between the console's VR4300 CPU core (based on revision 4300i of the MIPS architecture) and SGI's drawings display processor, the RDP.
It is an interpreter, because that is as accurate as it can get. (Of course, a re-compiler can be just as accurate as an interpreter in theory, but it is nowhere near as pure, small, simple, portable, and forward-extensible to the possibility of adding in cycle-accuracy.) It was created for the primary goal of surpassing Project64's RSP emulation in both speed and accuracy. The goal was successfully met for zilmar's already accurate RSP interpreter, which applied a few fixes discovered through the development of this plugin. The "speed" half of the goal, however, was not uncontestedly met for Jabo's dynamic re-compiler extension to zilmar's RSP interpreter, within the same RSP plugin for Project64. To this day, the Project64 RSP re-compiler remains anywhere between 0 to 10 percent faster than this RSP interpreter, depending on the level of RCP complexity for the current part of the game. Even so, many infamous games to the progress of Nintendo 64 emulation do not work with the current RSP recompiler, due to surviving bugs, and its speed is easily outclassed by doing the RSP in HLE instead of executing it at all, via dynamic re-compilation or otherwise.
To this day, the parts of Project64's RSP interpreter which may be exploited as bugs, incorrect implementation, or faulty programming, survive but stand few and far apart—far apart enough, that, it is unlikely that it will ever pose any end users a bug while actually playing a game in an emulator. This software has, however, been found to survive in games with faulty SP DMA requests sent by the host CPU's re-compiler, whereas zilmar's RSP interpreter is tied to a halting complaint. It emulates all of the system control registers and vector control registers (the RSP flags) in ways that correct the exploits of zilmar's RSP interpreter. It also, of course, accurately fights the RSP vector operations, with Intel SIMD vector operations in its own right, while Jabo's RSP re-compiler code is limited to MMX and lacks many of the special intrinsics needed for core vector management such as vector shuffling and staticizations of vector compare conditions. Hence the name of this plugin: "Static Interpreter". Even the need for branch weighing and jumping has been minimized where beneficial in the primary CPU loop control.
It was always developed in the C language, as there never were any opposing alternatives. Automatic intrinsics by compiling the source as C++ proved only to be detrimental, but the possibility is open to experiment with object-oriented templates and other intrinsic ideas of C++ in the future interest of this plugin by any other vendors. For the time frame in which it was appropriate, it was compiled and built using Microsoft Visual Studio 2010, but that is no longer a wise option. The C code is now so strongly styled in a way that adjusts to auto-vectorizer intelligence in the compiler, that the use of Microsoft Visual Studio and probably any compilers besides GCC is simply masochistic. The next chapter will discuss vendor concerns for building the software and maintaining it.
The GNU compiler collection is needed to properly build this plugin. On Microsoft Windows, any attempts to build the solution via Microsoft Visual Studio are not supported and, if successful, will be determintal to the performance of the resulting software for several reasons and compile-time optimization problems that still apply to versions of Visual Studio today. Instead, when compiling on Windows, use the MinGW suite. The easiest way to get it is to follow their instructions for doing an automatic installation, although the original, masochistic author of this plugin usually chooses the longwinded manual installation. As of the latest release of this software, it was compiled by GCC 4.8.1-4.
When compiling a DLL on Windows, the command line invocation should look something like this for a machine which supports version 3 of Intel's Streaming SIMD Extensions (SSSE3):
GCC.EXE -S -O3 -DARCH_MIN_SSSE3 -mssse3 -mstackrealign -Wall -pedantic -o ../rsp/rsp.s ../../rsp/rsp.c AS.EXE --statistics -o ../rsp/rsp.o ../rsp/rsp.s GCC.EXE --shared -s -o ../rsp/rsp.dll ../rsp/rsp.o
If, of course, your system supports beyond SSSE3, you have the benefit of compiling the plugin to factor in extra support with -msse4
, -mavx
, -mfma
, or even higher. The option that should always succeed in at least building, on the other hand, is -march=native
.
When compiling a DLL without SSSE3 support, please attempt at least a build with the much more widely available SSE2, using a configuration like this:
GCC.EXE -S -O3 -DARCH_MIN_SSE2 -msse2 -mstackrealign -Wall -pedantic -o ../rsp/rsp.s ../../rsp/rsp.c AS.EXE --statistics -o ../rsp/rsp.o ../rsp/rsp.s GCC.EXE --shared -s -o ../rsp/rsp.dll ../rsp/rsp.o
On occasion, new releases to GCC will be made and eventually ported to MinGW on Windows. Sometimes, the new release will be beneficial to the optimized vectorization and other code generation for this program, but other times, it will be in a slight bit of a beta state. Even though GCC is the only real option for properly building the code here, is it not nearly as orthodox in its behavior as other compilers. At times, the performance may be even worse than the previous build. In fact, before the latest rewrite to the entire software source tree, it was much better off compiled using the stable 4.7.2 release of GCC than any of the 4.8.1 experiments. Incidentally, the converse came true after a point of restyling the CPU loop's natural flow to be more direct and resorting to implicit optimizations on using function pointer tables with smallish-bulky argument call stacks, resulting in a surprising overall boost in performance. You should check the rsp.s
assembly output file generated by the compiler for precise details on comparing the code generation between compiler upgrades. You can also keep track of the manual downloads and updates of the MinGW port of GCC updates here.
The good news is that this software has its own built-in benchmark application for timing the latency of each RSP vector operation, for which SSE speed-ups are the most definitely applicable. The benchmark is initiated in the form of a call to the zilmar plugin system's `DllTest` procedure, which goes largely unused but was inherent since the Nemu64 generation of plugins off which the newer more common specifications had been based. Currently, the only main CPU emulator which enables you to call the `DllTest` procedure today is Mupen64 as of version 0.5, in the form of clicking the "Test" button while changing global plugin settings. This will log a text file of how many seconds it takes to execute most RSP vector operations 16,777,216 times. You can use this feature to help test changes to the program and verify either your updates to the source code or your use of updated versions of the GNU compiler collection.
It is not easy to maintain an emulator for a system of which you know little about. Although an essayed documentary of the inside features is beyond the scope of this document, as well as anything openly available on the Internet, what little which can be legally said is this. It is a slave processor attached to its master, the Nintendo 64 VR4300 CPU core. It was based entirely on the MIPS R4000 model of the MIPS architecture, but remember that it is a slave processor and must execute any instruction, even illegal or reserved ones, to maintain its fast exception-free processing speed at efficiently conducting thousands to millions of 3-D graphics and audio computations in less than one second. For many varying reasons, different instructions originally available on the actual MIPS R4000 processor model, were removed from the op-code matrix of the RSP and sit there as reserved instructions. (The standard floating-point coprocessor, CP1, and the entire `COP1` primary operation code, was also subsetted out of the RSP's design.) For example, it is impossible to do any multiplies or divides on the RSP without using the Vector Unit (VU) operation codes under `COP2`. The basic template of the RSP VU's operation codes matrix boils down to this:
000 001 010 011 100 101 110 111 +--------+--------+--------+--------+--------+--------+--------+--------+ 000 |MULT SP |MULT SP |DCT |IQ |MULT DP |MULT DP |MULT DP |MULT DP | +--------+--------+--------+--------+--------+--------+--------+--------+ 001 |MULT SP |MULT SP |DCT |IQ |MULT DP |MULT DP |MULT DP |MULT DP | # acc +--------+--------+--------+--------+--------+--------+--------+--------+ 010 |ADD |ADD |ADD |ADD |ADD |ADD |ADD |ADD | +--------+--------+--------+--------+--------+--------+--------+--------+ 011 |ADD |ADD |ADD |ADD |ADD |ADD |ADD |ADD | +--------+--------+--------+--------+--------+--------+--------+--------+ 100 |SELECT |SELECT |SELECT |SELECT |SELECT |SELECT |SELECT |SELECT | +--------+--------+--------+--------+--------+--------+--------+--------+ 101 |LOGICAL |LOGICAL |LOGICAL |LOGICAL |LOGICAL |LOGICAL |LOGICAL |LOGICAL | +--------+--------+--------+--------+--------+--------+--------+--------+ 110 |DIVIDE |DIVIDE |DIVIDE |DIVIDE |DIVIDE |DIVIDE |DIVIDE |DIVIDE | +--------+--------+--------+--------+--------+--------+--------+--------+ 111 |PACK |PACK |PACK |PACK |PACK |PACK |PACK |PACK | +--------+--------+--------+--------+--------+--------+--------+--------+
Generally speaking, these operations take three operands (destination, source, source) of 128-bit vector register specifiers, between zero to thirty-one, and a fourth operand between 0 to 15 which dictates the decoding of the sixteen-bit-precise element shuffling with the second vector source operand, but the exact operations and behavior are left as an exercise to the adventurer who will read across Michael Tedder's RSP notes in the very first known Nintendo 64 emulator, Project Unreality, and zilmar's compilation of the RCP details in anarko's "n64ops" archive. Even more information, of course, is in the source code to this plugin and the MAME team's RSP emulator.
Lastly, this should be self-explanatory to any academically interested developer, but information on the RSP insofar as the standard scalar operations inherent from the original MIPS R4000 CPU can be found in great, open detail in the official manual for the MIPS R4000 CPU published by MIPS Technologies themselves.
The following people have had an impact on the direction of this software.