Project Overview

This is a project to describe the file formats used in certain games developed by Zipper Interactive™:

  • the Recoil™ game (1999)
  • the MechWarrior 3™ base game (1999)
  • the MechWarrior 3 Pirate's Moon™ expansion (1999)
  • the Crimson Skies™ game (2000)

Zipper Interactive™ was trademark or registered trademark of Sony Computer Entertainment America LLC. Other trademarks belong to the respective rightsholders.

The main focus is MechWarrior 3.

This documentation can be used as a whitepaper for a clean room implementation to extract most MechWarrior 3 assets, or for reference for existing projects. Note that this project discusses the file structures, and not necessarily the contents of the files.

Terms and abbreviations

  • MW or MW3: MechWarrior 3, usually this means the base game and not the expansion.
  • PM: Pirate's Moon, aka. the expansion.
  • RC: Recoil.
  • CS: Crimson Skies.

License

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Pseudo-code conventions

All data types are little endian, unless noted otherwise. All strings are 8-bit US-ASCII, unless noted otherwise (i.e. a character occupies a byte, but only the lower 7 bits are used, the most significant bit is always zero).

All data types and structures are specified in pseudo-Rust code. If you do not know Rust, it should still be familiar.

Unsigned types are designated with u<bits>:

  • u8 is uint8_t or unsigned char/byte in C
  • u16 is uint16_t
  • u32 is uint32_t
  • u64 is uint64_t

Signed types are designated with i<bits>:

  • i8 is int8_t or signed char/char in C
  • i16 is int16_t
  • i32 is int32_t
  • i64 is int64_t

Floating point types are designated with f<bits>:

  • f32 is a single-precision IEEE 754 floating point number, float in C
  • f64 is a double-precision IEEE 754 floating point number, double in C

Note that for many integer data types, we don't know the exact bit size, or even if they are signed or unsigned, unless e.g. an obviously signed value was observed.

Fixed-length and variable length arrays are designated with [<type>; <length>], where the length may not be a valid Rust definition (for example, if it depends on another field).

Constants are aliases for a certain value that makes it convenient to reference by name. Constants will always have a data type specified.

#![allow(unused)]
fn main() {
const EXAMPLE: u32 = 1;
}

Structures are basically memory views/instructions on how to interpret a block of memory. Assume a C-compatible layout and 32 bit alignment (discussed more shortly). They have a name, and then list fields by name followed by a value. An example:

#![allow(unused)]
fn main() {
struct Example {
    foo: u32,
    bar: [f32; foo - 4],
}
}

This means read an unsigned integer of 32 bits/4 bytes, and then read foo - 4 32 bits/4 bytes floating point numbers.

For structures where the use of a field isn't known, they will be designated with "unk" and the offset of the field in the structure, e.g. unk08. Because MechWarrior 3 is a 32-bit executable and most likely written in C++ (based on the dependencies), the structures the game actually uses will follow those padding rules. The structures provided will either be already 32-bit aligned, or will have explict padding fields, designated with "pad" and the offset.

Tuples are sequences of types/elements. This is similar to structures, except that the fields aren't named:

#![allow(unused)]
fn main() {
tuple Example(f32, f32, f32);
}

This means a structure of 3 floating point values where the field names/usages aren't considered important. I will try to avoid tuples, but they are occasionally useful. You may always translate tuples into structures by naming the fields.

Enumerations are exclusive values, so only a single value is valid. The enumeration will have a integer type that indicates it's size when read. Zero (0) is not generally a valid value unless explicitly named:

#![allow(unused)]
fn main() {
enum Example: u16 {
    A = 1,
    B = 2,
}
}

Bitflags are similar to enumerations, but can have multiple values set or unset:

#![allow(unused)]
fn main() {
bitflags Example: u16 {
    A = 1 << 0, // 0x1
    B = 1 << 1, // 0x2
}
}

This means that zero (0) is generally valid (this means all unset). For the example, valid values are:

  • 0
  • 1 = A
  • 2 = B
  • 3 = A | B

Bitflags may also contain aliases of common flag combinations.

Introduction

To skip the rambling, go straight to the overview.

MechWarrior 3 history

If you want a more entertaining and complete history, Chase "Scharmers" Dahl has an awesome review called Fifteen Years of Giant Robots (specifically MechWarrior 3). Or if video is your thing, The Examined Life (of Gaming)'s MechWarrior Retrospective series (specifically MechWarrior 3) is slightly crude, but otherwise well researched.

I recommend both, and with reason. It's helpful to understand the development history around MechWarrior 3, which is complicated. And the time-frame allows us to put an upper bound on the hardware, software, and techniques available at the time.

The short version: In what is now typical fashion, MechWarrior 3 isn't the third instalment of the MechWarrior series, but the third generation. It was published in May 1999, with a new engine. It received an expansion pack, Pirate's Moon, and a Gold Edition release in September 1999. Due to the troubled development, the fourth generation released quite quickly afterwards, with MechWarrior 4: Vengeance in late 2000 in North America.

The engine seems to be largely developed by Zipper Interactive. Some people have had success using information in this project for other Zipper games, notably Recoil and Crimson Skies. The reverse was sadly not possible, since to my knowledge, no investigation of those games was published.

Why bother?

MechWarrior 4 certainly offer a more balanced, tactical approach with e.g. weapon hard-points to differentiate chassis. So why this game? In my mind, none of these games came close to the campaign of MechWarrior 3. Future campaigns have you starting off as a scrappy lance, but quickly growing and often being able to pick missions for different factions - which I never ended up caring about. MechWarrior 3 is different. Nothing comes close to having to complete an entire operation that goes wrong from the start, with limited supplies and out-of-date tactical information. Despite the troubled development which can be felt in lacking graphics for the time, barren landscapes, and lance mates you hear over the radio more often than you see them, the story shines. This is why it sticks in my head.

Seems I'm not the only one, as there are hundreds of posts trying to get it to work on modern Windows. The most promising approach is dgVoodoo 2, "a wrapper for old graphics API's for Windows Vista/7/8/10". There are still issues with the physics on today's fast processors though.

There is also a preservation aspect. Video game preservation should be important. After all, video games are the medium that has influenced me and many others the most. Preserving music, film, and television is comparatively simple. The day may come when we can emulate a Window XP PC well, but currently, it's hard to experience MechWarrior 3 at all. Being able to understand the assets is the first step.

As an aside, the German localisation is outstanding. Everything was localised, including the intro cinematic, the mission briefing, and in-game dialog. This was a huge selling point for me at the time (my English wasn't quite as polished yet), along with the kick-ass box art (seriously, that Mad Cat). The German CD cover is also gorgeous. Apparently though, the German version was censored. This rings true, as Germany has always had strict rules for video games certification via the USK and JuSchG. For example, the terrorists in "Command & Conquer: Generals" were replaced by robots. The gibs are quite gruesome when stomping on infantry, and seem largely unnecessary in a 'mech focused game. I will discuss the different versions shortly.

The MechWarrior 3 community

There still exists a modding community, and people still play MW3 online. This sounds ideal. When I reached out a few years ago, there was significant trepidation, since understanding game files could make cheating easier. Initially, I would have loved to build on the work of MW3 legends like Finnegan McCool (whom I didn't know at the time, and may have given me a warm welcome). But this is how it goes. And in retrospect, I think this was a blessing in disguise - I would've never started my project!

In the long run, not putting the information out there only hurts the community. People have to rely on out-of-date tools, into which they have no insight. No new tools can be written, and no progress can be made if the original authors leave. I hope my open approach changes this, and there are still enough people who care. There's a hard-core group out there thanks mainly to AncientxFreako, and it's just so great to be able to revitalise interest for a game I treasure.

Also, thanks to sarna.net for keeping all things BattleTech around in such a wonderfully accessible way (including patches).

MechWarrior 3 versions

Base game

In the US, there seem to have been a few releases: version 1.0, 1.1, 1.2, and Gold Edition. They can all be patched to 1.2. Presumably there was also a 1.1 patch (which I have not been able to find). In a weird quirk, the Gold Edition Readme says it is version 1.2, but it is still missing two multiplayer maps, zbd/c3/readermp3.zbd and zbd/c3/readermp4.zbd. Applying the 1.2 patch will install these.

Localisations and versions:

  • English (US): 1.0, 1.1, 1.2, Gold Edition
  • German (DE): 1.0, 1.2 patch exists
  • French (FR): 1.0, 1.2 patch exists
  • Italian (IT): Unconfirmed
  • Japanese (JA): 1.2 (メックウォリア3)
  • Taiwanese (TW): An extremely believable big box edition exists on eBay, but is horrendously expensive (機甲爭霸戰3, see BattleTech on zh.wikipedia.org or chiuinan.github.io)
  • Chinese/Hong Kong: Unconfirmed (Simplified: 机甲战士3, Traditional: 機甲戰士3, see BattleTech on zh.wikipedia.org or chiuinan.github.io)
  • English (GB): Unconfirmed if this is different than US, although redumps exist
  • Russian (RU): Unconfirmed, possibly a bootleg/fan translation only

Please do reach out if you have a version I'm missing. I would love to confirm the information holds for all versions.

I have installed all versions in a virtual machine, gathered the files, patched the versions to 1.2, and gathered the files again. This has allowed me to find differences, but also check that the structures, value-ranges, and methods should hold.

Expansion

I know a lot less about the Pirate's Moon expansion. For one, I never played it, as it was never released in German.

My focus has also been mainly on the base game, and there's still enough unknown information it that. I also only own a single US version of PM. Still, the code from the base game was easy enough to apply to Pirate's Moon, so some things could be discovered. When Pirate's Moon-specific information is known, it is noted in this project.

System requirements

MechWarrior 3 only runs on Windows, and required DirectX 6.1. It is probably a 32-bit executable, given the time frame. And it was likely programmed in C++, specifically Microsoft Visual C++ based on the dependencies. MechWarrior 3 came on a standard CD-ROM.

SpecMinimumRecommended
Operating system (OS)Windows 95Windows 98
Processor (CPU)Intel Pentium 166 MHzIntel Pentium 200 MHz
System memory (RAM)32 MB64 MB
Hard disk drive (HDD)240 MB390 MB
Video card (GPU)2 MB of VRAM8 MB of VRAM

DRM

The PC Gaming Wiki claims MechWarrior 3 is protected by Macrovision's SafeDisc DRM. At the time MW3 was released, only SafeDisc version 1 was available. Instructions from CD Media World on how to detect SafeDisc protection:

The following files should exist on every the original CD: 00000001.TMP, CLCD16.DLL, CLCD32.DLL, CLOKSPL.EXE, DPLAYERX.DLL

There is always a GAME.EXE and GAME.ICD file where the .ICD is the original game executable (in encrypted form) and the .EXE is a loader containing a parts of the SafeDisc protection.

(Formatting edited for readability.) The Wine mailing list agrees largely, sometimes SECDRV.SYS and DRVMGT.DLL are also found.

None of the US version I own have any of these files, the German version does though. It is possible the US versions have an earlier variant of SafeDisc copy protection, based on the earlier SafeAudio copy protection It uses weak sectors to detect when a disk has been copied. (For more information, see this CD Freaks/Myce article on SafeDisc 2.)

There are indications something odd is present on US disks. When I list the video directory, the date of the parent directory (..) is always mangled:

Version 1.00 (DE):

04/06/1999  02:25    <DIR>          .
04/06/1999  02:25    <DIR>          ..

Version 1.00 (US):

12/05/1999  02:18    <DIR>          .
The parameter is incorrect.
<0x16>?      <DIR>          ..

Version 1.1 (US):

09/07/1999  12:01    <DIR>          .
The parameter is incorrect.

?      <DIR>                              ..

Version 1.2 (US):

05/10/1999  08:35    <DIR>          .
<0x11>?      <DIR>          ..

SafeDisc itself is a liability, as the driver contains a buffer overflow vulnerability (CVE-2007-5587).

I don't want to comment too much on DRM, although as a customer, it has always been an annoyance and a hindrance for me. It is a concern for any effort legally examining the game. Some countries allow circumventing DRM for abandoned products or legitimate fair use. Some don't. This is why I've approached the project by installing the game, and then working on binary files. No DRM is bypassed.

MechWarrior 3 files overview

Installer

The MW3 installer is quite flexible, allowing selection of only some features to save hard drive space. The components and sub-components listed for a custom installation are:

  • Program files
    • Codec Files
  • AVI files
  • Software Render Files
    • Low Detail
    • Medium Detail
    • Best Detail
  • 3D Accelerator Files
    • 2 MB Card
    • 4 MB Card
    • 8 MB Card+
  • Sound
    • High Fidelity
    • Low Fidelity

Some files not directly installed that are discussed are ambient tracks and save games.

Please note that while many files have the ending .zbd, this does not mean they are in any way similar. Different .zbd files need to be parsed differently (they aren't even all archive files). It's possible .zbd stands for Zipper Binary Data.

Ambient tracks

The ambient tracks are never installed, and always streamed from the CD.

AVI files

If the AVI/video files are not installed, they will be read from the CD. These are the game intro, and cut scenes/mission briefings.

Sound

The high fidelity and low fidelity options installed soundsH.zbd and soundsL.zbd to the zbd directory, respectively. These are both sound archives. The demo only ships with medium fidelity sounds (soundsM.zbd). Additionally, the 1.2 patch installs some loose .wav files into the zbd directory.

Software render files

The software render files component installs textures for the software rendering to the zbd directory. They are largely campaign-specific.

For low detail c1\texture1.zbd, c2\texture1.zbd, c3\texture1.zbd, c4\texture1.zbd, c4b\texture1.zbd, and t1\texture1.zbd are installed.

For medium detail c1\texture2.zbd, c2\texture2.zbd, c3\texture2.zbd, c4\texture2.zbd, c4b\texture2.zbd, and t1\texture2.zbd are installed.

For best detail c1\texture.zbd, c2\texture.zbd, c3\texture.zbd, c4\texture.zbd, c4b\texture.zbd, and t1\texture.zbd are installed.

In each case, the 'mech textures rmechtexs.zbd are also installed.

All of these files are texture packages. The textures for software rendering are largely palette-based.

3D accelerator files

The 3d accelerator files component installs textures for the hardware rendering to the zbd directory. They are largely campaign-specific.

For 2 MB cards c1\rtexture2.zbd, c2\rtexture2.zbd, c3\rtexture2.zbd, c4\rtexture2.zbd, c4b\rtexture2.zbd, and t1\rtexture2.zbd are installed.

For 4 MB cards c1\rtexture3.zbd, c2\rtexture3.zbd, c3\rtexture3.zbd, c4\rtexture3.zbd, c4b\rtexture3.zbd, and t1\rtexture3.zbd are installed.

For 8 MB+ cards c1\rtexture.zbd, c2\rtexture.zbd, c3\rtexture.zbd, c4\rtexture.zbd, c4b\rtexture.zbd, and t1\rtexture.zbd are installed.

In the 2 MB case, the 'mech textures rmechtex16.zbd are also installed; otherwise, the 'mech textures rmechtex.zbd are also installed.

All of these files are texture packages. The textures for 3d accelerator rendering are not palette-based, but do have a reduced bit depth.

Program files

The program files component installs the following files to the specified install location:

  • force_eff.ifr: Probably force-feedback effects. I think this was a technology developed by the Immersion Corporation. The file extension .ifr stands for "Immersion Force Resource", which are pre-built effects authored in a tool called Immersion Studio. It's not clear how the game engine used these, and they deserve more investigation.
  • Mech3.exe: The main game engine executable. Not further discussed.
  • Mech3.icd: Only present for the German version, probably related to the SafeDisc DRM. Discussed tangentially in the introduction; otherwise not further discussed.
  • Mech3Msg.dll: A resource dynamic link library (DLL), which contains localised messages. Discussed in message table/translations.
  • MSN Gaming Zone.url: A Windows Internet Shortcut file, presumably to the MSN Gaming Zone, now known as MSN Games. Not further discussed.
  • ReadMe.doc or readme.doc, ReadMe.txt or readme.txt: The READMEs for the game in both Microsoft Word (.doc) and plain text (.txt) format. Not further discussed.
  • Uninstl.ddl and Uninstall.isu: Support files for the InstallShield uninstaller. Not further discussed.

These files are also installed on the system:

  • arial.ttf, impact.ttf, and lucon.ttf: Font files the game engine needs.
  • IFORCE2.dll: Probably force-feedback effects, see force_eff.ifr.
  • MSVCRT.DLL, msvcirt.dll, MSVCRT40.DLL, and MSVCP50.DLL: Support the Microsoft Visual C/C++ Runtime. These could be used to determine which MSVC version was used. Not further discussed.
  • MFC40.DLL and MFC42.DLL: Microsoft Foundation Class Library (MFC) dependencies. Not further discussed.

The codec sub-component also installs Ir50_32.dll. This video codec is relevant for the AVI files.

The program files component also installs all the necessary game files to the zbd directory in the specified install location. These are called database files, and have their own section below.

Database files

Database files are installed by the program files component. There are a lot of data files, and can be grouped into various categories. In general, database files are either:

  • global, in the root zbd directory
  • operation or chapter specific. The sub-directories c1, c2, c3, c4, c4b, and t1 seem to correspond to the operations of the campaign. t1 for the training operation, and c1 to c4 for the main campaign's operations/chapters. One oddity is c4b, which is possibly split off because the third and fourth operations (c3/c4) had 6 missions each (instead of four), and there was some kind of game engine limitation
  • mission specific. Multiplayer or instant action scenarios are also "missions" associated with a specific operation/chapter. These are identified by the file name's suffix, e.g. m1 for mission 1, mp1 for multiplayer map 1, and ia1 for instant action scenario 1.

Texture packages

The rimage.zbd provides globally-used images, such as UI elements, menu backgrounds, and more. This file is a texture package, and can be read in the same way software render files and 3D accelerator files are read.

Reader archives

Reader archives contain game configuration. They can be global (reader.zbd), campaign-specific (c1\reader.zbd, c2\reader.zbd, c3\reader.zbd, c4\reader.zbd, c4b\reader.zbd, t1\reader.zbd), mission-specific (<chapter directory>\readerm*.zbd), multiplayer maps (<chapter directory>\readermp*.zbd), or instant action scenarios (<chapter directory\readeria*.zbd). This is the full list:

  • reader.zbd
  • c1\reader.zbd
  • c2\reader.zbd
  • c3\reader.zbd
  • c4\reader.zbd
  • c4b\reader.zbd
  • t1\reader.zbd
  • c1\readeria1.zbd
  • c1\readeria2.zbd
  • c1\readeria3.zbd
  • c1\readerm1.zbd
  • c1\readerm2.zbd
  • c1\readerm3.zbd
  • c1\readerm4.zbd
  • c1\readermp1.zbd
  • c1\readermp2.zbd
  • c2\readeria1.zbd
  • c2\readeria2.zbd
  • c2\readeria3.zbd
  • c2\readerm1.zbd
  • c2\readerm2.zbd
  • c2\readerm3.zbd
  • c2\readerm4.zbd
  • c2\readermp1.zbd
  • c2\readermp2.zbd
  • c3\readeria1.zbd
  • c3\readeria2.zbd
  • c3\readeria3.zbd
  • c3\readerm1.zbd
  • c3\readerm2.zbd
  • c3\readerm3.zbd
  • c3\readerm4.zbd
  • c3\readerm5.zbd
  • c3\readerm6.zbd
  • c3\readermp1.zbd
  • c3\readermp2.zbd
  • c4\readeria1.zbd
  • c4\readeria2.zbd
  • c4\readeria3.zbd
  • c4\readerm1.zbd
  • c4\readerm2.zbd
  • c4\readerm3.zbd
  • c4\readermp1.zbd
  • c4\readermp2.zbd
  • c4b\readerm4.zbd
  • c4b\readerm5.zbd
  • c4b\readerm6.zbd
  • t1\readeria1.zbd
  • t1\readerm1.zbd
  • t1\readerm2.zbd
  • t1\readerm3.zbd
  • t1\readerm4.zbd
  • t1\readermp1.zbd

Two more multiplayer maps are provided by the 1.2 patch: c3\readermp3.zbd and c3\readermp4.zbd.

Interpreter scripts

The interpreter scripts (interp.zbd) drive how the game engine loads the game data/worlds.

Mechlib archive

A single mechlib archive is installed, mechlib.zbd. This contains 'mech and mechlib model data.

Motion archive

A single motion archive is installed, motion.zbd. This contains the animation data for 'mech motion (e.g. walking).

Game world data

The game world data is called gamez.zbd, and so also known as GameZ files. Each operation/chapter has its own game world data in the sub-directory:

  • c1\gamez.zbd
  • c2\gamez.zbd
  • c3\gamez.zbd
  • c4\gamez.zbd
  • c4b\gamez.zbd
  • t1\gamez.zbd

Animation definition archives

While animation definitions are provided in some reader archives, they are also present in a compiled form in animation definition files, called anim.zbd. These correspond to each game world:

  • c1\anim.zbd
  • c2\anim.zbd
  • c3\anim.zbd
  • c4\anim.zbd
  • c4b\anim.zbd
  • t1\anim.zbd

Save games

TODO

Ambient tracks

Ambient tracks are music tracks, longer than sound effects. There are two 3 minute tracks for the base version of MechWarrior 3, and one 9.5 minute track for the Pirate's Moon expansion. They are never installed, and so must be retrieved from the CD. They are used as background music during missions.

Investigation

When I insert a MechWarrior 3 CD into a Mac, iTunes opens. When I insert a MechWarrior 3 CD into a Windows PC, this message shows:

"Select to choose what happens with enhanced audio CDs."

An enhanced audio CD contains data and audio on the same disk. So the ambient tracks are simply CD audio, which are presumably streamed from the CD during gameplay. A re-implementation should also be able to do this.

There are only two ambient/background tracks, roughly three minutes earch. Using a tool such as ExactAudioCopy (EAC)1, it is possible to copy the audio tracks as Waveform Audio files (WAV, *.wav) where it is legal to do so.

For individuals wanting to enjoy these tracks, it's worth noting these WAV files are rather large. For preservation, a lossless compression like FLAC uses about ~40% of the storage space. Since the tracks slightly differ between the different versions, for general use a lossy format like AAC with a bitrate of 128 kilobytes or above should be plenty. This produces file sizes around 10% of the original.

1

EAC is Windows only. Options on macOS are RIP, Max, XLD, or iTunes. There are many options on Linux, I suggest Morituri.

In-game use

To my knowledge, the ambient tracks do not play in the menus, only during gameplay. I don't know how the engine uses these tracks:

  1. Does the engine select a random track, or always starts on the first (audio) track?
  2. Does the engine loop the tracks once they finish playing, or is there simply slience after a mission time of over six minutes?

Appendix 1: Detailed version comparison

Between the versions, all the tracks had different CRC codes. Another oddity is the fact the tracks aren't in the same order on different versions. I'm unsure why this is. The difference in the audio data could be the result of the manufacturing process. For the German version, it could be due to the SafeDisc DRM (is 2 seconds longer). But they all sound indistinguishable for me, and the waveforms look the same, so it's probably fine.

A waveform plot of the shorter tracks, showing virtually indistinguishable waveforms. The data was resampled to mono, but the amplitude is not re-normalised. It may appear slightly quieter than it would be in stereo.

These are the detailed track information of all MechWarrior 3 versions I own:

v1.0 US

TrackStartLengthStart sectorEnd sectorSizeCRC
10:00.0059:12.450266444597.64 MiB
259:12.453:11.6926644528083832.28 MiB515BECAE
362:24.393:06.0628083929479431.30 MiB45D64143

CTDB TOCID: hUJiDDh7s2IYPP1GpLfGVYpIWxE-

v1.0 DE

TrackStartLengthStart sectorEnd sectorSizeCRC
10:00.0062:48.560282655634.00 MiB
262:48.563:06.0628265629661131.30 MiBEDCC302C
365:54.623:13.6929661231115532.62 MiBA262C28B

CTDB TOCID: vPmoaaMWAdaLNkVSMLqK2HZxmaE-

v1.1 US

TrackStartLengthStart sectorEnd sectorSizeCRC
10:00.0059:12.200266419597.59 MiB
259:12.203:06.0626642028037531.30 MiB825686B5
362:18.263:11.6928037629476932.28 MiB21627377

CTDB TOCID: WJdZLalC42N4VOtU.QQx5GDfvqI-

v1.2 US

TrackStartLengthStart sectorEnd sectorSizeCRC
10:00.0059:12.340266433597.62 MiB
259:12.343:06.0626643428038931.30 MiBDB5F6872
362:18.403:11.6928039029478332.28 MiB61502511

CTDB TOCID: WJdZLalC42N4VOtU.QQx5GDfvqI-

v1.2 PM

TrackStartLengthStart sectorEnd sectorSizeCRC
10:00.0022:18.110100360
222:18.119:22.61100361142571535BD032

CTDB TOCID: Y1qrr8eDEKTsSDhgyfHah6MGKzA-

Appendix 2: How the waveform plots were made

I used SciPy to read WAV file data, and matplotlib to plot them:

import numpy as np
from scipy.io import wavfile
from scipy.signal import resample

import matplotlib
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker


def time_ticker_format(x, pos=None):
    mins, secs = divmod(x, 60)
    return "{:.0f}:{:02.0f}".format(abs(mins), secs)


def plot_waveforms(tracks, save_name=None, resample_factor=40):
    """This function makes assumptions about the input data: stereo 44100 Hz 16-bit signed PCM"""
    data = []
    rates = []
    for track in tracks:
        rate, stereo = wavfile.read(track, mmap=True)
        samples, channels = stereo.shape
        assert channels == 2, "expecting stereo"
        mono = stereo.mean(1)
        # this is to make the data more resonable to plot
        resampled = resample(mono, int(np.ceil(mono.size / resample_factor)))
        data.append(resampled)
        rates.append(rate)

    rate = rates[0]
    assert all(rate == r for r in rates)

    count = len(tracks)
    fig, axis = plt.subplots(count, 1, figsize=(16, 4 * count))

    for ax, mono, name in zip(axis, data, tracks):
        samples = mono.size
        length = samples / rate
        time = np.linspace(0, length, num=samples)

        ax.plot(time, mono)
        ax.set_xlim(0, length)
        ax.xaxis.set_major_formatter(ticker.FuncFormatter(time_ticker_format))
        ax.xaxis.set_major_locator(ticker.MultipleLocator(20))
        ax.set_ylim(-(1 << 15), (1 << 15))  # signed 16-bit
        ax.yaxis.set_major_locator(ticker.NullLocator())
        ax.xaxis.set_label_text(name)

    fig.tight_layout()
    if save_name:
        plt.savefig(save_name)
        plt.close(fig)

AVI files

The Mechwarrior 3 intro and campaign videos are found in the video directory on the CD. They can also optionally be installed to the hard drive.

Investigation (MW3)

They are AVI containers (*.avi). The video codec is known from the installation, but we can confirm that and gather more information using ffmpeg, specifically ffprobe. This is for campaign.avi, information on all English video files can be found in the appendix:

Input #0, avi, from 'Campaign.avi':
  Duration: 00:03:24.27, start: 0.000000, bitrate: 3320 kb/s
  Stream #0:0: Video: indeo5 (IV50 / 0x30355649), yuv410p, 640x480, 3020 kb/s, 15 fps, 15 tbr, 15 tbn, 15 tbc
  Stream #0:1: Audio: pcm_u8 ([1][0][0][0] / 0x0001), 22050 Hz, 1 channels, u8, 176 kb/s

The video streams are encoded using Intel's Indeo codec (version 5, FourCC IV50). They are all 640x480 at 15 frames per second, although the bitrates vary from 3020 kb/s to 1260 kb/s. The audio streams are raw pulse-code modulation (PCM) at 22050 Hz, so uncompressed.

For the German version, these have the metadata "Sound Forge 4.0 Audio" attached, which was a German sound editing program, probably used by the localisation team.

These codecs were no doubt chosen because they could be decoded with very little CPU, not because of their quality. This is especially true if they had to be streamed from the CD. Codecs have come far since then, with ubiquitous hardware support. Indeo has at least one vulnerability, meaning the codec is unlikely to be installed on modern systems. Realistically, the best option is to re-encode at least the video using existing software (ffmpeg). Installing the old codec is obviously inadvisable, and reverse engineering the codec is complicated and unnecessary.

The file checksums between the US versions 1.0, 1.1, and 1.2 are exactly the same (on the CD - I don't think the patch affects the video files, simply based on the size, but haven't checked).

Re-encoding

TL;DR:

for f in *.avi
do
    ffmpeg \
        -i "$f" \
        -codec:v "libx264" \
        -preset "medium" \
        -crf "30" \
        -codec:a "aac" \
        -b:a "64k" \
        "${f%.*}.mp4"
done

To compress the audio, there are several options. If supported, advanced audio coding (ACC) is excellent at low bitrates, and for mainly speech, using 64 kb/s is fine without any concerns of quality loss. The command line options are -codec:a aac -b:a 64k1. AAC is patented and not all game engines support it. This is generally problematic for good audio codecs. A viable alternative is to not alter the audio and just copy it using -codec:a copy, as raw PCM support is ubiquitous.

As mentioned, I definitely wanted to re-encode the video because of known Indeo vulnerabilities. H.264/x264 is widely supported. Quality-wise, it's a bit trickier than the audio, because it's more subjective in comparisons. The original video is highly compressed, with visible compression artefacts - please keep this in mind, the re-encoded file can't be better than the original. So personally, I find the video re-encoded with a low bitrate fine. In fact, choosing a low bitrate smooths some of the original, block-y compression artefacts out (the smoothing could be done via processing at higher bitrates). But you can decide for yourself, in a minute I'll show how to compare the re-encoded to the original. And worst case, files can be re-encoded from the original again.

My recommendation is to use a fairly quick encoding to test things out, and a low quality factor. Something like -codec:v libx264 -preset medium -crf 28. It's worth reading the ffmpeg H.264 encoding guide if you wish to change these parameters. Choose a slower preset should deliver the same quality at a lower bitrate, at expense of encoding time. Choosing a lower crf value will increase the bitrate, which in theory increases quality. Given the source material, that probably won't do much those. Once you're happy with the parameters, I'd suggest using a slower preset for the final encoding, like veryslow, since processing power is cheap and these videos are short and have a tiny resolution (generally, the preset doesn't affect quality very much).

For a container format with maximum compatibility, I've chosen MPEG-4 (*.mp4), although if supported by your use-case, the open standard Matroska (*.mkv) is an excellent choice.

1

libfdk might be slightly higher in quality, and if your build of ffmpeg was compiled with libfdk support you could try using the libfdk_aac codec. That also enabled the use of variable bit rate. However, I don't think it's worth the effort. The input isn't exactly high quality in the first place, and the built-in AAC encoder is pretty good.

Comparing results

The MPV media player can play two (or more) videos side-by-side, which is great for comparing the encoded video.

mpv --lavfi-complex="[vid1][vid2]hstack[vo]" intro.avi --external-file=intro.mp4

In-game use

The introduction is played when the game is loading. The campaign videos are played when the campaign is started, and between missions.

Appendix 1: Modern codec performance

It's interesting to see just how far codecs have come. For those settings, the average reduction in size is 86% for the US version and almost 89% for the German version!

video/v1.0-us

FilenameOriginalCompressedReduction
intro.avi78.36 MiB5.47 MiB93.0%
Campaign.avi80.85 MiB12.45 MiB84.6%
c1.avi14.50 MiB1.36 MiB90.6%
c1m1.avi8.75 MiB0.97 MiB88.9%
c1m2.avi5.96 MiB0.77 MiB87.0%
c1m3.avi5.21 MiB0.74 MiB85.7%
c1m4.avi9.17 MiB1.16 MiB87.4%
c2.avi10.79 MiB1.67 MiB84.6%
c2m1.avi4.77 MiB0.65 MiB86.4%
c2m2.avi10.41 MiB1.22 MiB88.3%
c2m3.avi6.31 MiB0.75 MiB88.2%
c2m4.avi7.68 MiB0.79 MiB89.7%
c3.avi5.48 MiB1.62 MiB70.5%
c3m1.avi5.93 MiB1.06 MiB82.1%
c3m2.avi6.24 MiB1.02 MiB83.6%
c3m4.avi7.45 MiB1.12 MiB84.9%
c3m5.avi9.49 MiB1.08 MiB88.6%
c3m6.avi5.73 MiB0.84 MiB85.3%
c4win.avi23.98 MiB1.49 MiB93.8%

Average reduction: 86.5%

video/v1.0-de

FilenameOriginalCompressedReduction
intro.avi76.00 MiB5.33 MiB93.0%
Campaign.avi77.76 MiB11.35 MiB85.4%
c1.avi13.45 MiB1.36 MiB89.9%
c1m1.avi10.88 MiB0.97 MiB91.1%
c1m2.avi7.44 MiB0.77 MiB89.6%
c1m3.avi6.50 MiB0.74 MiB88.5%
c1m4.avi11.38 MiB1.16 MiB89.8%
c2.avi13.32 MiB1.67 MiB87.5%
c2m1.avi5.95 MiB0.65 MiB89.1%
c2m2.avi12.86 MiB1.22 MiB90.5%
c2m3.avi7.86 MiB0.75 MiB90.5%
c2m4.avi9.46 MiB0.79 MiB91.6%
c3.avi6.88 MiB1.62 MiB76.5%
c3m1.avi7.38 MiB1.06 MiB85.6%
c3m2.avi7.75 MiB1.02 MiB86.8%
c3m4.avi9.27 MiB1.13 MiB87.9%
c3m5.avi11.80 MiB1.08 MiB90.9%
c3m6.avi7.15 MiB0.84 MiB88.3%
c4win.avi23.98 MiB1.50 MiB93.8%

Average reduction: 88.7%

Appendix 2: English video file information

Input #0, avi, from 'Campaign.avi':
  Duration: 00:03:24.27, start: 0.000000, bitrate: 3320 kb/s
  Stream #0:0: Video: indeo5 (IV50 / 0x30355649), yuv410p, 640x480, 3020 kb/s, 15 fps, 15 tbr, 15 tbn, 15 tbc
  Stream #0:1: Audio: pcm_u8 ([1][0][0][0] / 0x0001), 22050 Hz, 1 channels, u8, 176 kb/s
Input #0, avi, from 'c1.avi':
  Duration: 00:00:58.00, start: 0.000000, bitrate: 2096 kb/s
  Stream #0:0: Video: indeo5 (IV50 / 0x30355649), yuv410p, 640x480, 1236 kb/s, 15 fps, 15 tbr, 15 tbn, 15 tbc
  Stream #0:1: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, 2 channels, s16, 705 kb/s
Input #0, avi, from 'c1m1.avi':
  Duration: 00:00:46.00, start: 0.000000, bitrate: 1595 kb/s
  Stream #0:0: Video: indeo5 (IV50 / 0x30355649), yuv410p, 640x480, 1275 kb/s, 15 fps, 15 tbr, 15 tbn, 15 tbc
  Stream #0:1: Audio: pcm_u8 ([1][0][0][0] / 0x0001), 22050 Hz, 1 channels, u8, 176 kb/s
Input #0, avi, from 'c1m2.avi':
  Duration: 00:00:31.67, start: 0.000000, bitrate: 1577 kb/s
  Stream #0:0: Video: indeo5 (IV50 / 0x30355649), yuv410p, 640x480, 1263 kb/s, 15 fps, 15 tbr, 15 tbn, 15 tbc
  Stream #0:1: Audio: pcm_u8 ([1][0][0][0] / 0x0001), 22050 Hz, 1 channels, u8, 176 kb/s
Input #0, avi, from 'c1m3.avi':
  Duration: 00:00:27.73, start: 0.000000, bitrate: 1577 kb/s
  Stream #0:0: Video: indeo5 (IV50 / 0x30355649), yuv410p, 640x480, 1258 kb/s, 15 fps, 15 tbr, 15 tbn, 15 tbc
  Stream #0:1: Audio: pcm_u8 ([1][0][0][0] / 0x0001), 22050 Hz, 1 channels, u8, 176 kb/s
Input #0, avi, from 'c1m4.avi':
  Duration: 00:00:48.33, start: 0.000000, bitrate: 1591 kb/s
  Stream #0:0: Video: indeo5 (IV50 / 0x30355649), yuv410p, 640x480, 1268 kb/s, 15 fps, 15 tbr, 15 tbn, 15 tbc
  Stream #0:1: Audio: pcm_u8 ([1][0][0][0] / 0x0001), 22050 Hz, 1 channels, u8, 176 kb/s
Input #0, avi, from 'c2.avi':
  Duration: 00:00:56.67, start: 0.000000, bitrate: 1596 kb/s
  Stream #0:0: Video: indeo5 (IV50 / 0x30355649), yuv410p, 640x480, 1265 kb/s, 15 fps, 15 tbr, 15 tbn, 15 tbc
  Stream #0:1: Audio: pcm_u8 ([1][0][0][0] / 0x0001), 22050 Hz, 1 channels, u8, 176 kb/s
Input #0, avi, from 'c2m1.avi':
  Duration: 00:00:25.27, start: 0.000000, bitrate: 1584 kb/s
  Stream #0:0: Video: indeo5 (IV50 / 0x30355649), yuv410p, 640x480, 1270 kb/s, 15 fps, 15 tbr, 15 tbn, 15 tbc
  Stream #0:1: Audio: pcm_u8 ([1][0][0][0] / 0x0001), 22050 Hz, 1 channels, u8, 176 kb/s
Input #0, avi, from 'c2m2.avi':
  Duration: 00:00:54.40, start: 0.000000, bitrate: 1605 kb/s
  Stream #0:0: Video: indeo5 (IV50 / 0x30355649), yuv410p, 640x480, 1275 kb/s, 15 fps, 15 tbr, 15 tbn, 15 tbc
  Stream #0:1: Audio: pcm_u8 ([1][0][0][0] / 0x0001), 22050 Hz, 1 channels, u8, 176 kb/s
Input #0, avi, from 'c2m3.avi':
  Duration: 00:00:33.33, start: 0.000000, bitrate: 1587 kb/s
  Stream #0:0: Video: indeo5 (IV50 / 0x30355649), yuv410p, 640x480, 1270 kb/s, 15 fps, 15 tbr, 15 tbn, 15 tbc
  Stream #0:1: Audio: pcm_u8 ([1][0][0][0] / 0x0001), 22050 Hz, 1 channels, u8, 176 kb/s
Input #0, avi, from 'c2m4.avi':
  Duration: 00:00:39.80, start: 0.000000, bitrate: 1618 kb/s
  Stream #0:0: Video: indeo5 (IV50 / 0x30355649), yuv410p, 640x480, 1286 kb/s, 15 fps, 15 tbr, 15 tbn, 15 tbc
  Stream #0:1: Audio: pcm_u8 ([1][0][0][0] / 0x0001), 22050 Hz, 1 channels, u8, 176 kb/s
Input #0, avi, from 'c3.avi':
  Duration: 00:00:29.27, start: 0.000000, bitrate: 1570 kb/s
  Stream #0:0: Video: indeo5 (IV50 / 0x30355649), yuv410p, 640x480, 1264 kb/s, 15 fps, 15 tbr, 15 tbn, 15 tbc
  Stream #0:1: Audio: pcm_u8 ([1][0][0][0] / 0x0001), 22050 Hz, 1 channels, u8, 176 kb/s
Input #0, avi, from 'c3m1.avi':
  Duration: 00:00:31.47, start: 0.000000, bitrate: 1579 kb/s
  Stream #0:0: Video: indeo5 (IV50 / 0x30355649), yuv410p, 640x480, 1260 kb/s, 15 fps, 15 tbr, 15 tbn, 15 tbc
  Stream #0:1: Audio: pcm_u8 ([1][0][0][0] / 0x0001), 22050 Hz, 1 channels, u8, 176 kb/s
Input #0, avi, from 'c3m2.avi':
  Duration: 00:00:33.20, start: 0.000000, bitrate: 1575 kb/s
  Stream #0:0: Video: indeo5 (IV50 / 0x30355649), yuv410p, 640x480, 1252 kb/s, 15 fps, 15 tbr, 15 tbn, 15 tbc
  Stream #0:1: Audio: pcm_u8 ([1][0][0][0] / 0x0001), 22050 Hz, 1 channels, u8, 176 kb/s
Input #0, avi, from 'c3m4.avi':
  Duration: 00:00:39.53, start: 0.000000, bitrate: 1580 kb/s
  Stream #0:0: Video: indeo5 (IV50 / 0x30355649), yuv410p, 640x480, 1260 kb/s, 15 fps, 15 tbr, 15 tbn, 15 tbc
  Stream #0:1: Audio: pcm_u8 ([1][0][0][0] / 0x0001), 22050 Hz, 1 channels, u8, 176 kb/s
Input #0, avi, from 'c3m5.avi':
  Duration: 00:00:50.07, start: 0.000000, bitrate: 1590 kb/s
  Stream #0:0: Video: indeo5 (IV50 / 0x30355649), yuv410p, 640x480, 1270 kb/s, 15 fps, 15 tbr, 15 tbn, 15 tbc
  Stream #0:1: Audio: pcm_u8 ([1][0][0][0] / 0x0001), 22050 Hz, 1 channels, u8, 176 kb/s
Input #0, avi, from 'c3m6.avi':
  Duration: 00:00:30.40, start: 0.000000, bitrate: 1582 kb/s
  Stream #0:0: Video: indeo5 (IV50 / 0x30355649), yuv410p, 640x480, 1266 kb/s, 15 fps, 15 tbr, 15 tbn, 15 tbc
  Stream #0:1: Audio: pcm_u8 ([1][0][0][0] / 0x0001), 22050 Hz, 1 channels, u8, 176 kb/s
Input #0, avi, from 'c4win.avi':
  Duration: 00:01:12.20, start: 0.000000, bitrate: 2786 kb/s
  Stream #0:0: Video: indeo5 (IV50 / 0x30355649), yuv410p, 640x480, 2481 kb/s, 15 fps, 15 tbr, 15 tbn, 15 tbc
  Stream #0:1: Audio: pcm_u8 ([1][0][0][0] / 0x0001), 22050 Hz, 1 channels, u8, 176 kb/s
Input #0, avi, from 'intro.avi':
  Duration: 00:03:02.47, start: 0.000000, bitrate: 3602 kb/s
  Stream #0:0: Video: indeo5 (IV50 / 0x30355649), yuv410p, 640x480, 2764 kb/s, 15 fps, 15 tbr, 15 tbn, 15 tbc
  Stream #0:1: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, 2 channels, s16, 705 kb/s

Archive files

For both the base game and expansion, archive files can be recognised by a table of contents (TOC) at the end of the .zbd file. This is a common strategy to be able to easily add entries to an archive without rewriting the entire archive. The new entry is written at the end, i.e. it overwrites the TOC, and then the TOC is written out fully with the new entry. This avoids having to rewrite the rest of the entries.

Known archive files are sound archives, reader archives, motion archives, mechlib archives, and save games. Other .zbd files may also contain multiple files, but are not archive-based (for example interpreter scripts, texture files).

Investigation (MW3)

The sound archives are good candidates to follow along, since their contents makes it obvious that the entry data is written from the start of the file (so the TOC must be at the end), and once extracted, you get .wav files that are easily validated to be correct (by listening to them).

For the base game, there are two fields at the end of the file:

#![allow(unused)]
fn main() {
struct Footer {
    version: u32, // always 1
    count: u32,
}
}

The version of the TOC (u32, at -8), and number of entries in the TOC (u32, at -4). The version will always be 1.

Each entry in the TOC is 148 bytes long:

#![allow(unused)]
fn main() {
struct Entry {
    start: u32,
    length: u32,
    name: [u8; 64], // zero-terminated/padded
    garbage: [u8; 76],
}
}

The start of the TOC is found by calculating the length of the TOC (number of entries * 148), adding the TOC "footer" (count, version) to that, and subtracting it from the length of the file, or seeking from the end of the file. Then read the entries.

Each entry specifies the start of the entry's data in the file, the length of the entry's data in the file, the name of the entry (zero-terminated, and padded with null bytes), and a field I've called "garbage". This can largely be ignored. It was supposed to be flags, a comment and the file time:

#![allow(unused)]
fn main() {
struct Entry {
    start: u32,
    length: u32,
    name: [u8; 64],
    flags: u32,
    comment: [u8; 64],
    time: u64,
}
}

Where the time is actually a Windows FILETIME structure. Ignore the low and high parts in the documentation, the easiest way to read this is as a 64-bit value, which is then "the number of 100-nanosecond intervals that have elapsed since January 1, 1601, Coordinated Universal Time (UTC)." (i.e. the Windows epoch).

Unfortunately, in some files (like the mechlib), the entry data was not properly zeroed out, and so this contains random memory.

Another trap is that entries are not necessarily deduplicated. There can be two or more entries with the same name. In all the files I have, entries with the same name contain the same data, but this isn't a guarantee.

How the entry data is interpreted depends on the archive type.

Investigation (PM)

The Pirate's Moon archives are similar to the base game, but there are three fields and the end of the file, and they do not have a backwards-compatible layout:

#![allow(unused)]
fn main() {
struct Footer {
    version: u32, // always 2
    count: u32,
    checksum: u32,
}
}

The version of the TOC (u32, at -12), the number of entries in the TOC (u32, at -8), and a checksum of the file data (u32, at -4). The version will always be 2. If they had left the version at -8, this would have made reading the file easier.

The new field is the checksum. For archive types other than reader archives, it will be 0. Maybe it was too time intensive to calculate the checksum for the bigger archives, or maybe they only introduced it to prevent cheating by modifiying the reader files, which are relatively easy to understand. It's unclear why it wasn't made backwards compatible though, or why the other archives didn't keep using version 1.

The checksum is an incorrectly implemented cyclic redundancy check (CRC32). It seems to be based on Ross William's A Painless Guide To CRC Error Detection Algorithms, specifically the "Roll Your Own Table-Driven Implementation" section. As noted in Michael Pohoreski (aka. Michaelangel007) excellent CRC32 Demystified, for the code given the bits in each data byte aren't reversed. Of note is additionally the initialization value of 0x00000000, and the fact that the final value isn't inverted/xor'd with 0xFFFFFFFF, as some other implementations do. Based on this information, I have managed to write code for calculating the Pirate's Moon checksums using a pre-calculated table. The pre-calculated table used is a standard CRC32 with the polynomial 0x04C11DB7, roughly:

#![allow(unused)]
fn main() {
for index in 0..256u32 {
    let mut crc = index << 24;
    for _ in (1..9).rev() {
        if (crc & 0x80000000) == 0x80000000 {
            crc = (crc << 1) ^ 0x04C11DB7;
        } else {
            crc = crc << 1;
        }
    }
    CRC32_TABLE[index] = crc;
}
}

A running CRC32 can then easily be calculated for arbitrary input, starting with the initial value:

#![allow(unused)]
fn main() {
pub const CRC32_INIT: u32 = 0x00000000;

fn crc32_update(crc: u32, buf: &[u8]) -> u32 {
    let mut crc = crc;
    for byte in buf {
        let index = (crc >> 24) ^ (*byte as u32);
        crc = CRC32_TABLE[index as usize] ^ (crc << 8);
    }
    crc
}
}

The CRC32 of an archive is calculated over all the entry data in the archive, in the order they are listed in the TOC, but does not include the TOC itself.

There is one more oddity for motion archives in PM. For these, the entry length will always be 1. The entry length can be calculated from the previous entry starting position, so e.g. sorting the entries by start, reversing them, and using the start of the TOC for the first (reversed)/last (unreversed) entry. Or, since the motion reading code can be made self-limiting, code can simply jump to the start and read the motion data.

Sound archives

Sound archives hold sound effects, used throughout the game in menus and in missions.

Investigation

Sound archives are the easiest type of archive to investigate in my opinion. Their contents makes it obvious how archive files are read.

The two hints as to what data these archives contain are that a) the 1.2 patch installs loose Waveform Audio Files, aka. WAVE or .wav into the zbd directory, and b) the starting data in the archives is b"RIFF \xe0\x02\x00WAVEfmt ", which is the magic RIFF header (Resource Interchange File Format), and a WAVE format.

There isn't much else to say about these files, since the hard part is reading the archive, and that code is common with other archives.

Maybe of interest for parsing the WAVE files to read the raw sound data as floating point values is that they are all mono or stereo files, and use only 8 or 16 bit samples. RIFF or WAVE parsing is out of scope for this documentation, but I have had no problems with parsing the sound files.

Another thing to remember is that as mentioned, the patch installs loose WAVE files in the zbd directory, which also need to be loaded to have all sound effects present.

In-game use

Sound effects are used throughout the game in menus and in missions. They are global, so it's easy to load them once and use them as needed throughout. With modern RAM sizes, this isn't a problem. The high fidelity sound archive is less than 100 MiB, and WAVE files are already uncompressed. Even if the sound data is parsed to floating point values, this should be less than 400 MiB.

Reader archives / binary reader files

Reader archives hold most of the games configuration in a Lisp-like list structure. Fair warning though that some of this information is duplicated inside anim.zbd files!

Binary and text reader files have the file extension .zrd, which could stand for Zipper Reader. Until 2022, I only knew of binary reader files. However, there exist text reader files, for example DefaultCtlConfig.zrd.

Investigation (MW3)

Once it is known how to read archive files (from e.g. the sound archives), the reader data is easy to figure out, since the binary structure is very simple and consistent.

To read a value, first a u32 (or i32) is read. This is the type of value, where 1 means integer (i32), 2 means float (f32), 3 means string, and 4 means list. No other types are seen. You can also think of the values as a tagged/discriminated union or a sum type.

For reading string values, read a u32 (or i32), which is the number of bytes in the string. Then read that many bytes. There is no zero-termination! One trap is that the string encoding is not exactly known. It could depend on the system's codepage. Interpreting the string as ASCII (0-127) seems to be the safest option, and the reader files never use values outside of ASCII. Another option would be to use codepage 1252 for the encoding.

For reading list values, simply read a u32 (or i32), which is the number of values in the list plus one (!). Then, read count - 1 items. Empty lists do exist, and list values can be of different types (so it is more like a tuple).

#![allow(unused)]
fn main() {
struct Integer {
    type_: u32, // always 1
    value: i32,
}

struct Float {
    type_: u32, // always 2
    value: f32,
}

struct String {
    type_: u32, // always 3
    length: u32,
    value: [u8; length], // not zero-terminated/padded
}

struct List {
    type_: u32, // always 4
    count: u32,
    values: [Integer/Float/String/List; count - 1],
}
}

The outermost value in a reader file seems to always be a list, so the data structure is self-terminating. This makes it easy to read the entire file.

While the binary structure is simple and consistent, the end result is not necessarily easy to consume. First, "keyed" data is annoying to look up for modern standards. There is no dictionary/map/object type. This means it's necessary to find the key in the list, and then the next index could be the data. There is no requirement a key is unique in a list. There is no requirement a key is followed by only one value. Sometimes, the following values are contained in a list (of size 1), sometimes, not:

[
    "key1",
    ["value1"],
    "key2",
    0.5,
    "key3",
    0.3,
    0.4,
]

Some lists are clearly a certain data type in the engine, but might contain different numbers of items, e.g. just a node name ["target_node"], a node name and translation ["target_node", 0.0, 0.0, 0.0], and potentially more forms.

So it seems like data lookup/interpretation is completely custom. Still, with a bit of care, it's possible to infer this and write code that uses the data.

Investigation (PM)

In Pirate's Moon, reader archives gained a checksum. They are the only archive type this is used for. Presumably, this was to make game modification harder, maybe to curb cheating online? Otherwise, they haven't changed.

In-game use

Reader files configure most of the game. However, animation definition archives (anim.zbd) contain the same animation definitions as the reader files, but compiled into better-defined C structures. So modifying an animation definition in a reader file may not change the game's behaviour. It's likely this was done because there are many animation definitions, and parsing them from the relatively unstructured reader files would make load times very long.

Converting reader files to animation definition archives faces the same problem as interpreting the reader data (custom code required). It's likely the development team had a tool to do this, or maybe the engine could dump animation definition archives from the loaded reader data (since the anim.zbd files look a lot like memory dumps with e.g. pointer values serialised).

Motion archives

Motion archives hold 'mech motion animation data, so how a 'mech model moves when it e.g. walks. However, the association of motion data with a 'mech model is determined by a reader file. Some 'mechs share motions/animations, and some motions are seemingly unused.

Investigation (MW3)

Motion archives are archive files. Each motion file is named <mech>_<motion>, so for example "bushwhacker_jump". Motion files begin with a header:

#![allow(unused)]
fn main() {
struct Header {
    version: u32, // always 4
    loop_time: f32, // > 0.0
    frame_count: u32,
    part_count: u32,
    unk16: f32, // always -1.0
    unk20: f32, // always 1.0
}
}

The version field will always be four (4). The loop time is a non-negative floating point value that describes how long the motion plays for. The frame count is the number of frames in the motion, which is inclusive. This means there are actually frame count + 1 frames of data to read. The last frame is always the same as the first frame. Apparently, this is a common technique to make looping animations easier. The part count is the number of parts of the model that will be animated. The last two fields are unknown, but are always set to negative one (-1.0) and positive one (1.0). Maybe they describe the coordinate system?

Next count parts are read:

#![allow(unused)]
fn main() {
struct Part {
    name_length: u32,
    name: [u8; name_length], // not zero-terminated
    flags: PartFlags, // always Translation + Rotation
    translations: [Vector3; frame_count + 1],
    rotations: [Quaternion; frame_count + 1],
}

bitflags PartFlags: u32 {
    Scale = 1 << 1,       // 0x02
    Rotation = 1 << 2,    // 0x04
    Translation = 1 << 3, // 0x08
}

struct Vector3 {
    x: f32,
    y: f32,
    z: f32,
}

struct Quaternion {
    w: f32,
    x: f32,
    y: f32,
    z: f32,
}
}

Each part begins with a variable-length string (ASCII). There is no zero-termination. This is the part of the 'mech model that the motion affects. The flags field always specify translation (8) and rotation (4), and never scale (2) for obvious reasons (scaling any part would look weird on 'mechs). So it will always be twelve (12).

Then, the translations are read sequentially, and then the rotations are read sequentially. Again, there is one more frame to read than frame count indicates, and the first and last values will be the same. I believe the quaternion order is wxyz, since the quaternions work fine in Blender, but not in Unity, which uses xyzw order.

Investigation (PM)

Motion archive data doesn't change significantly in the expansion, but the archive does. Motion archives do not use checksumming; the checksum is always set to zero (0). Additionally, for some bizarre reason, the length of the data in the archive's TOC is always set to one (1). This can be highly inconvenient depending on the way archive entries are being read. A workaround is described in archive files.

In-game use

Motions are used to animate 'mech models during missions. Which motion is used for which 'mech model is specified in the reader files (dfn_<mech>.zrd in zbd/reader.zbd). Some 'mechs share motions, and some motions are unused.

Interpreter scripts

The interpreter scripts drive how the game engine loads the game data/worlds. They are all contained in a single file, interp.zbd.

Investigation

This is a quite short file, which is good. It is not an archive file.

#![allow(unused)]
fn main() {
struct Header {
    signature: u32, // always 0x08971119
    version: u32, // always 7
    count: u32,
}
}

The file starts with a signature (u32, magic number 0x08971119), a version (u32, always 7), and the number of scripts/count (u32). A table of contents (TOC) with script entries follows, which is easy to read since the count is known:

#![allow(unused)]
fn main() {
struct Entry {
    path: [u8; 120], // zero-terminated/padded
    last_modified: u32,
    offset: u32,
}

type Entries = [Entry; count];
}

The entry path seems to be an 120 byte string, ASCII, which is zero-terminated and padded with zeros/nulls. This can contain backslashes. They have the file extensions .gw and .gs, which one could guess to be game world and game script, respectively.

I have had success interpreting the last modified value as a timestamp, which gives datetimes around 1999 (for the v1.2 version). However, they may be some local timezone, and not UTC.

Finally, the offset is simply where the interpreter script data starts in the file. The the script data is written in the same order as the entries in the TOC, with no padding, so for reading all the data (instead of jumping to a script), it isn't strictly necessary. The script data must also be self-terminating, since the length isn't recorded in the TOC.

And indeed, immediately after the TOC the script data follows. Each script contains several lines. First, the size/length of the line (u32) is read. If it is zero (0), then the script is complete. Next, the token count of the line is read (u32). This indicates how many tokens the line contains.

The line is exactly size bytes. It contains exactly token count zero/null bytes (\0). These deliniate the arguments, so for two arguments, there are three tokens in a line: CommandName\0Argument1\0Argument2\0. The line should always end with a null byte (zero-terminated). There is no extra padding.

Null bytes or characters where probably chosen because they make splitting/tokenising the line trivial in C. However, since the command name and arguments don't contain spaces, it seems to be safe to convert the null bytes to spaces (if this is more convenient), and strip the final null byte.

#![allow(unused)]
fn main() {
struct Line {
    length: u32,
    token_count: u32,
    line: [u8; length],
}
}

Decoding the line as ASCII is safe, as is any ASCII-compatilbe encoding such as codepage 1251 or UTF-8. Encoding should probably be limited to ASCII though.

In-game use

Although the workings of the interpreter are obviously game engine internals, the commands are all human readable and self-describing. Presumably, the interpreter is driven by these scripts, and so they affect how most of the data is loaded. This can be seen from e.g. c1.gs:

ifdef USEZBD
GameZReadZBDFile %GAMEZ%
endif
ifndef USEZBD
... world setup code
endif

This looks like the interpreter scripts enabled prototyping of worlds before the assets were packed into a gamez.zbd file, probably for faster game development iteration. It also gives a bit of insight in how the game data is structured. There are several references to nodes, which indicates world data is maybe represented as a tree-like structure.

A comprehensive study of the filepaths in the interpreter scripts could maybe reveal how the game engine loaded unpacked/loose asset files, and make modding the existing engine easier.

String DLLs /translations

This file is known as:

  • messages.dll in Recoil
  • Mech3Msg.dll in MechWarrior 3 and Pirate's Moon
  • Strings.dll in Crimson Skies (this one is different)

These files contain localised strings that are used by the game engine. Some of these strings are referred to by message keys (MSG_) in e.g. reader files.

Investigation (MW3)

Mech3Msg.dll has a single export:

$ rabin2 -E Mech3Msg.dll
[Exports]

nth paddr      vaddr      bind   type size lib          name
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
1   0x00000b20 0x10001720 GLOBAL FUNC 0    Mech3Msg.dll ZLocGetID

This is somewhat unusual for a DLL that is ~120 KB in size. It also doesn't use many functions:

$ rabin2 -s Mech3Msg.dll
[Symbols]

nth paddr      vaddr      bind   type size lib          name
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
1   0x00000b20 0x10001720 GLOBAL FUNC 0    Mech3Msg.dll ZLocGetID
1   0x00001000 0x10002000 NONE   FUNC 0    MSVCRT.dll   imp._initterm
2   0x00001004 0x10002004 NONE   FUNC 0    MSVCRT.dll   imp.malloc
3   0x00001008 0x10002008 NONE   FUNC 0    MSVCRT.dll   imp._adjust_fdiv
4   0x0000100c 0x1000200c NONE   FUNC 0    MSVCRT.dll   imp.free

And only links to msvcrt.dll (rabin2 -l Mech3Msg.dll), which is Microsoft's Visual C Runtime (MSVCRT). This hints that the DLL does not contain much functionality code-wise.

Printing the sections (rabin2 -S Mech3Msg.dll) shows the .rsrc section is the biggest, followed by .data. Printing the strings (rabin2 -z Mech3Msg.dll) shows that there are a lot of strings in both of these sections. Printing the resources shows that it contains a message table:

$ rabin2 -U Mech3Msg.dll
Resource 0
  name: 1
  timestamp: Thu Jan  1 00:00:00 1970
  vaddr: 0x1000e060
  size: 64.9K
  type: MESSAGETABLE
  language: LANG_ENGLISH

The German version predictably has the language LANG_GERMAN. This isn't an uncommon way of handling localisation, and is known as a resource-only DLL. Microsoft describes a similar approach to "localizing message strings". What is uncommon is the export, since resource-only DLLs usually contain no code.

The message table accounts for the strings in .rsrc, but not in .data.

However, the strings in the .data section all begin with the same prefix: MSG_. This also provides some indication of what the ZLocGetID function does. After simply trying some different arguments, it becomes apparent that when ZLocGetID is passed one of those message keys, it returns an unsigned 32-bit integer which corresponds to the entry ID in the table. So ZLocGetID and the .data section map human-readable strings to message table entry IDs. In Python - but only using a 32-bit version of Python and on Windows - this can be done as follows:

import ctypes
import ctypes.wintypes

lib = ctypes.CDLL("Mech3Msg.dll")
ZLocGetID = lib.ZLocGetID
ZLocGetID.argtypes = [ctypes.c_char_p]
ZLocGetID.restype = ctypes.c_int32

message_id = ZLocGetID(message_name)

Of course, enumerating the message keys via ZLocGetID is also not easy; a brute-force approach could take a long time. So message keys still need to be extracted from the .data section (see below).

The internal workings of Mech3Msg.dll are otherwise not interesting to this project. I think the DLL probably uses binary search to be able to quickly look up the entry IDs by message keys (at least, that's how I would've done it in 1999 and with C). Binary search requires the message keys to be sorted, which could be done at compile time, or run time. For a replacement Mech3Msg.dll, with a modern language, a hash-table/dictionary lookup would be more than sufficient. Or using C on a modern processor, a linear search would be fast enough.

Bonus facts:

  1. Not all messages are looked up by the message key! See below in "in-game use".
  2. Not all messages have corresponding values in the message table - it was probably easier to leave them in, knowing they're unused in the engine than recreate this data.
  3. Some messages are zeroed out by the patch, for example MSG_GAME_NAME_DEBUG_VER. Rather interesting.

Investigation (CS)

Initially, it seems like Strings.dll is very similar to Mech3Msg.dll:

$ rabin2 -E Strings.dll
[Exports]

nth paddr      vaddr      bind   type size lib         name
―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
1   0x00001010 0x10001010 GLOBAL FUNC 0    Strings.dll ZLocGetStringID

Note the entry point is ZLocGetStringID, and not ZLocGetID. It also links to KERNEL32.dll instead of MSVCRT.dll, and references many more functions. The .data and .rsrc sections are still the biggest.

The most notable change is the type of resources:

$ rabin2 -U Strings.dll
Resource 0
  name: 7
  timestamp: Tue Jan  1 00:00:00 1980
  vaddr: 0x100135b8
  size: 636
  type: STRING
  language: LANG_ENGLISH
<truncated>
Resource 111
  name: 1072
  timestamp: Tue Jan  1 00:00:00 1980
  vaddr: 0x1001d1d4
  size: 238
  type: STRING
  language: LANG_ENGLISH
Resource 112
  name: 1
  timestamp: Tue Jan  1 00:00:00 1980
  vaddr: 0x1001d2c4
  size: 944
  type: VERSION
  language: LANG_ENGLISH
Resource 113
  name: 1
  timestamp: Tue Jan  1 00:00:00 1980
  vaddr: 0x1001d674
  size: 4
  type: UNKNOWN (255)
  language: LANG_ENGLISH

This means instead of using a message table, it uses a string table to store the message texts. In practise, this is a small change, but does require the resource section to be parsed differently.

As seen above, the DLL also includes a VERSION and UNKNOWN resource. It is not necessary to parse these to recover the messages.

In-game use

Some messages are looked up directly by entry ID. I found this out when I didn't preserve the entry IDs in a replacement DLL, and the "insert CD" message was incorrect. Even though new messages are added and old messages are removed in the new versions/patches, they preserve entry ID numbering between versions. A replacement DLL should also do this. A re-implementation doesn't have to.

Presumably, most messages are looked up by message key by the engine. Some reader files also reference message keys, which are presumably dynamically looked up when interpreting reader files.

Reading the message table

Luckily, Windows resources are somewhat well documented, either by Microsoft or third-parties. There are two options. On Windows, it is possible to use Windows APIs to read these resources, via LoadLibraryEx, and then FindResource/LoadResource, or FormatMessage specifically for message tables. The problem with the former functions are they less helpful for message tables, as the raw message table still needs to be parsed. The problem with the latter function is that it requires a message ID to load a specific message. Alternatively, it's trivial to read the entire message table on any platform/operating system.

There exist many libraries for parsing Portable Executables (PE), which is the "file format for executables, object code, DLLs and others used in 32-bit and 64-bit versions of Windows operating systems". They can often parse resource definitions also. So getting the raw message table data should be easy, especially since there is only one resource in the DLL. If a library doesn't support this, then the best approach is to parse the .rsrc section and look for RT_MESSAGETABLE = 11 (0x000B), and the appropriate locale ID en_US = 1033 (0x0409).

The format of the message table is described by MESSAGE_RESOURCE_DATA, MESSAGE_RESOURCE_BLOCK, and MESSAGE_RESOURCE_ENTRY, although they are pseudo-structures. Note that since MechWarrior 3 is a 32-bit application (as discussed in the introduction), the alignment for data is 32-bit or 4 bytes.

First, the number of blocks is read (u32). Next, the blocks are read, which are the low ID (u32), high ID (u32), and the offset to entries (u32):

#![allow(unused)]
fn main() {
struct Block {
    low_id: u32,
    high_id: u32,
    offset_to_entries: u32,
}

struct Data {
    count: u32,
    blocks: [Blocks; count],
}
}

Finally, the entries are read by iterating over the blocks, the most complicated step (but still easy).

For each block, it's offset from the start of the message table data is given. Blocks should be sequential, so it should be possible to simply iterate through the data, but I would recommend seeking to the position anyway. Since the entries are grouped into blocks, the entries from low ID (inclusive) to high ID (inclusive!) are read per block. The inclusive high ID can be a bit of a trap. It's very easy to not read the highest ID in a block by being off-by-one. For a block with only one message, the low ID and high ID are the same. For a block with two messages, the low ID could be e.g. 1 and the high ID would be e.g. 2. So in Python, the entry ID would be: for entry_id in range(low_id, high_id + 1).

#![allow(unused)]
fn main() {
struct Entry {
    length: u16,
    flags: u16,
    message: [u8; length - 4], // zero-terminated/padded
}
}

For each entry, the length is read first (u16), which is the length of the entire entry. Then the Unicode flags are read (u16). Expect this to be 0, since the messages are not Unicode (which in Microsoft-land means UTF-16 LE). Instead, the messages are encoded using the codepage appropriate for the language of the message table (aka. locale ID). Luckily for extraction, the English, German, and French locale IDs map to the same codepage (1251). This means that the messages simply need to be read with the codepage encoding, and they will decode properly (I have tested this on the German strings). So it is simply a matter of reading length - 4 bytes (remember, the length includes itself and the flags field) to get the message data, which is not quite the same as the message.

Messages are padded to be 32-bit aligned with null bytes (\0). Even though the length is known, messages have at least one null byte at the end (zero-terminated), presumably for C interoperability. Since codepage 1251 shares the first 128 characters with ASCII, these can be safely stripped before decoding the string (i.e. in byte form), or afterwards.

Additionally, even single line messages are terminated with the DOS/Windows line ending \r\n (this isn't always the case, but common and true in this case). As long as they are at the end of the message, you may wish to also strip these for convenience. Messages can also contain DOS/Windows newlines within the message, which should be preserved.

It's also worth pointing out that some of the messages contain formatting placeholders, that are specific to those messages. There is no way of knowing what values were intended, other than looking for the format placeholders (e.g. %1, %2) and inferring this from the context of the message (or reverse-engineering the engine, which this project does not encourage).

Reading the string table

This is very analogue to message tables. It is possible to use Windows APIs, or to parse the resources using a PE library/by hand.

Raymond Chen has a post about "The format of string resources" on his blog "The Old New Thing". Roughly speaking, string tables are split by the resource compiler into blocks of 16 contiguous IDs. This is why the DLL contains 112 STRING resources (RT_STRING = 6). The resource name gives the block ID.

Notice how similar this is to a message table, except that while the message table is a single resource that contains blocks, the string table effectively makes the blocks available to be loaded separately, without parsing the entire string table. That is the resource data entries give the data offset/size of a single block of strings. One extra complication is that a single block could have multiple resource data entries for each language, but this doesn't happen.

From the block ID, the string IDs can be derived:

#![allow(unused)]
fn main() {
let block_min = (block_id - 1) * 16;
let block_max = block_id * 16;
}

I'm sure there's a Unicode flag somewhere in the resource information; for Crimson Skies the messages are always "Unicode". This is Microsoft-speak for UTF-16 little-endian, and a whole other can of worms. I digress. The strings are not zero-terminated. Instead, first a u16 value is read, which is the "length" of the string. To be pedantic, it is not the length, but the number of WCHAR/u16 values which comprise the string. If you want to know more, see "surrogate pairs", Unicode codepoints, and the meta-question of what the length of string should be (bytes, codepoints, grapheme clusters, etc).

Because the blocks are contiguous, missing entries are zero-length strings, so a zero length should be interpreted as missing.

In any case, for a given length > 0:

#![allow(unused)]
fn main() {
// u16 values must be read as little endian on all systems!
let wchars = [u16; length];
// or using bytes, but note these also must be byte-swapped on big endian systems!
let bytes = [u8; length * 2];
}

Reading the message keys

Presumably, you'll be using a PE parsing library. Start from the .data section. The first bytes are not important to understand. They are part of the common runtime (CRT) initialisation, generally called .CRT$XCA/__xc_a, .CRT$XCU_, and .CRT$XCZ/__xc_z. For MechWarrior 3 or Pirate's Moon, simply skip or read these four (4) u32 values (16 bytes). They should all be zero. For Recoil or Crimson Skies, skip 48 bytes. They are not all zero.

The data that follows are clearly constants defined in the original code. There is a sort of entry table for the message keys, that consists of the absolute memory offset of the message key string (u32), and the corresponding message table entry ID (u32). There is no easy way of knowing when the table has fully been read. I suggest checking if the offset is in the bounds of the .data section, since the string data produces values outside this range when accidentally interpreted as an integer.

Given the memory offset of the start of the .data section, the relative offset of the message key is easy to determine by subtracting the start offset from the absolute offset read previously. Seek to that position, and read the message key until encountering a null byte (\0). All message keys will be ASCII.

For manual verification, it's possible to use e.g. rabin2 to extract the strings, filter only the ones beginning with MSG_, and compare that to the result of parsing the .data section.

Texture packages

Texture packages hold textures or images, used throughout the game.

Investigation

I've had to awkwardly name the texture files "packages". They contain several textures/images, but are not archive-based. Most of them are for textures, but textures are simply images mapped to 3D surfaces. Since all textures are images, but not all images are textures, I'll call the data an image, not a texture.

RC, MW, PM, and CS texture packages are read in exactly the same way. The only difference is that in the base game, no package uses global palettes.

File structure

Packages start with a header:

#![allow(unused)]
fn main() {
struct Header {
    unk00: u32, // always 0
    unk04: u32, // always 1
    global_palette_count: i32, // or u32
    image_count: u32, // or i32
    unk16: u32, // always 0
    unk20: u32, // always 0
}
}

Only two fields in the header are useful. The global palette count (i32 or u32) indicates how many global palettes are used. The base game doesn't use them, so this will be zero (0). The expansion does for some packages. It's recommended to read this as an i32, as textures that don't use a global palette signify this with -1. The image count (u32 or i32) is self-explanatory, and should be at least one (1) or more. Next there is a table of contents, with image count entries:

#![allow(unused)]
fn main() {
struct Entry {
    name: [u8; 32], // zero-terminated/padded
    start_offset: u32,
    global_palette_index: i32,
}
}

The name of the image is a 32 byte string; assume ASCII encoding. It is zero-terminated and padded with zeros/nulls. The start offset (u32) is the offset of the image data in the package. This means the image data must be self-describing/self-terminating. The global palette index indicates if/which global palette is used. Images that don't use a global palette have this set to -1; otherwise the index is between 0 (inclusive) and global palette count (exclusive).

If there are any global palettes, they are read next. Global palettes are always 512 bytes long, or 256 * u16 packaged colour values in RGB565 format. How to interpret and unpack these values is described a bit later.

#![allow(unused)]
fn main() {
struct GlobalPalette {
    values: [u16; 256],
}
// alternatively
struct GlobalPalette {
    values: [u8; 256 * 2],
}
}

Next, the image data is read in the same order as in the TOC. The data is read contiguously, so the start offset isn't needed. Or, it can be used for verification that the image data has been read completely, since the length of the image data isn't known from the TOC.

Each images starts with a header of information:

#![allow(unused)]
fn main() {
struct ImageInfo {
    flags: ImageFlags,
    width: u16,
    height: u16,
    unk08: u32, // always 0
    palette_count: u16,
    stretch: ImageStretch,
}

enum ImageStretch: u16 {
    None = 0,
    Vertical = 1,
    Horizontal = 2,
    Both = 3,
    /// Crimson Skies only
    Unk4 = 4,
    /// Crimson Skies only
    Unk7 = 7,
    /// Crimson Skies only
    Unk8 = 8,
}

bitflags ImageFlags: u32 {
    ColorDepth = 1 << 0,    // 0x01
    HasAlpha = 1 << 1,      // 0x02
    NoAlpha = 1 << 2,       // 0x04
    FullAlpha = 1 << 3,     // 0x08
    GlobalPalette = 1 << 4, // 0x10
    ImageLoaded = 1 << 5,   // 0x20
    AlphaLoaded = 1 << 6,   // 0x40
    PaletteLoaded = 1 << 7, // 0x80
}
}

First, the flags. The first flag, which is assumed to be related to colour depth, is always set and isn't further important - the colour depth is always 16 bit/2 bytes per pixel.

Next are the alpha channel flags, which are a mess. If "no alpha" is set, then "has alpha" and "full alpha" must not be set. This indicates the image has no alpha channel. If "no alpha" is unset, then "has alpha" must be set. This indicates the image has an alpha channel. If "full alpha" is set, then the alpha channel data is 8 bits/1 byte per pixel; otherwise, the alpha channel/transparency is derived from the colour information and there is no alpha channel data. The exact way the alpha channel is loaded is discussed with the image data.

The global palette flag is set if and only if the entry in the TOC specified a global palette index.

Finally, the last three flags are assumed to be some indication of what data the game engine has loaded. They can be safely ignored for interpreting the image data, but do occur in the files.

The width (u16) and height (u16) are obvious. The next value (u32) is unknown, but always zero (0). The palette count (u16) specifies how many colour values the palette contains. Images that aren't palette-based have this set to zero (0). Importantly, this applies to both global and local palettes. So even though global palettes have enough data for 256 colour values, fewer colours may be used when interpreting image data.

Lastly, the stretch field indicates if an image should be stretched after it has been decoded/before it is displayed. This seems to be used for e.g. environment textures that require more vertical resolution than horizontal resolution, possibly to save space but still have the image be square (I think square textures used to provide a performance benefit for some graphics cards/operations).

Image data

Colour image pixel data (not palette-based)

Colour images are images with a zero palette count. The colour data is read first. It is a bitmap with two (2) bytes per pixel of size width * height (so width * height * 2 bytes in total).

#![allow(unused)]
fn main() {
struct ColorData {
    values: [u16; width * height],
}
// alternatively
struct ColorData {
    values: [u8; width * height * 2],
}
}

Each pixel is 2 bytes/16 bits, and is a packed RGB format known as 565. This was determined by trying out different packed RGB formats and seeing if the colours look correct. The RGB565 format means red has 5 bits, green has 6 bits, and blue has 5 bits of information. This is the layout in memory, where each cell is a byte/u8:

|GGGBBBBB|RRRRRGGG|
|7      0|7      0|

If read as one little-endian u16 (the default on x86), this is the layout:

|BBBBBGGG GGGRRRRR|
|^    ^      ^   ^|
|0    5 7 8  11 15|

While it is important to know the bit patterns, there's a temptation to extract the individual colour values. But in my experience, this isn't a good approach. Let's take a minute to think about how to map an RGB565 encoded pixel to the standard RGB888 encoding (where each colour value occupies 1 byte). Simply shifting a 5 or 6 bit value doesn't produce full brightness:

#![allow(unused)]
fn main() {
let value = (0b11111 << 3);
value == 0b11111000; // => true
value < 0b11111111; // => true
}

So simple shifting produces a darker than usual image. Instead, the values have to be interpolated. I don't know enough about computer graphics to say if it is important to apply gamma correction when mapping from RGB565 to RGB888, so I've assumed the assets are stored in linear RGB and therefore linear interpolation is correct.

5 bit values range from 0 to 31 (inclusive), 6 bit values range from 0 to 63 (inclusive), and 8 bit values range from 0 to 255 (inclusive). This means that the values can be mapped to a floating point value in the range of 0.0 to 1.0 (inclusive) by dividing by the maximum (either 31 or 63), and then the floating point value can be mapped to the 8 bit range by multiplying by the maximum (255). For floating point accuracy reasons, I believe it's best to multiply first, and then divide. The result should be the same.

Finally, the floating point value must be converted to an integer. Rounding should be considered, as often, converting to an integer often simply truncates the fractional/decimal part. But rounding is also complicated, and there are several strategies like banker's rounding/rounding half to even. Given the input is limited in precision, I've simply chosen to round up, with a nice trick that adding 0.5 to a (positive) floating point value before truncating rounds up.

With this, it's easy to build a lookup table to map any RGB565 colour value to an RGB888 values, which is much faster than doing this conversion for each pixel. A Rust implementation could look like this:

#![allow(unused)]
fn main() {
let rgb888: Vec<u32> = (u16::MIN..=u16::MAX)
    .map(|rgb565| {
        let red_bits = (rgb565 >> 11) & 0b11111;
        assert!(red_bits <= 31, "r5 {:#b}", red_bits);
        let red_lerp = ((red_bits as f64) * 255.0 / 31.0 + 0.5) as u32;
        assert!(red_lerp < 256, "r8 {:#b}", red_lerp);

        let green_bits = (rgb565 >> 5) & 0b111111;
        assert!(green_bits <= 63, "g6 {:#b}", green_bits);
        let green_lerp = ((green_bits as f64) * 255.0 / 63.0 + 0.5) as u32;
        assert!(green_lerp < 256, "g8 {:#b}", green_lerp);

        let blue_bits = (rgb565>> 0) & 0b11111;
        assert!(blue_bits <= 31, "b5 {:#b}", blue_bits);
        let blue_lerp = ((blue_bits as f64) * 255.0 / 31.0 + 0.5) as u32;
        assert!(blue_lerp < 256, "b8 {:#b}", blue_lerp);

        (red_lerp << 16) | (green_lerp << 8) | (blue_lerp << 0)
    })
    .collect();

// black
assert_eq!(rgb888[0b0000000000000000], 0x000000);
// white
assert_eq!(rgb888[0b1111111111111111], 0xFFFFFF);
// red
assert_eq!(rgb888[0b1111100000000000], 0xFF0000);
// green
assert_eq!(rgb888[0b0000011111100000], 0x00FF00);
// blue
assert_eq!(rgb888[0b0000000000011111], 0x0000FF);
// red + green
assert_eq!(rgb888[0b0000011111111111], 0x00FFFF);
// red + blue
assert_eq!(rgb888[0b1111100000011111], 0xFF00FF);
// green + blue
assert_eq!(rgb888[0b1111111111100000], 0xFFFF00);
}

The same approach can be used for decoding colour image data and palette colour data.

Colour image simple alpha

The alpha channel for a colour image with simple alpha (so not full alpha) is derived from the colour data. A completely black pixel (0x0000) is 0% opaque/100% transparent (usually 0), any other colour is 100% opaque/0% transparent (usually 255).

Colour image full alpha data

For an image with full alpha, the alpha channel data is read after the image data. It is a bitmap with one (1) byte per pixel of size width * height (so width * height bytes in total).

#![allow(unused)]
fn main() {
struct FullAlphaData {
    values: [u8; width * height],
}
}

The values range from 0, which is 0% opaque/100% transparent, to 255, which is 100% opaque/0% transparent.

Palette-based image pixel data

Palette-based images are images with a greater-than zero palette count. This means the image data is an array of palette indices, that are then mapped to colours via the palette. Palette-based images can either use a predefined global palette, or a palette specific to the image (local palette).

The palette index data is read first. It is a bitmap with one (1) byte per pixel of size width * height (so width * height bytes in total).

#![allow(unused)]
fn main() {
struct PaletteIndexData {
    values: [u8; width * height],
}
}

I'll shortly discuss how to map this palette index data to colour data.

Palette-based image simple alpha

It currently isn't known how to derive a simple alpha channel for palette-based images. This is due to a lack of interest. Since the palette-based images are more limited in colour due to palette quantisation (a maximum of 256 distinct colours), there is little reason to use them on modern PCs. Consequently, it hasn't been investigated. A common strategy for simple transparency in other palette-based image formats is to designate one index as transparent (e.g. likely the first, possibly the last, but some allow any index to be the transparent one).

Palette-based image full alpha data

This is exactly like the colour image. It is a bitmap with one (1) byte per pixel of size width * height (so width * height bytes in total).

Palette-based image palette colour data

If the image isn't using a global palette, the palette colour data is read after the palette index data and full alpha data (if any).

#![allow(unused)]
fn main() {
struct LocalPalette {
    values: [u16; palette_count],
}
// alternatively
struct LocalPalette {
    values: [u8; palette_count * 2],
}
}

Just like colour image data and global palette data, these are RGB565 format colour values.

If the image is using a global palette, then that must be restricted to the number of colour values indicated by palette count.

Using palette-based image data

There are several options here. Some image formats support palette-based images. However, few support palette-based colour channels and a full alpha channel. For preservation, the best strategy might be to output the image data as a palette PNG and the alpha data as a grey scale PNG. Alternatively, it's obviously possible to map each pixel to RGB888 via the palette, and optionally store the palette separately.

Recap

  • All images have an image header
  • For colour images:
    • Read the image data
    • Read the full alpha channel (if the image has one)
  • For palette-based images:
    • Read the palette index data
    • Read the full alpha channel (if the image has one)
    • Read the local palette colour data (if not using a global palette)

In-game use

Textures and images are used basically everywhere.

Mechlib archives

Mechlib archives hold detailed and low-resolution 'mech models, 'mech cockpit models, and mechlib model data.

Investigation (MW3)

Mechlib archives are archive files. They contain three unique files, format, version, materials. Otherwise, all files are models with the ending .flt.

Format and version

Both of these files are four (4) bytes long, and can be read as either a u32 or i32. The format value is always one (1). The version value is 27 for the base game.

Materials

The materials file is very similar to but slightly different than materials information in GameZ files.

The difference in the Mechlib is that the texture_ident field is a pointer, not the index. In the GameZ file, since the texture names are written first, the field holds the texture index, which is then replaced with a pointer to the texture. In the mechlib archive, this is the raw pointer value, since the texture name is written after the structure (discussed shortly).

So, in the material file, the number of materials in the file (count, u32 or i32) comes first. Next, count materials are read. Additionally, if the material is textured, a variable string that is the texture name follows the material structure immediately:

#![allow(unused)]
fn main() {
struct MaterialName {
    length: u32,
    name: [u8; length], // not zero-terminated
}
}

Assume this is ASCII. There is no zero-termination, so if this is required, allocate length + 1 bytes.

Textured materials

Textured materials are the same as GameZ, with the following exceptions:

  • Mechlib materials cannot be cycled, so the Cycled flag (0x04) is never set, and the cycle pointer field is always zero (0)/null.
  • As described, the texture_ident field is not an index, but a pointer. The pointer value is - as always - garbage from when the memory was dumped.
  • The terrain/soil type is always Default (0).

Coloured materials

Untextured or coloured materials are the same as GameZ.

Model files

Like the materials file, model files are very similar to but slightly different than models in gamez.zbd files. Model files are also quite complex.

First, some background. MechWarrior 3 uses so-called "nodes" to represent information in the engine. There are hints to this in the reader files and interpreter scripts. In mechlib.zbd, the only allowed node type is a 3D object node. GameZ files can contain other nodes.

I describe all nodes separately, since the structures are rather large. As a quick refresher, all nodes share a base structure, and then have node type specific data.

I also describe mesh data structures in GameZ, since they are largely the same.

Investigation (PM)

The expansion files are similar to the base game, however many data structures around the nodes have changed.

Format and version

Both of these files are four (4) bytes long, and can be read as either a u32 or i32. The format value is always one (1). The version value is 41 for the expansion.

Materials

Materials are read exactly the same as the base game.

Model files

Model files are ready the same way as the base game, but many data structures are different. Note that while in the base game, only 3D object nodes are allowed, in the expansion both 3D object nodes and LOD (level of detail) nodes are present.

I describe all nodes separately, since the structures are rather large and shared with GameZ files.

I also describe mesh data structures in GameZ, since they are largely the same.

In-game use

These models are used in-game and in the mechlab screen.

GameZ files

GameZ files hold the game's world assets (except for 'mech models).

Investigation (MW3)

GameZ files begin with a header, which is a mish-mash of information:

#![allow(unused)]
fn main() {
struct HeaderMw {
    signature: u32, // always 0x02971222
    version: u32, // always 27
    texture_count: u32,
    textures_offset: u32,
    materials_offset: u32,
    meshes_offset: u32,
    node_array_size: u32,
    node_count: u32,
    nodes_offset: u32,
}
}

The signature (u32) is the magic number 0x02971222. The version (u32) is always 27, which matches the mechlib archives version.

The other values are used for accessing the four big blocks of information: textures, materials, meshes, and nodes. This is also not so different from the mechlib archives, although there are significant differences in the way the data is read/written. It isn't known why this is. The offsets aren't strictly necessary for parsing, since the data is written without padding, and so can be used for verifying the different parsing stages were successful/parsed all the information.

Textures

Reading the texture infos uses the texture count from the header. Expect this to be less than 4096 textures for sanity checking (if desired). There is no header, instead simply read texture count texture information structures:

#![allow(unused)]
fn main() {
struct TextureInfo {
    unk00: u32, // always 0
    unk04: u32, // always 0
    texture: [u8; 20], // suffixed
    usage: TextureUsage, // always Used (2)
    index: u32, // always 0
    unk36: i32, // always -1
}

enum TextureUsage: u32 {
    Unused = 0,
    Unknown1 = 1,
    Used = 2,
    Unknown3 = 3,
}

type TextureInfos = [TextureInfo; texture_count];
}

As with many structures, this seems to be a memory dump of an in-engine structure. So most of these fields are unimportant for simply reading the game data.

The only important field is the texture name, which is interesting to parse. Assume ASCII encoding. Firstly, it is shorter than most other fixed-length strings in game data (20 bytes, instead of the usual 32 bytes).

Secondly, it is suffixed. Basically, the name will be texture\0tif\0\0, that is the name of the texture/image as it appears in the texture packages, followed by a null byte, followed by the suffix/file extension tif (usually), finally padded with more null bytes until the length of 20 bytes. So it seems like the assets were Tag Image File Format (TIFF) images, and then the GameZ generation code didn't strip the file extension, but simply overwrote the period of the file extension with a null byte.

For code that only wants to read the texture name, this doesn't matter. Simply read until the first null byte and discard the rest. For code that wishes to e.g. round-trip this information in a binary-accurate way, it's more complicated. In every case, there will be an initial null byte. The suffix and further padding may be cut off by the 20 byte limit. Any padding after the suffix will also be only more null bytes. So restoring the period and therefore the file extension is a feasible approach.

Not much else is known about the other fields. unk00 (u32?) is always zero (0), and could've been a pointer. unk04 (u32?) is always zero (0). I'm told it could cause the engine to execute additional dynamic code on loading. The usage field (u32?) seems to allow tracking of if the texture is no longer in use by the engine and can be removed from memory. It will always be two (2) in the file, which corresponds to "Used". The index field (u32 or i32) tracks the texture's index in the global texture array. It will always be zero (0) in the file, since no index has been assigned until it is loaded. unk36 (i32) is always negative one (-1).

Materials

Materials header

The materials block does have a header:

#![allow(unused)]
fn main() {
struct MaterialHeader {
    array_size: i32, // always >= 0, <= 0xFFFF
    count: i32, // always >= 0, <= array_size
    index_max: i32, // always == count
    unk12: i32, // always -1
}
}

The field unk12 is unknown, but is always negative one (-1).

The other fields are interdependent. The material array size indicates how big the material array for this world is expected to the in the worst case. This allows the engine to allocate more or less memory depending on the world. Expect this to be zero (0) or greater, and less than 65535/0xFFFF. Next is the actual count of materials in the file. Naturally, this must be zero (0) or greater, and less than the array size. Finally is the maximum index or next index, which is used to track which index to use for any further materials. This will always be the same as the material count, since they are loaded at once, producing contiguous indices. Shortly, we'll see that the material indices are i16 values. It's unclear why the values in the header are aligned to 32 bits/4 bytes. This is why I've indicated them to be read as i32, with additional bounds checking. Per C structure packing rules, you'd expected if they were i16 that the header would be smaller/more tightly packed.

Materials are read in three phases. The valid materials first, then zeroed-out materials, and then material cycle data.

Materials information

Next, count materials are read. Each material has a main structure, which is the same structure as the Mechlib materials, but is read and interpreted slightly different. Unlike the Mechlib materials, material indices are also read. First, the structures:

#![allow(unused)]
fn main() {
struct Material {
    alpha: u8,
    flags: MaterialFlags,
    rgb: u16,
    red: f32,
    green: f32,
    blue: f32,
    texture_ident: u32,
    unk20: f32, // always 0.0
    unk24: f32, // always 0.5
    unk28: f32, // always 0.5
    soil: u32,
    cycle_ptr: u32,
}

bitflags MaterialFlags: u8 {
    Textured = 1 << 0, // 0x01
    Unknown = 1 << 1,  // 0x02
    Cycled = 1 << 2,   // 0x04
    Always = 1 << 4,   // 0x08
    Never = 1 << 5,    // 0x10
}

struct MaterialIndices {
    index1: i16,
    index2: i16,
}
}

First, read the material information. Then read the material indices. Repeat until count materials have been read.

A lot isn't known about the material information. It seems to be dump of an in-game structure, as it contains what seem to be pointers. Some fields are always set to the same value. The unk20 field is always 0.0, the unk24 and unk28 fields are always 0.5.

The Always flag (0x08) is always set, the Never flag (0x10) is never set. The most important flag is the Textured flag. This indicates whether the material has a texture or not.

Terrain/soil type

The terrain/soil index indicates how polygons with that will be classified/behave in the engine.

In Recoil, the following types are hard-coded in the executable:

[
    "default",      # 0
    "water",        # 1
    "seafloor",     # 2
    "quicksand",    # 3
    "lava",         # 4
    "fire",         # 5
]

The range of values is 0..5, although 2/seafloor does not seem to be used.

These types are also hard-coded in MechWarrior 3, but the range of values is 0..13. In the soils.zrd file, the following types are defined:

[
    "dirt",         # 6
    "mud",          # 7
    "grass",        # 8
    "concrete",     # 9
    "snow",         # 10
    "mech",         # 11
    "silt",         # 12
    "noslip",       # 13
]

As indicated, these seem to be concatenated with the hard-coded list. The value 11/mech does not seem to be used in GameZ (or the Mechlib).

For Crimson Skies, soils.zrd is also different.

Textured materials

#![allow(unused)]
fn main() {
struct Material {
    alpha: u8, // always 0xFF/255
    // always: 0x01/Textured
    // variable: 0x02/Unknown
    // variable: 0x04/Cycled
    // always: 0x08/Always (except for RC)
    // never: 0x10/Never
    flags: MaterialFlags,
    rgb: u16, // always 0x7FFF/32767
    red: f32, // always 255.0
    green: f32, // always 255.0
    blue: f32, // always 255.0
    texture_ident: u32,
    unk20: f32, // always 0.0
    unk24: f32, // always 0.5
    unk28: f32, // always 0.5
    soil: u32, // 0..13
    cycle_ptr: u32,
}
}

Textured materials always have alpha set to 255/0xFF, since textures can include their own alpha data. The rgb field set to 32767/0x7FFF, and the red, green, and blue fields set to 255.0 (which is white). The unknown flag may or may not be set.

Textured materials can have the cycled flag set, which indicates that the material has multiple textures that are cycled through, creating an animated effect. Note that Mechlib textured materials cannot be cycled. If this flag is set, the cycle pointer should be non-zero/non-null. If the flag is unset, the cycle pointer field is always zero (0)/null.

In the GameZ file, texture_ident field is an index to the texture info. This index must be less than the texture count.

Coloured materials

#![allow(unused)]
fn main() {
struct Material {
    alpha: u8,
    // never: 0x01/Textured
    // never: 0x02/Unknown
    // never: 0x04/Cycled
    // always: 0x08/Always (except for RC)
    // never: 0x10/Never
    flags: MaterialFlags,
    rgb: u16, // always 0x0000/0
    red: f32,
    green: f32,
    blue: f32,
    texture_ident: u32, // always 0
    unk20: f32, // always 0.0
    unk24: f32, // always 0.5
    unk28: f32, // always 0.5
    soil: u32, // 0..13
    cycle_ptr: u32, // always 0
}
}

Untextured or coloured materials always have no flags set except for the "Always" flag (0x08).

The rgb field is always zero (0/0x0000). This deserved a bit of discussion. Textures use a packed colour value format known as RGB565, and textured materials have their colour set to white. For textured materials, rgb is set to 0x7FFF, which corresponds to white in the RGB555 format. So I have assumed this field was intended to be used as a packed colour, but for some reason wasn't used.

The red, green, and blue fields indicate the colour of the material, in an range of 0.0 .. 255.0.

The texture_ident field is always 0. Since the Cycled flag (0x04) is never set, the cycle pointer is always zero (0)/null.

Material indices

The expected indices can be calculated from the material index when reading. Say index is the value from 0 to count when reading the materials. The expected value for index1 and index2 are:

#![allow(unused)]
fn main() {
let mut expected_index1 = index + 1;
if expected_index1 >= count {
    expected_index1 = -1;
}
let mut expected_index2 = index - 1;
if expected_index2 < 0 {
    expected_index2 = -1;
}
}

So basically, index1 is the next index, and index2 is the previous. It seems like these are used for bookkeeping. Since they are so easy to calculate, discarding them is fine.

Zeroed-out materials

If there is a difference between the material count and the array size, then there will be array size - count zeroed-out material structures. This means all bytes/fields will be zero. You can basically loop from count to array size, and this is in fact advisable since the material indices will not be zeroed out. In fact, they will be the reverse of the filled in materials:

#![allow(unused)]
fn main() {
let mut expected_index1 = index - 1;
if expected_index1 < count {
    expected_index1 = -1;
}
let mut expected_index2 = index + 1;
if expected_index2 >= array_size {
    expected_index2 = -1;
}
}

This especially indicates these files are just dumps of in-engine data, if the (assumed) raw pointer values weren't enough evidence. It really does seem like this is just a dump of some internal array, since there is really no reason to write these zeroed-out structures (they contain no real information, so space could have been saved here).

Material cycle data

Finally, after the materials information, and zeroed-out materials, the material cycle data is read. This is basically in-order, so loop through all the previously read non-zeroed-out materials, and if they have the cycled flag set/cycled pointer non-null, read the cycle information:

#![allow(unused)]
fn main() {
struct CycleInfo {
    unk00: u32, // always 0 or 1 (boolean)
    unk04: u32,
    unk08: u32, // always 0
    unk12: f32, // always >= 2.0 and <= 16.0
    count1: u32,
    count2: u32, // always == count1
    data_ptr; u32, // always != 0
}
}

Not much is known about this structure, again it is probably used for keeping track of the material's cycle data. unk00 is always zero (0) or one (1), so a Boolean. unk04 is variable. unk08 is always zero (0). unk12 is a floating point value always greater or equal to 2.0, and less than or equal to 16.0. The two count values are always equal, and indicate the cycle length/number of textures in the cycle. Finally, the pointer is always non-zero, presumably this pointed to a block of memory that held the texture indices or pointers for the cycle, which are read next.

The important piece of information is the cycle count. Read this many u32 after the cycle information, which are the cycle's texture indices, basically:

#![allow(unused)]
fn main() {
struct CycleTextures {
    texture_index: [u32; count1],
}
}

Again, all of these values should be less than the total texture count. As far as I can see, the texture index (texture_ident) from the materials information isn't used for cycled textures, instead it's only these.

Meshes

From the main header, meshes_offset gives the offset to the meshes header, which looks like this:

#![allow(unused)]
fn main() {
struct MeshesHeader {
    array_size: i32, // always >= 0, <= 0xFFFF
    count: i32, // always >= 0, <= array_size
    index_max: i32, // always == count
}
}

This is very similar to the materials header. The fields are interdependent. The mesh array size indicates how big the mesh array for this world is expected to the in the worst case. Expect this to be zero (0) or greater, and less than 65535/0xFFFF. Next is the actual count of meshes in the file. Naturally, this must be zero (0) or greater, and less than the array size. Finally is the maximum index or next index, which is used to track which index to use for any further meshes. This will always be the same as the mesh count.

Meshes are read in three phases. The valid mesh headers or mesh information first, then zeroed-out mesh headers/information, and then mesh data.

Mesh information

The mesh information is a large structure of 92 bytes:

#![allow(unused)]
fn main() {
struct MeshInfoMw {
    unk00: u32, // always 0 or 1 (bool)
    unk04: u32, // always 0 or 1
    unk08: u32,
    parent_count: u32,  // 12, always > 0
    polygon_count: u32, // 16
    vertex_count: u32,  // 20
    normal_count: u32,  // 24
    morph_count: u32,   // 28
    light_count: u32,   // 32
    unk36: u32, // always 0
    unk40: f32,
    unk44: f32,
    unk48: u32, // always 0
    polygons_ptr: u32, // 52
    vertices_ptr: u32, // 56
    normals_ptr: u32,  // 60
    lights_ptr: u32,   // 64
    morphs_ptr: u32,   // 68
    unk72: f32,
    unk76: f32,
    unk80: f32,
    unk84: f32,
    unk88: u32, // always 0
}

type MeshOffset = u32; // or i32
type MeshIndex = i32;
type MeshInfosMW = [(MeshInfoMW, MeshOffset); count];
type ZeroInfosMW = [(MeshInfoMW, MeshIndex); (array_size - count)];
}

The most important piece of information is the polygon count. If this is zero (0), then the vertex count, normal count, and morph count will all be zero (0). Note that the counts can also be zero if the polygon count is non-zero. You might expect the light count to also be zero, and this would make sense, but is not true in at least one case.

Pointers will be zero/null if the corresponding count is zero (0), and will be non-zero/non-null if the corresponding count is positive.

The fields unk00 and unk04 will always be zero (0) or one (1). In Pirate's Moon, unk04 can also be two (2), so it's assumed this is not a boolean.

The parent count will always be greater than zero. The fields unk36, unk48, and unk88 will always be zero (0). The other fields are unknown.

The mechlib archive has a similar data structure, which does not include the final member. dataOffset indicates the absolute offset of the mesh data in the GameZ file. Since the mesh data is written in order, the mesh data offset must be greater than the last (or for the first, after all the mesh information and zeroed-out mesh information), and less than the next block (the nodes).

As an aside, internally this is probably used as the next mesh index, just like the materials did.

Zeroed-out mesh information

If there is a difference between the meshes count and the array size, then there will be array size - count zeroed-out mesh information structures. This means all bytes/fields will be zero. You can basically loop from count to array size, and this is in fact advisable since in this case, the mesh data offset is instead the mesh index. The mesh index wants to be loaded as an i32, not a u32 as might be more useful for the mesh data offset:

#![allow(unused)]
fn main() {
let mut expected_index: i32 = index + 1;
if expected_index >= array_size {
    expected_index = -1;
}
}

Mesh data

Next, the mesh data is read for any filled in mesh information (not zeroed-out). The offset of the start of this data should match the previously read mesh data offset, but can be read sequentially without seeking.

Reading the mesh data is dynamic, based on the counts:

  • Read vertex count vertices (where each is a vector of three f32)
  • Read normal count normals (where each is a vector of three f32)
  • Read morph count morphs(?) (where each is a vector of three f32)
  • Read the lights
  • Read the polygons
#![allow(unused)]
fn main() {
struct Vec3 {
    x: f32,
    y: f32,
    z: f32,
}

struct Vertices {
    vertices: [Vec3; vertex_count],
}

struct Normals {
    normals: [Vec3; normal_count],
}

struct Morphs {
    morphs: [Vec3; morph_count],
}
}
Light information and data

The light information is largely unexplored and read in two phases. First, light count light information structures are read, each of 76 bytes in size:

#![allow(unused)]
fn main() {
struct LightInfoMw {
    unk00: u32,
    unk04: u32,
    unk08: u32,
    extra_count: u32,
    unk16: u32,
    unk20: u32,
    unk24: u32,
    unk28: f32,
    unk32: f32,
    unk36: f32,
    unk40: f32,
    ptr: u32,
    unk48: f32,
    unk52: f32,
    unk56: f32,
    unk60: f32,
    unk64: f32,
    unk68: f32,
    unk72: f32,
}

// probably good to combine lights + extras
// in real code
struct Lights {
    lights: [LightInfo; light_count],
    // pseudo-code: extra_count is variable!
    extras: [[Vec3; extra_count]; light_count],
}
}

The important field here is at offset 12, which is a u32 or i32 and indicates how much extra data to read. This data is read after all the light information. In this case, loop over the light information, and read extra count vertices (where each is a vector of the f32).

More research is needed on what the lights do.

Polygon information and data

The polygon information structure is 36 bytes:

#![allow(unused)]
fn main() {
struct PolygonInfoMw {
    vertex_info: u32, // always <= 0x3FF
    unk04: u32, // always >= 0, <= 20
    vertices_ptr: u32, // always != 0
    normals_ptr: u32,
    uvs_ptr: u32,
    colors_ptr: u32, // always != 0
    unk_ptr: u32, // always != 0
    material_index: u32,
    material_info: u32,
}

type PolygonInfosMw = [PolygonInfoMw; polygon_count];
}

The vertex info field is a compound field, and could also be read as u8 values. The lower byte can be masked via vertex_info & 0xFF, and provides the number of vertices in the polygon. This must be greater than or equal to three (3), since every polygon must have at least three vertices, and therefore the vertices pointer, colours pointer, and an unknown pointer are also non-zero/non-null.

There are additionally two flags, an unknown flag masked with (vertex_info & 0x100) != 0 and the normals flag masked with (vertex_info & 0x200) != 0. The use of the unknown flag is predictably unknown. The normals flag indicates whether the polygon has normals. Additionally, whether the polygon has UVs is determined by whether the UV pointer is non-zero/non-null. It's unclear why the normals pointer doesn't do this and a flag was used.

The material index indicates which material the polygon uses. The material info is currently unknown.

After all the polygon information has been read, the polygon data is read.

The data is based on the number of vertices in the polygon (vertex count). For each polygon:

  • The vertex indices are always read, which are u32 that index the mesh's vertices. Read vertex count of these.
  • The normal indices are only read if the flag is set, and are u32 that index the mesh's normals. Read vertex count of these.
  • The UV coordinates are only read if the UV pointer is non-zero/non-null. Each UV coordinate is two f32 (u, v). Read vertex count UVs.
  • The vertex colours are always read. Each colour is three f32 (r, g, b), the same structure as Vec3. Read vertex count colours.

With this information and the mesh information, the polygons can be reconstructed.

Nodes

Finally, the nodes block. If you thought the previous information was complex to read, the nodes turn this to eleven.

Because the node data is very complicated, I describe all nodes separately. Please refer to that document for detailed information. I will however go over how to read the data here.

In principle, this works a lot like the other blocks. The node count and node array size was given by the GameZ header. The nodes are also read in a phased manner, and also have zeroed-out nodes.

Unfortunately, to me it seems the node count is wildly inaccurate for some files. Since this seems like a memory dump, it's possible that only node count nodes should actually be read. But the nodes between count and the array size may not be zeroed out. So I resorted to reading all the node base structures until I found a zeroed out one, and then stopped. That allowed me to get the actual count.

Node base structures

Because the node count is inaccurate, a strategy is needed. Either look for the first zeroed-out nodes while reading the base structures and break out of the loop (all further nodes will be zeroed-out), or read all of them and e.g. ignore the zeroed out nodes when reading node data. To detect zeroed out nodes, a good indication is if the first byte of the name is zero (0).

In both cases, read array size node base structures.

Next, read a u32 value. For empty node types, this is the parent index (!). For other node types, this is the offset of their type-specific data in the file. For zeroed-out nodes, this is:

#![allow(unused)]
fn main() {
let mut expected_index = index + 1;
if expected_index >= array_size {
    // we'll never know why???
    expected_index = 0xFFFFFF;
}
}

And indeed, it's unclear why this isn't 0xFFFFFFFF (-1 for i32), or even 0xFFFF (-1 for i16). But that's what it is.

Optionally assert the node index rules for GameZ files:

  • There can only be a single world node, and it must be the first node in the file (index 0)
  • There can only be a single window node, and it must be the second node in the file (index 1)
  • There can only be a single camera node, and it must be the third node in the file (index 2)
  • There is at least one display node, and it must be the fourth node in the file (index 3). If there is another display node, it must be the fifth node in the file (index 4)
  • There can only be a single light node, although its position in the file is variable
  • Zeroed out nodes must be at the end of the array, and contiguous.

Zeroed-out nodes

Zeroed-out nodes will be all zero, except for the mesh index, which will be negative one (-1).

Node type-specific data

Then, read the type-specific data. Empty nodes do not have node data, and zeroed-out nodes don't either. Otherwise; the data is read in the same order as the base structure, based on the node type.

If a node had a non-zero parent count and/or child count on the node base structure, then these indices are read after the node's data. In the base game, these are the only nodes that have non-zero counts:

  • LOD: Always one parent, always multiple children
  • Object3d: Zero or one parent, sometimes children
  • World: No parent, always children

But the logic could be generic simply based on the count. After the type data, the parent indices (u32) are read first, then the child indices (u32). Then the next node's type data follows.

Node relationships

As a final step, the linearly arranged nodes could be transformed into a graph/tree structure if this is more convenient.

Investigation (PM)

The data structures differ slightly for Pirate's Moon. For the main header, there is a new unknown, 32-bit integer:

#![allow(unused)]
fn main() {
struct HeaderPm {
    signature: u32, // always 0x02971222?
    version: u32, // always 27?
    unk08: u32, // new
    texture_count: u32,
    textures_offset: u32,
    materials_offset: u32,
    meshes_offset: u32,
    node_array_size: u32,
    node_count: u32,
    nodes_offset: u32,
}
}

Textures

Assumed to be the same as the base game?

Materials

Assumed to be the same as the base game, since in the mechlib they are.

Meshes

Mesh information

The mesh information has changed, and is now 100 bytes (+8):

#![allow(unused)]
fn main() {
struct MeshInfoPm {
    unk00: u32, // always 0 or 1 (bool)
    unk04: u32, // always 0, 1, 2
    unk08: u32,
    parent_count: u32,  // 12, always > 0
    polygon_count: u32, // 16
    vertex_count: u32,  // 20
    normal_count: u32,  // 24
    morph_count: u32,   // 28
    light_count: u32,   // 32
    unk36: u32, // always 0
    unk40: f32,
    unk44: f32,
    unk48: u32, // always 0
    polygons_ptr: u32, // 52
    vertices_ptr: u32, // 56
    normals_ptr: u32,  // 60
    lights_ptr: u32,   // 64
    morphs_ptr: u32,   // 68
    unk72: f32,
    unk76: f32,
    unk80: f32,
    unk84: f32,
    unk88: u32, // always 0
    unk_count: u32,
    unk_ptr: u32,
}
}

The unk04 field (u32) used to be 0 or 1, but can now be 0, 1, or 2.

Of interest are the new fields unk_count (u32) and unk_ptr (u32). So far, we don't know what this is, but it behaves similarly to other mesh information (e.g. the vertices). If this count is zero (0), then the pointer will be null (0). Otherwise, if the count is greater than zero, the pointer will be non-null. As we will shortly see, this unknown data is 12 bytes per count (maybe a Vec3?), and read after the polygon data.

Mesh data

Next, the mesh data is read for any filled in mesh information (not zeroed-out). The offset of the start of this data should match the previously read mesh data offset, but can be read sequentially without seeking.

Reading the mesh data is dynamic, based on the counts:

  • Read vertex count vertices (where each is a vector of three f32)
  • Read normal count normals (where each is a vector of three f32)
  • Read morph count morphs(?) (where each is a vector of three f32)
  • Read the lights
  • Read the polygons
  • Read the unknown data, which is unknown count * 12 bytes (possibly a vector of three f32?)
#![allow(unused)]
fn main() {
struct Vec3 {
    x: f32,
    y: f32,
    z: f32,
}

struct Vertices {
    vertices: [Vec3; vertex_count],
}

struct Normals {
    normals: [Vec3; normal_count],
}

struct Morphs {
    morphs: [Vec3; morph_count],
}

struct Unknowns {
    unknowns: [Vec3, unk_count],
}
}
Light information and data

The light information is largely unexplored and read in two phases. First, light count light information structures are read, each of 80 bytes in size:

#![allow(unused)]
fn main() {
struct LightInfoPm {
    unk00: u32,
    unk04: u32,
    unk08: u32,
    extra_count: u32,
    unk16: u32,
    unk20: u32,
    unk24: u32,
    unk28: f32,
    unk32: f32,
    unk36: f32,
    unk40: f32,
    ptr: u32,
    unk48: f32,
    unk52: f32,
    unk56: f32,
    unk60: f32,
    unk64: f32,
    unk68: f32,
    unk72: f32,
    unk76: f32,
}

// probably good to combine lights + extras
// in real code
struct Lights {
    lights: [LightInfo; light_count],
    // pseudo-code: extra_count is variable!
    extras: [[Vec3; extra_count]; light_count],
}
}

The important field here is at offset 12, which is a u32 or i32 and indicates how much extra data to read. This data is read after all the light information. In this case, loop over the light information, and read extra count vertices (where each is a vector of the f32).

More research is needed on what the lights do.

Polygon information and data

The polygon information structure is 40 bytes:

#![allow(unused)]
fn main() {
struct PolygonInfoPm {
    vertex_info: u32, // always <= 0x3FF
    unk04: u32, // always >= 0, <= 20
    vertices_ptr: u32, // always != 0
    normals_ptr: u32,
    unk16: u32, // always 1
    uvs_ptr: u32, // always != 0
    colors_ptr: u32, // always != 0
    unk28: u32, // always != 0
    unk32: u32, // always != 0
    unk36: u32, // always 0xFFFFFF00
}

bitflags PolygonFlags: u32 {
    Unk2 = 1 << 2,
    Normals = 1 << 4,
    TriStrip = 1 << 5,
}

type PolygonInfosPm = [PolygonInfoPm; polygon_count];
}

Note that this structure has significantly changed from the base game.

The vertex info field is a compound field, and could also be read as u8 values. The lower byte can be masked via vertex_info & 0xFF, and provides the number of vertices in the polygon. This must be greater than or equal to three (3), since every polygon must have at least three vertices, and therefore the vertices pointer, colours pointer, and an unknown pointer are also non-zero/non-null.

The second byte can be masked via (vertex_info & 0xFF00) >> 8; this is the polygon flags. In the Mechlib, these are much better behaved than the base game.

The flag Unk2 is predictably unknown, so far no correlation to polygon data has been found. Normals indicates whether the polygon has normals data.

Finally, the newest addition is whether the polygon is a triangle strip. This was found by Skyfaller in his investigation of the Pirate's Moon data. Triangle fans so far also always require normals data. For reading the polygon information, nothing changes for a triangle strip. What does change is how the polygon faces must be constructed by programs displaying the polygon data.

The field unk04 (u32) is always greater than or equal to zero (0), and less than or equal to twenty (20).

The vertices index pointer (vertices_ptr), UV coordinate pointer (uvs_ptr), and vertex color (colors_ptr) are always non-null/non-zero. The normals index pointer is always non-null/non-zero if the normals flag is set; otherwise, it is always zero (0).

The field unk16 (u32) is always one (1). The fields unk26 (u32) and unk32 (u32) look like pointers, and are always non-null/non-zero.

The field unk36 (u32) is always 0xFFFFFF00.

Note that unlike in MechWarrior 3, the texture/material index is not present in the polygon info - it is read later.

After all the polygon information has been read, the polygon data is read.

The data is based on the number of vertices in the polygon (vertex count). For each polygon:

  • The vertex indices are always read, which are u32 that index the mesh's vertices. Read vertex count of these.
  • The normal indices are only read if the flag is set, and are u32 that index the mesh's normals. Read vertex count of these.
  • The texture index is always read. This is a single u32.
  • The UV coordinates are always read. Each UV coordinate is two f32 (u, v). Read vertex count UVs.
  • The vertex colours are always read. Each colour is three f32 (r, g, b), the same structure as Vec3. Read vertex count colours.

With this information and the mesh information, the polygons can be reconstructed.

Nodes

Unexplored.

In-game use

These models are used in-game and in the mechlab screen.

Nodes

Nodes are how the world data is organised and structured. Nodes appear in GameZ files and mechlib archives. There are eight known node types in GameZ files:

  • Camera
  • Display
  • Empty
  • Light
  • LOD (level of detail)
  • Object3d
  • Window
  • World

In MechWarrior 3, the only valid node type in the mechlib archive is Object3d. In Pirate's Moon, the only valid node types in the mechlib archive are Object3d and LOD.

We also think there are other node types from the animations:

  • Sequence
  • Animate or Animation
  • Sound
  • Switch (i.e. flow control)

Each node type has the same base structure, although some node types do not seem to use all the information in the base structure. The node types also have node-specific structures/information.

Node organisation/relationships

Each node can have several parents, and several children. In fact, each node tracks both the children and the parents, and there doesn't seem to be a way of ensuring this data is consistent other than careful coding (e.g. when a child is removed, also remove it's reference to the parent).

In principle, this results in a directed graph structure. Cycles are also absolutely possible. Again this was presumably carefully avoided because a cyclic graph is not useful for most processing. Let's assume therefore that a valid representation of nodes inside the engine is a directed acyclic graph (DAG) at the very least.

In reality, the nodes are usually tree-like, although in a "tree" in the computer science sense, there can only be one root, and each node has exactly one parent. From what I can see, this isn't necessarily the case for MW3. Otherwise, why allow a node to have multiple parents?

However, when loading nodes from the mechlib or GameZ files, the nodes indeed only have either zero (0) or one (1) parent (at load time). We'll discuss further restrictions on the different node types shortly.

Common data types

#![allow(unused)]
fn main() {
tuple Vec3(f32, f32, f32);
tuple Color(f32, f32, f32);
tuple Matrix(f32, f32, f32, f32, f32, f32, f32, f32, f32);

const MATRIX_EMPTY: Matrix = Matrix(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0);
const MATRIX_IDENTITY: Matrix = Matrix(1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0);
}

Node data structures

Node parents and children

In principle, all nodes could have multiple parents, and multiple children. In practice, no nodes have multiple parents, and as described in the node organisation and in the game-specific node data structures:

  • Camera nodes don't have a parent or children
  • Display nodes don't have a parent or children
  • Empty nodes don't have a parent or children (at least not for the purposes of this part)
  • Light nodes don't have a parent or children
  • LOD nodes always have a parent, and always have children
  • Object3d nodes can have a parent and children
  • Window nodes don't have a parent or children
  • World nodes don't have a parent, but do have children

The reason I describe this in such detail is that it helps understand how the game nodes are structured.

In the general case, both the parent and children indices are dynamic arrays. Read parent count u32 values first for the parent index/indices, and then read child count u32 values next for the child indices. (Obviously, if the count is zero, it isn't necessary to read anything.)

Node positions in the GameZ file

There are also restrictions on which nodes can appear where in a GameZ file. Mechlib archives can only contain certain nodes, so this does not apply.

When loading a GameZ file:

  • There can only be a single world node, and it must be the first node in the file (index 0)
  • There can only be a single window node, and it must be the second node in the file (index 1)
  • There can only be a single camera node, and it must be the third node in the file (index 2)
  • There is at least one display node, and it must be the fourth node in the file (index 3). If there is another display node, it must be the fifth node in the file (index 4)
  • There can only be a single light node, although its position in the file is variable
  • Zeroed out nodes must be at the end of the array, and contiguous.

Nodes (MW3)

Nodes are how the world data is organised and structured. Please see the general node overview first. This page describes node data structures for MechWarrior 3 only.

Node base/shared structure

This is the structure used by all nodes, and is 208 bytes in size:

#![allow(unused)]
fn main() {
struct NodeMw {
    name: [u8; 36],
    flags: NodeFlags,
    unk040: u32, // always 0
    unk044: u32,
    zone_id: u32,
    node_type: NodeType,
    data_ptr: u32,
    mesh_index: i32,
    environment_data: u32, // always 0
    action_priority: u32, // always 1
    action_callback: u32, // always 0
    area_partition_x: i32, // -1, or >= 0, <= 64
    area_partition_y: i32, // -1, or >= 0, <= 64
    parent_count: u32, // always 0 or 1
    parent_array_ptr: u32,
    children_count: u32,
    children_array_ptr: u32,
    unk100: u32, // always 0
    unk104: u32, // always 0
    unk108: u32, // always 0
    unk112: u32, // always 0
    unk116: Box3d,
    unk140: Box3d,
    unk164: Box3d,
    unk188: u32, // always 0
    unk192: u32, // always 0
    unk196: u32,
    unk200: u32, // always 0
    unk204: u32, // always 0
}

tuple Box3d(f32, f32, f32, f32, f32, f32);

enum NodeType: u32 {
    Empty = 0,
    Camera = 1,
    World = 2,
    Window = 3,
    Display = 4,
    Object3d = 5,
    Lod = 6,
    // Sequence = 7,
    // Animate = 8,
    Light = 9,
    // Sound = 10,
    // Switch = 11,
}

bitflags NodeFlags: u32 {
    // Unk00 = 1 << 0,
    // Unk01 = 1 << 1,
    Active = 1 << 2,
    AltitudeSurface = 1 << 3,
    IntersectSurface = 1 << 4,
    IntersectBbox = 1 << 5,
    // Proximity = 1 << 6,
    Landmark = 1 << 7,
    Unk08 = 1 << 8,
    HasMesh = 1 << 9,
    Unk10 = 1 << 10,
    // Unk11 = 1 << 11,
    // Unk12 = 1 << 12,
    // Unk13 = 1 << 13,
    // Unk14 = 1 << 14,
    Terrain = 1 << 15,
    CanModify = 1 << 16,
    ClipTo = 1 << 17,
    // Unk18 = 1 << 18,
    TreeValid = 1 << 19,
    // Unk20 = 1 << 20,
    // Unk21 = 1 << 21,
    // Unk22 = 1 << 22,
    // Override = 1 << 23,
    IdZoneCheck = 1 << 24,
    Unk25 = 1 << 25,
    // Unk26 = 1 << 26,
    // Unk27 = 1 << 27,
    Unk28 = 1 << 28,
    // Unk29 = 1 << 29,
    // Unk30 = 1 << 30,
    // Unk31 = 1 << 31,

    Base = Active | TreeValid | IdZoneCheck,
    Default = Base | AltitudeSurface | IntersectSurface,
}

const DEFAULT_ZONE_ID: u32 = 255;
}

I'm pretty sure the name is 36 bytes long, not the usual 32 bytes and another field. Assume ASCII. The "padding" for the name is also odd. It seems like all nodes are initialised with the name to Default_node_name (padded with zeros/nulls to 36 bytes). Then, when the name is filled in, it is overwritten with the node name (zero/null terminated). This is likely not important when only reading the data, but is important when trying to write a binary-accurate replica.

Many flags are unknown in their functionality. Which flags are valid for a node also depends on the node type, and are described further in the sub-sections. The following information is invariant, i.e. does not depend on the node type.

The fields unk040, unk100, unk104, unk108, unk112, unk188, unk192, unk200, and unk204 are always zero (0).

The field environment_data is always zero (0). The field action_callback is always zero (0)/null (this is possibly a pointer). The field action_priority is always one (1).

The area partition values are tied to the world structure. These must either be both negative one (-1), which indicates no area partition is assigned to the node. Alternatively, both values must be greater than or equal to zero (0) and less than or equal to 64 (this upper bound is arbitrarily chosen based on usual area partition sizes), which indicates an area partition is assigned to the node. Once the world node data is loaded, these can be properly validated. Some node types can have stricter validation on this.

During loading, the parent count is always zero (0) or one (1). Some node types can have stricter validation on this. If the parent count is zero, then the parent array pointer is zero/null, otherwise it is non-zero/non-null. The child count is usually less than or equal to 64 (this upper bound is arbitrarily chosen based on usual child counts). Some node types can have stricter validation on this. If the child count is zero, then the child array pointer is zero/null, otherwise it is non-zero/non-null.

We currently think the fields unk116, unk140, and unk164 are values of six floating point numbers that specify a box in three dimensions. They are likely some kind of bounding boxes.

Therefore, for any node in a GameZ file, after filtering the invariant data, the variable data is the name, the flags, unk044, the zone ID, the data pointer, the mesh index, the area partition values, the parent count (i.e. whether the node has a parent) and the parent array pointer, the child count and the child array pointer, unk116, unk140, unk164, and unk196.

Camera nodes base structure

Since there can only be one camera node, the node name is always camera1. The flags will always be the default node flags. The field unk044 will always be zero (0). The zone ID will always be the default zone ID (255). Camera nodes always have data associated with them, so the data pointer will always be non-zero/non-null. The mesh index will always be negative one (-1). The area partition will always be unassigned (-1, -1). There will be no parents, and therefore the parent array pointer is zero/null. There will be no children, and therefore the child array pointer is zero/null. The fields unk116, unk140, and unk164 will always be zeros (0.0). The field unk196 will always be zero (0).

Therefore, the variable data is the data pointer.

Display nodes base structure

There can be one or two display nodes, which always have the name display. The flags will always be the default node flags. The field unk044 will always be zero (0). The zone ID will always be the default zone ID (255). Display nodes always have data associated with them, so the data pointer will always be non-zero/non-null. The mesh index will always be negative one (-1). The area partition will always be unassigned (-1, -1). There will be no parents, and therefore the parent array pointer is zero/null. There will be no children, and therefore the child array pointer is zero/null. The fields unk116, unk140, and unk164 will always be zeros (0.0). The field unk196 will always be zero (0).

Therefore, the variable data is the data pointer.

Empty nodes base structure

The field unk044 will be 1, 3, 5, or 7. The zone ID will be either one (1) or the default zone ID (255). Empty nodes don't have data associated with them, so the data pointer will always be zero/null. The mesh index will always be negative one (-1). The area partition will always be unassigned (-1, -1). There will be no parents, and therefore the parent array pointer is zero/null. There will be no children, and therefore the child array pointer is zero/null. The field unk196 will always be zero (0).

Therefore, the variable data is the name, flags, unk044, the zone ID, unk116, unk140, and unk164. Additionally, empty nodes do have a parent index, but when using a GameZ and mechlib-compatible base structure, this is stored outside the base structure. This will be discussed during loading in more detail, but it might be useful to include a field for this here.

Light nodes base structure

Since there is only one light node, the node name is always sunlight. The flags will always be the default node flags and Unk08 (0x100). The field unk044 will always be zero (0). The zone ID will always be the default zone ID (255). Light nodes always have data associated with them, so the data pointer will always be non-zero/non-null. The mesh index will always be negative one (-1). The area partition will always be unassigned (-1, -1). There will be no parents, and therefore the parent array pointer is zero/null. There will be no children, and therefore the child array pointer is zero/null. The field unk116 will always have the values (1.0, 1.0, -2.0, 2.0, 2.0, -1.0). The fields unk140 and unk164 will always be zeros (0.0). The field unk196 will always be zero (0).

Therefore, the variable data is the data pointer.

LOD nodes base structure

The field unk044 will always be one (1). The zone ID will be either the default zone ID (255), or a value greater than or equal to one (1) and less than or equal to 80 (this upper bound is arbitrarily chosen based on usual zone IDs). LOD nodes always have data associated with them, so the data pointer will always be non-zero/non-null. The mesh index will always be negative one (-1). There will be one parent, and therefore the parent array pointer is non-zero/non-null. There will be at last one child, and therefore the child array pointer is non-zero/non-null. The field unk116 will be unequal to (0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0), and the field unk164 will be equal to unk116. The field unk140 will always be zeros (0.0). The field unk196 will always be 160.

Therefore, the variable data is the name, flags, the zone ID, the data pointer, the area partition values, the parent array pointer, the child count, the child array pointer, and unk116.

Object3d nodes base structure

The field unk044 will always be one (1). The zone ID will be either the default zone ID (255), or a value greater than or equal to one (1) and less than or equal to 80 (this upper bound is arbitrarily chosen based on usual zone IDs). Object3d nodes always have data associated with them, so the data pointer will always be non-zero/non-null.

The mesh index depends on the HasMesh flag, and whether the node is in a GameZ file or a mechlib archive. For a GameZ file, the mesh index is an index. So if the flag is set, then the index is greater than or equal to zero (0). If the flag is unset, then the index is always negative one (-1). For a mechlib archive, the mesh index is actually a pointer value, since the data is already stored hierarchically. So if the flag is set, this is non-zero/non-null. If the flag is unset, this is zero/null. Note that for the non-null case, if you are loading the value as a signed integer (i32), the memory on 32-bit machines was limited. In practice, it won't be greater than 2147483647 bytes, so you can also check if the value is greater than zero.

In short:

  • IsMechlib && !HasMesh => mesh_index == 0 (null ptr)
  • IsMechlib && HasMesh => mesh_index != 0 (non-null ptr)
  • IsGameZ && !HasMesh => mesh_index == -1 (invalid index)
  • IsGameZ && HasMesh => mesh_index > -1 (valid index)

The field unk196 will always be 160.

Therefore, the variable data is the name, flags, the zone ID, the data pointer, the area partition values, the parent count, the parent array pointer, the child count, the child array pointer, unk116, unk140, and unk164.

Window nodes base structure

Since there can only be one window node, the node name is always window1. The flags will always be the default node flags. The field unk044 will always be zero (0). The zone ID will always be the default zone ID (255). Window nodes always have data associated with them, so the data pointer will always be non-zero/non-null. The mesh index will always be negative one (-1). The area partition will always be unassigned (-1, -1). There will be no parents, and therefore the parent array pointer is zero/null. There will be no children, and therefore the child array pointer is zero/null. The fields unk116, unk140, and unk164 will always be zeros (0.0). The field unk196 will always be zero (0).

Therefore, the variable data is the data pointer.

World nodes base structure

Since there can only be one world node, the node name is always world1. The flags will always be the default node flags. The field unk044 will always be zero (0). The zone ID will always be the default zone ID (255). World nodes always have data associated with them, so the data pointer will always be non-zero/non-null. The mesh index will always be negative one (-1). The area partition will always be unassigned (-1, -1). There will be no parents, and therefore the parent array pointer is zero/null. There will be at last one child, and therefore the child array pointer is non-zero/non-null. The fields unk116, unk140, and unk164 will always be zeros (0.0). The field unk196 will always be zero (0).

Therefore, the variable data is the data pointer, the child count, and the child array pointer.

Node type data structures

All nodes except empty nodes have extra, type-specific data associated with them.

Camera data

#![allow(unused)]
fn main() {
struct Camera {
    world_index: i32, // always 0
    window_index: i32, // always 1
    focus_node_xy: i32, // always -1
    focus_node_xz: i32, // always -1
    flags: u32, // always 0
    translation: Vec3, // always 0.0
    rotation: Vec3, // always 0.0
    world_translate: Vec3, // always 0.0
    world_rotate: Vec3, // always 0.0
    mtw_matrix: Matrix, // always 0.0
    unk104: Vec3, // always 0.0
    view_vector: Vec3, // always 0.0
    matrix: Matrix, // always 0.0
    alt_translate: Vec3, // always 0.0
    clip_near_z: f32
    clip_far_z: f32,
    zero184: [u8; 24], // always 0
    lod_multiplier: f32, // always 1.0
    lod_inv_sq: f32, // always 1.0
    fov_h_zoom_factor: f32, // always 1.0
    fov_v_zoom_factor: f32, // always 1.0
    fov_h_base: f32,
    fov_v_base: f32,
    fov_h: f32,
    fov_v: f32,
    fov_h_half: f32,
    fov_v_half: f32,
    unk248: u32, // always 1
    zero252: [u8; 60], // always 0
    unk312: u32, // always 1
    zero316: [u8; 72], // always 0
    unk388: u32, // always 1
    zero392: [u8; 72], // always 0
    unk464: u32, // always 0
    fov_h_cot: f32,
    fov_v_cot: f32,
    stride: i32, // always 0
    zone_set: i32, // always 0
    unk484: i32, // always -256
}
}

The size of the camera structure is 488 bytes. This is large, but considering there is only one camera, it probably made sense to trade a bit of memory for storing intermediate results to speed up computation.

We understand a lot of the camera structure, although most of the information when loaded from a file is zeroed out, and is then initialised after loading (possibly by the interpreter).

The important fields are the near Z (f32) and far Z (f32) clipping values at offset 176, and the horizontal (f32) and vertical (f32) field of view values (FoV) at offset 232. The clipping near Z must be greater than 0.0, and the far Z must be greater than the near Z.

Many of the other FoV-related values are directly derived from the FoV. The FoV base values are equal to the FoV, because the zoom factor is one (1.0). The FoV half values are equal to the FoV divided by two (2.0). And the FoV cotangent values are derived from the cotangent of the FoV half values.

Therefore, for loading a level, the clipping and FoV values are the only important parts.

Display data

#![allow(unused)]
fn main() {
const CLEAR_COLOR: Color = Color(
    0.3919999897480011,
    0.3919999897480011,
    1.0
);

struct Display {
    origin_x: u32, // always 0
    origin_y: u32, // always 0
    resolution_x: u32, // always 640
    resolution_y: u32, // always 400
    clear_color: Color, // always CLEAR_COLOR
}
}

The size of the display structure is 28 bytes.

The display data is completely constant when loading. The origin x and y values (u32 or i32) are always zero (0). The resolution x and y values (u32 or i32) are always 640 and 400, respectively. The clear colour is always 0.3919999897480011, 0.3919999897480011, and 1.0, which is a blue-ish colour (#6464ff).

Empty data

Empty nodes do not have data.

Light data

#![allow(unused)]
fn main() {
struct Light {
    direction: Vec3,
    translation: Vec3, // always 0.0
    zero024: [u8; 112], // always 0
    unk136: f32, // always 1.0
    unk140: f32, // always 0.0
    unk144: f32, // always 0.0
    unk148: f32, // always 0.0
    unk152: f32, // always 0.0
    diffuse: f32, // always >= 0.0, <= 1.0
    ambient: f32, // always >= 0.0, <= 1.0
    color: Color, // always 1.0
    flags: LightFlags, // always Default
    range_near: f32, // always > 0.0
    range_far: f32,
    range_near_sq: f32,
    range_far_sq: f32,
    range_inv: f32,
    unk200: u32, // always 1
    unk204: u32, // always != 0
    // Possibly not part of the light structure
    unk208: u32, // always 0
}

// Also used for light state events in Anim
bitflags LightFlags: u32 {
    Inactive = 0;
    TranslationAbs = 1 << 0;
    Translation = 1 << 1;
    Rotation = 1 << 2;
    Range = 1 << 3;
    Color = 1 << 4;
    Ambient = 1 << 5;
    Diffuse = 1 << 6;
    Directional = 1 << 7;
    Saturated = 1 << 8;
    Subdivide = 1 << 9;
    Static = 1 << 10;

    Default = TranslationAbs
    | Translation
    | Range
    | Directional
    | Saturated
    | Subdivide;
}
}

The size of the light structure either 208 bytes, or 212 bytes (more on this shortly).

What's known about the light structure comes a lot from the animations. There is a vast block of the structure at offset 24 with a length of 112 bytes that is completely unknown and zeroed out.

The direction (Vec3) of the light is given. The translation (Vec3) is always zero (0.0). The diffuseness of the light (f32) is greater or equal to zero (0.0) and less than or equal to one (1.0). The ambient value (f32) is greater or equal to zero (0.0) and less than or equal to one (1.0). It isn't quite clear what this does, since the only colour in the structure is white (1.0, 1.0, 1.0). The flags indicate which members of the structure are valid, although it is always set to the default alias (TranslationAbs, Translation, Range, Directional, Saturated, and Subdivide). The near range (f32) is always greater than zero (0.0), while the far range (f32) is always greater than the near range. The squared range values are simply that, the near and far range values squared. The inverse range value is one over the range difference or delta (far minus near), so 1.0 / (range_far - range_near).

I've been told the last three fields are something to do with the light's parent. The current theory is that it is a dynamic array. The unk200 field is a count, and unk204 is an array of size count with node indices or pointers. That would make unk208 a dump of the array, and variable/not part of the light structure. If this is the case, then the light structure is 208 bytes in size. If the count is zero (0) - which it never is - then presumably the pointer would be zero/null, otherwise the pointer would be non-zero/non-null (which we do see). And then after the light structure is read, count u32 or i32 values would be read (but since count is always 1, it's only one value), which then indicates the indices of the parents. Since this is always zero (0), the light is parented to the world. This seems nuts; it isn't clear why lights don't use the default parent fields on the node base structure. It doesn't matter for MW3, but might be useful for PM. We'll also see similar indications of dynamic arrays in other structures (e.g. the world data).

LOD data

#![allow(unused)]
fn main() {
struct LodMw {
    level: u32, // always 0 or 1
    range_near_sq: f32,
    range_far: f32,
    range_far_sq: f32,
    zero16: [u8; 44], // always 0
    unk60: f32,
    unk64: f32,
    unk68: u32, // always 1
    unk72: u32, // always 0 or 1 (bool)
    unk76: u32,
}
}

The size of the LOD structure is 80 bytes.

The level field (u32) is always zero (0) or one (1). Usually, this would make it a Boolean, but I think it corresponds to the level of detail setting, so e.g. low and high (hence the name). The near range value (f32) is always greater than or equal to zero (0.0) and less than or equal to 1000.0 squared, so it's assumed this is the near range squared. The far range value is stored as the base value (f32), which is always greater than zero (0.0), and why I suspect this is the far range, and as a squared value (f32). These are guesses at best.

The unk60 field (f32) is greater than or equal to zero (0.0), while the unk64 field (f32) is this value squared. The unk68 field (u32) is always one (1). The unk72 field (u32) is either zero (0) or one (1), a Boolean. If unk72 is zero/false, then the unk76 field (u32) is also zero (0). If unk72 is one/true, then the unk76 field is non-zero/non-null, which makes it likely a pointer.

Object3d data

#![allow(unused)]
fn main() {
struct Object3d {
    flags: Object3dFlags,
    opacity: f32, // always 0.0
    unk008: f32, // always 0.0
    unk012: f32, // always 0.0
    unk016: f32, // always 0.0
    unk020: f32, // always 0.0
    rotation: Vec3,
    scale: Vec3, // always 1.0
    rot_matrix: Matrix,
    translation: Vec3,
    zero096: [u8; 48], // always 0
}

bitflags Object3dFlags: u32 {
    HasOpacity = 1 << 2, // 0x02
    NoCoordinates = 1 << 3, // 0x08
    Unk20 = 1 << 5, // 0x20
}
}

The size of the Object3d structure is 144 bytes. This is a surprisingly large overhead, because there are many objects in a game world. It's also unclear why Euler angles and a matrix were used instead of Quaternions (which the motions use).

The flags (u32) are basically unknown. Only two values occur, 32 or 40. So an unknown flag (Unk20, 0x20) is always set, and then a flag I've named "NoCoordinates" (0x08) can either be set or unset. From some of the animation work and testing, it seems like there is a flag for if the object has opacity (0x02). Since this is always unset in GameZ files and mechlib archives, opacity (f32) is always zero (0.0), otherwise we can probably expect opacity to be greater or equal to zero (0.0) and less than or equal to one (1.0). There are four fields that are always zero (0.0), we don't even strictly know if they are floating point (f32) because of this.

Next follows the rotation (Vec3), presumably the scale (Vec3) which is always one (1.0), a matrix (Matrix, 3x3), and the translation (Vec3). If the no coordinates flag is set, then the rotation and translation will be zeros (0.0), and the matrix will be the identity matrix (MATRIX_IDENTITY). If the no coordinates flag is unset, then the rotation components will each be greater than or equal to negative Pi and less than or equal to positive Pi, and the translation while unspecified should be used. In most cases, the matrix can be calculated from the rotation, which is the x, y, z Euler angles:

#![allow(unused)]
fn main() {
fn euler_to_matrix(rotation: &Vec3) -> Matrix {
    let x = -rotation.0;
    let y = -rotation.1;
    let z = -rotation.2;

    let (sin_x, cos_x) = x.sin_cos();
    let (sin_y, cos_y) = y.sin_cos();
    let (sin_z, cos_z) = z.sin_cos();

    // optimized m(z) * m(y) * m(x)
    Matrix(
        cos_y * cos_z,
        sin_x * sin_y * cos_z - cos_x * sin_z,
        cos_x * sin_y * cos_z + sin_x * sin_z,
        cos_y * sin_z,
        sin_x * sin_y * sin_z + cos_x * cos_z,
        cos_x * sin_y * sin_z - sin_x * cos_z,
        -sin_y,
        sin_x * cos_y,
        cos_x * cos_y,
    )
}
}

In 2% of all Object3d nodes, this calculation is slightly off. This seems like either a bug or inaccuracy in the written data.

An additional trap for bit-perfect gamez.zbd writing is that negative zero (-0.0) and positive zero (+0.0) floating point values have different bit patterns per IEEE 754. And -0.0 is equal to 0.0. So for bit-perfect round-tripping, it is necessary to preserve the zero signs, even in the case where the no coordinates flag is set.

Window data

#![allow(unused)]
fn main() {
struct Window {
    origin_x: u32, // always 0
    origin_y: u32, // always 0
    resolution_x: u32, // always 320
    resolution_y: u32, // always 200
    zero016: [u8; 212], // always 0
    buffer_index: i32, // always -1
    buffer_ptr: u32, // always 0
    unk236: u32, // always 0
    unk240: u32, // always 0
    unk244: u32, // always 0
}
}

The size of the Window structure is 248 bytes.

The origin x (u32) and y (u32) are always set to zero (0). The resolution x (u32) and y (u32) are always set to 320 and 200, respectively. Observant readers will note this is half the default display node resolution. Most of the rest of the structure from offset 16 with a length of 212 bytes is zero. The next non-zero value is at offset 228, which is what we think is the buffer index (i32), and is always negative one (-1). The next field is the buffer pointer, and this is always zero/null. Finally, the next three values (e.g. u32) are all zero (0).

World data

#![allow(unused)]
fn main() {
struct World {
    unk000: u32, // always 0
    area_partition_used: u32, // always 0
    area_partition_count: u32,
    area_partition_ptr: u32,
    fog_state: u32, // always 1
    fog_color: Color, // always 0.0
    fog_range_near: f32, // always 0.0
    fog_range_far: f32, // always 0.0
    fog_altitude_high: f32, // always 0.0
    fog_altitude_low: f32, // always 0.0
    fog_density: f32, // always 0.0
    area_left: f32,
    area_bottom: f32,
    area_width: f32,
    area_height: f32,
    area_right: f32,
    area_top: f32,
    unk076: u32, // always 16
    virtual_partition: u32, // always 1
    virt_partition_x_min: u32, // always 1
    virt_partition_y_min: u32, // always 1
    virt_partition_x_max: u32,
    virt_partition_y_max: u32,
    virt_partition_x_size: f32, // always +256.0
    virt_partition_y_size: f32, // always -256.0
    virt_partition_x_half: f32, // always +128.0
    virt_partition_y_half: f32, // always -128.0
    virt_partition_x_inv: f32, // always 1.0 / +256.0
    virt_partition_y_inv: f32, // always 1.0 / -256.0
    virt_partition_diag: f32, // always -192.0
    partition_inclusion_tol_low: f32, // always 3.0
    partition_inclusion_tol_high: f32, // always 3.0
    virt_partition_x_count: u32,
    virt_partition_y_count: u32,
    virt_partition_ptr: u32,
    unk148: f32, // always 1.0
    unk152: f32, // always 1.0
    unk156: f32, // always 1.0
    unk160: u32, // always 1
    unk164: u32, // always != 0
    unk168: u32, // always != 0
    unk172: u32, // always 0
    unk176: u32, // always 0
    unk180: u32, // always 0
    unk184: u32, // always 0
    unk188: u32,
}
}

The size of the World structure is 188 or 192 bytes.

World structure

The first field unk000 (u32) is always zero (0).

The area partition information is partially derived from later fields. At load time, the used count (u32) is always zero (0). The count (u32) can be validated later, from the virtual partition information. The pointer (u32) is always non-zero/non-null.

The fog state (u32) is always one (1), which corresponds to a linear fog. Exponential fog is two (2), but is never set. The fog colour is always zero/black (0.0, 0.0, 0.0). The fog near and far range values (f32) and the fog altitude high and low values (f32) are always zero (0.0), as well as the fog density (f32). This can be set by the interpreter when loading the world, or by the corresponding anim.zbd.

The area values describe the area of the game world. Although these are floating point numbers, they are truncated, and can be converted to integers. The right coordinate must be larger than the left coordinate, and the bottom coordinate must be larger than the top. The width and height can be calculated from the right/left and top/bottom values, respectively.

The field unk076 (u32) is always 16.

The virtual partition information is fairly regular. It's not clear why this is called "virtual partition", except that the interpreter has a commands. For example, WorldSetVirtualPartition on, which is why the virtual partition field (u32) is always one (1). The minimum x and y values (u32) are always one (1). The maximum x and y values (u32) give the partition size. The x size (f32) is always 256.0, and the y size (f32) is always -256.0. The half x size (f32) is predictably 128.0, and the half y size is -128.0. The inverse x size (f32) is 1.0 / 256.0, and the inverse y size (f32) is 1.0 / -256.0. The partition diagonal half size is always -192.0. It's a bit of an odd calculation: likely the square root of the x and y size divided by two (2.0), or alternatively times 0.5. But if the x and y size are actually used, it comes out as -181.0. As far as I can see, this is a result of a poor square root approximation that a is well-known bit hack. For example, I have found it referenced in a paper named "A benchmark for C program verification" (arXiv:1904.01009v1), or in a thread from 2014 titled "Floating Point Hacks" on the dark bit factory forums. Here is a reproduction of the paper's C code:

float
sqrt_approx(float x)
{
    union { float x; unsigned i; } u;
    u.x = x;
    u.i = (u.i >> 1) + 0x1fc00000;
    return u.x;
}

Translated to Rust:

fn approx_sqrt(value: f32) -> f32 {
    let cast = i32::from_ne_bytes(value.to_ne_bytes());
    let approx = (cast >> 1) + 0x1FC00000;
    f32::from_ne_bytes(approx.to_ne_bytes())
}

fn main() {
    let x_size = 256.0f32;
    let y_size = -256.0f32;
    let size = x_size * x_size + y_size * y_size;
    let diag_good = size.sqrt() * 0.5;
    let diag_poor = approx_sqrt(size) * 0.5;
    println!("{} {}", diag_good, diag_poor);
}

This prints 181.01933 and 192, respectively, so a good fit. It isn't clear why an approximate square root was needed here (what's the speed reason?). But we will see this approximate square root function in the partition code later.

The partition inclusion low and high tolerance (f32) are always three (3.0), this also matches the values set in interp.zbd.

The virtual partition x count (u32) is the number of steps from area left to area right in y size (256) steps or increments, so roughly (area_right - area_left) / 256 (this may need to be rounded up). The virtual partition y count (u32) is the number of steps from area bottom to area top in y size (-256) steps/increments. This is therefore inverted! So roughly (area_top - area_bottom) / -256 (this may need to be rounded down?). Also, the virtual partition x max is equal to the virtual partition x count minus one (1), and the virtual partition y max is equal to the virtual partition y count minus one (1).

The virtual partition total count (not part of the structure) can also now be calculated, and the area partition count will be equal to this, except for the T1 world (the training), where it is the count minus one (1).

The virtual partition pointer (u32) is always non-zero/non-null. The fields unk148, unk152, and unk156 (f32) are always one (1.0).

The field unk160 (u32) is always one (1), and the fields unk164 and unk168 (u32) are always non-zero/non-null - likely pointers. The fields unk172, unk176, unk180, and unk184 (u32, maybe) are always zero (0). Finally, the field unk188 (u32) is variable.

Just like the lights structure, it seems like the fields unk160, unk164, unk168, and possibly unk172 could be dynamic arrays. This would make the world structure 188 bytes, and then e.g. unk160 indicates how many values to read.

In short, the variable data is the area partition count and pointer, the area (although only 4 values are needed), the virtual partition x and y counts (since the maximum extent can be calculated from this), the virtual partition pointer, and the fields unk164, unk168, and unk188.

The area ranges (left to right, bottom to top) are also needed to read the partitions.

World partitions

The partitions depend on the area. Specifically, partitions are read in a nested loop, roughly:

#![allow(unused)]
fn main() {
let mut y = area_bottom;
while y >= area_top {
    let mut x = area_left;
    while x <= area_right {
        read_partition(x, y);
        x  += 256;
    }
    y += -256;
}
}

I'm not 100% sure the maths is correct, but you get the idea.

#![allow(unused)]
fn main() {
struct Partition {
    unk00: i32, // always 256/0x100
    unk04: i32, // always -1
    part_x: f32, // always x
    part_y: f32, // always y
    x_min: f32, // always x
    z_min: f32,
    y_min: f32, // always y + -256.0
    x_max: f32, // always x + 256.0
    z_max: f32,
    y_max: f32, // always y
    x_mid: f32, // always x + 128.0
    z_mid: f32,
    y_mid: f32, // always y + -128.0
    diagonal: f32,
    unk56: u16, // always 0
    count: u16,
    ptr: u32,
    unk64: u32, // always 0
    unk68: u32, // always 0
}
}

The size of a partition structure is 72 bytes.

The first field (i32?) could be the partition x size, but could also be bit flags. It is always 256/0x100. The second field (i32) is always negative one (-1), so this could be the partition y scaling. It's just an odd way to store this information.

The partition x and y are the same as the area x and y from the loop, but as floating point numbers.

The next fields give the minimum, maximum, and mean x, z, and y values (all f32). Because of the step values, x_min is always equal to x, and y_min is always equal to y + -256.0 (or y - 256.0). x_max is always equal to x + 256.0, and y_max is always equal to y. I am not sure how z_min or z_max is determined, possibly from the geometry of the partition.

Therefore, the mid-points can easily be calculated. First, division is usually avoided, especially on old CPUs, since it was slower than multiplication. We can write x / 2.0 as x * 0.5. The average is then (max + min) * 0.5. The x and y calculations simplify further.

Since x_min = x and x_max = x + 256.0:

  1. x_mid = (x_max + x_min) * 0.5
  2. x_mid = (x_min + x_max) * 0.5
  3. x_mid = (x + (x + 256.0)) * 0.5
  4. x_mid = (2.0 * x + 256.0) * 0.5
  5. x_mid = x + 128.0

Since y_min = y + -256.0 and y_max = y:

  1. y_mid = (y_max + y_min) * 0.5
  2. y_mid = (y + (y + -256.0)) * 0.5
  3. y_mid = (2.0 * y + -256.0) * 0.5
  4. y_mid = y + -128.0

Obviously, simplification isn't possible for z_mid, because z_min and z_max are derived from the geometry. z_mid is even more frustrating though:

#![allow(unused)]
fn main() {
let z_mid = (z_max + z_min) * 0.5;
}

If we attempt the calculation with single-precision floating point, out of the total 22016 partitions from all versions, 21812 match this exactly, and 204 do not match exactly, only closely. I've seen another formulation of the average calculation that is rumoured to help with accuracy, but this is disputed (see "Rounding error in computing average" from StackOverflow).

#![allow(unused)]
fn main() {
let z_mid = z_min + (z_max - z_min) * 0.5;
}

This is actually worse, failing in 2068 cases. Only when using double-precision does it produce the same result. The previous calculation does not change when using double-precision.

For most use-cases, this doesn't really matter, although it does affect the diagonal calculation, which is the next field (f32). Effectively, this is the square root of the square of the sides:

#![allow(unused)]
fn main() {
let x_side = (x_max - x_min) * 0.5;
let z_side = (z_max - z_min) * 0.5;
let y_side = (y_max - y_min) * 0.5;
let diagonal = (x_side * x_side + z_side * z_side + y_side * y_size).sqrt();
}

Naturally, x_side simplifies to 128.0, and y_size to -128.0, although due to the squaring the sign does not matter. Also note that because of the squaring, any error in z_side compounds quickly, so I've found it necessary to cast z_max and z_min to f64, and perform the entire calculation up to the square root as double-precision:

#![allow(unused)]
fn main() {
let z_side = (z_max as f64 - z_min as f64) * 0.5;
let temp = 2.0 * 128.0 * 128.0 + z_side * z_side;
}

But this is where it gets silly. The partitions also use the (poor) approximate square root discussed above for the world structure (approx_sqrt). So all the precision is "lost", although it is still required to produce the same result in my testing.

Moving on, the field unk56 (u16) is always zero (0), and the fields unk64 and unk68 (u32) are also always zero (0).

The count and pointer fields are part of a dynamic array. If the count is zero (0), then the pointer is zero/null. If the count is greater than zero, then the pointer is non-zero/non-null. In this case, read count u32 values after the structure. These should be indices of nodes in the given partition.

Nodes (PM)

Nodes are how the world data is organised and structured. Please see the general node overview first. This page describes node data structures for Pirate's Moon only. Refer also to MechWarrior 3 nodes.

Node base/shared structure

Only analysed in the mechlib.

This is the structure used by all nodes, and is 208 bytes in size. Please note that while this is the same size as NodeMw, the layout is different! Refer to the base game for any other types.

#![allow(unused)]
fn main() {
struct NodePm {
    name: [u8; 36],
    flags: NodeFlags,
    unk040: u32, // always 0
    unk044: u32,
    zone_id: u32, // always 255 (mechlib only?)
    node_type: NodeType,
    data_ptr: u32,
    mesh_index: i32,
    environment_data: u32, // always 0
    action_priority: u32, // always 1
    action_callback: u32, // always 0
    area_partition_x: i32, // -1, or >= 0, <= 64
    area_partition_y: i32, // -1, or >= 0, <= 64
    parent_count: u16, // always 0 or 1
    children_count: u16,
    parent_array_ptr: u32,
    children_array_ptr: u32,
    unk096: u32, // always 0
    unk100: u32, // always 0
    unk104: u32, // always 0
    unk108: u32, // always 0
    unk112: u32, // 0, 1, 2 (mechlib?)
    unk116: Box3d,
    unk140: Box3d,
    unk164: Box3d,
    unk188: u32, // always 0
    unk192: u32, // always 0
    unk196: u32, // always 0x000000A0 (mechlib?)
    unk200: u32, // always 0
    unk204: u32, // always 0
}
}

Preliminary analysis of nodes in the mechlib indicates this data structure is largely the same as the base game. The biggest change is around offset 84:

-    parent_count: u32, // always 0 or 1
-    parent_array_ptr: u32,
-    children_count: u32,
-    children_array_ptr: u32,
+    parent_count: u16, // always 0 or 1
+    children_count: u16,
+    parent_array_ptr: u32,
+    children_array_ptr: u32,
+    unk096: u32, // always 0

I.e. the parent_count and children_count have been changed from u32 values to u16 values. This has shifted the parent and child array pointers, and has introduced an extra field, unk096 (u32), which is always zero (0).

Additionally, the field unk112 (u32) is now variable, but always 0, 1, or 2.

Camera nodes base structure

Not analysed yet.

Display nodes base structure

Not analysed yet.

Empty nodes base structure

Not analysed yet.

Light nodes base structure

Not analysed yet.

LOD nodes base structure

Preliminary analysis of nodes in the mechlib indicates LOD nodes always have the node flags:

  • BASE
  • UNK08
  • UNK10
  • ALTITUDE_SURFACE
  • INTERSECT_SURFACE
  • UNK25

The field unk044 will always be one (1).

The zone ID will be the default zone ID (255), but this is probably down to the mechlib. Assuming the same behaviour as the base game, the zone ID will be either the default zone ID (255), or a value greater than or equal to one (1) and less than or equal to 80 (this upper bound is arbitrarily chosen based on usual zone IDs).

LOD nodes always have data associated with them, so the data pointer will always be non-zero/non-null.

Although LOD nodes cannot have a mesh, the mesh index does depend on whether the node is in a GameZ file or a mechlib archive. For a GameZ file, the mesh index is an index, so it is always negative one (-1). For a mechlib archive, the mesh index is actually a pointer value, since the data is already stored hierarchically. So it is always zero (0). See Object3d nodes for mode information.

There will be one parent, and therefore the parent array pointer is non-zero/non-null. There will be at last one child, and therefore the child array pointer is non-zero/non-null.

The fields unk116 and unk140 will always be zeros (0.0). The field unk164 will be unequal to (0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0).

The field unk112 will always be 2. The field unk196 will always be 160.

Object3d nodes base structure

The field unk044 will be either 1 or 45697.

The zone ID will be the default zone ID (255), but this is probably down to the mechlib. Assuming the same behaviour as the base game, the zone ID will be either the default zone ID (255), or a value greater than or equal to one (1) and less than or equal to 80 (this upper bound is arbitrarily chosen based on usual zone IDs).

Object3d nodes always have data associated with them, so the data pointer will always be non-zero/non-null.

The mesh index depends on the HasMesh flag, and whether the node is in a GameZ file or a mechlib archive. For a GameZ file, the mesh index is an index. So if the flag is set, then the index is greater than or equal to zero (0). If the flag is unset, then the index is always negative one (-1). For a mechlib archive, the mesh index is actually a pointer value, since the data is already stored hierarchically. So if the flag is set, this is non-zero/non-null. If the flag is unset, this is zero/null. Note that for the non-null case, if you are loading the value as a signed integer (i32), the memory on 32-bit machines was limited. In practice, it won't be greater than 2147483647 bytes, so you can also check if the value is greater than zero.

In short:

  • IsMechlib && !HasMesh => mesh_index == 0 (null ptr)
  • IsMechlib && HasMesh => mesh_index != 0 (non-null ptr)
  • IsGameZ && !HasMesh => mesh_index == -1 (invalid index)
  • IsGameZ && HasMesh => mesh_index > -1 (valid index)

The field unk196 will always be 160.

Other fields have not been analysed in detail, since they are liable to change outside the Mechlib.

Window nodes base structure

Not analysed yet.

World nodes base structure

Not analysed yet.

Node type data structures

All nodes except empty nodes have extra, type-specific data associated with them.

Camera data

Not analysed yet.

Display data

Not analysed yet.

Empty data

Not analysed yet.

Light data

Not analysed yet.

LOD data

Only analysed in the mechlib.

#![allow(unused)]
fn main() {
struct LodPm {
    level: u32, // always 0 or 1
    range_near_sq: f32,
    range_far: f32,
    range_far_sq: f32,
    zero16: [u8; 44], // always 0
    unk60: f32, // always == 0.0
    unk64: f32, // always >= 0.0
    unk68: f32, // always == unk64 * unk64
    unk72: f32, // always >= 0.0
    unk76: f32, // always == unk72 * unk72
    unk80: u32, // always 1
    unk84: u32, // always 0
    unk88: u32, // always 0
}
}

The size of the LOD structure is 92 bytes.

The level field (u32) is always zero (0) or one (1). Usually, this would make it a Boolean, but I think it corresponds to the level of detail setting, so e.g. low and high (hence the name). The near range value (f32) is always greater than or equal to zero (0.0) and less than or equal to 1000.0 squared, so it's assumed this is the near range squared. The far range value is stored as the base value (f32), which is always greater than zero (0.0), and why I suspect this is the far range, and as a squared value (f32). These are guesses at best.

The unk60 field (f32) is always zero (0.0). The unk64 field (f32) is greater than or equal to zero (0.0), while the unk68 field (f32) is this value squared. . The unk72 field (f32) is greater than or equal to zero (0.0), while the unk76 field (f32) is this value squared. The unk80 field (u32) is always one (1). The unk84 field (u32) and unk88 field (u32) are both always zero (0).

Object3d data

Only analysed in the mechlib.

This seems to be the same as the base game.

Window data

Not analysed yet.

World data

Not analysed yet.

Animation definition files

Animation definition files (anim files) hold compiled animation definitions for a game world.

The initial animation definitions are in the reader archives, but they are quite free form and so probably complicated and slow to parse. I think this proved so slow that load times were unacceptable, and the solution the development team came up with was to load the reader files into the engine, and then dump out the in-memory representations of the parsed animation definitions.

It isn't known - because it hasn't been investigated - if the release version is capable of loading the animation definitions from the readers directly, or how to trigger this (for example, by removing the anim.zbd files).

Investigation (MW3)

Header and TOC

Anim files begin with a simple header:

#![allow(unused)]
fn main() {
struct Header {
    signature: u32, // always 0x08170616
    version: u32, // always 39
    entry_count: u32,
}
}

The signature (u32) is the magic number 0x02971222. The version (u32) is always 39, which is different from the mechlib archives and GameZ files version. The entry count (u32) indicates how many animation definitions reader files are in the TOC that follows. This basically a list of the raw animation definition file paths:

#![allow(unused)]
fn main() {
struct Entry {
    path: [u8; 80],
    unk80: u32,
}

type Entries = [Entry; entry_count];
}

The path is an ASCII-encoded, zero-terminated string of up to 80 bytes. It is usually a relative path pointing to a .zrd file, such as ..\data\common\zrdr\commonAnim.zrd (backslashes not escaped). Again, this points to a close connection to the various reader archives, which include matching files. Please note that the path data may occasionally contain non-zero bytes after the zero-termination, for example:

00000000  2e 2e 5c 64 61 74 61 5c  63 6f 6d 6d 6f 6e 5c 7a  |..\data\common\z|
00000010  72 64 72 5c 63 6f 6d 6d  6f 6e 41 6e 69 6d 2e 7a  |rdr\commonAnim.z|
00000020  72 64 00 02 90 02 3e 02  90 3d 3e 02 20 3e 3e 02  |rd....>..=>. >>.|
00000030  50 3e 3e 02 c8 bb 01 02  00 ff ff ff 04 02 00 00  |P>>.............|
00000040  00 00 00 00 c0 41 3e 02  d0 41 3e 02 90 43 3e 02  |.....A>..A>..C>.|
00000050  6a d8 95 37                                       |j..7|

Bytes from 0x00 (0) to 0x22 (34, exclusive) are the path, byte 0x22 (34) is the zero terminator, bytes from 0x23 (36) to 0x50 (80, exclusive) is garbage data from overwritten memory, and the four bytes from 0x50 (80) to 0x54 (84, exclusive) is an unknown integer (u32?). Given that for many entries, the trailing data is zero, it seems like this memory wasn't zeroed out properly in some cases.

Animation definitions information

Following the TOC, there is some kind of information or book-keeping structure:

#![allow(unused)]
fn main() {
struct Info {
    unk00: u32, // always 0
    unk04: u32, // always 0
    unk08: u16, // always 0
    count: u16,
    unk12: u32, // always != 0, ptr?
    unk16: u32, // always 0
    unk20: u32, // always 0
    unk24: u32, // always != 0, ptr?
    gravity: f32,
    unk32: u32, // always 0
    unk36: u32, // always 0
    unk40: u32, // always 0
    unk44: u32, // always 0
    unk48: u32, // always 0
    unk52: u32, // always 0
    unk56: u32, // always 0
    unk60: u32, // always 1
    unk64: u32, // always 0
}

const GRAVITY: f32 = -9.8;

}

Most of the structure is zeroes, except for:

  • The animation count (u16) at offset 10, which is greater than zero
  • The two u32 values at offset 12 and 24, which are probably pointers and non-zero/non-null
  • A f32 value at offset 28, which seems to be the gravity (of the world?) used for animation calculations, but is always set to -9.8 (0xC11CCCCD; or bytes 0xCD 0xCC 0x1C 0xC1).

Animation definition structures

I'll describe the structures in full, before describing how to read animation definitions. The base animation definition structure is 316 bytes:

#![allow(unused)]
fn main() {
struct AnimDef {
    anim_name: [u8; 32],
    name: [u8; 32],
    anim_ptr: u32, // always != 0
    anim_root: [u8; 32],
    anim_root_ptr: u32,
    unk104: [u8; 44], // always 0
    flags: AnimDefFlags,
    unk152: u8, // always 0
    activation: AnimActivation,
    unk154: u8, // always 4
    unk155: u8, // always 2
    exec_by_range_min: f32,
    exec_by_range_max: f32,
    reset_time: f32,
    unk168: f32, // always 0
    max_health: f32,
    cur_health: f32,
    unk180: u32, // always 0
    unk184: u32, // always 0
    unk188: u32, // always 0
    unk192: u32, // always 0
    sequence_definitions_ptr: u32,
    reset_state: SequenceDefinition,
    sequence_definition_count: u8,
    object_count: u8,
    node_count: u8,
    light_count: u8,
    puffer_count: u8,
    dynamic_sound_count: u8,
    static_sound_count: u8,
    unknown_count: u8, // always zero
    activ_prereq_count: u8,
    activ_prereq_min_to_satisfy: u8,
    anim_ref_count: u8,
    unk275: u8, // always 0
    objects_ptr: u32,
    nodes_ptr: u32,
    lights_ptr: u32,
    puffers_ptr: u32,
    dynamic_sounds_ptr: u32,
    static_sounds_ptr: u32,
    unknown_ptr: u32,
    activ_prereqs_ptr: u32,
    anim_refs_ptr: u32,
    unk312: u32, // always 0
}

bitflags AnimDefFlags: u32 {
    ExecutionByRange = 1 << 1;
    ExecutionByZone = 1 << 3;
    HasCallbacks = 1 << 4;
    ResetTime = 1 << 5;
    NetworkLogSet = 1 << 10;
    NetworkLogOn = 1 << 11;
    SaveLogSet = 1 << 12;
    SaveLogOn = 1 << 13;
    AutoResetNodeStates = 1 << 16;
    ProximityDamage = 1 << 20;
}

enum AnimActivation: b8 {
    WeaponHit = 0,
    CollideHit = 1,
    WeaponOrCollideHit = 2,
    OnCall = 3,
    OnStartup = 4,
}
}

This is going to get complicated.

The first field is called "animation name" in the reader files, and is a 32 bytes, zero-terminated ASCII string with possible un-zeroed memory after the terminator. The second field is called simply "name" in the reader files, and is a 32 bytes, zero-terminated ASCII string (although this seems to only have zeros after the terminator). The next field is some kind of pointer (u32), possibly pointing to the engine-internal animation structure, and always non-zero/non-null. The third name is what I've called the "animation root". This is also a 32 bytes, zero-terminated ASCII string with possible un-zeroed memory after the terminator, and seems to be related to the object or node the animation is applied to. The next field is some kind of pointer (u32), possibly pointing to the engine-internal animation root, and always non-zero/non-null.

From what I could determine, if the .flt extension is stripped from the name, then if this matches the animation root name, the animation root pointer and animation pointer will be equal; otherwise, the animation root pointer and animation pointer will be unequal.

There are 44 zero bytes from offset 104 to 148 (exclusive).

At offset 148 are the flags, which indicate which optional features/values/fields the animation definition uses. I know of 10 of these:

  • Execution by range (EXECUTION_BY_RANGE in reader files), likely that the animation definition is triggered if something (only the player?) is within range. Associated with two fields. If execution by range is set, execution by zone isn't set.
  • Execution by zone (EXECUTION_BY_ZONE in reader files), a very uncommon trigger only appearing eleven times in all reader files. It isn't known how this works, since in the reader files the value to this key is an empty list. If execution by zone is set, execution by range isn't set.
  • Has callbacks, set if any of the animation definition's sequences include a callback sequence event; otherwise unset (so this is derived, and not explicitly mentioned in the reader files). Probably to speed up callback look-ups?
  • Reset time, likely whether the animation has a reset time. Definitely associated with one field, maybe two.
  • Network log set and network log on. These work in tandem. In the reader files, if the NETWORK_LOG key is present, the "set" flag is set and the "on" flag is valid. The "on" flag is set if the NETWORK_LOG value is ON; if it is OFF the flag is unset. If the "set" flag isn't set, then the "on" flag isn't be set. These flags seem to control whether an animation definition is considered for transmission in a network/multiplayer game, and if it sent.
  • Save log set and save log on. Similar to the network flags, these work in tandem. In the reader files, if the SAVE_LOG key is present, the "set" flag is set and the "on" flag is valid. The "on" flag is set if the SAVE_LOG value is ON; if it is OFF the flag is unset. If the "set" flag isn't set, then the "on" flag isn't be set. These flags seem to control whether an animation definition is considered for inclusion in a save game file, and if it is saved.
  • Auto reset node states, or AUTO_RESET_NODE_STATES in the reader files might control whether the animation nodes or animation root is reset when the animation is reset or not. This seems to be the default behaviour, as the key AUTO_RESET_NODE_STATES is mostly followed by the value OFF in reader files.
  • Proximity damage (PROXIMITY_DAMAGE in the reader files) is uncommon, and used 22 times in the reader files. The key has a value in the reader files, but it is always 0, so I haven't been able to confirm an associated field in the structure.

The field at offset 152 is unknown (u8), and is always zero (0). Next is the animation activation (ACTIVATION in the reader files), which can be:

  • WEAPON_HIT, rare, 28 occurrences
  • COLLIDE_HIT, uncommon, 119 occurrences
  • WEAPON_OR_COLLIDE_HIT, uncommon, 108 occurrences
  • ON_CALL, most common, 3026 occurrences
  • ON_STARTUP, rare, 58 occurrences

The field at offset 154 is unknown (u8), and is always four (4). It could be related to a concept in the engine called action priority, but this isn't sure. The field at offset 155 is unknown (u8), and is always two (2).

The next two fields are the execution by range minimum (f32) and maximum (f32) range. If the execution by range flag is set, the minimum value is greater than or equal to 0.0 and the maximum value is greater than or equal to the minimum value; otherwise both values are zero (0.0).

Next is the reset time (f32). If the reset time flag is set, this value seems to range from -1.0 to 4.0 (-1.0, 0.0, 0.3, 0.65, 0.714, 1.0, 2.0, 3.0, 4.0). If the flag is unset, this value is always negative one (-1.0). This is followed by an unknown value, which I have typed as f32 based on the surrounding values, even though it could be anything. It is always zero (0.0). Interestingly in the reader files, there is at least one instance of a RESET_TIME key with two values. It could also track the "current" animation time - whatever that is.

The maximum health value (f32) is greater than or equal to zero (0.0), while the current health value (f32) is equal to the maximum. So these could be swapped. The reader files only mention HEALTH.

The next four fields (u32/i32/f32) are always zero (0).

I'll talk more about the sequence definition pointer value (u32) when discussing the sequence definitions. This is always non-zero/non-null, but then all animation definitions have at least one sequence definition.

Next follows the reset state sequence definition (thanks Skyfaller for the analysis). This will be read later separately again (see reset sequence). This might seem odd, but the reset state can contain a variable number of events, and so must be read after the animation definition. Likely they just used the generic sequence definition serialisation/deserialisation functions here, so the data is duplicated.

Several counts of things associated with the animation definition follow. They are all u8 values:

  • The number of sequence definitions
  • The number of objects (Object3d nodes)
  • The number of other nodes
  • The number of lights
  • The number of puffers
  • The number of dynamic sounds
  • The number of static sounds
  • The number of an unknown thing, always zero (0)
  • The number of activation prerequisite conditions
  • The minimum number of activation prerequisites necessary for activation, either 0, 1, or 2 in the files, but could be higher. Has to be less than or equal to the number of conditions.
  • The number of animation references
  • Likely a padding byte at offset 275, always zero (0)

These are immediately followed with pointers for these things (u32), except for the sequence definitions (this pointer was at offset 196):

  • The objects array pointer
  • The nodes array pointer
  • The lights array pointer
  • The puffers array pointer
  • The dynamic sounds array pointer
  • The static sounds array pointer
  • The unknown things array pointer, always zero (0)
  • The activation prerequisite conditions array pointer
  • The animation references array pointer

As a general rule, if the count is zero (0), then the pointer will be zero/null; otherwise, the pointer will be non-zero/non-null. These also trigger extra reads.

The final field at offset 312 (u32/i32) is unknown, and is always zero (0).

Animation definition reading

Animation definitions are read sequentially. The number of animation definitions to read was provided in the info structure. Also, when reading the animation definition array, the first item will always be zeroed out. This is a common occurrence for dynamic arrays in the anim file. Except not quite in this case! Field 153, the activation value won't be zero (0), but instead five (5), which corresponds to the on call activation.

After each animation definition structure is read, further reads based on the counts can be triggered (described in the following sections). This is also the case for the zeroed out item! It also has a zeroed out reset state!

Object3d nodes

If the object count was greater than zero, the object array is read. Each item is a 96 byte structure. When reading the array, the first item will always be zeroed out.

#![allow(unused)]
fn main() {
struct ObjectRef {
    name: [u8; 36],
    unk36: [u8; 60],
}
}

The name is a node name for a Object3d node, and so is 36 bytes long. Assume ASCII. The "padding" for the name is also odd. It seems like all nodes are initialised with the name to Default_node_name (padded with zeros/nulls to 36 bytes). Then, when the name is filled in, it is overwritten with the node name (zero/null terminated).

I haven't been able to figure out what the rest of the data (60 bytes) does.

Other nodes

If the nodes count was greater than zero, the nodes array is read. Each item is a 40 byte structure. When reading the array, the first item will always be zeroed out.

#![allow(unused)]
fn main() {
struct NodeRef {
    name: [u8; 36],
    pointer: u32,
}
}

The name is a node name for a node, and so is 36 bytes long. Assume ASCII. The "padding" for the name is also odd. It seems like all nodes are initialised with the name to Default_node_name (padded with zeros/nulls to 36 bytes). Then, when the name is filled in, it is overwritten with the node name (zero/null terminated).

The pointer (u32) is always non-zero/non-null, except for the first item.

Light nodes

If the lights count was greater than zero, the lights array is read. Each item is a 44 byte structure. When reading the array, the first item will always be zeroed out. This structure is also used for other things.

#![allow(unused)]
fn main() {
struct ThingRef {
    name: [u8; 36],
    pointer: u32,
    unk40: u32, // always 0
}
}

The name is a node name for a node, and so is 36 bytes long. Assume ASCII. The "padding" for the name is also odd. It seems like all nodes are initialised with the name to Default_node_name (padded with zeros/nulls to 36 bytes). Then, when the name is filled in, it is overwritten with the node name (zero/null terminated).

The pointer (u32) is always non-zero/non-null, except for the first item. The unknown field (u32?) is always zero (0).

Puffer nodes?

If the puffer count was greater than zero, the puffer array is read. Each item is a 44 byte structure. When reading the array, the first item will always be zeroed out.

#![allow(unused)]
fn main() {
struct PufferRef {
    name: [u8; 32],
    unk32: u32,
    pointer: u32,
    unk40: u32, // always 0
}
}

Since puffers don't seem to be nodes, the name is 32 bytes long. Assume ASCII. The name is padded/filled with zeros after the zero terminator.

The first unknown field (u32) is very strange. The lower three bytes are always zero (0), so unk32 ^ 0x00FFFFFF == 0. The high byte is sometimes non-zero.

The pointer (u32) is always non-zero/non-null, except for the first item. The second unknown field (u32?) is always zero (0).

Dynamic sounds/sound nodes

If the dynamic sounds count was greater than zero, the dynamic sounds array is read. Each item is a 44 byte structure. When reading the array, the first item will always be zeroed out. This is the same structure used by the lights.

Static sounds

If the static sounds count was greater than zero, the static sounds array is read. Each item is a 36 byte structure. When reading the array, the first item will always be zeroed out.

#![allow(unused)]
fn main() {
struct StaticSoundRef {
    name: [u8; 32],
    unk32: u32, // always 0
}
}

Since static sounds don't seem to be nodes, the name is 32 bytes long. Assume ASCII. The name is not cleanly zero-filled after the zero terminator. The unknown field (u32) is always zero (0).

Unknown items

Since the unknown count is always zero, this is never read. I presume - based on the other fields/ordering - that it would be read here. Since no such items are read, I don't know what structure this might have.

Activation prerequisite conditions

If the animation prerequisite conditions (APC) count was greater than zero, the APC array is read. Each item is a 48 byte structure. Unlike other arrays, the first item is not zeroed out!

This is by far the most complicated to read. There are essentially three types of APCs. Based on the type, the data read is interpreted differently (i.e. it has the same size, but different types/layout). Let me first describe the opaque layout:

#![allow(unused)]
fn main() {
// APC = Activation prerequisite condition
struct Apc {
    optional: u32, // always 0 or 1 (bool)
    type: ApcType,
    type_dependent: [u8; 40],
}

enum ApcType: u32 {
    Animation = 1,
    Object = 2,
    Parent = 3,
}
}

The optional field (u32) is always zero (0) or one (1), a Boolean, and signifies whether the APC is required or optional for activation. Animation-type APCs seem to be always required, i.e. not optional. The type field (u32) is an enumeration, where:

  • One (1) means the data is interpreted as a animation-type APC
  • Two (2) means the data is interpreted as an object-type APC
  • Three (3) means the data is interpreted as an object-type APC in the parent role

Next, the type dependent data:

#![allow(unused)]
fn main() {
struct ApcAnim {
    name: [u8; 32],
    unk32: u32, // always 0
    unk36: u32, // always 0
}

struct ApcObject {
    active: u32, // always 0 or 1 (bool)
    name: [u8; 32],
    pointer: u32,
}
}

For animation-type APCs, the name is 32 bytes, ASCII, zero-terminated, and padded with zeros/properly zeroed-out. The next two fields are 4 bytes in size each (u32?), and always zero (0).

For object-type APCs, the active field (u32) is always zero (0) or one (1), a Boolean. However, for object-type APCs with the parent role, they are always inactive (0). The name is also 32 bytes, ASCII, zero-terminated, and padded with zeros/properly zeroed-out. Finally, the pointer (u32) is always non-null/non-zero.

I haven't explored if there is any ordering to APCs, e.g. how parent APCs know which APCs are their children.

Animation references

If the animation references count was greater than zero, the animation references array is read. Each item is a 72 byte structure. Unlike other arrays, the first item is not zeroed out!

#![allow(unused)]
fn main() {
struct AnimRef {
    name: [u8; 64],
    unk64: u32, // always 0
    unk68: u32, // always 0
}
}

I'm not sure if the name field is actually 64 bytes long. Some values are properly zero-terminated at 32 bytes and beyond, but not all. Again, this is possibly a lack of zeroing out the memory. In any case, it's a zero-terminated, ASCII string. The next two fields are 4 bytes in size each (u32?), and always zero (0).

There's one animation reference per CALL_ANIMATION sequence event, and there may be duplicates to the same animation since multiple calls might needed.

Reset sequence

The reset sequence is read next, and is read unconditionally, i.e. every animation definition has a reset sequence - even the zeroed out first animation definition!

The reset sequence is special in that it always has the same name (RESET_SEQUENCE), and a separate reference to it is kept. Otherwise, it is largely the same as any other sequence:

#![allow(unused)]
fn main() {
struct SequenceDefinition {
    name: [u8; 32],
    flags: u32, // always 0 or 0x0303
    unk36: [u8, 20], // always 0
    pointer: u32,
    size: u32,
}

enum SequenceActivation: u32 {
    Initial = 0,
    OnCall = 3,
}
}

For any sequence, the name is 32 bytes long, ASCII, zero-terminated, and properly zeroed out.

The flags can either be zero (0) or 0x0303. This corresponds to the activation of either initial (0) or on call (3). But there are likely others we don't know about because they don't appear in the file.

The next 20 bytes (at offset 36) are unknown and always zero (0). Finally, the pointer (u32) and size (u32). If the size is zero (0), then the pointer will be zero/null, and no further data is read. This indicates an empty reset sequence. Otherwise, size bytes of sequence event data is read. I'll describe how to read sequence event data shortly.

For the reset sequence, the name will always be RESET_SEQUENCE. The flags will always be zero, an initial activation. It will always match the reset state in the animation definition.

Sequence definitions

If the sequence definitions count was greater than zero, the sequence definitions are read. Please see the reset sequence section for the sequence definitions structure.

Sequence events

Is this file not complicated enough yet? Sequence events will fix that. It starts easy. The size of the sequence event data (in bytes) is known from the sequence definition. Simply keep reading the events until that many bytes have been read. Each event starts with a header:

#![allow(unused)]
fn main() {
struct EventHeader {
    event_type: u8,
    start_offset: StartOffset,
    pad02: u16, // always 0
    size: u32,
    start_time: f32,
}

enum StartOffset: u8 {
    Animation = 1,
    Sequence = 2,
    Event = 3,
}
}

The event type (u8) indicates just that. We'll get to these. The start offset (u8) can either be animation (1), sequence (2), or event (3). The explicit padding at offset 2 is always zero (0). The size indicates the size of this event's total data (including the header). The start time indicates the event's start time relative to the start offset/parent (probably).

There are 33 known event types, and they are described separately in sequence events. Fun story, these each require parsing.

Sequence events

Events by index

Events by name

Sequence events

#![allow(unused)]
fn main() {
struct EventHeader {
    event_type: u8, // could also be an enum
    start_offset: StartOffset,
    pad02: u16, // always 0
    size: u32,
    start_time: f32,
}

enum StartOffset: u8 {
    Animation = 1,
    Sequence = 2,
    Event = 3,
}
}

The event type (u8) indicates just that, the type of the event, and therefore how to interpret the data following the header. The start offset (u8) can either be animation (1), sequence (2), or event (3). The explicit padding at offset 2 is always zero (0). The size indicates the size of this event's entire data (including the header). The start time indicates the event's start time relative to the start offset/parent (probably).

The event structures and their sizes specified in this document are all without the header, for convenience. Subtract 12 bytes (the size of the header) from the size in the header to get the event sizes specified.

Index lookups

Sequence events can refer to information in their associated animation definition, for example:

  • Object3d nodes
  • Sound nodes (dynamic sounds)
  • Other nodes (just called nodes)
  • Sounds (static sounds)
  • Lights
  • Puffers

Based on the packing of some structures and the general size of the arrays in GameZ, I assume node indices are 2 bytes/16 bits, so u16 or i16. We do see negative numbers, so I assume it's i16, leaving a maximum index of 32767 - still a lot larger than the usual array sizes.

As mentioned, there are negative numbers that seem to have special meanings. For example, if the reader file says INPUT_NODE, this is translated to the index -200.

#![allow(unused)]
fn main() {
const INPUT_NODE_INDEX: i16 = -200;
}

It's unknown if this is allowed for all node indices, or only some.

Sound

Reader name: SOUND, Type: 1, Size: 16

Also called "static sound" in this project.

#![allow(unused)]
fn main() {
struct Sound {
    sound_index: i16,
    node_index: i16,
    translation: Vec3,
}
}

The sound index (i16) is used to look up the static sound in the animation definition. The node index (i16) is used to look up the parent/at node in the animation definition. The translation (Vec3) is presumably the sound's translation from the node.

Sound node

Reader name: SOUND_NODE, Type: 2, Size: 60

Also called "dynamic sound" in this project.

#![allow(unused)]
fn main() {
struct SoundNode {
    name: [u8; 32],
    unk32: u32, // always 1
    flags: SoundNodeFlags,
    active_state: u32, // always 0 or 1 (bool)
    node_index: i16,
    pad46: u16, // always 0
    translation: Vec3,
}

bitflags SoundNodeFlags: u32 {
    InheritTranslation = 1 << 1; // 0x2
}
}

The sound node's name (32 bytes) is zero-terminated and zero padded. It's unclear why dynamic sounds aren't looked up by index, maybe this event creates a new node? The next field (u32) is always one (1). The flags field (u32) seems to be a bit field and is either zero (0) or two (2). The active state (u32) is either zero (0, false) or one (1, true). The node index (i16) is used to look up the parent/at node in the animation definition. The next field (u16) is padding and will always be zero (0). The translation (Vec3) is presumably the sound's translation from the node. If inherit translation flag (1 << 1 or 0x2) is unset, then the node index and the translation will be zero (0/0.0).

Light state

Reader name: LIGHT_STATE, Type: 4, Size: 120

#![allow(unused)]
fn main() {
struct LightState {
    name: [u8; 32],
    light_index: i16,
    pad34: u16, // always 0
    flags: LightFlags,
    active_state: u32, // always 0 or 1 (bool)
    point_source: u32, // always 1
    directional: u32, // always 0 or 1 (bool)
    saturated: u32, // always 0 or 1 (bool)
    subdivide: u32, // always 0 or 1 (bool)
    static: u32, // always 0 or 1 (bool)
    node_index: i32,
    translation: Vec3,
    rotation: Vec3,
    range_near: f32,
    range_far: f32,
    color: Color,
    ambient: f32,
    diffuse: f32,
}

// Also used for light nodes in GameZ
bitflags LightFlags: u32 {
    // This flag never occurs in sequence events
    TranslationAbs = 1 << 0; // 0x001
    Translation = 1 << 1;    // 0x002
    // This flag never occurs in sequence events
    Rotation = 1 << 2;       // 0x004
    Range = 1 << 3;          // 0x008
    Color = 1 << 4;          // 0x010
    Ambient = 1 << 5;        // 0x020
    Diffuse = 1 << 6;        // 0x040
    Directional = 1 << 7;    // 0x080
    Saturated = 1 << 8;      // 0x100
    Subdivide = 1 << 9;      // 0x200
    Static = 1 << 10;        // 0x400

    Inactive = 0;
    Default = TranslationAbs
    | Translation
    | Range
    | Directional
    | Saturated
    | Subdivide;
}
}

The light node's name (32 bytes) is zero-terminated and zero padded. The light node's index (i16) is used to look up the light in the animation definition. It's unclear why the light state contains both the light node's name and index. When looked up by index, that name matches the name in this structure. The next field (u16) is padding and will always be zero (0).

The light flags (u32) are also used for light nodes in GameZ, and indicate which further fields/states are valid and should be set. The TranslationAbs flag (1 << 0, 0x001) is never set in sequence events/in anim.zbd that we have.

The active state (u32) is always zero (0, false) or one (1, true). The point source field (u32) indicates whether the light is directed (0, never occurs) or a point source (1, always). The directional field (u32) is always zero (0, false) or one (1, true). If the directional flag (1 << 7, 0x080) is unset, this is always false. The saturated field (u32) is always zero (0, false) or one (1, true). If the saturated flag (1 << 8, 0x100) is unset, this is always false. The subdivide field (u32) is always zero (0, false) or one (1, true). If the subdivide flag (1 << 9, 0x200) is unset, this is always false. The static field (u32) is always zero (0, false) or one (1, true). If the static flag (1 << 10, 0x400) is unset, this is always false.

The node index (i32) is used to look up the parent/at node in the animation definition.

It's unclear why dynamic sounds aren't looked up by index, maybe this event creates a new node? The next field (u32) is always one (1). Inherit translation (u32) seems to be a bit field and is either zero (0) or two (2). The active state (u32) is either zero (0, false) or one (1, true).

The node index (i16) is used to look up the parent/at node in the animation definition. This is sometimes set to the special input node value. The next field (u16) is padding and will always be zero (0). The translation (Vec3) is presumably the sound's translation from the node. If the translation flag (1 << 1, 0x002) is unset, then both the node index and translation will be zero (0/0.0).

The rotation or direction (Vec3) is always zero (0.0), because the rotation flag (1 << 2, 0x004) is never set in sequence events/in anim.zbd that we have.

The near range (f32) and far range (f32) likely indicate the light's range. The near range is greater than or equal to zero (0.0), and the far range is greater than or equal to the near range. If the range flag (1 << 3, 0x008) is unset, then both are zero (0.0).

The colour (Color) is the RGB value of the light, and all values between zero (0.0) and one (1.0), inclusive of both. If the colour flag (1 << 4, 0x010) is unset, all values are zero (0.0). Finally, the ambient (f32) and diffuse (f32) control two aspects of lighting used in computer graphics. Both values are between zero (0.0, inclusive) and one (1.0, inclusive). If the ambient flag (1 << 5, 0x020) or diffuse flag (1 << 6, 0x040) are unset, the respective value will be zero (0.0).

Light animation

Reader name: LIGHT_ANIMATION, Type: 5, Size: 100

Object active state

Reader name: OBJECT_ACTIVE_STATE, Type: 6, Size: 8

Object translate state

Reader name: OBJECT_TRANSLATE_STATE, Type: 7, Size: 20

Object scale state

Reader name: OBJECT_SCALE_STATE, Type: 8, Size: 16

Object rotate state

Reader name: OBJECT_ROTATE_STATE, Type: 9, Size: 20

Object motion

Reader name: OBJECT_MOTION, Type: 10, Size: 320

Object motion from to

Reader name: OBJECT_MOTION_FROM_TO, Type: 11, Size: 132

Object motion SI script

Reader name: OBJECT_MOTION_SI_SCRIPT, Type: 12, Size: Variable, at least 24

Object opacity state

Reader name: OBJECT_OPACITY_STATE, Type: 13, Size: 12

Object opacity from to

Reader name: OBJECT_OPACITY_FROM_TO, Type: 14, Size: 24

Object add child

Reader name: OBJECT_ADD_CHILD, Type: 15, Size: 4

Object cycle texture

Reader name: OBJECT_CYCLE_TEXTURE, Type: 17, Size: 8

Object connector

Reader name: OBJECT_CONNECTOR, Type: 18, Size: 76

Call object connector

Reader name: CALL_OBJECT_CONNECTOR, Type: 19, Size: 68

Call sequence

Reader name: CALL_SEQUENCE, Type: 22, Size: 36

Stop sequence

Reader name: STOP_SEQUENCE, Type: 23, Size: 36

Call animation

Reader name: CALL_ANIMATION, Type: 24, Size: 68

Stop animation

Reader name: STOP_ANIMATION, Type: 25, Size: 36

Reset animation

Reader name: RESET_ANIMATION, Type: 26, Size: 36

Invalidate animation

Reader name: INVALIDATE_ANIMATION, Type: 27, Size: 36

Fog state

Reader name: FOG_STATE, Type: 28, Size: 68

Loop

Reader name: LOOP, Type: 30, Size: 8

If

Reader name: IF, Type: 31, Size: 12

Else

Reader name: ELSE, Type: 32, Size: 0

Elseif

Reader name: ELSEIF, Type: 33, Size: 12

Endif

Reader name: ENDIF, Type: 34, Size: 0

Callback

Reader name: CALLBACK, Type: 35, Size: 4

FBFX color from to

Reader name: FBFX_COLOR_FROM_TO, Type: 36, Size: 52

Presumably, FBFX stands for "frame buffer effect".

Detonate weapon

Reader name: DETONATE_WEAPON, Type: 41, Size: 24

Puffer state

Reader name: PUFFER_STATE, Type: 42, Size: 580

#![allow(unused)]
fn main() {
struct PufferState {
    name: [u8; 32],
    puffer_index: i16,
    pad34: u16, // always 0
    flags: PufferStateFlags,
    active_state: i32,
    node_index: u32,
    translation: Vec3,
    local_velocity: Vec3,
    world_velocity: Vec3,
    min_random_velocity: Vec3,
    max_random_velocity: Vec3,
    world_acceleration: Vec3,
    interval_type: u32,
    interval_value: f32,
    size_range: Vec2,
    lifetime_range: Vec2,
    start_age_range: Vec2,
    deviation_distance: f32,
    unk156: f32, // always 0.0
    unk160: f32, // always 0.0
    fade_range: Vec2,
    friction: f32,
    unk176: u32, // always 0
    unk180: u32, // always 0
    unk184: u32, // always 0
    unk188: u32, // always 0
    tex192: [u8; 36],
    tex228: [u8; 36],
    tex264: [u8; 36],
    tex300: [u8; 36],
    tex336: [u8; 36],
    tex372: [u8; 36],
    unk408: [u8; 120], // always 0
    unk528: u32,
    unk532: u32, // always 0
    unk536: f32,
    unk540: f32,
    growth_factor: f32,
    unk548: [u8; 32], // always 0
}

bitflags PufferStateFlags: u32 {
    // this might not be right?
    Translate = 1 << 0;          // 0x00001
    GrowthFactor = 1 << 1;       // 0x00002
    // this might not be right?
    State = 1 << 2;              // 0x00004
    LocalVelocity = 1 << 3;      // 0x00008
    WorldVelocity = 1 << 4;      // 0x00010
    MinRandomVelocity = 1 << 5;  // 0x00020
    MaxRandomVelocity = 1 << 6;  // 0x00040
    IntervalType = 1 << 7;       // 0x00080
    // this might not be right?
    IntervalValue = 1 << 8;      // 0x00100
    SizeRange = 1 << 9;          // 0x00200
    LifetimeRange = 1 << 10;     // 0x00400
    DeviationDistance = 1 << 11; // 0x00800
    FadeRange = 1 << 12;         // 0x01000
    Active = 1 << 13;            // 0x02000
    CycleTexture = 1 << 14;      // 0x04000
    StartAgeRange = 1 << 15;     // 0x08000
    WorldAcceleration = 1 << 16; // 0x10000
    Friction = 1 << 17;          // 0x20000

    Inactive = 0;
}
}

The puffer's name (32 bytes) is zero-terminated and zero padded. The puffer's index (i16) is used to look up the puffer in the animation definition. It's unclear why the puffer state contains both the puffer's name and index. When looked up by index, that name matches the name in this structure. The next field (u16) is padding and will always be zero (0).

The puffer state's flags (u32) indicate which further fields/states are valid and should be set. If the state flag (1 << 3, 0x00008) is unset, then no other flags are set in the sequence events/in anim.zbd that we have. This seems to indicate whether the puffer is disabled/inactive. At least, that's the best guess. However, there's also an active flag and an active state, which seems to be slightly different.

The active or lifetime state (i32) seems to allow for a range of values. If the active flag is set, then the active state will be greater than or equal to one (1), and less than or equal to five (5). If the active flag (1 << 13, 0x02000) is unset, then the active state is always negative one (-1).

TODO

Text reader files

Text reader files have the file extension .zrd, which could stand for Zipper Reader. Until 2022, I only knew of binary reader files. However, there exist text reader files, for example DefaultCtlConfig.zrd.

Investigation (MW3)

Although it was assumed the reader files were Lisp-like from the binary reader files, the text reader files confirm this:

(
  ⇥ KEYS (
  ⇥   ⇥ (CMD_ALPHASTRIKE  ⇥ keya(0x9c)  ⇥ joybtn(0x6))
  ⇥   ⇥ (CMD_AMS_TOGGLE  ⇥ keya(0x1e))
...
  ⇥ )
  ⇥ AXES (
  ⇥   ⇥ (Throttle  ⇥ joystick(Z)  ⇥ slope(-0.500000)  ⇥ intercept(0.500000)  ⇥ deadzone(0.050000))
  ⇥   ⇥ (Twist  ⇥ joystick(Rz)  ⇥ slope(-1.000000)  ⇥ intercept(0.000000)  ⇥ deadzone(0.000000))
  ⇥   ⇥ (Pitch  ⇥ joystick(Y)  ⇥ slope(1.000000)  ⇥ intercept(0.000000)  ⇥ deadzone(0.000000))
  ⇥   ⇥ (LR  ⇥ joystick(X)  ⇥ slope(-1.000000)  ⇥ intercept(0.000000)  ⇥ deadzone(0.000000))
  ⇥ )
)

Note that the whitespace delimiter used is a tab (indicated as ⇥ above).

There are a lot of interesting quirks with this lisp dialect. First, the whitespace delimiters are definitely tab, carriage return (CR), and line feed (LF), i.e. CR+LF don't seem to have a syntactic value. This is not unusual, but it isn't clear if a space is a valid delimiter. This also ties into the fact that strings don't seem to be quoted.

From the binary reader files, we know there are only four data types:

  • Integers (i32)
  • Floating-point numbers (f32, "floats")
  • Strings
  • Lists

Interestingly, the text reader files hint that at least mentally, there were more. For example, it seems like strings are always upper-case, and lower-case strings are symbols. This also leads to a concept of a "function" data type in the text reader, for example joybtn(0x6). In other Lisps, this would've been written as (joybtn 0x6). Also, maps/dictionaries are simply lists with implicit key-value pairs.

We don't know how the text reader files are precisely lexed. If I had to guess from binary reader files, the example above would be expressed in pseudo-JSON as follows:

[
  "KEYS",
  [
    ["CMD_ALPHASTRIKE", "keya", [0x9c], "joybtn", [0x6]],
    ["CMD_AMS_TOGGLE", "keya", [0x1e]],
...
  ],
  "AXES",
  [
    ["Throttle", "joystick", ["Z"], "slope", [-0.5], "intercept", [0.5], "deadzone", [0.05]],
    ["Twist", "joystick", ["Rz"], "slope", [-1.0], "intercept", [0.0], "deadzone", [0.0]],
    ["Pitch", "joystick", ["Y"], "slope", [1.0], "intercept", [0.0], "deadzone", [0.0]],
    ["LR", "joystick", ["X"], "slope", [-1.0], "intercept", [0.0], "deadzone", [0.0]]
  ]
]

I believe the engine has an implicit schema, in that it tries to find string values by index, and then any information/arguments it needs are retrieved from index + 1.

There are still questions. For example, what happens if we mess with the order of "AXES"? Presumably when parsing, it looks at list index 0 to figure out what to put where in already existing data structures in the engine.

Control configuration

The MechWarrior 3 engine uses DirectInput for controls. This also matches the key codes (keya) in the DefaultCtlConfig.zrd, they are DirectInput key codes. Below is a converter:





(try pressing some keys in the textbox)

Beginner's guide to hex viewing

You'll need a hex viewer or editor. On Windows, I strongly recommend HxD. This guide is specifically for 32-bit inspection, so 64-bit values are unlikely.

Endianness

Endianness is an important concept, but complicated. I'll cover the minimum necessary. x86 CPUs all use little endian. This means if you have a 32-bit value, for example 0xDEADBEEF, it is stored in memory as [0xEF, 0xBE, 0xAD, 0xDE].

0xDEADBEEF
  | | | |  (mem)
  | | | +- 0xEF
  | | +--- 0xBE
  | +----- 0xAD
  +------- 0xDE

This is slightly unintuitive, but luckily, most hex viewers will be able to display the decoded values.

Integers

Integers can be either signed or unsigned. Unsigned integers can be zero, or positive. Signed integers can be negative, zero, or positive. Zero has no sign (there is only 0, not +0 and -0). Both signed and unsigned integers have a size; generally 8, 16, 32, or 64 bits (1, 2, 4, or 8 bytes).

Unsigned

SizeMin (dec)Min (hex)Max (dec)Max (hex)
8 bit00x002550xFF
16 bits00x0000655350xFFFF
32 bits00x0000000042949672950xFFFFFFFF

Signed

SizeMin (dec)Min (hex)-1 (hex)0 (hex)Max (dec)Max (hex)
8 bit-1280x800xFF0x001270x7F
16 bits-327680x80000xFFFF0x0000327670x7FFF
32 bits-21474836480x800000000xFFFFFFFF0x0000000021474836470x7FFFFFFF

As you can see, for signed integers, the sign is encoded in the top-most bit (most significant bit or MSB). The negative values are also not intuitive, since they are encoded in two's complement. It's helpful to know this; but again most hex viewers can decode signed integers.

In general, unless you see an obviously signed value (for example, anything above 0x80000000 where > 2147483647 would be too large), it's impossible to tell if the type is signed or unsigned from the reverse engineering. Also, due to little endian storage, if you see the bytes [0x7F, 0x00, 0x00, 0x00] (7F000000), you cannot tell if this is a) a 32-bit integer with the value 127, b) two 16-bit integers with the values 127 and 0, or even c) four 8-bit integers with the values 127, 0, 0, 0.

For this reason, if you write any parsing code, you may want to strictly check the bounds of values. This then makes it easier to catch unexpected values earlier.

Quiz

The quiz001 file contains some values, all of the same type. Can you tell what type of integer (and therefore how many they are), and what the values are?

Reveal answer

There were ten 32-bit signed integers: 111, 9999, 10, 2000, 10, -200, 10, 0, 1, 100000

Floating point values

Floating point values basically encode a number in scientific notation, see IEEE 754. The information of a signed bit, the exponent, and the fraction is encoded into either 32 or 64 bits (called single or double precision, respectively). For example, a 32-bit floating point value is packed as follows:

0 01111100 01000000000000000000000 = 0.15625
| |      | |                     |
| exponent      fraction
sign

A normal human can't be expected to decode this; the hex viewer will help here also. However, with a small amount of practice, you can recognise some values.

  • 1.0 is 0x3F80000
  • -1.0 is 0xBF800000
  • 10.0 is 0x41200000
  • -10.0 is 0xC1200000
  • 100.0 is 0x42c80000
  • -100.0 is 0xC2C80000

However, 0.0 is 0x00000000, and so indistinguishable from an integer! In this documentation, I try to denote floating point values with a decimal point and at least one place to distinguish them from integers, e.g. 10 is an integer, 10.0 is a float.

Quiz

The quiz002 file contains a mix of 32-bit integers and 32-bit floats. Can you tell what the values are?

Reveal answer

The values were 9999, 0.5, -0.5, 1, -1, 200, 200.0, and indeterminate (could have been 0/integer or 0.0/float).

Strings

Strings in C are basically arrays of ASCII/ANSI characters. Each character has a numeric value (see Wikipedia for an ASCII table). This is why a lot of hex viewers also show an ASCII view next to the hex view, and simply skip non-printable characters. Because each character is a byte, you do not need to worry about endianness for ASCII strings:

b"Hello world" = 48 65 6c 6c 6f 20 77 6f 72 6c 64

Strings in C are usually terminated with a null or zero character (\0, 0xFF); this is called zero-terminated. So "Hello world" would actually be 48656c6c6f20776f726c6400.

Strings are either stored as fixed length or with a known length encoded before the string. Fixed length strings are usually padded with zeros, so encoding "Hello world" as a 16 length string is:

b"Hello world\0\0\0\0\0" = 48656c6c6f20776f726c640000000000

However, the padding can also be garbage if the programmer forgets to zero the memory, so this is also "Hello world": 48656c6c6f20776f726c6400DEADBEEF (note this is still zero-terminated). For a known-length string, "Hello world" could be either:

# Zero terminated
b"Hello world\0" = length 12 = 0c000000 48656c6c6f20776f726c6400
# (b'\x0c\x00\x00\x00Hello world\x00' in Python)
# Not terminated
b"Hello world" = length 11 = 0b000000 48656c6c6f20776f726c64
# (b'\x0b\x00\x00\x00Hello world' in Python)

Assuming the length is encoded as a 32-bit integer.

Quiz

The quiz003 file contains several strings. They are separated by DEADBEEFDEADBEEFDEADBEEF. Can you tell what type of strings they are, and what the values were?

Reveal answer

  1. Fixed length of 32: "Lorem Ipsum"
  2. Variable length, not terminated: "The quick brown fox"
  3. Fixed length of 16: "DEADBEEF" (note that the string here is encoded in ASCII, which is not the same as 0xDE, 0xAD, 0xBE, 0xEF)
  4. Fixed length of 16: "Hello world", with padding of "Padx"
  5. Variable length, zero terminated: "The quick brown fox\0"

Structures

This is a vast oversimplification, but structures basically describe a view/block of memory, that makes it easier to work with in code. They are usually collection of fields, although the field names are identifiers in the source code only, and not present in the actual memory. For example, given this C structure:

struct Foo {
    uint32_t a;
    float a;
}

Then Foo { a = 100, b = 100.0 } would be encoded as:

64000000 0000c842

When reverse engineering, the structure definitions is what we're usually trying to recover. I'll be using pseudo-Rust code to describe structures, as in the rest of the documentation (as opposed to C code).

Quiz

Reverse engineer the structure of quiz004. All data types are 32-bit/32-bit aligned.

Reveal answer

The structure was:

#![allow(unused)]
fn main() {
struct Quiz010 {
    a: f32,
    b: [u8; 16],
    c: i32, // or u32
}
}

And the value was Quiz004 { a: 1.5, b: "You can do it", c: 8888}.