Thursday, August 31, 2006

Mac Mini and WP Mac file-formats

Mac Mini donated by unnamed on behalf of unnamed :-)

Thank to a Mac Mini given to me for the purposes of libwpd development by a person that does not want to be mentioned, as a part of a bigger donation by a company that does not want to be mentioned either (I promised to be discrete, am I?), I was able to do some more work on the support of WordPerfectTM for Mac file-formats inside libwpd. As from the version 0.8.6, libwpd has a support for WP Mac 2.x file-format and the CVS HEAD has an initial support for the oldest WP Mac file-format, 1.x.

Since I was looking for the specifications for these two file-formats for some time without any success, I would just want to give the information that I have:

  • WP Mac 2.x looks basically like the WP Mac 3.x file-format with the difference, that it does not support tables. Instead of tables, it uses parallel text columns (WP speciality?). Nevertheless, from the point of view of an import filter, this means that the code that is reading WP Mac 3.x files will work with WP Mac 2.x.

  • WP Mac 1.x file-format is a special one. It manifestly based on WP Dos 4.2, with some elements that one can find later in WP Mac 2.x and 3.x. Since we again do not have any documentation besides a working WordPerfect 1.0.5 (Could a company whose name starts by "M" and ends by "icrosoft" show us how functional is their word processor from 1988?), I will describe what I managed to understand from this format by reverse-engineering. Like that people who are without success looking for the specifications may find this blog and earn me some money by clicking on the adds :-)

WordPerfect for Macintosh 1.0 file-format

WP Mac 1.x is similar to WP Dos 4.2 in the fact that it has no header whatever, just a plain text with WP functions inserted. According to my observation, they are similar to the extend that a function that has certain purpose in WP 4.2, will have the same or at least very very very similar purpose in WP Mac 1.x, although the function lengths may be different between the two formats.

The block of codes between 0x80 and 0xBF are Single byte functions that are identical (at least I did not find difference until now) in the two file-formats.

The codes from 0xC0 to 0xFF (?) are codes for Multi-byte functions. They can have different sizes. A more or less accurate account of the sizes (including the opening and closing gates) can be found here and will be always updated as I am advancing in my study of the file-format. The functions where the length is mentioned as a positive number are Fixed-length functions and those ones where there is "-1" instead of the length, are Variable-length functions. It is in both subgroups of the Multi-byte functions that the difference between the WP Dos 4.2 and WP Mac 1.x file-formats exists.

One of the differences is that where WP Dos 4.2 measures the margins, positions and other lengths in “text columns” (8-bit value), WP Mac 1.x uses points as 16-bit (unsigned?) big-endian ordered number (72 points = 1 inch). This is contributing to the difference concerning the length of Fixed-length functions between the two formats.

As for the Variable length functions, the difference between WP Dos 4.2 and WP Mac 1.x is big. In the WP 4.2 file-format, to know the length of the function one needed to scan for the closing gate, WP Mac 1.x gives the size of the data as a 32-bit big-endian ordered unsigned number just after the opening gate and just before the closing gate. It means that the function length is this number plus the 10 bytes that correspond to its double occurrence as well as to the opening and closing gate. Those that are used to work with Corel documentation will find this maybe more clear:

  <function code>    ...    1 byte
  {size of data}     ...    32-bit big-endian ordered unsigned integer
  <data> x size of data
  {size of data}
  <function code>

When reverse-engineering the function codes, it is wise to be inspired by both WP Dos 4.2 and WP Mac 3.x file formats. Certain functions follow very closely the WP Dos 4.2 specifications (with above mentioned differences in the length and position information), certain functions follow the WP Mac 3.x logic (like the Tab Set is containing condensed tab tables that follow the logic of WP3 Set Tabs function).

Help me!

I also met quite a big problem concerning the font information. The file-format is storing the font information in 16-bit number, that was identified by hub as corresponding to Macintosh fontID. According to a newer Apple documentation and also according to a sound mind (no more appropriate link here :-) ), identifying fonts by fontID is broken by design. Nevertheless, on the way towards the world domination of libwpd, it would be an error not to try to convert the font information at least. If someone of you knows about a way to get unambiguously the font name from fontID (libwpd is being used on many platforms, so the presence of Apple API is not to be assumed), I will not refuse such information. OTOH, a table mapping 0xFFFF+1 numbers to strings would bloat a bit the libwpd. But, if there is a way to place the fontIDs into groups and map the groups each one to one font that is representative of the group, it shouldn't be so difficult to convince me to try to implement it :-)

World domination under way

Having said this, it looks like libwpd 0.8.7 (whenever we release it) will probably support to a certain extent ALL WordPerfectTM file-formats on this planet. Did WP Dos 1.x ever exist?