Friday, June 29, 2007

Hackweek: The devil is in the details (or ODF generation woes)

Yesterday, I was poluting planets with my story of converting images included in WordPerfect files. There were still some missing links and thus it is nice to give you another screenshot. Christian Lippka, a supreme intelligence of Draw and Impress fame, helped me to examine the OpenDocument Drawing objects that I was creating. The problem was that they were not scaling to the frame they were supposed to sit in. And we found the problem. (OK, he found it!) The solution is to include in the drawing document this XML snippet:

  <config:config-item-set config:name="ooo:view-settings">
    <config:config-item config:name="VisibleAreaTop" config:type="int">0</config:config-item>
    <config:config-item config:name="VisibleAreaLeft" config:type="int">0</config:config-item>
    <config:config-item config:name="VisibleAreaWidth" config:type="int">7941</config:config-item>
    <config:config-item config:name="VisibleAreaHeight" config:type="int">19124</config:config-item>

The two magic numbers are width and height of the image in 100ths of milimeter. BTW, I am interested to see what will happen when this code starts to be used for their WordPerfect importer also by KWord. But, here I will leave Ariya the pleasure to handle possible implementation differences.

And so, thanks to the light of Christian, this is how the document looks like now. Ok, I have to confess a little cheating. The dimension of the frame is for the while not read from the file by a parser. I was quite lazy to start to parse this information. It is for the time being hardcoded in the code, but should not be conceptually difficult to parse the information,... just boring to death.

I would like to mention also another person that gave me a useful tool that I was using during this week. Far from being anything close to XSL(T) fan, I found extremely useful Svante Schubert's transforms that allow to load and export files in OpenDocument flat XML format in and from It is a load easier like this to make little experiments with document without having to run zip and unzip zillion of time.

So, what remains to be done? Naturally, to write the parser of the box information for different WP file-formats inside libwpd. This is something I am really finding very boring and ungrateful task. So, if you want to be my personal hero, send me a patch.

For those WordPerfect users that have a load of documents with images, the documentation says that one "Graphics Filename" prefix packet can point to several "Graphics Cached File Data" packets that contain the graphic information. Nevertheless, the documentation does not say how the data is split in this case. In several full-blown WPG streams or in one WPG stream that has chunks stored in different prefix packets? I was unable to create a document with several pointers in the "Graphics Filename" prefix packet, so if you have some of them, here I am to receive them. Or even better, send me a patch for their handling in libwpd.