Thursday, June 28, 2007

Hackweek @ Novell

It was said, it was repeated and repeated. This week, is a hackfest week at Novell. We are hacking (programming for those who do not like the word) on different projects/ideas that are close to our heart. Naturally, TrainedMonkey(tm) could not remain behind. And guess, what projects he chose to hack on... Naturally, libwpd, libwpg and their OpenOffice.org-related filters. It is really nice to be upstream maintainer of several things, like that, when given a week of free hacking, one is likely not to lack ideas what to do with the time.

The import of embedded pictures is one of the most overdue features in our WordPerfect(tm) converter. So, I jumped on the task when this opportunity came. And fortunately, not everything had to be coded from scratch. Ariya, in the frame of a Google Summer of Code project gave a decisive push to libwpg and he has kept working on improving it. We came to the point, where standalone WordPerfect Graphics files can be nicely converted to SVG or ODG including embedded bitmaps. So, what were the steps to do?

First of all, it was necessary to port libwpd/writerperfect to generate OpenDocument instead of the legacy OpenOffice 1.0 file-format. This was needed because of the nice way, one can incorporate binary objects in OpenDocument represented as flat XML. This was accomplished on Monday, and the conversion of all documents from libwpd regression suite produces a nice ODF stream that validates against the OpenDocument 1.1 strict schema.

Next step was actually to parse the embedded image data in WordPerfect documents. A nice discovery is that in old WP5.x documents, the images are stored without the WPG header. So, one had to hack into libwpd a possibility to force the document parsing even if it does not recognize it. This was done on Tuesday.

The same day in the evening, I started to hack on passing the data libwpd gives us to libwpg and processing its output and incorporating the images into the OpenDocument text stream. This work continued on Wednesday and resulted in finally seeing an image. The scaling and anchoring was not good, but at least one could see something else then lines of cheesy C++. But it was really hard to make the generated documents (although valid according the ODF schema) load nice and display the pictures well. I poked some people whose brain should be a bit better then a brain of TrainedMonkey(tm), but did not manage to get more intelligent. And since a soup is never eaten as hot as it is cooked, I left it there for the night.

A night brought some rest for the mind and I started my day by a nice chat with the guy who is expert in pictures embedded in OpenOffice.org Writer documents. I came to the conclusion, that the best would be to include those images directly as <draw:object> inside a <draw:frame>. The advantage is that this would keep the images as editable pictures, the same way the Corel customers can edit them inside the WordPerfect Office(tm) applications. This meant to throw a big chunk of Wednesday's code in favour of a more elegant solution. For those familiar with libwp* world, the OdgExporter class was copied to writerperfect too :-) (Now it is part of Novell's WPG import filter for OpenOffice.org, of wpg2odg tool, of perfectspot graphics viewer, and now of writerperfect). There is a way of improvement, since it is likely, that it should not be very difficult to join wpg2odg and writerperfect in one source package.

I spent some time trying to debug my code, since the images were not showing. Then, by chance, I discovered that the devil was in the right office:mimetype attribute of the embedded object :-(. So, now one can see the images inside a frame. The scaling is still not correct, and I am not sure at this moment, whether I will be able to scale the image as whole or will have to scale every single shape on import, but it results in something that is editable. And that was the goal of the rewrite.

Tomorrow, I will try to code again the parser of the box information, so that we extract from the file information of the box anchoring, size and position. This promisses to be quite a boring stuff, but necessary missing link.

BTW, libwpd and libwpg will not refuse any fine hacker that would like to contribute to enhance our conversion feature-set while still keepeing our stability and document import rate.