Thursday, December 07, 2006

Benefits of getting lost

I do not know whether it happened to you to get lost only to discover a new exciting landscape of your town, parts that you have never entered before. And so, in spite of the fact that you wasted some time, you feel lucky.

A document crashing OpenOffice.org

This is exactly what happened to me last week. Everything started with the issue 71487. OpenOffice.org was crashing with the document, but the internal libwpd's wpd2raw tool was spitting out a document that was well formed. AbiWord was opening the document without shouting loud, so the quick conclusion was that the culprit would be the writerperfect code. Writerperfect is the the base of a nice tool, wpd2sxw, that uses libwpd to read a WordPerfect document and produces a SXW. The core code of writerperfect is used also in KWord's and OpenOffice.org Writer's import filters.

Nevertheless, writerperfect produced with this document a nice well formed flat XML in SXW file-format. Since the generated flat XMLs never really validated against the OpenOffice.org 1.0 DTD, the well formed-test is so far the only tool that we are able to use to track regressions.

Valgrinding writerperfect

The things started to be a little bit more tough. Gdb was giving a partially corrupted trace and the resulting crash was outside the writerperfect code. So, in a desperate attempt to get the issue fixed, I tried to use valgrind and check whether writerperfect does not have somewhere a jump or a branch dependent on an uninitialized variable. I run all documents of our regression suite through valgrind and realized that writerperfect had a lot a lot a lot of memory problems. It took me whole weekend and more to remove all the memory leaks (partially destroyed objects due to non-virtual destructor in base classes, a container of pointers going out of scope,...). This work lead to the inclusion of writerperfect valgrind test in our regression test-suite and in removal of all detected memory problems.

Problems with SXW generation on x86_64

After all this cleanup work, I tried to load the document again. And OpenOffice.org crashed again exactly like the past times. Moreover, on my home machine that runs Ubuntu Edgy amd64, the command-line tool wpd2sxw produced a clean XML in SXW format on standard output, but the content.xml in the zip file contained some garbage characters. On other machine, running Ubuntu Edgy i386, the garbage was not there.

The amazing permissivity of the WordPerfect file-format

A bit disgusted, I asked Mathias Bauer whether he cannot see anything. He took the document and confirmed that the trace (the part I could not see because of the corruption) originates from the WordPerfect importer. So, the last desperate move was to boot the Windows partition and examine the document in WordPerfect itself. I know, would be maybe the first step, but given that I have only one Windows partition and that one is on my wife's laptop... Side-by-side examination of the original document opened in WordPerfect and the converted document opened in AbiWord showed that it was true that AbiWord opened the document, but it did import only half of it. So, the time came to have some nice eye-to-eye session with ghex2. The close examination of the document using hexadecimal viewer showed that the document contains a footnote that itself contains 3 footnotes. WordPerfect reacts to such cases by ignoring completely the nested footnotes, but it leaves the functions in the stream unchanged. I am always puzzled seeing the way WordPerfect preserves user's errors for future generations. So the solution was simply to instruct libwpd to ignore foot/endnotes if it is already parsing a foot/endnote. Simple fix of some lines and the document loads correctly in OpenOffice.org, AbiWord and KWord.

gsf_output_printf and long strings on x86_64

wpd2sxw command-line tool uses libgsf abstraction for writing different streams into the the SXW zip-file. So, after having audited our code thoroughly and not finding anything that could be wrong, my attention turned to libgsf, more exactly to the gsf_output_printf function. I discovered that this function uses among others g_vsnprintf call. Knowing the woes that one can have due to different implementations of vsnprintf out there, I replaced the gsf_output_printf calls by gsf_output_puts calls and...

... the resulting SXW loaded into the OpenOffice.org without any problem. Instead of bugging Jody who should have by now enough of me for some weeks (so much I tried to suck knowledge out of his brain in my H.Opeless phase), I spoke to Dom Lachowicz about the problem and, eventually, Morten fixed it in libgsf CVS. It turned out to be the same problem that affected about a year ago the GsfOutputMemory.

Positive externalities of being lost

It is true that the real fix for issue 71487 did not take more than few lines of code, but the fact of being lost and H.Opeless for some time had nice positive externalities. Thanks to this situation, it came to my mind to start to valgrind the writerperfect code (it was not on my todo list, at this time at least) and the memory problems got solved. A bug in libgsf code was triggered and fixed.

And so, in spite of the fact that I lost quite a lot of time with this, I feel lucky when I look at the positive externalities.