Monday, November 26, 2012

LibreOffice CorelDraw import filter: improvements by user input

It has been a long time without communicating with the distinguished readership of my blog. There was a hard decision to be made between producing code and producing literature. The code won until now. But now I have found a time to lift my head up from the coding, so the literature is back.

Many of you might be wondering what happened since my post about the text support in CorelDraw files from last June. Things are going pretty well. Since the CorelDraw import filter was released with LibreOffice 3.6, the users started to use the feature and report bugs. We were working on fixing them and improving the libcdr's quality.

Quick overview of reverse-engineering process

From my discussions with our users and developers on-line and during some of the conferences that I attended, I realize that there is a slight misunderstanding in the large public about how the reverse-engineering works. So, here are some thoughts that may help understand it a bit more:

At the beginning of the process, there is a file-format. We don't know anything about its internal structure. There is no documentation whatsoever about it. One tries to generate a file in this file-format and examine it in hexadecimal viewer. Next, one tries to operate some little change in the document and examine what changed in the file itself. Eventually after many iterations, one might find regularities and some structure that helps to divide the file into several sections or blocks of more manageable size. It is essential in this phase that one can encode this information into some kind of introspection tool, since a plain hexadecimal viewer is not a very productive tool in the long run. We use for introspection of documents Valek Filippov's oletoy, a python tool that stores our knowledge about the structure of different file-formats.

Once there is enough information about how to parse the document structure, the next target becomes to get some visible results. In order to save time and get visible results in a short time, all libraries such as libcdr or libvisio, use the libwpg's interface. Reusing this interface means a considerable saving of time, since there are already working generators of ODG and SVG from the callbacks of this interface. Having visible results soon in the development/reverse-engineering cycle also allows visually asses the import results and correct them if necessary. Eventually, one can realize the absence of necessary information and try to go back to reverse-engineering to find it.

Users' feedback is essential

The support of reverse-engineered file-formats is a constant work-in-progress. A subtle dance between implementation and information digging. In this process, the user feedback is an essential element. The theories about the meaning of some information inside file hold only until a file comes to falsify them. Even a complex file generated by a developer is easily beaten by real life documents. And each file that shows a "weird" bug is advancing the understanding of the file-format. Let us look at this example:

After the release of LibreOffice 3.6.1, we got a not so good assessment of the quality of the CorelDraw import filter in the' c't review. Those of you that understand German can delight in the nuanced evaluation:

Ein neuer Import-Filter in Draw öffnet jetzt auch CorelDraw-Dateien, was uns im Test allerdings nur mit sehr einfachen Zeichnungen fehlerfrei gelang. In dieser Form ist er schlicht unbrauchbar.

Which can be mildly translated into English (given the understatements so common in en-GB):

A new import filter in Draw opens now also CorelDraw files, which we managed to do without errors only with very simple drawings. In this form, it is rather unusable.

Since we are really concerned about the quality of our software, we are thankful for any bug report whether it is brought to us in a friendly or other manner. This specific bug report helped us to understand how are stored in newer CorelDraw files chains of matrix transforms. And since a picture speaks louder then thousand words, compare the document c't was refering to opened in LibreOffice 3.6.2 and then in LibreOffice 3.6.3, after we fixed the position bits.

File opened in Libreoffice 3.6.2 The same file opened in LibreOffice 3.6.3
File in LibreOffice 3.6.2 File in LibreOffice 3.6.2

So feel encouraged to submit bugs against the CorelDraw import filter, or — even better — send us patches for your favorite itch.