ODT/XML first aid

If you work with tex4ht to convert LaTeX to OpenDocument (for subsequent Word conversion in NeoOffice, say), you may find yourself wanting to doctor an .odt file. At least I did; sometimes tex4ht outputs an odt with problems or syntax errors. But here’s the nice thing, if you need some quick odt first aid: as I learned from this article by Maarten Wisse, an odt is really just a zip archive.

> unzip test.odt
Archive:  test.odt
   creating: META-INF/
  inflating: META-INF/manifest.xml
   creating: Pictures/
  inflating: content.xml
  inflating: meta.xml
  inflating: settings.xml
  inflating: styles.xml

Mirabile dictu, those xml files are pretty easy to read. All the good stuff is in content.xml and styles.xml. You can burrow into these files wth a text editor, modifying style parameters or the way tex4ht has tried to tag your content. And when you’re done:

> zip test.odt content.xml
updating: content.xml (deflated 81%)

That’s all! If NeoOffice gives you an error when it tries to open a generated odt, it will tell you the line number of the syntax problem, and you can just fix it by hand.

All right, I know, kludge city. But very, very useful in a pinch.



Filed under Conversion, kludgetastic, tex4ht, Word

2 responses to “ODT/XML first aid

  1. Alex Roberts

    I wonder if I am doing this wrong. When I took the buggy ODT file and unzipped it, I got different results: I typed “unzip Untitled.odt”, and the following appeared:

    Archive: Untitled.odt
    inflating: Untitled-blx.bib
    inflating: Untitled-manifest.4of
    inflating: Untitled-meta.4ot
    inflating: Untitled-settings.4os
    inflating: Untitled-styles.4oy
    inflating: Untitled.4ct
    extracting: Untitled.4od
    inflating: Untitled.4tc
    inflating: Untitled.aux
    inflating: Untitled.bbl
    inflating: Untitled.blg
    inflating: Untitled.dvi
    inflating: Untitled.html
    inflating: Untitled.idv
    inflating: Untitled.lg
    inflating: Untitled.log
    inflating: Untitled.pdf
    inflating: Untitled.tex
    inflating: Untitled.tmp
    inflating: Untitled.xref
    inflating: missfont.log

    No XML files. Did I do something wrong?


    • Andrew Goldstone

      One more reply, Alex, also try the real TeXsperts in around tug.org or tex.stackexchange.com, as well as the tex4ht mailing list, hosted at tug.org as well. This could be a problem with the behavior of your version of zip. Try renaming untitled.odt to untitled.zip, moving it to a clean directory, and unzipping it with a graphical unzipper. See what you get. If you still get all the cruft produced by pdflatex and tex4ht, then you’ll have to dig deeper, double-check you have the most up-to-date versions of everything, etc.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s