Collaborating on Documents Avoiding MSoft Word

I hate Word.  I first used it in 1990 (on Windows 3.0).  I still have to use it everyday because a) I work in the public sector and it seems we have decided to adopt document interchange using a proprietary product/format and b) everyone I collaborate with ... uses Word.  The latter is the only reason I still have a computer with Windows installed on it.  I have also learned not to argue with people about the benefits or hazards of Word (either on technical or political grounds).

For producing documents with attractive figures, layouts, tables, bibliographies and mathematics I primarily, I use LaTeX.  Then, I need to send a readable and text-editable version to colleagues.

Goal : take a LaTeX document (that looks great compiled to PDF) and provide an editable copy in Word format for colleagues to edit, comment on and review.

Here's some sample content I want converted gracefully to a Word compatible format, shown in the original PDF, produced from a LaTeX document


Use pandoc

Pandoc offers conversion between many formats, including .tex to .docx.

Trying the default settings didn't do a great job - the tables didn't get formatted well, and the equation environments (e.g. with LaTeX's align environment) appeared as code, and the bibliography was absent with citations and cross-references to figures/tables coded as markdown rather than converted to actual text.

One suggestion was to use pandoc to convert .tex to markdown (.md), clean up the resulting .md file, then convert .md to .docx.  The resulting .md looked to me to be missing the bibliography, and requires a fair bit of work before I can get to a usable .docx formatted document.

I suspect I need to invest more time with pandoc to really make use of it properly, but for a quick file conversion to enable collaboration with Word users, it's not going to help.


Use latex2rtf

This tool converts a .tex to a .rtf file that you can open and edit in Word.  Some initial experiments showed that it does a pretty good job.

Some observations:

  • it manages to insert figures scaled and displayed properly with their captions and inline cross-referencing intact - however, the caption is orphaned in the example above.
  • any text formatting (e.g. \textbf, \emph) is converted appropriately
  • citations appear, looking neat and consistent with the bibliography (although the line-spacing for the bibliography needs tweaking)
  • it struggles with formatted tables (depending on the LaTeX tables package being used) but see below for solutions
  • can't handle margin notes - they get inserted into the main text where they are inserted - but footnotes were handled gracefully
  • equation handling can be variable - there's some missing cross referencing in the example above, but generally it's pretty good.

After some tweaking with the default parameters, I found that to achieve a quick, one-shot conversion:

  • for tables: use the -t2 option to convert to bitmaps (so, they're not editable in the .rtf but are faithfully reproduced from the version in the PDF)
  • for margin notes: convert them to footnotes in the LaTeX source, re-run LaTeX then re-run through latex2rtf
  • for equations: use the -M12 option to convert inline and math environment equations to bitmaps and then insert them into the .rtf (so again, they're not editable, but at least are faithfully reproduced).

Preferred solution

Use latex2rtf, -M12 and -t2 options

    latex2rtf -M12 -t2 sourcefile.tex
  

And then edit in Word for the few inconsistencies produced. 

LaTeX to HTML via Pandoc

Visualising time series of symptom scores (PANSS)

Visualising time series of symptom scores (PANSS)