man(1) Manual page archive


     DOC2TXT(1)                                             DOC2TXT(1)

     NAME
          doc2txt, olefs, mswordstrings - extract printable strings
          from Microsoft Word documents

     SYNOPSIS
          doc2txt [ file.doc ]
          aux/olefs [ -m mtpt ] file.doc
          aux/mswordstrings /mnt/doc/WordDocument

     DESCRIPTION
          Doc2txt is a shell script that uses olefs and mswordstrings
          to extract the printable text from the body of a Microsoft
          Word document.

          Microsoft Office documents are stored in OLE (Object Linking
          and Embedding) format, which is a scaled down version of
          Microsoft's FAT file system.  Olefs presents the contents of
          an Office document as a file system on mtpt, which defaults
          to /mnt/doc.  Mswordstrings parses the WordDocument file
          inside an Office document, extracting the text stream.

     SOURCE
          /sys/src/cmd/aux/mswordstrings.c
          /sys/src/cmd/aux/olefs.c
          /rc/bin/doc2txt

     SEE ALSO
          strings(1)
          ``Microsoft Word 97 Binary File Format'', available on line
          at Microsoft's developer home page.
          ``LAOLA Binary Structures'', snake.cs.tu-
          berlin.de:8081/~schwartz/pmh.