man(1) Manual page archive


     MAN(1)                                                     MAN(1)

     NAME
          mktags, looktags, tagfiles, rdtrie, qhash, tagfs - file
          indexing and searching tools

     SYNOPSIS
          [ DB= dbpath ] looktags [ -n ] tag ...

          mktags [ -d ] dbpath file ...

          tagfiles [ -d ] triepath file ...

          rdtrie triepath [ tag ... ]

          qhash [ -dv ] hashpath [ qid ... ]

          qhash [ -dv ] -a hashpath [ qid path ... ]

          qhash [ -dv ] -c hashpath file ...

          tagfs [ -abcD ] [ -s srv ] [ -m mnt ] triepath

     DESCRIPTION
          These tools can be used to index files based on content and
          perform word searches using the resulting data bases. The
          first two programs are Rc scripts providing the primary user
          interface. The other programs provide the actual software
          for indexing and searching.

          Mktags creates a database named dbpath that maps from tags
          (words) to file names. Only given files are indexed (includ-
          ing subdirectories as well). Any word in the path name for a
          file, and any word contained in the file (for most files) is
          a valid search tag for the file.  A database is made of two
          files: a trie and a hash table. The name of the trie has the
          suffix .trie.db and the name of the hash has the suffix
          .hash.db.  The path to the database (files) without any suf-
          fix is considered the name of the database.

          By convention, there is a system wide data base at /lib/sys
          (that is, /lib/sys.trie.db and /lib/sys.hash.db) and a per-
          user data base at $home/lib/$user (that is,
          $home/lib/$user.trie.db and $home/lib/$user.hash.db).

          Looktags searches the system and the user databases for
          files that match the query specified by its arguments. By
          default, only file names are printed. Flag -n instructs
          looktags to run grep(1) to print some of the matching lines.

          A query is made of lists of tags separated by the ":" char-
          acter, each as a distinct argument. A file matches the query

     MAN(1)                                                     MAN(1)

          if it is associated (contains) to all the tags on one of the
          lists. For example,
               looktags a b c : d e
          would search for files either matching all of a, b, and c or
          matching all of d and e.

          Looktags can be instructed to use a different database by
          defining the DB environment variable to contain a list of
          names for the databases to be used (without any file name
          suffixes).

          To speed up searches, the trie part of the database can be
          kept in memory using tagfs.  When using a database named
          /a/b/dbname the program looktags searches first for a file
          named /srv/dbname.tagfs (to reach a server holding an in-
          memory version of the trie part of the database), and uses
          it instead if available. Otherwise, looktags looks for the
          host identified by $search in the ndb(6) database. Should it
          be found, looktags imports its /srv to look for
          /srv/dbname.tagfs on it. This is used to share an in-memory
          database among several machines sharing a network. Only as a
          last resort would looktags read the database by itself to
          execute the query.

          Tagfiles tags every file mentioned (recurring for directo-
          ries) as an argument using the Trie stored in the file trie.
          Here, trie must include the .trie.db suffix if any.  Mktags
          relies on this program.

          For each file indexed, tagfiles uses every word in its path
          name as a tag to search for the file.  Also, tagfiles looks
          at the file name suffix and uses file(1) to determine the
          type of file and pick a particular indexing method.  For
          text files, tagfiles reads entire file contents and associ-
          ates each word contained in the file as a tag to search for
          the file. For other types of file, tagfiles tries to execute
          external programs to extract the list of tags for each file.
          Should the appropriate external program not exist, tagfiles
          would still try to index the file as text when appropriate.

          The following programs may be executed by tagfiles to obtain
          tags for files. They are expected to write tags for the file
          given as an argument, one per line:

          tagc      to tag C source.
          taglimbo  to tag Limbo source.
          taghtml   to tag HTML files.
          tagman    to tag manual pages
          tagrc     to tag Rc scripts
          tagtroff  to tag roff source.
          tagdoc    to tag Microsoft Office documents, including rich
                    text format.

     MAN(1)                                                     MAN(1)

          tagpdf    to tag Adobe PDF files.
          tageps    to tag Adobe EPS files.
          tagps     to tag PosctScript files.
          Rdtrie can be used to inspect and query the Trie in the
          database. The Trie data structure keeps all the known tags
          in a trie, maintaining a list of Qids for each tag.
          Without any tag argument in the command line, rdtrie reads
          and prints the entire Trie file, trie. Otherwise, rdtrie
          reads trie and then interprets any following arguments as a
          query. The Qid matching the query are printed in the stan-
          dard output. See above for the syntax of queries.  Looktags
          relies on this program to execute its query.
          Qhash maintains a file name hash table in the database. This
          data structure is used to translate Qids into file names. In
          what follows, Qid means actually the Qid.path field of the
          file's Qid, in base 16.  Also, the argument hash is manda-
          tory and has to be the path for the hash file in the data-
          base, including the .hash.db suffix.
          The first invocation syntax (without using flags -a or -c)
          can be used to retrieve the path names for the given qids in
          the command line. This is used by looktags to retrieve the
          paths for matching files.
          Under flag -a the program qhash adds the following argument
          pairs (each with a qid and path) to the hash file.
          Under flag -c qhash retrieves Qids and (absolute) path names
          for file (s) mentioned as arguments (recurring for directo-
          ries), and adds them to the database. This is used by mktags
          to create/update the hash file in the data base.
          In any of the programs above, flags -d and -v (when avail-
          able) enable certain debug messages to track problems while
          using the programs.
          The program tagfs can be used to update a Trie and is an
          alternative to rdtrie to perform searches by keeping the
          entire Trie in memory. It is a file system that serves by
          default the pipe at /srv/triename.tagfs (where triename is
          the base name of the triepath witout suffixes), mountingit-
          self at /mnt/tags.  Flags -s and -m can be used to instruct
          tagfs to serve srv instead or to mount itself at mnt
          instead.
          The single directory served contains a ctl file that can be
          read to gather statistics about the Trie and can be written
          to modify the trie. A write of the string sync writes the
          in-memory database back to its file. A write of the form tag
          qidpath tag ...  adds any tag to qidpath in the trie (but
          does not update the on-disk database).
          A query can be made by creating a file, writing the query
          into it (being careful to separate different tags and :
          characters with white space), and then reading from the same
          file the list of qids that match the query. The query file
          is removed as soon as it is closed after having read from
          it.

     MAN(1)                                                     MAN(1)

     EXAMPLES
          Create the per-user and the system database:
               ; mktags $home/lib/$user  $home /mail/box/$user/msgs
               ; mktags /lib/sys /cfg  /rc /sys

          Look for files mentioning either list append or queue
          append, then repeat que query but using an alternate data-
          base kept at /lib/other.trie.db and /lib/other.hash.db:
               ; looktags list append : queue append
               ; DB=/lib/other looktags list append : queue append

          Add (or update!) tags for files under /usr/prof to the per-
          sonal database:
               ; tagfiles $home/lib/$user.trie.db /usr/prof
               ; qhash -c $home/lib/$user.hash.db /usr/prof

          Place the system database in memory so that looktags can be
          faster, and add the tag yoyoba to file with qid 8345f
               ; tagfs /lib/sys.trie.db
               ; echo tag 8345f yoyoba >/mnt/tags/ctl
               ; echo sync >/mnt/tags/ctl

          Make the system database at whale.lsub.org available to
          other hosts: First, edit /lib/ndb/local to contain
          search=whale.lsub.org for the network entry. Second, at
          whale:
               whale% tagfs /lib/sys.trie.db
               whale% chmod a+rw /srv/sys.tagfs
          Now from other hosts, looktags may use Whale's in-memory
          database.

     FILES
          /sys/src/cmd/tags/updatetags
               Example script to update the user database each minute.

          /lib/sys.{trie,hash}.db
               Per system data base.

          $home/lib/$user.{trie,hash}.db
               Per user database files.

     SOURCE
          /sys/src/cmd/tags

     SEE ALSO
          grep(1),

     BUGS
          There is no clear way to remove tags from a file.  The data-
          base is expected to be updated daily (at night) to reflect
          changes during the day, and tagfs has to be restarted to see
          the effects.

     MAN(1)                                                     MAN(1)