MAN(1) MAN(1) NAME mktags, looktags, tagfiles, rdtrie, qhash, tagfs - file indexing and searching tools SYNOPSIS [ DB= dbpath ] looktags [ -n ] tag ... mktags [ -d ] dbpath file ... tagfiles [ -d ] triepath file ... rdtrie triepath [ tag ... ] qhash [ -dv ] hashpath [ qid ... ] qhash [ -dv ] -a hashpath [ qid path ... ] qhash [ -dv ] -c hashpath file ... tagfs [ -abcD ] [ -s srv ] [ -m mnt ] triepath DESCRIPTION These tools can be used to index files based on content and perform word searches using the resulting data bases. The first two programs are Rc scripts providing the primary user interface. The other programs provide the actual software for indexing and searching. Mktags creates a database named dbpath that maps from tags (words) to file names. Only given files are indexed (includ- ing subdirectories as well). Any word in the path name for a file, and any word contained in the file (for most files) is a valid search tag for the file. A database is made of two files: a trie and a hash table. The name of the trie has the suffix .trie.db and the name of the hash has the suffix .hash.db. The path to the database (files) without any suf- fix is considered the name of the database. By convention, there is a system wide data base at /lib/sys (that is, /lib/sys.trie.db and /lib/sys.hash.db) and a per- user data base at $home/lib/$user (that is, $home/lib/$user.trie.db and $home/lib/$user.hash.db). Looktags searches the system and the user databases for files that match the query specified by its arguments. By default, only file names are printed. Flag -n instructs looktags to run grep(1) to print some of the matching lines. A query is made of lists of tags separated by the ":" char- acter, each as a distinct argument. A file matches the query MAN(1) MAN(1) if it is associated (contains) to all the tags on one of the lists. For example, looktags a b c : d e would search for files either matching all of a, b, and c or matching all of d and e. Looktags can be instructed to use a different database by defining the DB environment variable to contain a list of names for the databases to be used (without any file name suffixes). To speed up searches, the trie part of the database can be kept in memory using tagfs. When using a database named /a/b/dbname the program looktags searches first for a file named /srv/dbname.tagfs (to reach a server holding an in- memory version of the trie part of the database), and uses it instead if available. Otherwise, looktags looks for the host identified by $search in the ndb(6) database. Should it be found, looktags imports its /srv to look for /srv/dbname.tagfs on it. This is used to share an in-memory database among several machines sharing a network. Only as a last resort would looktags read the database by itself to execute the query. Tagfiles tags every file mentioned (recurring for directo- ries) as an argument using the Trie stored in the file trie. Here, trie must include the .trie.db suffix if any. Mktags relies on this program. For each file indexed, tagfiles uses every word in its path name as a tag to search for the file. Also, tagfiles looks at the file name suffix and uses file(1) to determine the type of file and pick a particular indexing method. For text files, tagfiles reads entire file contents and associ- ates each word contained in the file as a tag to search for the file. For other types of file, tagfiles tries to execute external programs to extract the list of tags for each file. Should the appropriate external program not exist, tagfiles would still try to index the file as text when appropriate. The following programs may be executed by tagfiles to obtain tags for files. They are expected to write tags for the file given as an argument, one per line: tagc to tag C source. taglimbo to tag Limbo source. taghtml to tag HTML files. tagman to tag manual pages tagrc to tag Rc scripts tagtroff to tag roff source. tagdoc to tag Microsoft Office documents, including rich text format. MAN(1) MAN(1) tagpdf to tag Adobe PDF files. tageps to tag Adobe EPS files. tagps to tag PosctScript files. Rdtrie can be used to inspect and query the Trie in the database. The Trie data structure keeps all the known tags in a trie, maintaining a list of Qids for each tag. Without any tag argument in the command line, rdtrie reads and prints the entire Trie file, trie. Otherwise, rdtrie reads trie and then interprets any following arguments as a query. The Qid matching the query are printed in the stan- dard output. See above for the syntax of queries. Looktags relies on this program to execute its query. Qhash maintains a file name hash table in the database. This data structure is used to translate Qids into file names. In what follows, Qid means actually the Qid.path field of the file's Qid, in base 16. Also, the argument hash is manda- tory and has to be the path for the hash file in the data- base, including the .hash.db suffix. The first invocation syntax (without using flags -a or -c) can be used to retrieve the path names for the given qids in the command line. This is used by looktags to retrieve the paths for matching files. Under flag -a the program qhash adds the following argument pairs (each with a qid and path) to the hash file. Under flag -c qhash retrieves Qids and (absolute) path names for file (s) mentioned as arguments (recurring for directo- ries), and adds them to the database. This is used by mktags to create/update the hash file in the data base. In any of the programs above, flags -d and -v (when avail- able) enable certain debug messages to track problems while using the programs. The program tagfs can be used to update a Trie and is an alternative to rdtrie to perform searches by keeping the entire Trie in memory. It is a file system that serves by default the pipe at /srv/triename.tagfs (where triename is the base name of the triepath witout suffixes), mountingit- self at /mnt/tags. Flags -s and -m can be used to instruct tagfs to serve srv instead or to mount itself at mnt instead. The single directory served contains a ctl file that can be read to gather statistics about the Trie and can be written to modify the trie. A write of the string sync writes the in-memory database back to its file. A write of the form tag qidpath tag ... adds any tag to qidpath in the trie (but does not update the on-disk database). A query can be made by creating a file, writing the query into it (being careful to separate different tags and : characters with white space), and then reading from the same file the list of qids that match the query. The query file is removed as soon as it is closed after having read from it. MAN(1) MAN(1) EXAMPLES Create the per-user and the system database: ; mktags $home/lib/$user $home /mail/box/$user/msgs ; mktags /lib/sys /cfg /rc /sys Look for files mentioning either list append or queue append, then repeat que query but using an alternate data- base kept at /lib/other.trie.db and /lib/other.hash.db: ; looktags list append : queue append ; DB=/lib/other looktags list append : queue append Add (or update!) tags for files under /usr/prof to the per- sonal database: ; tagfiles $home/lib/$user.trie.db /usr/prof ; qhash -c $home/lib/$user.hash.db /usr/prof Place the system database in memory so that looktags can be faster, and add the tag yoyoba to file with qid 8345f ; tagfs /lib/sys.trie.db ; echo tag 8345f yoyoba >/mnt/tags/ctl ; echo sync >/mnt/tags/ctl Make the system database at whale.lsub.org available to other hosts: First, edit /lib/ndb/local to contain search=whale.lsub.org for the network entry. Second, at whale: whale% tagfs /lib/sys.trie.db whale% chmod a+rw /srv/sys.tagfs Now from other hosts, looktags may use Whale's in-memory database. FILES /sys/src/cmd/tags/updatetags Example script to update the user database each minute. /lib/sys.{trie,hash}.db Per system data base. $home/lib/$user.{trie,hash}.db Per user database files. SOURCE /sys/src/cmd/tags SEE ALSO grep(1), BUGS There is no clear way to remove tags from a file. The data- base is expected to be updated daily (at night) to reflect changes during the day, and tagfs has to be restarted to see the effects. MAN(1) MAN(1)