man(1) Manual page archive


     RE(3)                                                       RE(3)

     NAME
          re_bm, re_cw, re_re - string and pattern matching

     SYNOPSIS
          #include <re.h>           re_cw *pat;

          re_bm *re_bmcomp(b, e, map)     int re_cwexec(pat, rdfn, matchfn)
          char *b, *e;                    re_cw *pat;
          unsigned char map[256];         int (*rdfn)(), (*matchfn)();

          int re_bmexec(pat, rdfn, matchfn)void re_cwfree(pat);
          re_bm *pat;                     re_cw *pat;
          int (*rdfn)(), (*matchfn)();
                                          re_re *re_recomp(b, e, map)
          void re_bmfree(pat);            char *b, *e;
          re_bm *pat;                     unsigned char map[256];

          re_cw *re_cwinit(map)           re_reexec(pat, b, e, match)
          unsigned char map[256];         re_re *pat;
                                          char *b, *e, *match[10][2];
          void re_cwadd(pat, b, e)
          re_cw *pat;                     void re_refree(pat);
          char *b, *e;                    re_re *pat;

          void re_cwcomp(pat)             void re_error(str);
                                          char *str;

    DESCRIPTION
         These routines search for patterns in strings.  The re_re
         routines search for general regular expressions (defined
         below) using a lazily evaluated deterministic finite automa-
         ton.  The more specialized and faster re_cw routines search
         for multiple literal strings using the Commentz-Walter algo-
         rithm.  The still more specialized and efficient re_bm rou-
         tines search for a single string using the Boyer-Moore algo-
         rithm.  The routines handle strings designated by pointers
         to the first character of the string and to the character
         following the string.

         To use the re_bm routines, first build a recognizer by call-
         ing re_bmcomp, which takes the search string and a character
         map; all characters are compared after mapping.  Typically,
         map is initialized by a loop similar to
         for(i = 0; i < 256; i++) map[i] = i;
         and its value is no longer required after the call to
         re_bmcomp.
         The recognizer can be run (multiple times) by calling
         re_bmexec,
         which stops and returns the first non-positive return from either
         rdfn

    RE(3)                                                       RE(3)

         or
         matchfn.
         The recognizer calls the supplied function
         rdfn
         to obtain input and
         matchfn
         to report text matching the search string.

         Rdfn
         should be declared as

              int rdfn(pb, pe)
              char **pb, **pe;

         where *pb and *pe delimit an as yet unprocessed text frag-
         ment (none if `*pb==*pe') to be saved across the call to
         rdfn. On return, *pb and *pe point to the new text, includ-
         ing the saved fragment.  Rdfn returns 0 for EOF, negative
         for error, and positive otherwise.  The first call to rdfn
         from each invocation of re_bmexec has *pb==0.

         Matchfn should be declared as

              int matchfn(pb, pe)
              char **pb, **pe;

         where *pb and *pe delimit the matched text.  Matchfn sets
         *pb, *pe, and returns a value in the same way as rdfn.

         To use the re_cw routines, first build the recognizer by
         calling re_cwinit, then re_cwadd for each string, and
         finally re_cwcomp. The recognizer is run by re_cwexec analo-
         gously to re_bmexec.

         A full regular expression recognizer is compiled by
         re_recomp and executed by re_reexec, which returns 1 if
         there was a match and 0 if there wasn't.  The strings that
         match subexpressions are returned in array match using the
         above convention.  `match[0]' refers to the whole matched
         expression.  If match is zero, then no match delimiters are
         set.

         The routine re_error prints its argument on standard error
         and exits.  You may supply your own version for specialized
         error handling.  If re_error returns rather than exits, the
         compiling routines (e.g.  re_bmcomp) will return 0.

         The recognizers that these routines construct occupy storage
         obtained from malloc(3). The storage can be deallocated by
         re_refree.

       Regular Expressions

    RE(3)                                                       RE(3)

         The syntax for a regular expression e0 is
         e3:  literal | charclass | '.' | '^' | '$' | '\'n | '(' e0 ')'

         e2:  e3
           |  e2 REP
         REP: '*' | '+' | '?' | '\{' RANGE '\}'
         RANGE: int | int ',' | int ',' int

         e1:  e2
           |  e1 e2

         e0:  e1
           |  e0 ALT e1
         ALT: '|' | newline

         A literal is any non-metacharacter or a metacharacter (one
         of .*+?[]()|\^$) preceded by `\'.

         A charclass is a nonempty string s bracketed [s] (or [^s]);
         it matches any character in (or not in) s. In s, the
         metacharacters other than `]' have no special meaning, and
         `]' may only appear as the first letter.  A substring a-b,
         with a and b in ascending ASCII order, stands for the inclu-
         sive range of ASCII characters between a and b.

         A `\' followed by a digit n matches a copy of the string
         that the parenthesized subexpression beginning with the nth
         `(', counting from 1, matched.

         A `.'  matches any character.

         A `^' matches the beginning of the input string; `$' matches
         the end.

         The REP operators match zero or more (*), one or more (+),
         zero or one (?), exactly m (\{m\}), m or more (\{m,\}), and
         any number between m and n inclusive (\{m,n\}), instances
         respectively of the preceding regular expression e2.

         A concatenated regular expression, e1 e2, matches a match to
         e1 followed by a match to e2.

         An alternative regular expression, e0 ALT e1, matches either
         a match to e0 or a match to e1.

         A match to any part of a regular expression extends as far
         as possible without preventing a match to the remainder of
         the regular expression.

    SEE ALSO
         regexp(3), gre(1)

    RE(3)                                                       RE(3)

    DIAGNOSTICS
         Routines that return pointers return 0 on error.

    BUGS
         Between re(3) and regexp(3) there are too many routines.