RE(3) RE(3) NAME re_bm, re_cw, re_re - string and pattern matching SYNOPSIS #include <re.h> re_cw *pat; re_bm *re_bmcomp(b, e, map) int re_cwexec(pat, rdfn, matchfn) char *b, *e; re_cw *pat; unsigned char map[256]; int (*rdfn)(), (*matchfn)(); int re_bmexec(pat, rdfn, matchfn)void re_cwfree(pat); re_bm *pat; re_cw *pat; int (*rdfn)(), (*matchfn)(); re_re *re_recomp(b, e, map) void re_bmfree(pat); char *b, *e; re_bm *pat; unsigned char map[256]; re_cw *re_cwinit(map) re_reexec(pat, b, e, match) unsigned char map[256]; re_re *pat; char *b, *e, *match[10][2]; void re_cwadd(pat, b, e) re_cw *pat; void re_refree(pat); char *b, *e; re_re *pat; void re_cwcomp(pat) void re_error(str); char *str; DESCRIPTION These routines search for patterns in strings. The re_re routines search for general regular expressions (defined below) using a lazily evaluated deterministic finite automa- ton. The more specialized and faster re_cw routines search for multiple literal strings using the Commentz-Walter algo- rithm. The still more specialized and efficient re_bm rou- tines search for a single string using the Boyer-Moore algo- rithm. The routines handle strings designated by pointers to the first character of the string and to the character following the string. To use the re_bm routines, first build a recognizer by call- ing re_bmcomp, which takes the search string and a character map; all characters are compared after mapping. Typically, map is initialized by a loop similar to for(i = 0; i < 256; i++) map[i] = i; and its value is no longer required after the call to re_bmcomp. The recognizer can be run (multiple times) by calling re_bmexec, which stops and returns the first non-positive return from either rdfn RE(3) RE(3) or matchfn. The recognizer calls the supplied function rdfn to obtain input and matchfn to report text matching the search string. Rdfn should be declared as int rdfn(pb, pe) char **pb, **pe; where *pb and *pe delimit an as yet unprocessed text frag- ment (none if `*pb==*pe') to be saved across the call to rdfn. On return, *pb and *pe point to the new text, includ- ing the saved fragment. Rdfn returns 0 for EOF, negative for error, and positive otherwise. The first call to rdfn from each invocation of re_bmexec has *pb==0. Matchfn should be declared as int matchfn(pb, pe) char **pb, **pe; where *pb and *pe delimit the matched text. Matchfn sets *pb, *pe, and returns a value in the same way as rdfn. To use the re_cw routines, first build the recognizer by calling re_cwinit, then re_cwadd for each string, and finally re_cwcomp. The recognizer is run by re_cwexec analo- gously to re_bmexec. A full regular expression recognizer is compiled by re_recomp and executed by re_reexec, which returns 1 if there was a match and 0 if there wasn't. The strings that match subexpressions are returned in array match using the above convention. `match[0]' refers to the whole matched expression. If match is zero, then no match delimiters are set. The routine re_error prints its argument on standard error and exits. You may supply your own version for specialized error handling. If re_error returns rather than exits, the compiling routines (e.g. re_bmcomp) will return 0. The recognizers that these routines construct occupy storage obtained from malloc(3). The storage can be deallocated by re_refree. Regular Expressions RE(3) RE(3) The syntax for a regular expression e0 is e3: literal | charclass | '.' | '^' | '$' | '\'n | '(' e0 ')' e2: e3 | e2 REP REP: '*' | '+' | '?' | '\{' RANGE '\}' RANGE: int | int ',' | int ',' int e1: e2 | e1 e2 e0: e1 | e0 ALT e1 ALT: '|' | newline A literal is any non-metacharacter or a metacharacter (one of .*+?[]()|\^$) preceded by `\'. A charclass is a nonempty string s bracketed [s] (or [^s]); it matches any character in (or not in) s. In s, the metacharacters other than `]' have no special meaning, and `]' may only appear as the first letter. A substring a-b, with a and b in ascending ASCII order, stands for the inclu- sive range of ASCII characters between a and b. A `\' followed by a digit n matches a copy of the string that the parenthesized subexpression beginning with the nth `(', counting from 1, matched. A `.' matches any character. A `^' matches the beginning of the input string; `$' matches the end. The REP operators match zero or more (*), one or more (+), zero or one (?), exactly m (\{m\}), m or more (\{m,\}), and any number between m and n inclusive (\{m,n\}), instances respectively of the preceding regular expression e2. A concatenated regular expression, e1 e2, matches a match to e1 followed by a match to e2. An alternative regular expression, e0 ALT e1, matches either a match to e0 or a match to e1. A match to any part of a regular expression extends as far as possible without preventing a match to the remainder of the regular expression. SEE ALSO regexp(3), gre(1) RE(3) RE(3) DIAGNOSTICS Routines that return pointers return 0 on error. BUGS Between re(3) and regexp(3) there are too many routines.