regex - regular expression recognizer


include "regex.m";
regex:= load Regex "/dis/regex.dis";
compile: fn(e: string)       : Re;
execute: fn(x: Re; s: string): (int,int);


The compile function returns a compiled form of the regular expression given in string e, or nil if e is not a valid regular expression.

The execute function matches the compiled regular expression x against string s. It returns indexes of the first character of the longest leftmost match and of the next character beyond the match, or (-1,-1) if no match exists.

The primitives in regular expressions are:

matches any character other than newline


matches character c, except \n matches newline


matches character c other than one of:

\ . ^ $ ( ) [ ] ? * +


matches what regular expression e matches


matches an empty substring


matches an empty substring at the beginning of a string


matches an empty substring at the end of a string



matches any character in a set (or its complement), given as a sequence of zero or more items - characters and ranges. An item consists at least of a literal character, not \ or ], or of a character escaped with \. If this is followed by a literal -, it is the lower limit of an inclusive range of Unicode characters. The upper limit is a similarly expressed character after the -.

Repetitions are built from primitives, p, in these ways.

one match to p


zero or one matches to p


zero or more matches to p


one or more matches to p

Regular expressions are built from repetitions, r, and other regular expressions, e1, e2, in these ways.

a repetition


concatenation: a match to r followed by a match to e1


alternation: a match to either e1 or e2; concatenation takes precedence over alternation


(beg, end):=
regex.execute(regex.compile("[ABCb-z]+"), s:="aAbBcCdD");
s[beg:end] == "AbBcCd";
(beg, end):= regex.execute(regex.compile("a*b*"), "bbaabb");
(beg, end) == (0,2);
re:= regex.compile("(thick)*(chocolate|raspberry)?(topp|fill)ing");

