StringPattern

String pattern matching. Syntax follow Lua style patterns and is a simplification compared to traditional regex syntax format.

Character classes
  • %a - letters

  • %c - control characters

  • %d - digits

  • %g - graphical characters

  • %l - lower case letters

  • %p - punctuation characters

  • %s - white space characters

  • %u - upper case letters

  • %w - alphanumeric characters

  • %x - hexadecimal digits

The upper case version of the character class above represent the inverse version. Special characters are escaped by prefixing with %. e.g. %% represent the literal percent character.

Ranges and sets
  • [aBc] - represent letters a, B & c.

  • [%a_] - represent all letters and the underscore character.

  • [0-0] - represent all digits.

A range or set with ^ as the first character is the inverse version.

Repetitions
  • * - Match the previous character (or class) zero or more times, as many times as possible.

  • + - Match the previous character (or class) one or more times, as many times as possible.

  • - - Match the previous character (or class) zero or more times, as few times as possible.

  • ? - Make the previous character (or class) optional.

Anchors
  • ^ - Match the start of the input string

  • $ - Match the end of the input string.

Special
  • %n - for n between 1 and 9; such item matches a substring equal to the n-th captured string.

  • %bxy - where x and y are two distinct characters; such item matches strings that start with x, end with y, and where the x and y are balanced.

  • %f[set] - a frontier pattern; such item matches an empty string at any position such that the next character belongs to set and the previous character does not belong to set.

Patterns enclosed in parantheses marks a capture and is saved according to index count. An empty paranthese will capture just the current index in the string.

Further example and more complete documentation can be found online in the official lua documentation on this subject.

This module is ported from the code from patterns(7) from OpenBSD’s httpd(8), which in turn is based on the pattern-matching code from the Lua language. License is MIT.

Const

ERROR_NO_MATCH*                 = -1;
ERROR_TO_COMPLEX*               = -2;
ERROR_MALFORMED_PATTERN*        = -3;
ERROR_MAX_REPITITIONS*          = -4;
ERROR_INVALID_CAPTURE_IDX*      = -5;
ERROR_INVALID_PATTERN_CAPTURE*  = -6;
ERROR_TO_MANY_CAPTURE*          = -7;

Types

Pattern* = RECORD
            matchdepth : INTEGER; (* control of recursive depth to avoid stack overflow *)
            repetitioncounter : INTEGER; (* control the repetition items *)
            maxcaptures : INTEGER; (* configured capture limit *)
        sinit, slen : LENGTH; (* start & len match in src *)
        send, pend : LENGTH; (* end index of src, pat *)
            error- : INTEGER; (* Should be 0 *)
            level- : INTEGER; (* total number of captures (finished or unfinished) *)
            capture : ARRAY MAXCAPTURES OF StringMatch;
    END;

Procedures

Pattern.Match

Find first occurence of pattern in str. Return to TRUE if pattern is found.

PROCEDURE (VAR this : Pattern) Match* (pat-, str-: ARRAY OF CHAR): BOOLEAN;

Pattern.Find

Find occurence of pattern in str start at index start. Return start of match position or -1 if no match.

PROCEDURE (VAR this : Pattern) Find* (pat-, str-: ARRAY OF CHAR; start : LENGTH): LENGTH;

Pattern.Capture

Get start position and length into src string of capture at index. For match pattern index 0 it return position and length of whole string match. Return TRUE if a valid capture exists and no error flag is set.

PROCEDURE (VAR this : Pattern) Capture* (index : INTEGER; VAR start, len : LENGTH): BOOLEAN;