An optimizer for the regular expression AST.

Acts upon the AST of a parsed regular expression pattern, transforming it in-place without moving to a compilation step.

Where possible, it aims to reduce branching as much as possible in the expression by reducing usage of |.

Here is a summary of the optimizations that it will do:

Class Simplification : [aab] => [ab] [aa] => [a]

Class Reduction : [a] => a Range Construction : [abc] => [a-c] Rune Merging into Range : [aa-c] => [a-c]

Range Merging : [a-cc-e] => [a-e] [a-cd-e] => [a-e] [a-cb-e] => [a-e]

Wildcard Reduction : a|. => . .|a => . [ab]|. => . .|[ab] => .

Common Suffix Elimination : blueberry|strawberry => (?:blue|straw)berry Common Prefix Elimination : abi|abe => ab(?:i|e)

Composition: Consume All to Anchored End

`.*$` =>     <special opcode>
`.+$` => `.` <special opcode>

Possible future improvements:

Change the AST of alternations to be a list instead of a tree, so that constructions such as (ab|bb|cb) can be considered in whole by the affix elimination optimizations.

Introduce specialized opcodes for certain classes of repetition.

Add Common Infix Elimination.

Measure the precise finite minimum and maximum of a pattern, if available, and check against that on any strings before running the virtual machine.

package core:text/regex/optimizer
Source

Overview

Index

Types

Node ¶
Source

Related Procedures With Parameters

Node_Alternation ¶
Source

Node_Anchor ¶
Source

Node_Concatenation ¶
Source

Node_Group ¶
Source

Node_Match_All_And_Escape ¶
Source

Node_Optional ¶
Source

Node_Optional_Non_Greedy ¶
Source

Node_Repeat_N ¶
Source

Node_Repeat_One ¶
Source

Node_Repeat_One_Non_Greedy ¶
Source

Node_Repeat_Zero ¶
Source

Node_Repeat_Zero_Non_Greedy ¶
Source

Node_Rune ¶
Source

Node_Rune_Class ¶
Source

Node_Wildcard ¶
Source

Node_Word_Boundary ¶
Source

Rune_Class_Range ¶
Source

Related Procedures With Parameters

Constants

Variables

Procedures

class_range_sorter ¶
Source

optimize ¶
Source

optimize_subtree ¶
Source

Procedure Groups

Source Files

Generation Information

package core:text/regex/optimizerSource

Overview

Index

Types

Related Procedures With Parameters

Related Procedures With Parameters

Constants

Variables

Procedures

Procedure Groups

Source Files

Generation Information

package core:text/regex/optimizer
Source