package core:text/regex/optimizer

⌘K
Ctrl+K
or
/

    Overview

    package regex_optimizer implements an optimizer which acts upon the AST of a parsed regular expression pattern, transforming it in-place without moving to a compilation step.

    Where possible, it aims to reduce branching as much as possible in the expression by reducing usage of |.

    Here is a summary of the optimizations that it will do:

    Class Simplification : [aab] => [ab] [aa] => [a]

    Class Reduction : [a] => a Range Construction : [abc] => [a-c] Rune Merging into Range : [aa-c] => [a-c]

    Range Merging : [a-cc-e] => [a-e] [a-cd-e] => [a-e] [a-cb-e] => [a-e]

    Alternation to Optional : a| => a? Alternation to Optional Non-Greedy : |a => a?? Alternation Reduction : a|a => a Alternation to Class : a|b => [ab] Class Union : [a0]|[b1] => [a0b1] [a-b]|c => [a-bc] a|[b-c] => [b-ca]

    Wildcard Reduction : a|. => . .|a => . [ab]|. => . .|[ab] => .

    Common Suffix Elimination : blueberry|strawberry => (?:blue|straw)berry Common Prefix Elimination : abi|abe => ab(?:i|e)

    Composition: Consume All to Anchored End

    `.*$` =>     <special opcode>
    `.+$` => `.` <special opcode>
    
    
    

    Possible future improvements:

    Change the AST of alternations to be a list instead of a tree, so that constructions such as (ab|bb|cb) can be considered in whole by the affix elimination optimizations.

    Introduce specialized opcodes for certain classes of repetition.

    Add Common Infix Elimination.

    Measure the precise finite minimum and maximum of a pattern, if available, and check against that on any strings before running the virtual machine.

    Types

    Node ¶

    Node :: regex_parser.Node

    Node_Alternation ¶

    Node_Alternation :: regex_parser.Node_Alternation

    Node_Anchor ¶

    Node_Anchor :: regex_parser.Node_Anchor

    Node_Concatenation ¶

    Node_Concatenation :: regex_parser.Node_Concatenation

    Node_Group ¶

    Node_Group :: regex_parser.Node_Group

    Node_Match_All_And_Escape ¶

    Node_Match_All_And_Escape :: regex_parser.Node_Match_All_And_Escape

    Node_Optional ¶

    Node_Optional :: regex_parser.Node_Optional

    Node_Optional_Non_Greedy ¶

    Node_Optional_Non_Greedy :: regex_parser.Node_Optional_Non_Greedy

    Node_Repeat_N ¶

    Node_Repeat_N :: regex_parser.Node_Repeat_N

    Node_Repeat_One ¶

    Node_Repeat_One :: regex_parser.Node_Repeat_One

    Node_Repeat_One_Non_Greedy ¶

    Node_Repeat_One_Non_Greedy :: regex_parser.Node_Repeat_One_Non_Greedy

    Node_Repeat_Zero ¶

    Node_Repeat_Zero :: regex_parser.Node_Repeat_Zero

    Node_Repeat_Zero_Non_Greedy ¶

    Node_Repeat_Zero_Non_Greedy :: regex_parser.Node_Repeat_Zero_Non_Greedy

    Node_Rune ¶

    Node_Rune :: regex_parser.Node_Rune

    Node_Rune_Class ¶

    Node_Rune_Class :: regex_parser.Node_Rune_Class

    Node_Wildcard ¶

    Node_Wildcard :: regex_parser.Node_Wildcard

    Node_Word_Boundary ¶

    Node_Word_Boundary :: regex_parser.Node_Word_Boundary

    Rune_Class_Range ¶

    Rune_Class_Range :: regex_parser.Rune_Class_Range

    Constants

    This section is empty.

    Variables

    This section is empty.

    Procedures

    class_range_sorter ¶

    class_range_sorter :: proc(i, j: regex_parser.Rune_Class_Range) -> bool {…}

    optimize ¶

    optimize :: proc(tree: regex_parser.Node, flags: bit_set[regex_common.Flag; u8]) -> (result: regex_parser.Node, changes: int) {…}

    optimize_subtree ¶

    optimize_subtree :: proc(tree: regex_parser.Node, flags: bit_set[regex_common.Flag; u8]) -> (result: regex_parser.Node, changes: int) {…}

    Procedure Groups

    This section is empty.

    Source Files

    Generation Information

    Generated with odin version dev-2024-09 (vendor "odin") Windows_amd64 @ 2024-09-15 21:11:17.912081100 +0000 UTC