package core:text/regex/optimizer
Overview
package regex_optimizer implements an optimizer which acts upon the AST of a parsed regular expression pattern, transforming it in-place without moving to a compilation step.
Where possible, it aims to reduce branching as much as possible in the
expression by reducing usage of |
.
Here is a summary of the optimizations that it will do:
Class Simplification : [aab]
=> [ab]
[aa]
=> [a]
Class Reduction : [a]
=> a
Range Construction : [abc]
=> [a-c]
Rune Merging into Range : [aa-c]
=> [a-c]
Range Merging : [a-cc-e]
=> [a-e]
[a-cd-e]
=> [a-e]
[a-cb-e]
=> [a-e]
Alternation to Optional : a|
=> a?
Alternation to Optional Non-Greedy : |a
=> a??
Alternation Reduction : a|a
=> a
Alternation to Class : a|b
=> [ab]
Class Union : [a0]|[b1]
=> [a0b1]
[a-b]|c
=> [a-bc]
a|[b-c]
=> [b-ca]
Wildcard Reduction : a|.
=> .
.|a
=> .
[ab]|.
=> .
.|[ab]
=> .
Common Suffix Elimination : blueberry|strawberry
=> (?:blue|straw)berry
Common Prefix Elimination : abi|abe
=> ab(?:i|e)
Composition: Consume All to Anchored End
`.*$` => <special opcode> `.+$` => `.` <special opcode>
Possible future improvements:
Change the AST of alternations to be a list instead of a tree, so that
constructions such as (ab|bb|cb)
can be considered in whole by the affix
elimination optimizations.
Introduce specialized opcodes for certain classes of repetition.
Add Common Infix Elimination.
Measure the precise finite minimum and maximum of a pattern, if available, and check against that on any strings before running the virtual machine.
Index
Types (18)
- Node
- Node_Alternation
- Node_Anchor
- Node_Concatenation
- Node_Group
- Node_Match_All_And_Escape
- Node_Optional
- Node_Optional_Non_Greedy
- Node_Repeat_N
- Node_Repeat_One
- Node_Repeat_One_Non_Greedy
- Node_Repeat_Zero
- Node_Repeat_Zero_Non_Greedy
- Node_Rune
- Node_Rune_Class
- Node_Wildcard
- Node_Word_Boundary
- Rune_Class_Range
Constants (0)
This section is empty.
Variables (0)
This section is empty.
Procedures (3)
Procedure Groups (0)
This section is empty.
Types
Node_Alternation ¶
Node_Alternation :: regex_parser.Node_Alternation
Node_Anchor ¶
Node_Anchor :: regex_parser.Node_Anchor
Node_Concatenation ¶
Node_Concatenation :: regex_parser.Node_Concatenation
Node_Group ¶
Node_Group :: regex_parser.Node_Group
Node_Match_All_And_Escape ¶
Node_Match_All_And_Escape :: regex_parser.Node_Match_All_And_Escape
Node_Optional ¶
Node_Optional :: regex_parser.Node_Optional
Node_Optional_Non_Greedy ¶
Node_Optional_Non_Greedy :: regex_parser.Node_Optional_Non_Greedy
Node_Repeat_N ¶
Node_Repeat_N :: regex_parser.Node_Repeat_N
Node_Repeat_One ¶
Node_Repeat_One :: regex_parser.Node_Repeat_One
Node_Repeat_One_Non_Greedy ¶
Node_Repeat_One_Non_Greedy :: regex_parser.Node_Repeat_One_Non_Greedy
Node_Repeat_Zero ¶
Node_Repeat_Zero :: regex_parser.Node_Repeat_Zero
Node_Repeat_Zero_Non_Greedy ¶
Node_Repeat_Zero_Non_Greedy :: regex_parser.Node_Repeat_Zero_Non_Greedy
Node_Rune ¶
Node_Rune :: regex_parser.Node_Rune
Node_Rune_Class ¶
Node_Rune_Class :: regex_parser.Node_Rune_Class
Node_Wildcard ¶
Node_Wildcard :: regex_parser.Node_Wildcard
Node_Word_Boundary ¶
Node_Word_Boundary :: regex_parser.Node_Word_Boundary
Rune_Class_Range ¶
Rune_Class_Range :: regex_parser.Rune_Class_Range
Constants
This section is empty.
Variables
This section is empty.
Procedures
class_range_sorter ¶
class_range_sorter :: proc(i, j: regex_parser.Rune_Class_Range) -> bool {…}
Procedure Groups
This section is empty.
Source Files
Generation Information
Generated with odin version dev-2024-12 (vendor "odin") Windows_amd64 @ 2024-12-17 21:11:02.077165900 +0000 UTC