package core:text/regex
Overview
A complete suite for using Regular Expressions to match and capture text.
Regular expressions are used to describe how a piece of text can match to another, using a pattern language.
Odin's regex library implements the following features:
Alternation: `apple|cherry`
Classes: `[0-9_]`
Classes, negated: `[^0-9_]`
Shorthands: `\d\s\w`
Shorthands, negated: `\D\S\W`
Wildcards: `.`
Repeat, optional: `a*`
Repeat, at least once: `a+`
Repetition: `a{1,2}`
Optional: `a?`
Group, capture: `([0-9])`
Group, non-capture: `(?:[0-9])`
Start & End Anchors: `^hello$`
Word Boundaries: `\bhello\b`
Non-Word Boundaries: `hello\B`
These specifiers can be composed together, such as an optional group:
(?:hello)?
This package also supports the non-greedy variants of the repeating and
optional specifiers by appending a ? to them.
Of the shorthand classes that are supported, they are all ASCII-based, even when compiling in Unicode mode. This is for the sake of general performance and simplicity, as there are thousands of Unicode codepoints which would qualify as either a digit, space, or word character which could be irrelevant depending on what is being matched.
Here are the shorthand class equivalencies:
\d: [0-9] \s: [\t\n\f\r ] \w: [0-9A-Z_a-z]
If you need your own shorthands, you can compose strings together like so:
MY_HEX :: "[0-9A-Fa-f]" PATTERN :: MY_HEX + "-" + MY_HEX
The compiler will handle turning multiple identical classes into references to the same set of matching runes, so there's no penalty for doing it like this.
``Some people, when confronted with a problem, think
"I know, I'll use regular expressions." Now they have two problems.''
- Jamie Zawinski
Regular expressions have gathered a reputation over the decades for often being chosen as the wrong tool for the job. Here, we will clarify a few cases in which RegEx might be good or bad.
When is it a good time to use RegEx?
You don't know at compile-time what patterns of text the program will need to match when it's running. As an example, you are making a client which can be configured by the user to trigger on certain text patterns received from a server. For another example, you need a way for users of a text editor to compose matching strings that are more intricate than a simple substring lookup. The text you're matching against is small (< 64 KiB) and your patterns aren't overly complicated with branches (alternations, repeats, and optionals). If none of the above general impressions apply but your project doesn't warrant long-term maintenance.
When is it a bad time to use RegEx?
You know at compile-time the grammar you're parsing; a hand-made parser has the potential to be more maintainable and readable. The grammar you're parsing has certain validation steps that lend itself to forming complicated expressions, such as e-mail addresses, URIs, dates, postal codes, credit cards, et cetera. Using RegEx to validate these structures is almost always a bad sign. The text you're matching against is big (> 1 MiB); you would be better served by first dividing the text into manageable chunks and using some heuristic to locate the most likely location of a match before applying RegEx against it. You value high performance and low memory usage; RegEx will always have a certain overhead which increases with the complexity of the pattern.
The implementation of this package has been optimized, but it will never be as thoroughly performant as a hand-made parser. In comparison, there are just too many intermediate steps, assumptions, and generalizations in what it takes to handle a regular expression.
Index
Constants (0)
This section is empty.
Variables (0)
This section is empty.
Types
Capture ¶
This struct corresponds to a set of string captures from a RegEx match.
pos will contain the start and end positions for each string in groups,
such that str[pos[0][0]:pos[0][1]] == groups[0].
Related Procedures With Parameters
- destroy_capture
- match_with_preallocated_capture
- destroy (procedure groups)
- match (procedure groups)
Related Procedures With Returns
Compiler_Error ¶
Compiler_Error :: regex_compiler.Error
Creation_Error ¶
Creation_Error :: enum int { None, // A `\` was supplied as the delimiter to `create_by_user`. Bad_Delimiter, // A pair of delimiters for `create_by_user` was not found. Expected_Delimiter, // An unknown letter was supplied to `create_by_user` after the last delimiter. Unknown_Flag, }
Error ¶
Error :: union { regex_parser.Error, regex_compiler.Error, Creation_Error, }
Related Procedures With Returns
Flags ¶
Related Procedures With Parameters
Match_Iterator ¶
Match_Iterator :: struct { regex: Regular_Expression, capture: Capture, vm: regex_vm.Machine, idx: int, temp: runtime.Allocator, threads: int, done: bool, }
An iterator to repeatedly match a pattern against a string, to be used with *_iterator procedures.
Related Procedures With Parameters
- destroy_iterator
- match_iterator
- reset
- destroy (procedure groups)
- match (procedure groups)
Related Procedures With Returns
Parser_Error ¶
Parser_Error :: regex_parser.Error
Regular_Expression ¶
Regular_Expression :: struct { flags: bit_set[regex_common.Flag; u8] `fmt:"-"`, class_data: []regex_vm.Rune_Class_Data `fmt:"-"`, program: []regex_vm.Opcode `fmt:"-"`, }
A compiled Regular Expression value, to be used with the match_* procedures.
Related Procedures With Parameters
- destroy_regex
- match_and_allocate_capture
- match_with_preallocated_capture
- destroy (procedure groups)
- match (procedure groups)
Related Procedures With Returns
Constants
This section is empty.
Variables
This section is empty.
Procedures
create ¶
create :: proc(pattern: string, flags: bit_set[regex_common.Flag; u8] = {}, permanent_allocator := context.allocator, temporary_allocator := context.temp_allocator) -> (result: Regular_Expression, err: Error) {…}
Create a regular expression from a string pattern and a set of flags.
Allocates Using Provided Allocators
Inputs:
pattern: The pattern to compile.
flags: A bit_set of RegEx flags.
permanent_allocator: The allocator to use for the final regular expression. (default: context.allocator)
temporary_allocator: The allocator to use for the intermediate compilation stages. (default: context.temp_allocator)
Returns:
result: The regular expression.
err: An error, if one occurred.
create_by_user ¶
create_by_user :: proc(pattern: string, permanent_allocator := context.allocator, temporary_allocator := context.temp_allocator) -> (result: Regular_Expression, err: Error) {…}
Create a regular expression from a delimited string pattern, such as one provided by users of a program or those found in a configuration file.
They are in the form of:
[DELIMITER] [regular expression] [DELIMITER] [flags]
For example, the following strings are valid:
/hellope/i #hellope#i •hellope•i つhellopeつi
The delimiter is determined by the very first rune in the string.
The only restriction is that the delimiter cannot be \, as that rune is used
to escape the delimiter if found in the middle of the string.
All runes after the closing delimiter will be parsed as flags:
'm': Multiline 'i': Case_Insensitive 'x': Ignore_Whitespace 'u': Unicode 'n': No_Capture '-': No_Optimization
Allocates Using Provided Allocators
Inputs:
pattern: The delimited pattern with optional flags to compile.
str: The string to match against.
permanent_allocator: The allocator to use for the final regular expression. (default: context.allocator)
temporary_allocator: The allocator to use for the intermediate compilation stages. (default: context.temp_allocator)
Returns:
result: The regular expression.
err: An error, if one occurred.
create_iterator ¶
create_iterator :: proc(str: string, pattern: string, flags: bit_set[regex_common.Flag; u8] = {}, permanent_allocator := context.allocator, temporary_allocator := context.temp_allocator) -> (result: Match_Iterator, err: Error) {…}
Create a Match_Iterator using a string to search, a regular expression to match against it, and a set of flags.
Allocates Using Provided Allocators
Inputs:
str: The string to iterate over.
pattern: The pattern to match.
flags: A bit_set of RegEx flags.
permanent_allocator: The allocator to use for the compiled regular expression. (default: context.allocator)
temporary_allocator: The allocator to use for the intermediate compilation and iteration stages. (default: context.temp_allocator)
Returns:
result: The Match_Iterator.
err: An error, if one occurred.
destroy_capture ¶
destroy_capture :: proc(capture: Capture, allocator := context.allocator) {…}
Free all data allocated by the match_and_allocate_capture procedure.
Frees Using Provided Allocator
Inputs:
capture: A Capture.
allocator: (default: context.allocator)
Related Procedure Groups
destroy_iterator ¶
destroy_iterator :: proc(it: Match_Iterator, allocator := context.allocator) {…}
Free all data allocated by the create_iterator procedure.
Frees Using Provided Allocator
Inputs:
it: A Match_Iterator
allocator: (default: context.allocator)
Related Procedure Groups
destroy_regex ¶
destroy_regex :: proc(regex: Regular_Expression, allocator := context.allocator) {…}
Free all data allocated by the create* procedures.
Frees Using Provided Allocator
Inputs:
regex: A regular expression.
allocator: (default: context.allocator)
Related Procedure Groups
match_and_allocate_capture ¶
match_and_allocate_capture :: proc(regex: Regular_Expression, str: string, permanent_allocator := context.allocator, temporary_allocator := context.temp_allocator) -> (capture: Capture, success: bool) {…}
Match a regular expression against a string and allocate the results into the
returned capture structure.
The resulting capture strings will be slices to the string str, not wholly
copied strings, so they won't need to be individually deleted.
Allocates Using Provided Allocators
Inputs:
regex: The regular expression.
str: The string to match against.
permanent_allocator: The allocator to use for the capture results. (default: context.allocator)
temporary_allocator: The allocator to use for the virtual machine. (default: context.temp_allocator)
Returns:
capture: The capture groups found in the string.
success: True if the regex matched the string.
Related Procedure Groups
match_iterator ¶
match_iterator :: proc(it: ^Match_Iterator) -> (result: Capture, index: int, ok: bool) {…}
Iterate over a Match_Iterator and return successive captures.
Inputs:
it: Pointer to the Match_Iterator to iterate over.
Returns:
result: Capture for this iteration.
ok: A bool indicating if there was a match, stopping the iteration on false.
Related Procedure Groups
match_with_preallocated_capture ¶
match_with_preallocated_capture :: proc(regex: Regular_Expression, str: string, capture: ^Capture, temporary_allocator := context.temp_allocator) -> (num_groups: int, success: bool) {…}
Match a regular expression against a string and save the capture results into
the provided capture structure.
The resulting capture strings will be slices to the string str, not wholly
copied strings, so they won't need to be individually deleted.
Allocates Using Provided Allocator
Inputs:
regex: The regular expression.
str: The string to match against.
capture: A pointer to a Capture structure with groups and pos already allocated.
temporary_allocator: The allocator to use for the virtual machine. (default: context.temp_allocator)
Returns:
num_groups: The number of capture groups set into capture.
success: True if the regex matched the string.
Related Procedure Groups
preallocate_capture ¶
preallocate_capture :: proc(allocator := context.allocator) -> (result: Capture) {…}
Allocate a Capture in advance for use with match. This can save some time
if you plan on performing several matches at once and only need the results
between matches.
Inputs:
allocator: (default: context.allocator)
Returns:
result: The Capture with the maximum number of groups allocated.
reset ¶
reset :: proc(it: ^Match_Iterator) {…}
Reset an iterator, allowing it to be run again as if new.
Inputs:
it: The iterator to reset.
Procedure Groups
destroy ¶
destroy :: proc{ destroy_regex, destroy_capture, destroy_iterator, }
match ¶
match :: proc{ match_and_allocate_capture, match_with_preallocated_capture, match_iterator, }
Source Files
Generation Information
Generated with odin version dev-2025-10 (vendor "odin") Windows_amd64 @ 2025-10-28 21:13:07.671793300 +0000 UTC