package core:text/regex

⌘K
Ctrl+K
or
/

    Overview

    package regex implements a complete suite for using Regular Expressions to match and capture text.

    Regular expressions are used to describe how a piece of text can match to another, using a pattern language.

    Odin's regex library implements the following features:

    Alternation:           `apple|cherry`
    Classes:               `[0-9_]`
    Classes, negated:      `[^0-9_]`
    Shorthands:            `\d\s\w`
    Shorthands, negated:   `\D\S\W`
    Wildcards:             `.`
    Repeat, optional:      `a*`
    Repeat, at least once: `a+`
    Repetition:            `a{1,2}`
    Optional:              `a?`
    Group, capture:        `([0-9])`
    Group, non-capture:    `(?:[0-9])`
    Start & End Anchors:   `^hello$`
    Word Boundaries:       `\bhello\b`
    Non-Word Boundaries:   `hello\B`
    
    

    These specifiers can be composed together, such as an optional group: (?:hello)?

    This package also supports the non-greedy variants of the repeating and optional specifiers by appending a ? to them.

    Of the shorthand classes that are supported, they are all ASCII-based, even when compiling in Unicode mode. This is for the sake of general performance and simplicity, as there are thousands of Unicode codepoints which would qualify as either a digit, space, or word character which could be irrelevant depending on what is being matched.

    Here are the shorthand class equivalencies:

    \d: [0-9]
    \s: [\t\n\f\r ]
    \w: [0-9A-Z_a-z]
    
    

    If you need your own shorthands, you can compose strings together like so:

    MY_HEX :: "[0-9A-Fa-f]"
    PATTERN :: MY_HEX + "-" + MY_HEX
    
    

    The compiler will handle turning multiple identical classes into references to the same set of matching runes, so there's no penalty for doing it like this.

    ``Some people, when confronted with a problem, think
      "I know, I'll use regular expressions." Now they have two problems.''
    
         - Jamie Zawinski
    
    
    

    Regular expressions have gathered a reputation over the decades for often being chosen as the wrong tool for the job. Here, we will clarify a few cases in which RegEx might be good or bad.

    When is it a good time to use RegEx?

    You don't know at compile-time what patterns of text the program will need to match when it's running. As an example, you are making a client which can be configured by the user to trigger on certain text patterns received from a server. For another example, you need a way for users of a text editor to compose matching strings that are more intricate than a simple substring lookup. The text you're matching against is small (< 64 KiB) and your patterns aren't overly complicated with branches (alternations, repeats, and optionals). If none of the above general impressions apply but your project doesn't warrant long-term maintenance.

    When is it a bad time to use RegEx?

    You know at compile-time the grammar you're parsing; a hand-made parser has the potential to be more maintainable and readable. The grammar you're parsing has certain validation steps that lend itself to forming complicated expressions, such as e-mail addresses, URIs, dates, postal codes, credit cards, et cetera. Using RegEx to validate these structures is almost always a bad sign. The text you're matching against is big (> 1 MiB); you would be better served by first dividing the text into manageable chunks and using some heuristic to locate the most likely location of a match before applying RegEx against it. You value high performance and low memory usage; RegEx will always have a certain overhead which increases with the complexity of the pattern.

    The implementation of this package has been optimized, but it will never be as thoroughly performant as a hand-made parser. In comparison, there are just too many intermediate steps, assumptions, and generalizations in what it takes to handle a regular expression.

    Types

    Capture ¶

    Capture :: struct {
    	pos:    [][2]int,
    	groups: []string,
    }
     

    This struct corresponds to a set of string captures from a RegEx match.

    pos will contain the start and end positions for each string in groups, such that str[pos[0][0]:pos[0][1]] == groups[0].

    Related Procedures With Parameters
    Related Procedures With Returns

    Compiler_Error ¶

    Compiler_Error :: regex_compiler.Error

    Creation_Error ¶

    Creation_Error :: enum int {
    	None, 
    	// A `\` was supplied as the delimiter to `create_by_user`.
    	Bad_Delimiter, 
    	// A pair of delimiters for `create_by_user` was not found.
    	Expected_Delimiter, 
    	// An unknown letter was supplied to `create_by_user` after the last delimiter.
    	Unknown_Flag, 
    	// An unsupported flag was supplied.
    	Unsupported_Flag, 
    }

    Error ¶

    Error :: union {
    	regex_parser.Error, 
    	regex_compiler.Error, 
    	Creation_Error, 
    }
    Related Procedures With Returns

    Flag ¶

    Flag :: regex_common.Flag

    Flags ¶

    Flags :: bit_set[regex_common.Flag; u8]

    Match_Iterator ¶

    Match_Iterator :: struct {
    	regex:   Regular_Expression,
    	capture: Capture,
    	vm:      regex_vm.Machine,
    	idx:     int,
    	temp:    runtime.Allocator,
    }
     

    An iterator to repeatedly match a pattern against a string, to be used with *_iterator procedures. Note: Does not handle .Multiline properly.

    Related Procedures With Parameters
    Related Procedures With Returns

    Parser_Error ¶

    Parser_Error :: regex_parser.Error

    Regular_Expression ¶

    Regular_Expression :: struct {
    	flags:      bit_set[regex_common.Flag; u8] `fmt:"-"`,
    	class_data: []regex_vm.Rune_Class_Data `fmt:"-"`,
    	program:    []regex_vm.Opcode `fmt:"-"`,
    }
     

    A compiled Regular Expression value, to be used with the match_* procedures.

    Related Procedures With Parameters
    Related Procedures With Returns

    Constants

    This section is empty.

    Variables

    This section is empty.

    Procedures

    create ¶

    create :: proc(pattern: string, flags: bit_set[regex_common.Flag; u8] = {}, permanent_allocator := context.allocator, temporary_allocator := context.temp_allocator) -> (result: Regular_Expression, err: Error) {…}
     

    Create a regular expression from a string pattern and a set of flags.

    Allocates Using Provided Allocators

    Inputs:
    pattern: The pattern to compile. flags: A bit_set of RegEx flags. permanent_allocator: The allocator to use for the final regular expression. (default: context.allocator) temporary_allocator: The allocator to use for the intermediate compilation stages. (default: context.temp_allocator)

    Returns:
    result: The regular expression. err: An error, if one occurred.

    create_by_user ¶

    create_by_user :: proc(pattern: string, permanent_allocator := context.allocator, temporary_allocator := context.temp_allocator) -> (result: Regular_Expression, err: Error) {…}
     

    Create a regular expression from a delimited string pattern, such as one provided by users of a program or those found in a configuration file.

    They are in the form of:

    [DELIMITER] [regular expression] [DELIMITER] [flags]
    
    

    For example, the following strings are valid:

    /hellope/i
    #hellope#i
    •hellope•i
    つhellopeつi
    
    

    The delimiter is determined by the very first rune in the string. The only restriction is that the delimiter cannot be \, as that rune is used to escape the delimiter if found in the middle of the string.

    All runes after the closing delimiter will be parsed as flags:

    'g': Global 'm': Multiline 'i': Case_Insensitive 'x': Ignore_Whitespace 'u': Unicode 'n': No_Capture '-': No_Optimization

    Allocates Using Provided Allocators

    Inputs:
    pattern: The delimited pattern with optional flags to compile. str: The string to match against. permanent_allocator: The allocator to use for the final regular expression. (default: context.allocator) temporary_allocator: The allocator to use for the intermediate compilation stages. (default: context.temp_allocator)

    Returns:
    result: The regular expression. err: An error, if one occurred.

    create_iterator ¶

    create_iterator :: proc(str: string, pattern: string, flags: bit_set[regex_common.Flag; u8] = {}, permanent_allocator := context.allocator, temporary_allocator := context.temp_allocator) -> (result: Match_Iterator, err: Error) {…}
     

    Create a Match_Iterator using a string to search, a regular expression to match against it, and a set of flags.

    Allocates Using Provided Allocators

    Inputs:
    str: The string to iterate over. pattern: The pattern to match. flags: A bit_set of RegEx flags. permanent_allocator: The allocator to use for the compiled regular expression. (default: context.allocator) temporary_allocator: The allocator to use for the intermediate compilation and iteration stages. (default: context.temp_allocator)

    Returns:
    result: The Match_Iterator. err: An error, if one occurred.

    destroy_capture ¶

    destroy_capture :: proc(capture: Capture, allocator := context.allocator) {…}
     

    Free all data allocated by the match_and_allocate_capture procedure.

    Frees Using Provided Allocator

    Inputs:
    capture: A Capture. allocator: (default: context.allocator)

    destroy_iterator ¶

    destroy_iterator :: proc(it: Match_Iterator, allocator := context.allocator) {…}
     

    Free all data allocated by the create_iterator procedure.

    Frees Using Provided Allocator

    Inputs:
    it: A Match_Iterator allocator: (default: context.allocator)

    destroy_regex ¶

    destroy_regex :: proc(regex: Regular_Expression, allocator := context.allocator) {…}
     

    Free all data allocated by the create* procedures.

    Frees Using Provided Allocator

    Inputs:
    regex: A regular expression. allocator: (default: context.allocator)

    match_and_allocate_capture ¶

    match_and_allocate_capture :: proc(regex: Regular_Expression, str: string, permanent_allocator := context.allocator, temporary_allocator := context.temp_allocator) -> (capture: Capture, success: bool) {…}
     

    Match a regular expression against a string and allocate the results into the returned capture structure.

    The resulting capture strings will be slices to the string str, not wholly copied strings, so they won't need to be individually deleted.

    Allocates Using Provided Allocators

    Inputs:
    regex: The regular expression. str: The string to match against. permanent_allocator: The allocator to use for the capture results. (default: context.allocator) temporary_allocator: The allocator to use for the virtual machine. (default: context.temp_allocator)

    Returns:
    capture: The capture groups found in the string. success: True if the regex matched the string.

    match_iterator ¶

    match_iterator :: proc(it: ^Match_Iterator) -> (result: Capture, index: int, ok: bool) {…}
     

    Iterate over a Match_Iterator and return successive captures. Note: Does not handle .Multiline properly.

    Inputs:
    it: Pointer to the Match_Iterator to iterate over.

    Returns:
    result: Capture for this iteration. ok: A bool indicating if there was a match, stopping the iteration on false.

    match_with_preallocated_capture ¶

    match_with_preallocated_capture :: proc(regex: Regular_Expression, str: string, capture: ^Capture, temporary_allocator := context.temp_allocator) -> (num_groups: int, success: bool) {…}
     

    Match a regular expression against a string and save the capture results into the provided capture structure.

    The resulting capture strings will be slices to the string str, not wholly copied strings, so they won't need to be individually deleted.

    Allocates Using Provided Allocator

    Inputs:
    regex: The regular expression. str: The string to match against. capture: A pointer to a Capture structure with groups and pos already allocated. temporary_allocator: The allocator to use for the virtual machine. (default: context.temp_allocator)

    Returns:
    num_groups: The number of capture groups set into capture. success: True if the regex matched the string.

    preallocate_capture ¶

    preallocate_capture :: proc(allocator := context.allocator) -> (result: Capture) {…}
     

    Allocate a Capture in advance for use with match. This can save some time if you plan on performing several matches at once and only need the results between matches.

    Inputs:
    allocator: (default: context.allocator)

    Returns:
    result: The Capture with the maximum number of groups allocated.

    reset ¶

    reset :: proc(it: ^Match_Iterator) {…}

    Procedure Groups

    Source Files

    Generation Information

    Generated with odin version dev-2025-04 (vendor "odin") Windows_amd64 @ 2025-04-23 21:12:32.128052700 +0000 UTC