package core:text/regex

⌘K
Ctrl+K
or
/

    Overview

    package regex implements a complete suite for using Regular Expressions to match and capture text.

    Regular expressions are used to describe how a piece of text can match to another, using a pattern language.

    Odin's regex library implements the following features:

    Alternation:           `apple|cherry`
    Classes:               `[0-9_]`
    Classes, negated:      `[^0-9_]`
    Shorthands:            `\d\s\w`
    Shorthands, negated:   `\D\S\W`
    Wildcards:             `.`
    Repeat, optional:      `a*`
    Repeat, at least once: `a+`
    Repetition:            `a{1,2}`
    Optional:              `a?`
    Group, capture:        `([0-9])`
    Group, non-capture:    `(?:[0-9])`
    Start & End Anchors:   `^hello$`
    Word Boundaries:       `\bhello\b`
    Non-Word Boundaries:   `hello\B`
    
    

    These specifiers can be composed together, such as an optional group: (?:hello)?

    This package also supports the non-greedy variants of the repeating and optional specifiers by appending a ? to them.

    Of the shorthand classes that are supported, they are all ASCII-based, even when compiling in Unicode mode. This is for the sake of general performance and simplicity, as there are thousands of Unicode codepoints which would qualify as either a digit, space, or word character which could be irrelevant depending on what is being matched.

    Here are the shorthand class equivalencies:

    \d: [0-9]
    \s: [\t\n\f\r ]
    \w: [0-9A-Z_a-z]
    
    

    If you need your own shorthands, you can compose strings together like so:

    MY_HEX :: "[0-9A-Fa-f]"
    PATTERN :: MY_HEX + "-" + MY_HEX
    
    

    The compiler will handle turning multiple identical classes into references to the same set of matching runes, so there's no penalty for doing it like this.

    ``Some people, when confronted with a problem, think
      "I know, I'll use regular expressions." Now they have two problems.''
    
         - Jamie Zawinski
    
    
    

    Regular expressions have gathered a reputation over the decades for often being chosen as the wrong tool for the job. Here, we will clarify a few cases in which RegEx might be good or bad.

    When is it a good time to use RegEx?

    You don't know at compile-time what patterns of text the program will need to match when it's running. As an example, you are making a client which can be configured by the user to trigger on certain text patterns received from a server. For another example, you need a way for users of a text editor to compose matching strings that are more intricate than a simple substring lookup. The text you're matching against is small (< 64 KiB) and your patterns aren't overly complicated with branches (alternations, repeats, and optionals). If none of the above general impressions apply but your project doesn't warrant long-term maintenance.

    When is it a bad time to use RegEx?

    You know at compile-time the grammar you're parsing; a hand-made parser has the potential to be more maintainable and readable. The grammar you're parsing has certain validation steps that lend itself to forming complicated expressions, such as e-mail addresses, URIs, dates, postal codes, credit cards, et cetera. Using RegEx to validate these structures is almost always a bad sign. The text you're matching against is big (> 1 MiB); you would be better served by first dividing the text into manageable chunks and using some heuristic to locate the most likely location of a match before applying RegEx against it. You value high performance and low memory usage; RegEx will always have a certain overhead which increases with the complexity of the pattern.

    The implementation of this package has been optimized, but it will never be as thoroughly performant as a hand-made parser. In comparison, there are just too many intermediate steps, assumptions, and generalizations in what it takes to handle a regular expression.

    Index

    Constants (0)

    This section is empty.

    Variables (0)

    This section is empty.

    Procedure Groups (2)

    Types

    Capture ¶

    Capture :: struct {
    	pos:    [][2]int,
    	groups: []string,
    }
     

    This struct corresponds to a set of string captures from a RegEx match.

    pos will contain the start and end positions for each string in groups, such that str[pos[0][0]:pos[0][1]] == groups[0].

    Related Procedures With Parameters
    Related Procedures With Returns

    Compiler_Error ¶

    Compiler_Error :: regex_compiler.Error

    Creation_Error ¶

    Creation_Error :: enum int {
    	None, 
    	// A `\` was supplied as the delimiter to `create_by_user`.
    	Bad_Delimiter, 
    	// A pair of delimiters for `create_by_user` was not found.
    	Expected_Delimiter, 
    	// An unknown letter was supplied to `create_by_user` after the last delimiter.
    	Unknown_Flag, 
    }

    Error ¶

    Error :: union {
    	regex_parser.Error, 
    	regex_compiler.Error, 
    	Creation_Error, 
    }
    Related Procedures With Returns

    Flag ¶

    Flag :: regex_common.Flag

    Flags ¶

    Flags :: bit_set[regex_common.Flag; u8]

    Parser_Error ¶

    Parser_Error :: regex_parser.Error

    Regular_Expression ¶

    Regular_Expression :: struct {
    	flags:      bit_set[regex_common.Flag; u8] `fmt:"-"`,
    	class_data: []regex_vm.Rune_Class_Data `fmt:"-"`,
    	program:    []regex_vm.Opcode `fmt:"-"`,
    }
     

    A compiled Regular Expression value, to be used with the match_* procedures.

    Related Procedures With Parameters
    Related Procedures With Returns

    Constants

    This section is empty.

    Variables

    This section is empty.

    Procedures

    create ¶

    create :: proc(pattern: string, flags: bit_set[regex_common.Flag; u8] = {}, permanent_allocator := context.allocator, temporary_allocator := context.temp_allocator) -> (result: Regular_Expression, err: Error) {…}
     

    Create a regular expression from a string pattern and a set of flags.

    Allocates Using Provided Allocators

    Inputs:
    pattern: The pattern to compile. flags: A bit_set of RegEx flags. permanent_allocator: The allocator to use for the final regular expression. (default: context.allocator) temporary_allocator: The allocator to use for the intermediate compilation stages. (default: context.temp_allocator)

    Returns:
    result: The regular expression. err: An error, if one occurred.

    create_by_user ¶

    create_by_user :: proc(pattern: string, permanent_allocator := context.allocator, temporary_allocator := context.temp_allocator) -> (result: Regular_Expression, err: Error) {…}
     

    Create a regular expression from a delimited string pattern, such as one provided by users of a program or those found in a configuration file.

    They are in the form of:

    [DELIMITER] [regular expression] [DELIMITER] [flags]
    
    

    For example, the following strings are valid:

    /hellope/i
    #hellope#i
    •hellope•i
    つhellopeつi
    
    

    The delimiter is determined by the very first rune in the string. The only restriction is that the delimiter cannot be \, as that rune is used to escape the delimiter if found in the middle of the string.

    All runes after the closing delimiter will be parsed as flags:

    'g': Global 'm': Multiline 'i': Case_Insensitive 'x': Ignore_Whitespace 'u': Unicode 'n': No_Capture '-': No_Optimization

    Allocates Using Provided Allocators

    Inputs:
    pattern: The delimited pattern with optional flags to compile. str: The string to match against. permanent_allocator: The allocator to use for the final regular expression. (default: context.allocator) temporary_allocator: The allocator to use for the intermediate compilation stages. (default: context.temp_allocator)

    Returns:
    result: The regular expression. err: An error, if one occurred.

    destroy_capture ¶

    destroy_capture :: proc(capture: Capture, allocator := context.allocator) {…}
     

    Free all data allocated by the match_and_allocate_capture procedure.

    Frees Using Provided Allocator

    Inputs:
    capture: A Capture. allocator: (default: context.allocator)

    destroy_regex ¶

    destroy_regex :: proc(regex: Regular_Expression, allocator := context.allocator) {…}
     

    Free all data allocated by the create* procedures.

    Frees Using Provided Allocator

    Inputs:
    regex: A regular expression. allocator: (default: context.allocator)

    match_and_allocate_capture ¶

    match_and_allocate_capture :: proc(regex: Regular_Expression, str: string, permanent_allocator := context.allocator, temporary_allocator := context.temp_allocator) -> (capture: Capture, success: bool) {…}
     

    Match a regular expression against a string and allocate the results into the returned capture structure.

    The resulting capture strings will be slices to the string str, not wholly copied strings, so they won't need to be individually deleted.

    Allocates Using Provided Allocators

    Inputs:
    regex: The regular expression. str: The string to match against. permanent_allocator: The allocator to use for the capture results. (default: context.allocator) temporary_allocator: The allocator to use for the virtual machine. (default: context.temp_allocator)

    Returns:
    capture: The capture groups found in the string. success: True if the regex matched the string.

    match_with_preallocated_capture ¶

    match_with_preallocated_capture :: proc(regex: Regular_Expression, str: string, capture: ^Capture, temporary_allocator := context.temp_allocator) -> (num_groups: int, success: bool) {…}
     

    Match a regular expression against a string and save the capture results into the provided capture structure.

    The resulting capture strings will be slices to the string str, not wholly copied strings, so they won't need to be individually deleted.

    Allocates Using Provided Allocator

    Inputs:
    regex: The regular expression. str: The string to match against. capture: A pointer to a Capture structure with groups and pos already allocated. temporary_allocator: The allocator to use for the virtual machine. (default: context.temp_allocator)

    Returns:
    num_groups: The number of capture groups set into capture. success: True if the regex matched the string.

    preallocate_capture ¶

    preallocate_capture :: proc(allocator := context.allocator) -> (result: Capture) {…}
     

    Allocate a Capture in advance for use with match. This can save some time if you plan on performing several matches at once and only need the results between matches.

    Inputs:
    allocator: (default: context.allocator)

    Returns:
    result: The Capture with the maximum number of groups allocated.

    Procedure Groups

    Source Files

    Generation Information

    Generated with odin version dev-2024-09 (vendor "odin") Windows_amd64 @ 2024-09-15 21:11:17.909656200 +0000 UTC