package core:text/scanner

⌘K
Ctrl+K
or
/

    Overview

    package text/scanner provides a scanner and tokenizer for UTF-8-encoded text. It takes a string providing the source, which then can be tokenized through repeated calls to the scan procedure. For compatibility with existing tooling and languages, the NUL character is not allowed. If an UTF-8 encoded byte order mark (BOM) is the first character in the source, it will be discarded.

    By default, a Scanner skips white space and Odin comments and recognizes all literals defined by the Odin programming language specification. A Scanner may be customized to recognize only a subset of those literals and to recognize different identifiers and white space characters.

    Types

    Position ¶

    Position :: struct {
    	filename: string,
    	// filename, if present
    	offset:   int,
    	// byte offset, starting @ 0
    	line:     int,
    	// line number, starting @ 1
    	column:   int,
    }
     

    Position represents a source position A position is valid if line > 0

    Related Procedures With Parameters
    Related Procedures With Returns

    Scan_Flag ¶

    Scan_Flag :: enum u32 {
    	Scan_Idents, 
    	Scan_Ints, 
    	Scan_C_Int_Prefixes, 
    	Scan_Floats,         // Includes integers and hexadecimal floats
    	Scan_Chars, 
    	Scan_Strings, 
    	Scan_Raw_Strings, 
    	Scan_Comments, 
    	Skip_Comments,       // if set with .Scan_Comments, comments become white space
    }

    Scan_Flags ¶

    Scan_Flags :: distinct bit_set[Scan_Flag; u32]
    Related Constants

    Scanner ¶

    Scanner :: struct {
    	src:           string,
    	src_pos:       int,
    	src_end:       int,
    	tok_pos:       int,
    	tok_end:       int,
    	ch:            rune,
    	line:          int,
    	column:        int,
    	prev_line_len: int,
    	prev_char_len: int,
    	// error is called for each error encountered
    	// If no error procedure is set, the error is reported to os.stderr
    	error:         proc(s: ^Scanner, msg: string),
    	// error_count is incremented by one for each error encountered
    	error_count:   int,
    	// flags controls which tokens are recognized
    	// e.g. to recognize integers, set the .Scan_Ints flag
    	// This field may be changed by the user at any time during scanning
    	flags:         Scan_Flags,
    	// The whitespace field controls which characters are recognized as white space
    	// This field may be changed by the user at any time during scanning
    	whitespace:    Whitespace,
    	// is_ident_rune is a predicate controlling the characters accepted as the ith rune in an identifier
    	// The valid characters must not conflict with the set of white space characters
    	// If is_ident_rune is not set, regular Odin-like identifiers are accepted
    	// This field may be changed by the user at any time during scanning
    	is_ident_rune: proc(ch: rune, i: int) -> bool,
    	// Start position of most recently scanned token (set by scan(s))
    	// Call init or next invalidates the position
    	pos:           Position,
    }
     

    Scanner allows for the reading of Unicode characters and tokens from a string

    Related Procedures With Parameters

    Whitespace ¶

    Whitespace :: distinct bit_set[rune; u128]
     

    Only allows for ASCII whitespace

    Related Constants

    Constants

    C_Like_Tokens ¶

    C_Like_Tokens :: Scan_Flags{.Scan_Idents, .Scan_Ints, .Scan_C_Int_Prefixes, .Scan_Floats, .Scan_Chars, .Scan_Strings, .Scan_Raw_Strings, .Scan_Comments, .Skip_Comments}

    C_Whitespace ¶

    C_Whitespace :: Whitespace{'\t', '\n', '\r', '\v', '\f', ' '}

    Char ¶

    Char :: -5

    Comment ¶

    Comment :: -8

    EOF ¶

    EOF :: -1

    Float ¶

    Float :: -4

    Ident ¶

    Ident :: -2

    Int ¶

    Int :: -3

    Odin_Like_Tokens ¶

    Odin_Like_Tokens :: Scan_Flags{.Scan_Idents, .Scan_Ints, .Scan_Floats, .Scan_Chars, .Scan_Strings, .Scan_Raw_Strings, .Scan_Comments, .Skip_Comments}

    Odin_Whitespace ¶

    Odin_Whitespace :: Whitespace{'\t', '\n', '\r', ' '}
     

    Odin_Whitespace is the default value for the Scanner's whitespace field

    Raw_String ¶

    Raw_String :: -7

    String ¶

    String :: -6

    Variables

    This section is empty.

    Procedures

    error ¶

    error :: proc(s: ^Scanner, msg: string) {…}

    errorf ¶

    errorf :: proc(s: ^Scanner, format: string, .. args: ..any) {…}

    init ¶

    init :: proc(s: ^Scanner, src: string, filename: string = "") -> ^Scanner {…}
     

    init initializes a scanner with a new source and returns itself. error_count is set to 0, flags is set to Odin_Like_Tokens, whitespace is set to Odin_Whitespace

    next ¶

    next :: proc(s: ^Scanner) -> rune {…}
     

    next reads and returns the next Unicode character. It returns EOF at the end of the source. next does not update the Scanner's pos field. Use 'position(s)' to get the current position

    peek ¶

    peek :: proc(s: ^Scanner, n: int = 0) -> (ch: rune) {…}
     

    peek returns the next Unicode character in the source without advancing the scanner It returns EOF if the scanner's position is at least the last character of the source if n > 0, it call next n times and return the nth Unicode character and then restore the Scanner's state

    peek_token ¶

    peek_token :: proc(s: ^Scanner, n: int = 0) -> (tok: rune) {…}
     

    peek returns the next token in the source It returns EOF if the scanner's position is at least the last character of the source if n > 0, it call next n times and return the nth token and then restore the Scanner's state

    position ¶

    position :: proc(s: ^Scanner) -> Position {…}
     

    position returns the position of the character immediately after the character or token returns by the previous call to next or scan Use the Scanner's position field for the most recently scanned token position

    position_is_valid ¶

    position_is_valid :: proc(pos: Position) -> bool {…}
     

    position_is_valid reports where the position is valid

    position_to_string ¶

    position_to_string :: proc(pos: Position, allocator := context.temp_allocator) -> string {…}

    scan ¶

    scan :: proc(s: ^Scanner) -> (tok: rune) {…}
     

    scan reads the next token or Unicode character from source and returns it It only recognizes tokens for which the respective flag that is set It returns EOF at the end of the source It reports Scanner errors by calling s.error, if not nil; otherwise it will print the error message to os.stderr

    token_string ¶

    token_string :: proc(tok: rune, allocator: runtime.Allocator) -> string {…}
     

    token_string returns a printable string for a token or Unicode character By default, it uses the context.temp_allocator to produce the string

    token_text ¶

    token_text :: proc(s: ^Scanner) -> string {…}
     

    token_text returns the string of the most recently scanned token

    Procedure Groups

    This section is empty.

    Source Files

    Generation Information

    Generated with odin version dev-2025-01 (vendor "odin") Windows_amd64 @ 2025-01-20 21:11:04.548876800 +0000 UTC