package core:text/scanner
Overview
package text/scanner provides a scanner and tokenizer for UTF-8-encoded text. It takes a string providing the source, which then can be tokenized through repeated calls to the scan procedure. For compatibility with existing tooling and languages, the NUL character is not allowed. If an UTF-8 encoded byte order mark (BOM) is the first character in the source, it will be discarded.
By default, a Scanner skips white space and Odin comments and recognizes all literals defined by the Odin programming language specification. A Scanner may be customized to recognize only a subset of those literals and to recognize different identifiers and white space characters.
Index
Types (5)
Constants (12)
Variables (0)
This section is empty.
Procedures (12)
Procedure Groups (0)
This section is empty.
Types
Position ¶
Position :: struct { filename: string, // filename, if present offset: int, // byte offset, starting @ 0 line: int, // line number, starting @ 1 column: int, }
Position represents a source position A position is valid if line > 0
Related Procedures With Parameters
Related Procedures With Returns
Scan_Flag ¶
Scan_Flag :: enum u32 { Scan_Idents, Scan_Ints, Scan_C_Int_Prefixes, Scan_Floats, // Includes integers and hexadecimal floats Scan_Chars, Scan_Strings, Scan_Raw_Strings, Scan_Comments, Skip_Comments, // if set with .Scan_Comments, comments become white space }
Scan_Flags ¶
Related Constants
Scanner ¶
Scanner :: struct { src: string, src_pos: int, src_end: int, tok_pos: int, tok_end: int, ch: rune, line: int, column: int, prev_line_len: int, prev_char_len: int, // error is called for each error encountered // If no error procedure is set, the error is reported to os.stderr error: proc(s: ^Scanner, msg: string), // error_count is incremented by one for each error encountered error_count: int, // flags controls which tokens are recognized // e.g. to recognize integers, set the .Scan_Ints flag // This field may be changed by the user at any time during scanning flags: Scan_Flags, // The whitespace field controls which characters are recognized as white space // This field may be changed by the user at any time during scanning whitespace: Whitespace, // is_ident_rune is a predicate controlling the characters accepted as the ith rune in an identifier // The valid characters must not conflict with the set of white space characters // If is_ident_rune is not set, regular Odin-like identifiers are accepted // This field may be changed by the user at any time during scanning is_ident_rune: proc(ch: rune, i: int) -> bool, // Start position of most recently scanned token (set by scan(s)) // Call init or next invalidates the position pos: Position, }
Scanner allows for the reading of Unicode characters and tokens from a string
Related Procedures With Parameters
Constants
C_Like_Tokens ¶
C_Like_Tokens :: Scan_Flags{.Scan_Idents, .Scan_Ints, .Scan_C_Int_Prefixes, .Scan_Floats, .Scan_Chars, .Scan_Strings, .Scan_Raw_Strings, .Scan_Comments, .Skip_Comments}
C_Whitespace ¶
C_Whitespace :: Whitespace{'\t', '\n', '\r', '\v', '\f', ' '}
Odin_Like_Tokens ¶
Odin_Like_Tokens :: Scan_Flags{.Scan_Idents, .Scan_Ints, .Scan_Floats, .Scan_Chars, .Scan_Strings, .Scan_Raw_Strings, .Scan_Comments, .Skip_Comments}
Odin_Whitespace ¶
Odin_Whitespace :: Whitespace{'\t', '\n', '\r', ' '}
Odin_Whitespace is the default value for the Scanner's whitespace field
Raw_String ¶
Raw_String :: -7
Variables
This section is empty.
Procedures
init ¶
init initializes a scanner with a new source and returns itself. error_count is set to 0, flags is set to Odin_Like_Tokens, whitespace is set to Odin_Whitespace
next ¶
next reads and returns the next Unicode character. It returns EOF at the end of the source. next does not update the Scanner's pos field. Use 'position(s)' to get the current position
peek ¶
peek returns the next Unicode character in the source without advancing the scanner It returns EOF if the scanner's position is at least the last character of the source if n > 0, it call next n times and return the nth Unicode character and then restore the Scanner's state
peek_token ¶
peek returns the next token in the source It returns EOF if the scanner's position is at least the last character of the source if n > 0, it call next n times and return the nth token and then restore the Scanner's state
position ¶
position returns the position of the character immediately after the character or token returns by the previous call to next or scan Use the Scanner's position field for the most recently scanned token position
position_is_valid ¶
position_is_valid reports where the position is valid
position_to_string ¶
position_to_string :: proc(pos: Position, allocator := context.temp_allocator) -> string {…}
scan ¶
scan reads the next token or Unicode character from source and returns it It only recognizes tokens for which the respective flag that is set It returns EOF at the end of the source It reports Scanner errors by calling s.error, if not nil; otherwise it will print the error message to os.stderr
token_string ¶
token_string returns a printable string for a token or Unicode character By default, it uses the context.temp_allocator to produce the string
token_text ¶
token_text returns the string of the most recently scanned token
Procedure Groups
This section is empty.
Source Files
Generation Information
Generated with odin version dev-2024-11 (vendor "odin") Windows_amd64 @ 2024-11-16 21:10:10.884639400 +0000 UTC