package core:text/scanner

Overview

package text/scanner provides a scanner and tokenizer for UTF-8-encoded text. It takes a string providing the source, which then can be tokenized through repeated calls to the scan procedure. For compatibility with existing tooling and languages, the NUL character is not allowed. If an UTF-8 encoded byte order mark (BOM) is the first character in the first character in the source, it will be discarded.

By default, a Scanner skips white space and Odin comments and recognizes all literals defined by the Odin programming language specification. A Scanner may be customized to recognize only a subset of those literals and to recognize different identifiers and white space characters.

Types

Position ¶

Position :: struct {
	filename: string,
	// filename, if present
	offset:   int,
	// byte offset, starting @ 0
	line:     int,
	// line number, starting @ 1
	column:   int,
}
 

Position represents a source position A position is valid if line > 0

Scan_Flag ¶

Scan_Flag :: enum u32 {
	Scan_Idents, 
	Scan_Ints, 
	Scan_C_Int_Prefixes, 
	Scan_Floats,         // Includes integers and hexadecimal floats
	Scan_Chars, 
	Scan_Strings, 
	Scan_Raw_Strings, 
	Scan_Comments, 
	Skip_Comments,       // if set with .Scan_Comments, comments become white space
}

Scan_Flags ¶

Scan_Flags :: distinct bit_set[Scan_Flag; u32]

Scanner ¶

Scanner :: struct {
	src:           string,
	src_pos:       int,
	src_end:       int,
	tok_pos:       int,
	tok_end:       int,
	ch:            rune,
	line:          int,
	column:        int,
	prev_line_len: int,
	prev_char_len: int,
	// error is called for each error encountered
	// If no error procedure is set, the error is reported to os.stderr
	error:         proc "odin" (s: ^Scanner, msg: string),
	// error_count is incremented by one for each error encountered
	error_count:   int,
	// flags controls which tokens are recognized
	// e.g. to recognize integers, set the .Scan_Ints flag
	// This field may be changed by the user at any time during scanning
	flags:         Scan_Flags,
	// The whitespace field controls which characters are recognized as white space
	// This field may be changed by the user at any time during scanning
	whitespace:    Whitespace,
	// is_ident_rune is a predicate controlling the characters accepted as the ith rune in an identifier
	// The valid characters must not conflict with the set of white space characters
	// If is_ident_rune is not set, regular Odin-like identifiers are accepted
	// This field may be changed by the user at any time during scanning
	is_ident_rune: proc "odin" (ch: rune, i: int) -> bool,
	// Start position of most recently scanned token (set by scan(s))
	// Call init or next invalidates the position
	pos:           Position,
}
 

Scanner allows for the reading of Unicode characters and tokens from a string

Whitespace ¶

Whitespace :: distinct bit_set[rune; u128]
 

Only allows for ASCII whitespace

Constants

C_Like_Tokens ¶

C_Like_Tokens :: Scan_Flags{.Scan_Idents, .Scan_Ints, .Scan_C_Int_Prefixes, .Scan_Floats, .Scan_Chars, .Scan_Strings, .Scan_Raw_Strings, .Scan_Comments, .Skip_Comments}

C_Whitespace ¶

C_Whitespace :: Whitespace{'\t', '\n', '\r', '\v', '\f', ' '}

Char ¶

Char :: -5

Comment ¶

Comment :: -8

EOF ¶

EOF :: -1

Float ¶

Float :: -4

Ident ¶

Ident :: -2

Int ¶

Int :: -3

Odin_Like_Tokens ¶

Odin_Like_Tokens :: Scan_Flags{.Scan_Idents, .Scan_Ints, .Scan_Floats, .Scan_Chars, .Scan_Strings, .Scan_Raw_Strings, .Scan_Comments, .Skip_Comments}

Odin_Whitespace ¶

Odin_Whitespace :: Whitespace{'\t', '\n', '\r', ' '}
 

Odin_Whitespace is the default value for the Scanner's whitespace field

Raw_String ¶

Raw_String :: -7

String ¶

String :: -6

Variables

This section is empty.

Procedures

error ¶

error :: proc "odin" (s: ^Scanner, msg: string) {…}

errorf ¶

errorf :: proc "odin" (s: ^Scanner, format: string, args: ..any) {…}

init ¶

init :: proc "odin" (s: ^Scanner, src: string, filename: string = "") -> ^Scanner {…}
 

init initializes a scanner with a new source and returns itself. error_count is set to 0, flags is set to Odin_Like_Tokens, whitespace is set to Odin_Whitespace

next ¶

next :: proc "odin" (s: ^Scanner) -> rune {…}
 

next reads and returns the next Unicode character. It returns EOF at the end of the source. next does not update the Scanner's pos field. Use 'position(s)' to get the current position

peek ¶

peek :: proc "odin" (s: ^Scanner, n: int = 0) -> (ch: rune) {…}
 

peek returns the next Unicode character in the source without advancing the scanner It returns EOF if the scanner's position is at least the last character of the source if n > 0, it call next n times and return the nth Unicode character and then restore the Scanner's state

peek_token ¶

peek_token :: proc "odin" (s: ^Scanner, n: int = 0) -> (tok: rune) {…}
 

peek returns the next token in the source It returns EOF if the scanner's position is at least the last character of the source if n > 0, it call next n times and return the nth token and then restore the Scanner's state

position ¶

position :: proc "odin" (s: ^Scanner) -> Position {…}
 

position returns the position of the character immediately after the character or token returns by the previous call to next or scan Use the Scanner's position field for the most recently scanned token position

position_is_valid ¶

position_is_valid :: proc "odin" (pos: Position) -> bool {…}
 

position_is_valid reports where the position is valid

position_to_string ¶

position_to_string :: proc "odin" (pos: Position, allocator := context.temp_allocator) -> string {…}

scan ¶

scan :: proc "odin" (s: ^Scanner) -> (tok: rune) {…}
 

scan reads the next token or Unicode character from source and returns it It only recognizes tokens for which the respective flag that is set It returns EOF at the end of the source It reports Scanner errors by calling s.error, if not nil; otherwise it will print the error message to os.stderr

token_string ¶

token_string :: proc "odin" (tok: rune, allocator := context.temp_allocator) -> string {…}
 

token_string returns a printable string for a token or Unicode character By default, it uses the context.temp_allocator to produce the string

token_text ¶

token_text :: proc "odin" (s: ^Scanner) -> string {…}
 

token_text returns the string of the most recently scanned token

Procedure Groups

This section is empty.

Source Files

Generation Information

Generated with odin version dev-2023-03 (vendor "odin") Windows_amd64 @ 2023-03-29 21:09:05.400881700 +0000 UTC