package core:unicode/utf8
Overview
Procedures and constants to support text-encoding in the UTF-8
character encoding.
Index
Variables (2)
Procedures (23)
- decode_grapheme_clusters
- decode_grapheme_iterate
- decode_grapheme_iterator_make
- decode_last_rune_in_bytes
- decode_last_rune_in_string
- decode_rune_in_bytes
- decode_rune_in_string
- encode_rune
- full_rune_in_bytes
- full_rune_in_string
- grapheme_count
- rune_at
- rune_at_pos
- rune_count_in_bytes
- rune_count_in_string
- rune_offset
- rune_size
- rune_start
- rune_string_at_pos
- runes_to_string
- string_to_runes
- valid_rune
- valid_string
Procedure Groups (4)
Types
Grapheme ¶
Related Procedures With Returns
Grapheme_Cluster_Sequence ¶
Grapheme_Cluster_Sequence :: enum int { None, Indic, Emoji, Regional, }
Grapheme_Iterator ¶
Grapheme_Iterator :: struct { str: string, curr_offset: int, grapheme_count: int, // The number of graphemes in the string rune_count: int, // The number of runes in the string width: int, // The widrth of the string in number of monospace cells last_rune: untyped rune, last_rune_breaks_forward: bool, last_width: int, last_grapheme_count: int, bypass_next_rune: bool, regional_indicator_counter: int, current_sequence: Grapheme_Cluster_Sequence, continue_sequence: bool, }
Related Procedures With Parameters
Related Procedures With Returns
Constants
MAX_RUNE ¶
MAX_RUNE :: '\U0010ffff'
RUNE1_MAX ¶
RUNE1_MAX: int : 1 << 7 - 1
RUNE2_MAX ¶
RUNE2_MAX: int : 1 << 11 - 1
RUNE3_MAX ¶
RUNE3_MAX: int : 1 << 16 - 1
RUNE_BOM ¶
RUNE_BOM: int : 0xfeff
RUNE_EOF ¶
RUNE_EOF :: ~rune(0)
RUNE_ERROR ¶
RUNE_ERROR :: '\ufffd'
RUNE_SELF ¶
RUNE_SELF: int : 0x80
SURROGATE_HIGH_MAX ¶
SURROGATE_HIGH_MAX: int : 0xdbff
A high/leading surrogate is in range SURROGATE_MIN..SURROGATE_HIGH_MAX, A low/trailing surrogate is in range SURROGATE_LOW_MIN..SURROGATE_MAX.
SURROGATE_LOW_MIN ¶
SURROGATE_LOW_MIN: int : 0xdc00
SURROGATE_MAX ¶
SURROGATE_MAX: int : 0xdfff
SURROGATE_MIN ¶
SURROGATE_MIN: int : 0xd800
ZERO_WIDTH_JOINER ¶
ZERO_WIDTH_JOINER :: unicode.ZERO_WIDTH_JOINER
Variables
accept_ranges ¶
accept_ranges: [5]Accept_Range = …
accept_sizes ¶
accept_sizes: [256]u8 = …
Procedures
decode_grapheme_clusters ¶
decode_grapheme_clusters :: proc(str: string, track_graphemes: bool = true, allocator := context.allocator) -> (graphemes: [dynamic]Grapheme, grapheme_count: int, rune_count: int, width: int) {…}
Decode the individual graphemes in a UTF-8 string.
Allocates Using Provided Allocator
Inputs:
str: The input string.
track_graphemes: Whether or not to allocate and return graphemes
with extra data about each grapheme.
allocator: (default: context.allocator)
Returns:
graphemes: Extra data about each grapheme.
grapheme_count: The number of graphemes in the string.
rune_count: The number of runes in the string.
width: The width of the string in number of monospace cells.
decode_grapheme_iterate ¶
decode_grapheme_iterate :: proc(it: ^Grapheme_Iterator) -> (text: string, grapheme: Grapheme, ok: bool) {…}
decode_grapheme_iterator_make ¶
decode_grapheme_iterator_make :: proc(str: string) -> (it: Grapheme_Iterator) {…}
decode_last_rune_in_bytes ¶
Related Procedure Groups
decode_last_rune_in_string ¶
Related Procedure Groups
decode_rune_in_bytes ¶
Related Procedure Groups
decode_rune_in_string ¶
Related Procedure Groups
full_rune_in_bytes ¶
full_rune_in_bytes reports if the bytes in b begin with a full utf-8 encoding of a rune or not An invalid encoding is considered a full rune since it will convert as an error rune of width 1 (RUNE_ERROR)
Related Procedure Groups
full_rune_in_string ¶
full_rune_in_string reports if the bytes in s begin with a full utf-8 encoding of a rune or not An invalid encoding is considered a full rune since it will convert as an error rune of width 1 (RUNE_ERROR)
Related Procedure Groups
grapheme_count ¶
Count the individual graphemes in a UTF-8 string.
Inputs:
str: The input string.
Returns:
graphemes: The number of graphemes in the string.
runes: The number of runes in the string.
width: The width of the string in number of monospace cells.
rune_count_in_bytes ¶
Related Procedure Groups
rune_count_in_string ¶
Related Procedure Groups
rune_offset ¶
Returns the byte position of rune at position pos in s with an optional start byte position. Returns -1 if it runs out of the string.
rune_size ¶
rune_size :: proc "contextless" (r: untyped rune) -> int {…}
runes_to_string ¶
runes_to_string :: proc(runes: []untyped rune, allocator := context.allocator) -> string {…}
string_to_runes ¶
string_to_runes :: proc(s: string, allocator := context.allocator) -> (runes: []untyped rune) {…}
valid_rune ¶
valid_rune :: proc "contextless" (r: untyped rune) -> bool {…}
Procedure Groups
decode_last_rune ¶
decode_last_rune :: proc{ decode_last_rune_in_string, decode_last_rune_in_bytes, }
decode_rune ¶
decode_rune :: proc{ decode_rune_in_string, decode_rune_in_bytes, }
full_rune ¶
full_rune :: proc{ full_rune_in_bytes, full_rune_in_string, }
full_rune reports if the bytes in b begin with a full utf-8 encoding of a rune or not An invalid encoding is considered a full rune since it will convert as an error rune of width 1 (RUNE_ERROR)
rune_count ¶
rune_count :: proc{ rune_count_in_string, rune_count_in_bytes, }
Source Files
Generation Information
Generated with odin version dev-2025-10 (vendor "odin") Windows_amd64 @ 2025-10-18 21:11:23.018431400 +0000 UTC