package core:unicode
Source

⌘K

Ctrl+K

Filter Results

Overview

Data and procedures to test properties of Unicode code points.

Index

Types (0)

This section is empty.

Constants (8)

MAX_ASCII
MAX_LATIN1
MAX_RUNE
REPLACEMENT_CHAR
WORD_JOINER
ZERO_WIDTH_JOINER
ZERO_WIDTH_NON_JOINER
ZERO_WIDTH_SPACE

Variables (19)

alpha_ranges
alpha_singlets
char_properties
emoji_extended_pictographic_ranges
grapheme_extend_ranges
hangul_syllable_lv_singlets
hangul_syllable_lvt_ranges
indic_conjunct_break_consonant_ranges
indic_conjunct_break_extend_ranges
nonspacing_mark_ranges
normalized_east_asian_width_ranges
space_ranges
spacing_mark_ranges
to_lower_ranges
to_lower_singlets
to_title_singlets
to_upper_ranges
to_upper_singlets
unicode_spaces

Procedures (41)

binary_search
is_alpha
is_combining
is_control
is_digit
is_emoji_extended_pictographic
is_emoji_modifier
is_enclosing_mark
is_gcb_extend_class
is_gcb_prepend_class
is_grapheme_extend
is_graphic
is_hangul_syllable_leading
is_hangul_syllable_lv
is_hangul_syllable_lvt
is_hangul_syllable_trailing
is_hangul_syllable_vowel
is_indic_conjunct_break_consonant
is_indic_conjunct_break_extend
is_indic_conjunct_break_linker
is_indic_consonant_preceding_repha
is_indic_consonant_prefixed
is_letter
is_lower
is_nonspacing_mark
is_number
is_prepended_concatenation_mark
is_print
is_punct
is_regional_indicator
is_space
is_spacing_mark
is_symbol
is_title
is_upper
is_white_space
normalized_east_asian_width
simple_fold
to_lower
to_title
to_upper

Procedure Groups (0)

This section is empty.

Types

This section is empty.

Constants

WORD_JOINER ¶
Source

WORD_JOINER :: '\u2060'

ZERO_WIDTH_JOINER ¶
Source

ZERO_WIDTH_JOINER :: '\u200D'

ZERO_WIDTH_NON_JOINER ¶
Source

ZERO_WIDTH_NON_JOINER :: '\u200C'

ZERO_WIDTH_SPACE ¶
Source

ZERO_WIDTH_SPACE :: '\u200B'

Variables

alpha_ranges ¶
Source

@(rodata)
alpha_ranges: [304]i32 = …

alpha_singlets ¶
Source

@(rodata)
alpha_singlets: [32]i32 = …

char_properties ¶
Source

@(rodata)
char_properties: [256]u8 = …

emoji_extended_pictographic_ranges ¶
Source

@(rodata)
emoji_extended_pictographic_ranges: [1022]i32 = …

grapheme_extend_ranges ¶
Source

@(rodata)
grapheme_extend_ranges: [752]i32 = …

hangul_syllable_lv_singlets ¶
Source

@(rodata)
hangul_syllable_lv_singlets: [399]i32 = …

hangul_syllable_lvt_ranges ¶
Source

@(rodata)
hangul_syllable_lvt_ranges: [798]i32 = …

indic_conjunct_break_consonant_ranges ¶
Source

@(rodata)
indic_conjunct_break_consonant_ranges: [52]i32 = …

indic_conjunct_break_extend_ranges ¶
Source

@(rodata)
indic_conjunct_break_extend_ranges: [340]i32 = …

nonspacing_mark_ranges ¶
Source

@(rodata)
nonspacing_mark_ranges: [692]i32 = …

normalized_east_asian_width_ranges ¶
Source

@(rodata)
normalized_east_asian_width_ranges: [489]i32 = …

Fullwidth (F) and Wide (W) are counted as 2. Everything else is 1.

Derived from: https://unicode.org/Public/15.1.0/ucd/EastAsianWidth.txt

space_ranges ¶
Source

@(rodata)
space_ranges: [26]i32 = …

spacing_mark_ranges ¶
Source

@(rodata)
spacing_mark_ranges: [364]i32 = …

to_lower_ranges ¶
Source

@(rodata)
to_lower_ranges: [108]i32 = …

to_lower_singlets ¶
Source

@(rodata)
to_lower_singlets: [666]i32 = …

to_title_singlets ¶
Source

@(rodata)
to_title_singlets: [16]i32 = …

to_upper_ranges ¶
Source

@(rodata)
to_upper_ranges: [105]i32 = …

to_upper_singlets ¶
Source

@(rodata)
to_upper_singlets: [680]i32 = …

unicode_spaces ¶
Source

@(rodata)
unicode_spaces: [18]i32 = …

Procedures

binary_search ¶
Source

binary_search :: proc(c: i32, table: []i32, length, stride: int, loc := #caller_location) -> int {…}

is_alpha ¶
Source

is_alpha :: is_letter

is_combining ¶
Source

is_combining :: proc(r: untyped rune) -> bool {…}

is_control ¶
Source

is_control :: proc(r: untyped rune) -> bool {…}

is_digit ¶
Source

is_digit :: proc(r: untyped rune) -> bool {…}

is_emoji_extended_pictographic ¶
Source

is_emoji_extended_pictographic :: proc(r: untyped rune) -> bool {…}

Extended_Pictographic

is_gcb_extend_class ¶
Source

is_gcb_extend_class :: proc(r: untyped rune) -> bool {…}

For grapheme text segmentation, from Unicode TR 29 Rev 43:

` Grapheme_Extend = Yes, or Emoji_Modifier = Yes

This includes: General_Category = Nonspacing_Mark General_Category = Enclosing_Mark U+200C ZERO WIDTH NON-JOINER

plus a few General_Category = Spacing_Mark needed for canonical equivalence. `

is_gcb_prepend_class ¶
Source

is_gcb_prepend_class :: proc(r: untyped rune) -> bool {…}

For grapheme text segmentation, from Unicode TR 29 Rev 43:

` Indic_Syllabic_Category = Consonant_Preceding_Repha, or Indic_Syllabic_Category = Consonant_Prefixed, or Prepended_Concatenation_Mark = Yes `

is_graphic ¶
Source

is_graphic :: proc(r: untyped rune) -> bool {…}

is_hangul_syllable_leading ¶
Source

is_hangul_syllable_leading :: proc(r: untyped rune) -> bool {…}

Hangul_Syllable_Type=Leading_Jamo

is_hangul_syllable_lv ¶
Source

is_hangul_syllable_lv :: proc(r: untyped rune) -> bool {…}

Hangul_Syllable_Type=LV_Syllable

is_hangul_syllable_lvt ¶
Source

is_hangul_syllable_lvt :: proc(r: untyped rune) -> bool {…}

Hangul_Syllable_Type=LVT_Syllable

is_hangul_syllable_trailing ¶
Source

is_hangul_syllable_trailing :: proc(r: untyped rune) -> bool {…}

Hangul_Syllable_Type=Trailing_Jamo

is_hangul_syllable_vowel ¶
Source

is_hangul_syllable_vowel :: proc(r: untyped rune) -> bool {…}

Hangul_Syllable_Type=Vowel_Jamo

is_indic_conjunct_break_consonant ¶
Source

is_indic_conjunct_break_consonant :: proc(r: untyped rune) -> bool {…}

Indic_Conjunct_Break=Consonant

is_indic_conjunct_break_extend ¶
Source

is_indic_conjunct_break_extend :: proc(r: untyped rune) -> bool {…}

Indic_Conjunct_Break=Extend

is_indic_conjunct_break_linker ¶
Source

is_indic_conjunct_break_linker :: proc(r: untyped rune) -> bool {…}

Indic_Conjunct_Break=Linker

is_indic_consonant_preceding_repha ¶
Source

is_indic_consonant_preceding_repha :: proc(r: untyped rune) -> bool {…}

Indic_Syllabic_Category=Consonant_Preceding_Repha

is_indic_consonant_prefixed ¶
Source

is_indic_consonant_prefixed :: proc(r: untyped rune) -> bool {…}

Indic_Syllabic_Category=Consonant_Prefixed

is_letter ¶
Source

is_letter :: proc(r: untyped rune) -> bool {…}

is_lower ¶
Source

is_lower :: proc(r: untyped rune) -> bool {…}

is_number ¶
Source

is_number :: proc(r: untyped rune) -> bool {…}

is_prepended_concatenation_mark ¶
Source

is_prepended_concatenation_mark :: proc(r: untyped rune) -> bool {…}

Prepended_Concatenation_Mark

is_print ¶
Source

is_print :: proc(r: untyped rune) -> bool {…}

is_punct ¶
Source

is_punct :: proc(r: untyped rune) -> bool {…}

is_regional_indicator ¶
Source

is_regional_indicator :: proc(r: untyped rune) -> bool {…}

Regional_Indicator

is_space ¶
Source

is_space :: proc(r: untyped rune) -> bool {…}

is_symbol ¶
Source

is_symbol :: proc(r: untyped rune) -> bool {…}

is_title ¶
Source

is_title :: proc(r: untyped rune) -> bool {…}

is_upper ¶
Source

is_upper :: proc(r: untyped rune) -> bool {…}

is_white_space ¶
Source

is_white_space :: is_space

normalized_east_asian_width ¶
Source

normalized_east_asian_width :: proc(r: untyped rune) -> int {…}

Return values:

2 if East_Asian_Width=F or W, or 0 if non-printable / zero-width, or 1 in all other cases.

simple_fold ¶
Source

simple_fold :: proc(r: untyped rune) -> untyped rune {…}

simple_fold iterates over the Unicode code points equivalent under the Unicode defined simple case folding. simple_fold returns the smallest rune > r if one exists, or the smallest rune >= 0. If no valid Unicode code point exists, r is returned.

Example:

simple_fold('A')      == 'a'
simple_fold('a')      == 'A'
simple_fold('Z')      == 'z'
simple_fold('z')      == 'Z'
simple_fold('7')      == '7'
simple_fold('k')      == '\u212a' (Kelvin symbol, K)
simple_fold('\u212a') == 'k'
simple_fold(-3)       == -3

to_lower ¶
Source

to_lower :: proc(r: untyped rune) -> untyped rune {…}

to_title ¶
Source

to_title :: proc(r: untyped rune) -> untyped rune {…}

to_upper ¶
Source

to_upper :: proc(r: untyped rune) -> untyped rune {…}

Procedure Groups

This section is empty.

Source Files

Generation Information

Generated with odin version dev-2025-12 (vendor "odin") Windows_amd64 @ 2025-12-30 21:14:50.152017500 +0000 UTC

package core:unicodeSource

Overview

Index

Types

Constants

Variables

Procedures

Procedure Groups

Source Files

Generation Information

package core:unicode
Source