package core:unicode
Overview
Data and procedures to test properties of Unicode code points.
Index
Types (1)
Variables (123)
- alpha_ranges
- alpha_singlets
- char_properties
- emoji_extended_pictographic_ranges
- extra_digits_ranges
- extra_digits_ranges16
- extra_digits_ranges32
- extra_digits_singles16
- grapheme_extend_ranges
- hangul_syllable_lv_singlets
- hangul_syllable_lvt_ranges
- indic_conjunct_break_consonant_ranges
- indic_conjunct_break_extend_ranges
- ll_ranges
- ll_ranges16
- ll_ranges32
- ll_singles16
- ll_singles32
- lm_ranges
- lm_ranges16
- lm_ranges32
- lm_singles16
- lm_singles32
- lo_ranges
- lo_ranges16
- lo_ranges32
- lo_singles16
- lo_singles32
- lt_ranges
- lt_ranges16
- lt_singles16
- lu_ranges
- lu_ranges16
- lu_ranges32
- lu_singles16
- lu_singles32
- mc_ranges
- mc_ranges16
- mc_ranges32
- mc_singles16
- mc_singles32
- me_ranges
- me_ranges16
- me_singles16
- mn_ranges
- mn_ranges16
- mn_ranges32
- mn_singles16
- mn_singles32
- nd_ranges
- nd_ranges16
- nd_ranges32
- nl_ranges
- nl_ranges16
- nl_ranges32
- nl_singles16
- nl_singles32
- no_ranges
- no_ranges16
- no_ranges32
- no_singles16
- nonspacing_mark_ranges
- normalized_east_asian_width_ranges
- other_lowercase_ranges
- other_lowercase_ranges16
- other_lowercase_ranges32
- other_lowercase_singles16
- other_lowercase_singles32
- other_uppercase_ranges
- other_uppercase_ranges16
- other_uppercase_ranges32
- pc_ranges
- pc_ranges16
- pc_singles16
- pd_ranges
- pd_ranges16
- pd_singles16
- pd_singles32
- pe_ranges
- pe_ranges16
- pe_singles16
- pf_ranges
- pf_singles16
- pi_ranges
- pi_ranges16
- pi_singles16
- po_ranges
- po_ranges16
- po_ranges32
- po_singles16
- po_singles32
- ps_ranges
- ps_singles16
- sc_ranges
- sc_ranges16
- sc_ranges32
- sc_singles16
- sc_singles32
- sk_ranges
- sk_ranges16
- sk_ranges32
- sk_singles16
- sm_ranges
- sm_ranges16
- sm_ranges32
- sm_singles16
- sm_singles32
- so_ranges
- so_ranges16
- so_ranges32
- so_singles16
- so_singles32
- space_ranges
- spacing_mark_ranges
- to_lower_ranges
- to_lower_singlets
- to_title_singlets
- to_upper_ranges
- to_upper_singlets
- unicode_spaces
- zs_ranges
- zs_ranges16
- zs_singles16
Procedures (43)
- binary_search
- in_range
- is_alpha
- is_combining
- is_control
- is_decimal
- is_digit
- is_emoji_extended_pictographic
- is_emoji_modifier
- is_enclosing_mark
- is_gcb_extend_class
- is_gcb_prepend_class
- is_grapheme_extend
- is_graphic
- is_hangul_syllable_leading
- is_hangul_syllable_lv
- is_hangul_syllable_lvt
- is_hangul_syllable_trailing
- is_hangul_syllable_vowel
- is_indic_conjunct_break_consonant
- is_indic_conjunct_break_extend
- is_indic_conjunct_break_linker
- is_indic_consonant_preceding_repha
- is_indic_consonant_prefixed
- is_letter
- is_lower
- is_nonspacing_mark
- is_number
- is_prepended_concatenation_mark
- is_print
- is_punct
- is_regional_indicator
- is_space
- is_spacing_mark
- is_symbol
- is_title
- is_upper
- is_white_space
- normalized_east_asian_width
- simple_fold
- to_lower
- to_title
- to_upper
Procedure Groups (0)
This section is empty.
Types
Range ¶
Related Procedures With Parameters
Constants
WORD_JOINER ¶
WORD_JOINER :: '\u2060'
ZERO_WIDTH_JOINER ¶
ZERO_WIDTH_JOINER :: '\u200D'
ZERO_WIDTH_NON_JOINER ¶
ZERO_WIDTH_NON_JOINER :: '\u200C'
ZERO_WIDTH_SPACE ¶
ZERO_WIDTH_SPACE :: '\u200B'
Variables
alpha_ranges ¶
@(rodata) alpha_ranges: [304]i32 = …
alpha_singlets ¶
@(rodata) alpha_singlets: [32]i32 = …
char_properties ¶
@(rodata) char_properties: [256]u8 = …
emoji_extended_pictographic_ranges ¶
@(rodata) emoji_extended_pictographic_ranges: [1022]i32 = …
extra_digits_ranges ¶
extra_digits_ranges: Range = …
extra_digits_ranges16 ¶
@(rodata) extra_digits_ranges16: [22]u16 = …
extra_digits_ranges32 ¶
@(rodata) extra_digits_ranges32: [8]i32 = …
extra_digits_singles16 ¶
@(rodata) extra_digits_singles16: [5]u16 = …
grapheme_extend_ranges ¶
@(rodata) grapheme_extend_ranges: [752]i32 = …
hangul_syllable_lv_singlets ¶
@(rodata) hangul_syllable_lv_singlets: [399]i32 = …
hangul_syllable_lvt_ranges ¶
@(rodata) hangul_syllable_lvt_ranges: [798]i32 = …
indic_conjunct_break_consonant_ranges ¶
@(rodata) indic_conjunct_break_consonant_ranges: [52]i32 = …
indic_conjunct_break_extend_ranges ¶
@(rodata) indic_conjunct_break_extend_ranges: [340]i32 = …
ll_ranges ¶
ll_ranges: Range = …
ll_ranges16 ¶
@(rodata) ll_ranges16: [144]u16 = …
ll_ranges32 ¶
@(rodata) ll_ranges32: [82]i32 = …
ll_singles16 ¶
@(rodata) ll_singles16: [549]u16 = …
ll_singles32 ¶
@(rodata) ll_singles32: [2]i32 = …
lm_ranges ¶
lm_ranges: Range = …
lm_ranges16 ¶
@(rodata) lm_ranges16: [42]u16 = …
lm_ranges32 ¶
@(rodata) lm_ranges32: [28]i32 = …
lm_singles16 ¶
@(rodata) lm_singles16: [36]u16 = …
lm_singles32 ¶
@(rodata) lm_singles32: [8]i32 = …
lo_ranges ¶
lo_ranges: Range = …
lo_ranges16 ¶
@(rodata) lo_ranges16: [472]u16 = …
lo_ranges32 ¶
@(rodata) lo_ranges32: [382]i32 = …
lo_singles16 ¶
@(rodata) lo_singles16: [55]u16 = …
lo_singles32 ¶
@(rodata) lo_singles32: [60]i32 = …
lt_ranges ¶
lt_ranges: Range = …
lt_ranges16 ¶
@(rodata) lt_ranges16: [6]u16 = …
lt_singles16 ¶
@(rodata) lt_singles16: [7]u16 = …
lu_ranges ¶
lu_ranges: Range = …
lu_ranges16 ¶
@(rodata) lu_ranges16: [120]u16 = …
lu_ranges32 ¶
@(rodata) lu_ranges32: [78]i32 = …
lu_singles16 ¶
@(rodata) lu_singles16: [552]u16 = …
lu_singles32 ¶
@(rodata) lu_singles32: [4]i32 = …
mc_ranges ¶
mc_ranges: Range = …
mc_ranges16 ¶
@(rodata) mc_ranges16: [142]u16 = …
mc_ranges32 ¶
@(rodata) mc_ranges32: [86]i32 = …
mc_singles16 ¶
@(rodata) mc_singles16: [41]u16 = …
mc_singles32 ¶
@(rodata) mc_singles32: [38]i32 = …
me_ranges ¶
me_ranges: Range = …
me_ranges16 ¶
@(rodata) me_ranges16: [8]u16 = …
me_singles16 ¶
@(rodata) me_singles16: [1]u16 = …
mn_ranges ¶
mn_ranges: Range = …
mn_ranges16 ¶
@(rodata) mn_ranges16: [264]u16 = …
mn_ranges32 ¶
@(rodata) mn_ranges32: [206]i32 = …
mn_singles16 ¶
@(rodata) mn_singles16: [81]u16 = …
mn_singles32 ¶
@(rodata) mn_singles32: [49]i32 = …
nd_ranges ¶
nd_ranges: Range = …
nd_ranges16 ¶
@(rodata) nd_ranges16: [74]u16 = …
nd_ranges32 ¶
@(rodata) nd_ranges32: [70]i32 = …
nl_ranges ¶
nl_ranges: Range = …
nl_ranges16 ¶
@(rodata) nl_ranges16: [12]u16 = …
nl_ranges32 ¶
@(rodata) nl_ranges32: [8]i32 = …
nl_singles16 ¶
@(rodata) nl_singles16: [1]u16 = …
nl_singles32 ¶
@(rodata) nl_singles32: [2]i32 = …
no_ranges ¶
no_ranges: Range = …
no_ranges16 ¶
@(rodata) no_ranges16: [48]u16 = …
no_ranges32 ¶
@(rodata) no_ranges32: [86]i32 = …
no_singles16 ¶
@(rodata) no_singles16: [5]u16 = …
nonspacing_mark_ranges ¶
@(rodata) nonspacing_mark_ranges: [692]i32 = …
normalized_east_asian_width_ranges ¶
@(rodata) normalized_east_asian_width_ranges: [489]i32 = …
Fullwidth (F) and Wide (W) are counted as 2. Everything else is 1.
Derived from: https://unicode.org/Public/15.1.0/ucd/EastAsianWidth.txt
other_lowercase_ranges ¶
other_lowercase_ranges: Range = …
other_lowercase_ranges16 ¶
@(rodata) other_lowercase_ranges16: [26]u16 = …
other_lowercase_ranges32 ¶
@(rodata) other_lowercase_ranges32: [8]i32 = …
other_lowercase_singles16 ¶
@(rodata) other_lowercase_singles16: [10]u16 = …
other_lowercase_singles32 ¶
@(rodata) other_lowercase_singles32: [1]i32 = …
other_uppercase_ranges ¶
other_uppercase_ranges: Range = …
other_uppercase_ranges16 ¶
@(rodata) other_uppercase_ranges16: [4]u16 = …
other_uppercase_ranges32 ¶
@(rodata) other_uppercase_ranges32: [6]i32 = …
pc_ranges ¶
pc_ranges: Range = …
pc_ranges16 ¶
@(rodata) pc_ranges16: [6]u16 = …
pc_singles16 ¶
@(rodata) pc_singles16: [3]u16 = …
pd_ranges ¶
pd_ranges: Range = …
pd_ranges16 ¶
@(rodata) pd_ranges16: [6]u16 = …
pd_singles16 ¶
@(rodata) pd_singles16: [15]u16 = …
pd_singles32 ¶
@(rodata) pd_singles32: [2]i32 = …
pe_ranges ¶
pe_ranges: Range = …
pe_ranges16 ¶
@(rodata) pe_ranges16: [2]u16 = …
pe_singles16 ¶
@(rodata) pe_singles16: [75]u16 = …
pf_ranges ¶
pf_ranges: Range = …
pf_singles16 ¶
@(rodata) pf_singles16: [10]u16 = …
pi_ranges ¶
pi_ranges: Range = …
pi_ranges16 ¶
@(rodata) pi_ranges16: [2]u16 = …
pi_singles16 ¶
@(rodata) pi_singles16: [10]u16 = …
po_ranges ¶
po_ranges: Range = …
po_ranges16 ¶
@(rodata) po_ranges16: [168]u16 = …
po_ranges32 ¶
@(rodata) po_ranges32: [80]i32 = …
po_singles16 ¶
@(rodata) po_singles16: [47]u16 = …
po_singles32 ¶
@(rodata) po_singles32: [23]i32 = …
ps_ranges ¶
ps_ranges: Range = …
ps_singles16 ¶
@(rodata) ps_singles16: [79]u16 = …
sc_ranges ¶
sc_ranges: Range = …
sc_ranges16 ¶
@(rodata) sc_ranges16: [12]u16 = …
sc_ranges32 ¶
@(rodata) sc_ranges32: [2]i32 = …
sc_singles16 ¶
@(rodata) sc_singles16: [12]u16 = …
sc_singles32 ¶
@(rodata) sc_singles32: [2]i32 = …
sk_ranges ¶
sk_ranges: Range = …
sk_ranges16 ¶
@(rodata) sk_ranges16: [32]u16 = …
sk_ranges32 ¶
@(rodata) sk_ranges32: [2]i32 = …
sk_singles16 ¶
@(rodata) sk_singles16: [14]u16 = …
sm_ranges ¶
sm_ranges: Range = …
sm_ranges16 ¶
@(rodata) sm_ranges16: [50]u16 = …
sm_ranges32 ¶
@(rodata) sm_ranges32: [6]i32 = …
sm_singles16 ¶
@(rodata) sm_singles16: [28]u16 = …
sm_singles32 ¶
@(rodata) sm_singles32: [11]i32 = …
so_ranges ¶
so_ranges: Range = …
so_ranges16 ¶
@(rodata) so_ranges16: [162]u16 = …
so_ranges32 ¶
@(rodata) so_ranges32: [130]i32 = …
so_singles16 ¶
@(rodata) so_singles16: [35]u16 = …
so_singles32 ¶
@(rodata) so_singles32: [12]i32 = …
space_ranges ¶
@(rodata) space_ranges: [26]i32 = …
spacing_mark_ranges ¶
@(rodata) spacing_mark_ranges: [364]i32 = …
to_lower_ranges ¶
@(rodata) to_lower_ranges: [108]i32 = …
to_lower_singlets ¶
@(rodata) to_lower_singlets: [666]i32 = …
to_title_singlets ¶
@(rodata) to_title_singlets: [16]i32 = …
to_upper_ranges ¶
@(rodata) to_upper_ranges: [105]i32 = …
to_upper_singlets ¶
@(rodata) to_upper_singlets: [680]i32 = …
unicode_spaces ¶
@(rodata) unicode_spaces: [18]i32 = …
zs_ranges ¶
zs_ranges: Range = …
zs_ranges16 ¶
@(rodata) zs_ranges16: [2]u16 = …
zs_singles16 ¶
@(rodata) zs_singles16: [6]u16 = …
Procedures
binary_search ¶
binary_search :: proc(c: $T, table: []$T, length, stride: int, loc := #caller_location) -> int {…}
in_range ¶
Check to see if the rune r is in range
is_alpha ¶
is_alpha :: is_letter
Return true if the rune r is a letter. Being a letter means that the rune has
the Unicode general category property of L. In practice, the character will have
a general category property of Ll, Lm, Lo, Lt, or Lu.
Inputs:
r: The rune which will be check for having the property of being a letter.
Returns:true when the rune r is a letter. false will be returned in all other cases.
is_decimal ¶
Returns true if the rune r is in the General Category Nd
Inputs:
r: The run to check if it is in the general category Nd.
Returns:true if the rune is in the general category Nd and false otherwise
is_digit ¶
This function determincs if a rune is a digit. To be a digit the charage either has a Numeric_Type of Digit or Decimal.
Inputs:
r: The rune to check if it is a digit.
Returns:true if the rune r is a digit, false in all other cases
is_emoji_extended_pictographic ¶
Extended_Pictographic
is_enclosing_mark ¶
General_Category=Enclosing_Mark
is_gcb_extend_class ¶
For grapheme text segmentation, from Unicode TR 29 Rev 43:
`
Grapheme_Extend = Yes, or
Emoji_Modifier = Yes
This includes: General_Category = Nonspacing_Mark General_Category = Enclosing_Mark U+200C ZERO WIDTH NON-JOINER
plus a few General_Category = Spacing_Mark needed for canonical equivalence.
`
is_gcb_prepend_class ¶
For grapheme text segmentation, from Unicode TR 29 Rev 43:
`
Indic_Syllabic_Category = Consonant_Preceding_Repha, or
Indic_Syllabic_Category = Consonant_Prefixed, or
Prepended_Concatenation_Mark = Yes
`
is_hangul_syllable_leading ¶
Hangul_Syllable_Type=Leading_Jamo
is_hangul_syllable_lv ¶
Hangul_Syllable_Type=LV_Syllable
is_hangul_syllable_lvt ¶
Hangul_Syllable_Type=LVT_Syllable
is_hangul_syllable_trailing ¶
Hangul_Syllable_Type=Trailing_Jamo
is_hangul_syllable_vowel ¶
Hangul_Syllable_Type=Vowel_Jamo
is_indic_conjunct_break_consonant ¶
Indic_Conjunct_Break=Consonant
is_indic_conjunct_break_extend ¶
Indic_Conjunct_Break=Extend
is_indic_conjunct_break_linker ¶
Indic_Conjunct_Break=Linker
is_indic_consonant_preceding_repha ¶
Indic_Syllabic_Category=Consonant_Preceding_Repha
is_indic_consonant_prefixed ¶
Indic_Syllabic_Category=Consonant_Prefixed
is_letter ¶
Return true if the rune r is a letter. Being a letter means that the rune has
the Unicode general category property of L. In practice, the character will have
a general category property of Ll, Lm, Lo, Lt, or Lu.
Inputs:
r: The rune which will be check for having the property of being a letter.
Returns:true when the rune r is a letter. false will be returned in all other cases.
is_nonspacing_mark ¶
General_Category=Nonspacing_Mark
is_number ¶
Checks to see if the rune r is a number. This means the rune is a member
of the general category Nd, Nl, or No.
Inputs:
r: The rune to check if it is number.
Returns:true if the ruen belongs to the general category Nd, Nl, or No. false
is return in all other cases.
is_prepended_concatenation_mark ¶
Prepended_Concatenation_Mark
is_white_space ¶
is_white_space :: is_space
normalized_east_asian_width ¶
Return values:
2 if East_Asian_Width=F or W, or 0 if non-printable / zero-width, or 1 in all other cases.
simple_fold ¶
simple_fold iterates over the Unicode code points equivalent under the Unicode defined simple case folding. simple_fold returns the smallest rune > r if one exists, or the smallest rune >= 0. If no valid Unicode code point exists, r is returned.
Example:
simple_fold('A') == 'a'
simple_fold('a') == 'A'
simple_fold('Z') == 'z'
simple_fold('z') == 'Z'
simple_fold('7') == '7'
simple_fold('k') == '\u212a' (Kelvin symbol, K)
simple_fold('\u212a') == 'k'
simple_fold(-3) == -3
Procedure Groups
This section is empty.
Source Files
Generation Information
Generated with odin version dev-2026-03 (vendor "odin") Windows_amd64 @ 2026-03-22 21:18:17.000998000 +0000 UTC