package core:encoding/entity

⌘K
Ctrl+K
or
/

    Overview

    Encode and decode runes to/from a Unicode &entity;.

    This code has several procedures to map unicode runes to/from different textual encodings. SGML/XML/HTML entity &#<decimal>; &#x<hexadecimal>; &<entity name>; (If the lookup tables are compiled in). Reference: https://www.w3.org/2003/entities/2007xml/unicode.xml

    URL encode / decode %hex entity Reference: https://datatracker.ietf.org/doc/html/rfc3986/#section-2.1

    Types

    Error ¶

    Error :: enum u8 {
    	None                    = 0, 
    	Tokenizer_Is_Nil, 
    	Illegal_NUL_Character, 
    	Illegal_UTF_Encoding, 
    	Illegal_BOM, 
    	CDATA_Not_Terminated, 
    	Comment_Not_Terminated, 
    	Invalid_Entity_Encoding, 
    }
    Related Procedures With Returns

    Tokenizer ¶

    Tokenizer :: struct {
    	r:           untyped rune,
    	w:           int,
    	src:         string,
    	offset:      int,
    	read_offset: int,
    }
    Related Procedures With Parameters

    XML_Decode_Option ¶

    XML_Decode_Option :: enum u8 {
    	// Do not decode & entities. It decodes by default. If given, overrides `Decode_CDATA`.
    	No_Entity_Decode, 
    	// CDATA is unboxed.
    	Unbox_CDATA, 
    	// Unboxed CDATA is decoded as well. Ignored if `.Unbox_CDATA` is not given.
    	Decode_CDATA, 
    	// Comments are stripped.
    	Comment_Strip, 
    	// Normalize whitespace
    	Normalize_Whitespace, 
    }
     

    Default: CDATA and comments are passed through unchanged.

    XML_Decode_Options ¶

    XML_Decode_Options :: bit_set[XML_Decode_Option; u8]
    Related Procedures With Parameters

    Constants

    CDATA_END ¶

    CDATA_END: string : "]]>"

    CDATA_START ¶

    CDATA_START: string : "<![CDATA["

    COMMENT_END ¶

    COMMENT_END: string : "-->"

    COMMENT_START ¶

    COMMENT_START: string : "<!--"

    MAX_RUNE_CODEPOINT ¶

    MAX_RUNE_CODEPOINT :: int(unicode.MAX_RUNE)

    XML_NAME_TO_RUNE_MAX_LENGTH ¶

    XML_NAME_TO_RUNE_MAX_LENGTH: int : 31
     

    XML_NAME_TO_RUNE_MIN_LENGTH ¶

    XML_NAME_TO_RUNE_MIN_LENGTH: int : 2
     

    <

    Variables

    This section is empty.

    Procedures

    advance ¶

    advance :: proc(t: ^Tokenizer) -> (err: Error) {…}

    decode_xml ¶

    decode_xml :: proc(input: string, options: bit_set[XML_Decode_Option; u8] = XML_Decode_Options{}, allocator := context.allocator) -> (decoded: string, err: Error) {…}
     

    Decode a string that may include SGML/XML/HTML entities. The caller has to free the result.

    escape_html ¶

    escape_html :: proc(s: string, allocator := context.allocator, loc := #caller_location) -> (output: string, was_allocation: bool) {…}
     

    escape_html escapes special characters like '&' to become '&'. It escapes only 5 different characters: & ' < > and "

    named_xml_entity_to_rune ¶

    named_xml_entity_to_rune :: proc(name: string) -> (decoded: [2]untyped rune, rune_count: int, ok: bool) {…}
     

    Input:

    entity_name - a string, like "copy" that describes a user-encoded Unicode entity as used in XML.
    
    

    Returns:

    "decoded"    - The decoded runes if found by name, or all zero otherwise.
    "rune_count" - The number of decoded runes
    "ok"         - true if found, false if not.
    
    

    IMPORTANT: XML processors (including browsers) treat these names as case-sensitive. So do we.

    unescape_entity ¶

    unescape_entity :: proc(s: string) -> (b: [8]u8, w: int, j: int) {…}
     

    Returns an unescaped string of an encoded XML/HTML entity.

    unescape_html ¶

    unescape_html :: proc(s: string, allocator := context.allocator, loc := #caller_location) -> (output: string, was_allocation: bool, err: runtime.Allocator_Error) {…}

    xml_decode_entity ¶

    xml_decode_entity :: proc(entity: string) -> (decoded: [2]untyped rune, rune_count: int, ok: bool) {…}

    Procedure Groups

    This section is empty.

    Source Files

    Generation Information

    Generated with odin version dev-2026-01 (vendor "odin") Windows_amd64 @ 2026-01-20 21:22:28.751695400 +0000 UTC