package core:encoding/entity

⌘K
Ctrl+K
or
/

    Overview

    A unicode entity encoder/decoder.

    This code has several procedures to map unicode runes to/from different textual encodings. SGML/XML/HTML entity &#<decimal>; &#x<hexadecimal>; &<entity name>; (If the lookup tables are compiled in). Reference: https://www.w3.org/2003/entities/2007xml/unicode.xml

    URL encode / decode %hex entity Reference: https://datatracker.ietf.org/doc/html/rfc3986/#section-2.1

    Types

    Error ¶

    Error :: enum u8 {
    	None                    = 0, 
    	Tokenizer_Is_Nil, 
    	Illegal_NUL_Character, 
    	Illegal_UTF_Encoding, 
    	Illegal_BOM, 
    	CDATA_Not_Terminated, 
    	Comment_Not_Terminated, 
    	Invalid_Entity_Encoding, 
    }
    Related Procedures With Returns

    Tokenizer ¶

    Tokenizer :: struct {
    	r:           rune,
    	w:           int,
    	src:         string,
    	offset:      int,
    	read_offset: int,
    }
    Related Procedures With Parameters

    XML_Decode_Option ¶

    XML_Decode_Option :: enum u8 {
    	// Do not decode & entities. It decodes by default. If given, overrides `Decode_CDATA`.
    	No_Entity_Decode, 
    	// CDATA is unboxed.
    	Unbox_CDATA, 
    	// Unboxed CDATA is decoded as well. Ignored if `.Unbox_CDATA` is not given.
    	Decode_CDATA, 
    	// Comments are stripped.
    	Comment_Strip, 
    	// Normalize whitespace
    	Normalize_Whitespace, 
    }
     

    Default: CDATA and comments are passed through unchanged.

    XML_Decode_Options ¶

    XML_Decode_Options :: bit_set[XML_Decode_Option; u8]

    Constants

    CDATA_END ¶

    CDATA_END :: "]]>"

    CDATA_START ¶

    CDATA_START :: "<![CDATA["

    COMMENT_END ¶

    COMMENT_END :: "-->"

    COMMENT_START ¶

    COMMENT_START :: "<!--"

    MAX_RUNE_CODEPOINT ¶

    MAX_RUNE_CODEPOINT :: int(unicode.MAX_RUNE)

    XML_NAME_TO_RUNE_MAX_LENGTH ¶

    XML_NAME_TO_RUNE_MAX_LENGTH :: 31
     

    XML_NAME_TO_RUNE_MIN_LENGTH ¶

    XML_NAME_TO_RUNE_MIN_LENGTH :: 2
     

    <

    Variables

    This section is empty.

    Procedures

    advance ¶

    advance :: proc(t: ^Tokenizer) -> (err: Error) {…}

    decode_xml ¶

    decode_xml :: proc(input: string, options: bit_set[XML_Decode_Option; u8] = XML_Decode_Options{}, allocator := context.allocator) -> (decoded: string, err: Error) {…}
     

    Decode a string that may include SGML/XML/HTML entities. The caller has to free the result.

    named_xml_entity_to_rune ¶

    named_xml_entity_to_rune :: proc(name: string) -> (decoded: rune, ok: bool) {…}
     

    Input:

    entity_name - a string, like "copy" that describes a user-encoded Unicode entity as used in XML.
    
    

    Returns:

    "decoded" - The decoded rune if found by name, or -1 otherwise.
    "ok"      - true if found, false if not.
    
    

    IMPORTANT: XML processors (including browsers) treat these names as case-sensitive. So do we.

    xml_decode_entity ¶

    xml_decode_entity :: proc(entity: string) -> (decoded: rune, ok: bool) {…}

    Procedure Groups

    This section is empty.

    Source Files

    Generation Information

    Generated with odin version dev-2025-01 (vendor "odin") Windows_amd64 @ 2025-01-20 21:11:03.409785700 +0000 UTC