package core:encoding/entity
Overview
A unicode entity encoder/decoder.
This code has several procedures to map unicode runes to/from different textual encodings. SGML/XML/HTML entity <decimal>; <hexadecimal>; &<entity name>; (If the lookup tables are compiled in). Reference: https://www.w3.org/2003/entities/2007xml/unicode.xml
URL encode / decode %hex entity Reference: https://datatracker.ietf.org/doc/html/rfc3986/#section-2.1
Index
Types (4)
Variables (0)
This section is empty.
Procedures (4)
Procedure Groups (0)
This section is empty.
Types
Error ¶
Error :: enum u8 { None = 0, Tokenizer_Is_Nil, Illegal_NUL_Character, Illegal_UTF_Encoding, Illegal_BOM, CDATA_Not_Terminated, Comment_Not_Terminated, Invalid_Entity_Encoding, }
Related Procedures With Returns
Tokenizer ¶
Related Procedures With Parameters
XML_Decode_Option ¶
XML_Decode_Option :: enum u8 { // Do not decode & entities. It decodes by default. If given, overrides `Decode_CDATA`. No_Entity_Decode, // CDATA is unboxed. Unbox_CDATA, // Unboxed CDATA is decoded as well. Ignored if `.Unbox_CDATA` is not given. Decode_CDATA, // Comments are stripped. Comment_Strip, // Normalize whitespace Normalize_Whitespace, }
Default: CDATA and comments are passed through unchanged.
XML_Decode_Options ¶
XML_Decode_Options :: bit_set[XML_Decode_Option; u8]
Constants
CDATA_END ¶
CDATA_END :: "]]>"
CDATA_START ¶
CDATA_START :: "<![CDATA["
COMMENT_END ¶
COMMENT_END :: "-->"
COMMENT_START ¶
COMMENT_START :: "<!--"
MAX_RUNE_CODEPOINT ¶
MAX_RUNE_CODEPOINT :: int(unicode.MAX_RUNE)
Variables
This section is empty.
Procedures
decode_xml ¶
decode_xml :: proc(input: string, options: bit_set[XML_Decode_Option; u8] = XML_Decode_Options{}, allocator := context.allocator) -> (decoded: string, err: Error) {…}
Decode a string that may include SGML/XML/HTML entities. The caller has to free the result.
named_xml_entity_to_rune ¶
Input:
entity_name - a string, like "copy" that describes a user-encoded Unicode entity as used in XML.
Returns:
"decoded" - The decoded rune if found by name, or -1 otherwise. "ok" - true if found, false if not.
IMPORTANT: XML processors (including browsers) treat these names as case-sensitive. So do we.
Procedure Groups
This section is empty.
Source Files
Generation Information
Generated with odin version dev-2024-12 (vendor "odin") Windows_amd64 @ 2024-12-06 21:12:12.423136600 +0000 UTC