Glib.Unicode

Entities

Simple Types

Subprograms

Description

This package provides functions for handling of unicode characters and utf8 strings. See also Glib.Convert.

<c_version>2.2.1</c_version> <group>Glib, the general-purpose library</group>

G_Unicode_Type

type G_Unicode_Type is
  (Unicode_Control,
   Unicode_Format,
   Unicode_Unassigned,
   Unicode_Private_Use,
   Unicode_Surrogate,
   Unicode_Lowercase_Letter,
   Unicode_Modifier_Letter,
   Unicode_Other_Letter,
   …,
   Unicode_Space_Separator);

The possible character classifications. See http://www.unicode.org/Public/UNIDATA/UCD.html

Enumeration Literal
Unicode_Control
Unicode_Format
Unicode_Unassigned
Unicode_Private_Use
Unicode_Surrogate
Unicode_Lowercase_Letter
Unicode_Modifier_Letter
Unicode_Other_Letter
Unicode_Titlecase_Letter
Unicode_Uppercase_Letter
Unicode_Combining_Mark
Unicode_Enclosing_Mark
Unicode_Non_Spacing_Mark
Unicode_Decimal_Number
Unicode_Letter_Number
Unicode_Other_Number
Unicode_Connect_Punctuation
Unicode_Dash_Punctuation
Unicode_Close_Punctuation
Unicode_Final_Punctuation
Unicode_Initial_Punctuation
Unicode_Other_Punctuation
Unicode_Open_Punctuation
Unicode_Currency_Symbol
Unicode_Modifier_Symbol
Unicode_Math_Symbol
Unicode_Other_Symbol
Unicode_Line_Separator
Unicode_Paragraph_Separator
Unicode_Space_Separator

Is_Alnum

function Is_Alnum (Char : Gunichar) return Boolean

True if Char is an alphabetical or numerical character

Parameters
Char
Return Value

Is_Alpha

function Is_Alpha (Char : Gunichar) return Boolean

True if Char is an alphabetical character

Parameters
Char
Return Value

Is_Digit

function Is_Digit (Char : Gunichar) return Boolean

True if Char is a digit

Parameters
Char
Return Value

Is_Lower

function Is_Lower (Char : Gunichar) return Boolean

True if Char is a lower-case character

Parameters
Char
Return Value

Is_Punct

function Is_Punct (Char : Gunichar) return Boolean

True if Char is a punctuation character

Parameters
Char
Return Value

Is_Space

function Is_Space (Char : Gunichar) return Boolean

True if Char is a space character

Parameters
Char
Return Value

Is_Upper

function Is_Upper (Char : Gunichar) return Boolean

True if Char is an upper-case character

Parameters
Char
Return Value

To_Lower

function To_Lower (Char : Gunichar) return Gunichar

Convert Char to lower cases

Parameters
Char
Return Value

To_Upper

function To_Upper (Char : Gunichar) return Gunichar

Convert Char to upper cases

Parameters
Char
Return Value

Unichar_To_UTF8

function Unichar_To_UTF8
  (C : Gunichar;
   Buffer : Gtkada.Types.Chars_Ptr := Gtkada.Types.Null_Ptr) return Natural

Encode C into Buffer, which must have at least 6 bytes free. Return the number of bytes written in Buffer. If Buffer is Null_Ptr, then the only effect is to compute the number of bytes to encode C.

Parameters
C
Buffer
Return Value

Unichar_To_UTF8

procedure Unichar_To_UTF8
  (C      : Gunichar;
   Buffer : out UTF8_String;
   Last   : out Natural)

Encode C into Buffer. Buffer must have at least 6 bytes free. Return the index of the last byte written in Buffer.

Parameters
C
Buffer
Last

Unichar_Type

function Unichar_Type (Char : Gunichar) return G_Unicode_Type

Return the unicode character type of a given character

Parameters
Char
Return Value

UTF8_Find_Next_Char

function UTF8_Find_Next_Char
  (Str : UTF8_String; Index : Natural) return Natural

Find the start of the next UTF8 character after the Index-th byte. Index doesn't need to be on the start of a character. Index is set to a value greater than Str'Last if there is no more character.

Parameters
Str
Index
Return Value

UTF8_Find_Next_Char

function UTF8_Find_Next_Char
  (Str     : Gtkada.Types.Chars_Ptr;
   Str_End : Gtkada.Types.Chars_Ptr := Gtkada.Types.Null_Ptr)
   return Gtkada.Types.Chars_Ptr

Find the start of the next UTF8 character after Str. Str_End points to the end of the string. If Null_Ptr, the string must be nul-terminated

Parameters
Str
Str_End
Return Value

UTF8_Find_Prev_Char

function UTF8_Find_Prev_Char
  (Str : UTF8_String; Index : Natural) return Natural

Find the start of the previous UTF8 character after the Index-th byte. Index doesn't need to be on the start of a character. Index is set to a value smaller than Str'First if there is no previous character.

Parameters
Str
Index
Return Value

UTF8_Find_Prev_Char

function UTF8_Find_Prev_Char
  (Str_Start : Gtkada.Types.Chars_Ptr;
   Str       : Gtkada.Types.Chars_Ptr) return Gtkada.Types.Chars_Ptr

Find the start of the previous UTF8 character before Str. Str_Start is a pointer to the beginning of the string. Null_Ptr is returned if there is no previous character

Parameters
Str_Start
Str
Return Value

UTF8_Get_Char

function UTF8_Get_Char (Str : UTF8_String) return Gunichar

Converts a sequence of bytes encoded as UTF8 to a unicode character. If Str doesn't point to a valid UTF8 encoded character, the result is undefined.

Parameters
Str
Return Value

UTF8_Get_Char_Validated

function UTF8_Get_Char_Validated (Str : UTF8_String) return Gunichar

Same as above. However, if the sequence if an incomplete start of a possibly valid character, it returns -2. If the sequence is invalid, returns -1.

Parameters
Str
Return Value

UTF8_Next_Char

function UTF8_Next_Char
  (Str : UTF8_String; Index : Natural) return Natural

Find the start of the next UTF8 character after the Index-th byte. Index has to be on the start of a character. Index is set to a value greater than Str'Last if there is no more character.

Parameters
Str
Index
Return Value

UTF8_Strdown

function UTF8_Strdown (Str : UTF8_String) return UTF8_String

Convert Str to lower cases

Parameters
Str
Return Value

UTF8_Strdown

function UTF8_Strdown
  (Str : Gtkada.Types.Chars_Ptr; Len : Integer)
   return Gtkada.Types.Chars_Ptr

Convert all characters in Str to lowercase. The resulting string must be freed by the user. It can have a different length than Str.

Parameters
Str
Len
Return Value

UTF8_Strlen

function UTF8_Strlen (Str : UTF8_String) return Glong

Return the number of characters in Str

Parameters
Str
Return Value

UTF8_Strlen

function UTF8_Strlen
  (Str : Gtkada.Types.Chars_Ptr;
   Max : Integer := -1) return Glong

Return the length of a utf8-encoded string. Max is the maximal number of bytes to examine. If it is negative, then the string is assumed to be nul-terminated.

Parameters
Str
Max
Return Value

UTF8_Strup

function UTF8_Strup (Str : UTF8_String) return UTF8_String

Convert Str to upper cases

Parameters
Str
Return Value

UTF8_Strup

function UTF8_Strup
  (Str : Gtkada.Types.Chars_Ptr; Len : Integer)
   return Gtkada.Types.Chars_Ptr

Convert all characters in Str to uppercase. The resulting string is newly allocated, and can have a different length than Str (for instance, the german ess-zet is converted to SS). The returned string must be freed by the caller.

Parameters
Str
Len
Return Value

UTF8_Validate

procedure UTF8_Validate
  (Str         : UTF8_String;
   Valid       : out Boolean;
   Invalid_Pos : out Natural)

Validate a UTF8 string. Set Valid to True if valid, set Invalid_Pos to first invalid byte.

Parameters
Str
Valid
Invalid_Pos