Unicode UTF[8,16,32] support
|
|
bool IsUtf8Lead(int c)
Tests whether c is lead UTF-8 byte.
dword FetchUtf8(const char *&s, const char *lim, bool& ok)
dword FetchUtf8(const char *&s, const char *lim)
Reads a single UTF-32 codepoint from UTF-8 string s with end at lim. s must be less than lim. s is advanced accordingly. ok is set to false if UTF-8 is invalid - in that case, error-escape of single byte is returned (but it is NOT set to true if valid UTF-8 character is read).
bool CheckUtf8(const char *s, int len)
bool CheckUtf8(const char *s)
bool CheckUtf8(const String& src)
Checks whether string contains a valid UTF-8 sequence. If source is specified as pointer s without len, its must be zero-terminated.
int Utf8Len(const dword *s, int len)
int Utf8Len(const dword *s)
int Utf8Len(const Vector<dword>& s)
Returns the size in bytes of UTF-32 Unicode text in UTF-8. If source is specified as pointer s without len, its must be zero-terminated.
int Utf8Len(const wchar *s, int len)
int Utf8Len(const wchar *s)
int Utf8Len(const WString& s)
Returns the size in bytes of UTF-16 Unicode text in UTF-8. If source is specified as pointer s without len, its must be zero-terminated.
int Utf8Len(dword code)
Returns the size in bytes of single codepoint in UTF-8.
void ToUtf8(char *t, const wchar *s, int len)
String ToUtf8(const wchar *s, int len)
String ToUtf8(const wchar *s)
String ToUtf8(const WString& s)
UTF-16 to UTF-8 conversion. If target is specified as pointer to buffer t, the buffer must contain enough space for the output. If source is specified as pointer s without len, its must be zero-terminated.
void ToUtf8(char *t, const dword *s, int len)
String ToUtf8(const dword *s, int len)
String ToUtf8(const dword *s)
String ToUtf8(const Vector<dword>& s)
UTF-32 to UTF-8 conversion. If target is specified as pointer to buffer t, the buffer must contain enough space for the output. If source is specified as pointer s without len, its must be zero-terminated.
String ToUtf8(dword code)
Converts single codepoint to UTF-8.
int Utf16Len(const dword *s, int len)
int Utf16Len(const dword *s)
int Utf16Len(const Vector<dword>& s)
Returns the size in wchars of UTF-32 Unicode text in UTF-16. If source is specified as pointer s without len, its must be zero-terminated.
int Utf16Len(dword code)
Returns the size in wchars of single codepoint in UTF-16.
int Utf16Len(const char *s, int len)
int Utf16Len(const char *s)
int Utf16Len(const String& s)
Returns the size in wchars of UTF-8 Unicode text in UTF-16. If source is specified as pointer s without len, its must be zero-terminated.
void ToUtf16(wchar *t, const dword *s, int len)
WString ToUtf16(const dword *s, int len)
WString ToUtf16(const dword *s)
WString ToUtf16(const Vector<dword>& s)
UTF-32 to UTF-16 conversion. If target is specified as pointer to buffer t, the buffer must contain enough space for the output. If source is specified as pointer s without len, its must be zero-terminated.
WString ToUtf16(dword code)
Converts single codepoint to UTF-16.
void ToUtf16(wchar *t, const char *s, int len)
WString ToUtf16(const char *s, int len)
WString ToUtf16(const char *s)
WString ToUtf16(const String& s)
UTF-8 to UTF-16 conversion. If target is specified as pointer to buffer t, the buffer must contain enough space for the output. If source is specified as pointer s without len, its must be zero-terminated.
int Utf32Len(const wchar *s, int len)
int Utf32Len(const wchar *s)
int Utf32Len(const WString& s)
Returns the size in dwords of UTF-16 Unicode text in UTF-32. Note that this is the same as the number of Unicode codepoints in the text. If source is specified as pointer s without len, its must be zero-terminated.
int Utf32Len(const char *s, int len)
int Utf32Len(const char *s)
int Utf32Len(const String& s)
Returns the size in dwords of UTF-8 Unicode text in UTF-32. Note that this is the same as the number of Unicode codepoints in the text. If source is specified as pointer s without len, its must be zero-terminated.
dword ReadSurrogatePair(const wchar *s, const wchar *lim)
Reads single utf32 codepoint from s, lim. Returns 0 if there is no surrogate pair at s.
void ToUtf32(dword *t, const wchar *s, int len)
Vector<dword> ToUtf32(const wchar *s, int len)
Vector<dword> ToUtf32(const wchar *s)
Vector<dword> ToUtf32(const WString& s)
UTF-16 to UTF-32 conversion. If target is specified as pointer to buffer t, the buffer must contain enough space for the output. If source is specified as pointer s without len, its must be zero-terminated.
void ToUtf32(dword *t, const char *s, int len)
Vector<dword> ToUtf32(const char *s, int len)
Vector<dword> ToUtf32(const char *s)
Vector<dword> ToUtf32(const String& s)
UTF-8 to UTF-32 conversion. If target is specified as pointer to buffer t, the buffer must contain enough space for the output. If source is specified as pointer s without len, its must be zero-terminated.
int UnicodeDecompose(dword codepoint, dword t[MAX_DECOMPOSED], bool only_canonical)
Vector<dword> UnicodeDecompose(dword codepoint, bool only_canonical)
Returns UNICODE decomposition of given codepoint into base and combining characters. If only_canonical is true, only canonical decomposition is allowed.
dword UnicodeCompose(const dword *t, int count)
dword UnicodeCompose(const Vector<dword>& t)
Tries to compose multi-codepoint grapheme into single codepoint if it exists. If such codepoint does not exist, returns 0.
|