VOGONS


Reply 40 of 43, by Scali

User metadata
Rank l33t
Rank
l33t
Errius wrote:

The multibyte functions like mbslen are for Extended ASCII codepages that use 2 bytes for some characters. (Like Japanese Shift-JIS). When using regular single-byte code pages (like 437) they function identically to the equivalent 1970s ASCII functions. This has nothing to do with Unicode.

Yes, that's not UTF-8.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 41 of 43, by Errius

User metadata
Rank l33t
Rank
l33t

Do coders in UNIXland actually use UTF-8 strings in memory? I never handled those, only UTF-16/32. A UTF-8 file will be read into memory as UTF-16 or (preferably) UTF-32. The process is reversed when saving the file.

Is this too much voodoo?

Reply 42 of 43, by Scali

User metadata
Rank l33t
Rank
l33t
Errius wrote:

Do coders in UNIXland actually use UTF-8 strings in memory? I never handled those, only UTF-16/32. A UTF-8 file will be read into memory as UTF-16 or (preferably) UTF-32. The process is reversed when saving the file.

I've never seen it anyway.
Even languages with native UTF-8 support like Java, JavaScript and C# don't actually process the UTF-8 in-memory. They use UTF-16 internally for their string types.
And I think both Java and JavaScript came from the UNIX world.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 43 of 43, by jmarsh

User metadata
Rank Oldbie
Rank
Oldbie
Errius wrote:

Do coders in UNIXland actually use UTF-8 strings in memory? I never handled those, only UTF-16/32. A UTF-8 file will be read into memory as UTF-16 or (preferably) UTF-32. The process is reversed when saving the file.

Up until about 6 years ago the android NDK didn't support wide chars - wchar_t existed but it was 8 bits, and half of the wcs* functions did nothing. So the choice was either use UTF-8 or call out to the java VM every single time you wanted to do anything with chars/strings.
They eventually fixed it, but of course anyone with an app that had persistent text data stored had to write a one-time conversion procedure to handle encoding translation.