Tim Sweeney claims that Microsoft will remove Win32

Reply 40 of 43, by Scali

Posted on 2019-09-12, 23:07

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

Errius wrote:
The multibyte functions like mbslen are for Extended ASCII codepages that use 2 bytes for some characters. (Like Japanese Shift-JIS). When using regular single-byte code pages (like 437) they function identically to the equivalent 1970s ASCII functions. This has nothing to do with Unicode.

Yes, that's not UTF-8.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 41 of 43, by Errius

Posted on 2019-09-12, 23:16

Errius Offline

Rank l33t

Rank: l33t
Posts: 2758
Joined: 2015-12-16, 19:16
Location: Lave Station

Do coders in UNIXland actually use UTF-8 strings in memory? I never handled those, only UTF-16/32. A UTF-8 file will be read into memory as UTF-16 or (preferably) UTF-32. The process is reversed when saving the file.

Is this too much voodoo?

Reply 42 of 43, by Scali

Posted on 2019-09-12, 23:21

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

Errius wrote:
Do coders in UNIXland actually use UTF-8 strings in memory? I never handled those, only UTF-16/32. A UTF-8 file will be read into memory as UTF-16 or (preferably) UTF-32. The process is reversed when saving the file.

I've never seen it anyway.
Even languages with native UTF-8 support like Java, JavaScript and C# don't actually process the UTF-8 in-memory. They use UTF-16 internally for their string types.
And I think both Java and JavaScript came from the UNIX world.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 43 of 43, by jmarsh

Posted on 2019-09-12, 23:36

jmarsh Offline

Rank Oldbie

Rank: Oldbie
Posts: 1370
Joined: 2014-01-04, 09:17

Errius wrote:
Do coders in UNIXland actually use UTF-8 strings in memory? I never handled those, only UTF-16/32. A UTF-8 file will be read into memory as UTF-16 or (preferably) UTF-32. The process is reversed when saving the file.

Up until about 6 years ago the android NDK didn't support wide chars - wchar_t existed but it was 8 bits, and half of the wcs* functions did nothing. So the choice was either use UTF-8 or call out to the java VM every single time you wanted to do anything with chars/strings.
They eventually fixed it, but of course anyone with an app that had persistent text data stored had to write a one-time conversion procedure to handle encoding translation.

Main menu

Common searches