Search This Blog

20080110

The Challenges of Japanese on Computers

After studying the Japanese character challenge in Terminal and Konsole, I've noticed that Kterm displays Japanese well, despite the annoying fact that desktop entries for applications may have descriptions that are displayed perfected in UTF-8 format without problems.

As a consequence of my research, here are several things I've discovered about Japanese and computers:

Most Japanese using computers use several programs together that help them write in Japanese. This consists of typing in Romaji, which one program recognizes and translates into both Kanji and Hiragana, but very litttle Katakana, which is reserved for transliteration of foreign words. To conserve typing, most words rendered as Katakana in Japanese retain their foreign spelling (as best as the particular Japanese writer can muster, based on whether they graduated from an ESL cram school). So, technically Japanese compuer enthusiasts actually write in ASCII character sets (charsets) but it gets translated into one of four charsets - Shift-JIS, EUC-JP, JIS-7 or ISO 2022-JP. As well there is Unicode.

When W3M or Links2 displays text, it isn't in UTF-8. It's ISO 8859-1.

Most applications that run in the BASH shell will display correctly if they have an option to use alternate character sets such as 8859-1, which may be programmed into the particular application, though with l10n, by installing more than just English, the font manager will translate fonts to be displayed in UTF-8.

Links2 does not correctly use UTF-8 and is based on 8859-1. It needs a UTF-8 upgrade, or at least, the current version I got for Dapper is not working.

I have also discovered that several apps (pyDict) isn't set to display as UTF-8.

No comments: