Archive

Archive for the ‘Plurals’ Category

While 420km below the ISS a Dani is sharpening his stone axe

26/05/2014 5 comments

Sometimes the world of software feels a bit like that, a confusing array of ancient and cutting edge stuff.I see you nodding sagely, thinking of the people still using Windows 98 or even more extreme, Windows 3.11 or people who just don’t want to upgrade to Firefox 3 (we’re on 29 just now, for those of you on Shrome). I actually understand that, on the one hand you have very low-key users who just write the odd email and on the other you have specialists (this is most likely something happening at your local hospital, incidentally) who rely on a custom-rigged system using custom-designed software, all done in the days of yore, to run some critical piece of technology and who are loathe to change it since… well… it works. I don’t blame them, who wants to mess around with bleeding tiles when they’re trying to zap your tumour.

But that wasn’t actually what I was thinking about. I was thinking about the spectrum of localizer friendly and unfriendly software. At the one extreme you have cutting edge Open Source developers working on the next generation of localization (also known as l20n, one up from l10n) and on the other you have… well, troglodytes. Since I don’t want to turn this into a really complicated lecture about linguistic features, I’ll pick a fairly straightforward example, the one that actually made me pick up my e-pen in anger. Plurals.

What’s the big deal, slap an -s on? Ummm. No. Ever since someone decided that counting one-two-lots (ah, I wish I had grown up a !San) was no longer sufficient, languages have been busy coming up with astonishingly complex (or simple) ways of counting stuff. One the one extreme you have languages like Cantonese which don’t inflict any changes on the things they’re counting. So the writing system aside, you just go 0 apple, 1 apple, 2 apple… 100 apple, 1,000 apple and so on.

English is a tiny step away from that, counting 0 apples, 1 apple, 2 apples… 100 apples, 1,000 apples and so on. Spot something already? Indeed. Logic doesn’t really come into it, not in a mathematical sense. By that I mean there is no reason why in Cantonese 0 should pattern with 1, 2 etc but that in English 0 should go with 2, 3, etc. It just does. Sure, historical linguists can sometimes shed light on how these have developed but not very often. On the whole, they just are.

This is where it gets entertaining (for linguists). First insight, there aren’t as many systems as there are languages. So much less than 6,000. In fact, looking at the places where such rules are collected, there are probably less than a 100 different ways (on the planet) for counting stuff. Still fun time though (for linguists). Let me give you a couple of examples. A lot of Slavonic (Ukrainian, Russian etc) languages require up to 3 different forms of a noun:

  • FORM 1: any number ending in 1 (1, 11, 21, 31….)
  • FORM 2: ends in 2, 3 or 4 – but not 12, 13 or 14 (22, 23, 24, 32, 33, 34…)
  • FORM 3: anything else (12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 26, 27…)

That almost makes sense in a way. But we can add a few more twists. Take the resurrected decimal system in Scottish Gaelic. It requires up to 4 forms of a noun:

  • FORM 1: 1 and 11 (1 chat, 11 chat)
  • FORM 2: 2 and 12 (2 chat, 12 chat)
  • FORM 3: 3-10, 13-20 (3 cait, 4 cait, 13 cait, 14 cait…)
  • FORM 4: anything else (21 cat, 22 cat, 100 cat…)

Hang one, you’re saying, surely FORM 1 and FORM 2 could be merged. ’fraid not, because while the word cat makes it look as if they’re the same, if you start counting something beginning with the letter d, n, t, s, the following happens:

  • FORM 1: 1 taigh, 11 taigh
  • FORM 2: 2 thaigh, 12 thaigh
  • FORM 3: 3 taighean, 4 taighean, 13 taighean, 14 taighean…
  • FORM 4: 21 taigh, 22 taigh, 100 taigh…

Told you, fun! Now here’s where it gets annoying. Initially, in the very early days of software, localization mostly meant taking software written in English and translating it into German, French, Spanish, Italian & Co and then a bit later on adding Chinese, Japanese and Korean to the list.

Through a sheer fluke, that worked almost perfectly. English has a very common pattern, as it turns out (one form for 1 and another for anything else) so going from English to German posed no problems in translation. You simple took a pair of English strings like:

  • Open one file
  • Open %d files

and translated them into German:

  • Eine Datei öffnen
  • %d Dateien öffnen

Similarly, going to Chinese also posed no problem, you just ended up with a superfluous string because (I’ll use English words rather than Chinese characters):

  • Open one file
  • Open %d file

also created no linguistic or computational problems. Well, there was the fact that in French 0 patterns with 1, not with the plural as it does in English but I bet at that point English developers thought they were home and dry and ready to tick off the whole issue of numbers and number placeholders in software.

Now I have no evidence but I suspect a Slavonic language like Russian was one of the first to kick up a stink. Because as we saw, it has a much more elaborate pattern than English. Now there was one bit of good news for the developers: although these linguistic setups were elaborate in some cases, they also followed predictable patterns and you only need about 6 categories (which ended up being called ONE, TWO, FEW, MANY, OTHER for the sake of readability – so Gaelic ended up with ONE, TWO, FEW and OTHER for example). Which meant you could write a rule for the language in question and then prep your software to present the translator – and ultimately the user – with the right number of strings for translation. Sure, they look a bit crazy, like this one for Gaelic:

Plural-Forms: nplurals=4; plural=(n==1 || n==11) ? 0 : (n==2 || n==12) ? 1 : (n > 2 && n < 20) ? 2 : 3;\n

but you only had to do it once and that was that. Simples… you’d think. Oh no. I mean, yes, certainly doable and indeed a lot of software correctly applies plural formatting these days. Most Open Source projects certainly do, programs like Linux or Firefox for example have it, which is the reason why you probably never noticed anything odd about it.

One step down from this nice implementation of plurals are projects like Joomla! who will allow you to use plurals but they won’t help you. Let me explain (briefly). Joomla! has one of the more atavistic approaches to localization – they expect translators to work directly in the .ini files Joomla! uses. Oh wow. So to begin with, that DOES enable you to do plurals but to begin with you have to figure out how to say the plural rule of your language in Joomla! and put that into one of the files. In our case, that turned out to be

   public static function getPluralSuffixes($count) {
if ($count == 0 || $count > 19) {
$return =  array(‘0’);
}
elseif($count == 1 || $count == 11) {
$return =  array(‘1’);
}
elseif($count == 2 || $count == 12) {
$return =  array(‘2’);
}
elseif(($count > 2 && $count < 12) || ($count > 12 && $count < 19) {
$return =  array(‘FEW’);
}

Easy peasy. One then has to take the English, for example:

COM_CONTENT_N_ITEMS_CHECKED_IN_0=”No cat”
COM_CONTENT_N_ITEMS_CHECKED_IN_1=”%d cat”
COM_CONTENT_N_ITEMS_CHECKED_IN_MORE=”%d cats”

and change it to this for Gaelic:

COM_CONTENT_N_ITEMS_CHECKED_IN_1=”%d cat”
COM_CONTENT_N_ITEMS_CHECKED_IN_2=”%d chat”
COM_CONTENT_N_ITEMS_CHECKED_IN_FEW=”%d cait”
COM_CONTENT_N_ITEMS_CHECKED_IN_OTHER=”%d cat”

Unsurprisingly, most localizers just can’t be bothered doing the plurals properly in Joomla!.

Ning is another project in this category – they also required almost as many contortions as Joomla! but their mud star is for having had plural formatting. And then having ditched it because allegedly the translators put in too many errors. Well duh… give a man a rusty saw and then complain he’s not sawing fast enough or what?

And then there are those projects which stubbornly plod on without any form of plural formatting (except English style plurals of course). The selection of programs which are still without proper plurals IS surprising I must say. You might think you’d find a lot of very old Open Source projects here which go back so far that no-one wants to bother with fixing the code. Wrong. There are some fairly new programs and apps in this category where the developers chose to ignore plurals either through linguistic ignorance or arrogance. Skype (started in 2003) and Netvibes (2005) for example. Just for contrast, Firefox was born in 2002 and to my knowledge always accounted for plurals.

Similarly, some of them belong to big software houses which technically have the money and manpower to fix this – such as Microsoft. Yep, Microsoft. To this date, no Microsoft product I’m aware of can handle non-English type plurals properly in ANY other language. Russians must be oddly patient when it comes to languages cause I get really annoyed when my screen tells me I have closed 5 window

A lot of software falls somewhere between the two extremes – I guess it’s just the way humans are, looking at the way we build our cities into and onto and over older bits of city except when it all falls down and we have to (or can?) start from scratch. But that makes it no less annoying when you’re trying to make software sound less like a robot in translation than it has to…

PS: I’d be curious to know which program first implemented plurals. I’m sort of guessing it’s Linux but I’m not old enough to remember. Let me know if you have some insights?

PPS: If you’re a developer and want to know more about plurals, I recommend the Unicode Consortium’s page on plurals as a starting point, you can take it from there.

Advertisements