Home > Developers, l10n, l20n, Plurals > While 420km below the ISS a Dani is sharpening his stone axe

While 420km below the ISS a Dani is sharpening his stone axe

Sometimes the world of software feels a bit like that, a confusing array of ancient and cutting edge stuff.I see you nodding sagely, thinking of the people still using Windows 98 or even more extreme, Windows 3.11 or people who just don’t want to upgrade to Firefox 3 (we’re on 29 just now, for those of you on Shrome). I actually understand that, on the one hand you have very low-key users who just write the odd email and on the other you have specialists (this is most likely something happening at your local hospital, incidentally) who rely on a custom-rigged system using custom-designed software, all done in the days of yore, to run some critical piece of technology and who are loathe to change it since… well… it works. I don’t blame them, who wants to mess around with bleeding tiles when they’re trying to zap your tumour.

But that wasn’t actually what I was thinking about. I was thinking about the spectrum of localizer friendly and unfriendly software. At the one extreme you have cutting edge Open Source developers working on the next generation of localization (also known as l20n, one up from l10n) and on the other you have… well, troglodytes. Since I don’t want to turn this into a really complicated lecture about linguistic features, I’ll pick a fairly straightforward example, the one that actually made me pick up my e-pen in anger. Plurals.

What’s the big deal, slap an -s on? Ummm. No. Ever since someone decided that counting one-two-lots (ah, I wish I had grown up a !San) was no longer sufficient, languages have been busy coming up with astonishingly complex (or simple) ways of counting stuff. One the one extreme you have languages like Cantonese which don’t inflict any changes on the things they’re counting. So the writing system aside, you just go 0 apple, 1 apple, 2 apple… 100 apple, 1,000 apple and so on.

English is a tiny step away from that, counting 0 apples, 1 apple, 2 apples… 100 apples, 1,000 apples and so on. Spot something already? Indeed. Logic doesn’t really come into it, not in a mathematical sense. By that I mean there is no reason why in Cantonese 0 should pattern with 1, 2 etc but that in English 0 should go with 2, 3, etc. It just does. Sure, historical linguists can sometimes shed light on how these have developed but not very often. On the whole, they just are.

This is where it gets entertaining (for linguists). First insight, there aren’t as many systems as there are languages. So much less than 6,000. In fact, looking at the places where such rules are collected, there are probably less than a 100 different ways (on the planet) for counting stuff. Still fun time though (for linguists). Let me give you a couple of examples. A lot of Slavonic (Ukrainian, Russian etc) languages require up to 3 different forms of a noun:

  • FORM 1: any number ending in 1 (1, 11, 21, 31….)
  • FORM 2: ends in 2, 3 or 4 – but not 12, 13 or 14 (22, 23, 24, 32, 33, 34…)
  • FORM 3: anything else (12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 26, 27…)

That almost makes sense in a way. But we can add a few more twists. Take the resurrected decimal system in Scottish Gaelic. It requires up to 4 forms of a noun:

  • FORM 1: 1 and 11 (1 chat, 11 chat)
  • FORM 2: 2 and 12 (2 chat, 12 chat)
  • FORM 3: 3-10, 13-20 (3 cait, 4 cait, 13 cait, 14 cait…)
  • FORM 4: anything else (21 cat, 22 cat, 100 cat…)

Hang one, you’re saying, surely FORM 1 and FORM 2 could be merged. ’fraid not, because while the word cat makes it look as if they’re the same, if you start counting something beginning with the letter d, n, t, s, the following happens:

  • FORM 1: 1 taigh, 11 taigh
  • FORM 2: 2 thaigh, 12 thaigh
  • FORM 3: 3 taighean, 4 taighean, 13 taighean, 14 taighean…
  • FORM 4: 21 taigh, 22 taigh, 100 taigh…

Told you, fun! Now here’s where it gets annoying. Initially, in the very early days of software, localization mostly meant taking software written in English and translating it into German, French, Spanish, Italian & Co and then a bit later on adding Chinese, Japanese and Korean to the list.

Through a sheer fluke, that worked almost perfectly. English has a very common pattern, as it turns out (one form for 1 and another for anything else) so going from English to German posed no problems in translation. You simple took a pair of English strings like:

  • Open one file
  • Open %d files

and translated them into German:

  • Eine Datei öffnen
  • %d Dateien öffnen

Similarly, going to Chinese also posed no problem, you just ended up with a superfluous string because (I’ll use English words rather than Chinese characters):

  • Open one file
  • Open %d file

also created no linguistic or computational problems. Well, there was the fact that in French 0 patterns with 1, not with the plural as it does in English but I bet at that point English developers thought they were home and dry and ready to tick off the whole issue of numbers and number placeholders in software.

Now I have no evidence but I suspect a Slavonic language like Russian was one of the first to kick up a stink. Because as we saw, it has a much more elaborate pattern than English. Now there was one bit of good news for the developers: although these linguistic setups were elaborate in some cases, they also followed predictable patterns and you only need about 6 categories (which ended up being called ONE, TWO, FEW, MANY, OTHER for the sake of readability – so Gaelic ended up with ONE, TWO, FEW and OTHER for example). Which meant you could write a rule for the language in question and then prep your software to present the translator – and ultimately the user – with the right number of strings for translation. Sure, they look a bit crazy, like this one for Gaelic:

Plural-Forms: nplurals=4; plural=(n==1 || n==11) ? 0 : (n==2 || n==12) ? 1 : (n > 2 && n < 20) ? 2 : 3;\n

but you only had to do it once and that was that. Simples… you’d think. Oh no. I mean, yes, certainly doable and indeed a lot of software correctly applies plural formatting these days. Most Open Source projects certainly do, programs like Linux or Firefox for example have it, which is the reason why you probably never noticed anything odd about it.

One step down from this nice implementation of plurals are projects like Joomla! who will allow you to use plurals but they won’t help you. Let me explain (briefly). Joomla! has one of the more atavistic approaches to localization – they expect translators to work directly in the .ini files Joomla! uses. Oh wow. So to begin with, that DOES enable you to do plurals but to begin with you have to figure out how to say the plural rule of your language in Joomla! and put that into one of the files. In our case, that turned out to be

   public static function getPluralSuffixes($count) {
if ($count == 0 || $count > 19) {
$return =  array(‘0’);
}
elseif($count == 1 || $count == 11) {
$return =  array(‘1’);
}
elseif($count == 2 || $count == 12) {
$return =  array(‘2’);
}
elseif(($count > 2 && $count < 12) || ($count > 12 && $count < 19) {
$return =  array(‘FEW’);
}

Easy peasy. One then has to take the English, for example:

COM_CONTENT_N_ITEMS_CHECKED_IN_0=”No cat”
COM_CONTENT_N_ITEMS_CHECKED_IN_1=”%d cat”
COM_CONTENT_N_ITEMS_CHECKED_IN_MORE=”%d cats”

and change it to this for Gaelic:

COM_CONTENT_N_ITEMS_CHECKED_IN_1=”%d cat”
COM_CONTENT_N_ITEMS_CHECKED_IN_2=”%d chat”
COM_CONTENT_N_ITEMS_CHECKED_IN_FEW=”%d cait”
COM_CONTENT_N_ITEMS_CHECKED_IN_OTHER=”%d cat”

Unsurprisingly, most localizers just can’t be bothered doing the plurals properly in Joomla!.

Ning is another project in this category – they also required almost as many contortions as Joomla! but their mud star is for having had plural formatting. And then having ditched it because allegedly the translators put in too many errors. Well duh… give a man a rusty saw and then complain he’s not sawing fast enough or what?

And then there are those projects which stubbornly plod on without any form of plural formatting (except English style plurals of course). The selection of programs which are still without proper plurals IS surprising I must say. You might think you’d find a lot of very old Open Source projects here which go back so far that no-one wants to bother with fixing the code. Wrong. There are some fairly new programs and apps in this category where the developers chose to ignore plurals either through linguistic ignorance or arrogance. Skype (started in 2003) and Netvibes (2005) for example. Just for contrast, Firefox was born in 2002 and to my knowledge always accounted for plurals.

Similarly, some of them belong to big software houses which technically have the money and manpower to fix this – such as Microsoft. Yep, Microsoft. To this date, no Microsoft product I’m aware of can handle non-English type plurals properly in ANY other language. Russians must be oddly patient when it comes to languages cause I get really annoyed when my screen tells me I have closed 5 window

A lot of software falls somewhere between the two extremes – I guess it’s just the way humans are, looking at the way we build our cities into and onto and over older bits of city except when it all falls down and we have to (or can?) start from scratch. But that makes it no less annoying when you’re trying to make software sound less like a robot in translation than it has to…

PS: I’d be curious to know which program first implemented plurals. I’m sort of guessing it’s Linux but I’m not old enough to remember. Let me know if you have some insights?

PPS: If you’re a developer and want to know more about plurals, I recommend the Unicode Consortium’s page on plurals as a starting point, you can take it from there.

Advertisements
  1. John Ferrier
    26/05/2014 at 6:18 pm

    Fascinating, that – as a layperson, it hadn’t occurred to me to connect the complexity of duals, plurals etc with problems in software/content localization.
    I assume that deliberate project decisions involving asset allocation are being made in these cases of deficient localization. I’m pretty sure academic interest in these problems isn’t lacking, especially given the confluence of computer science and linguistics that’s been strongly in evidence for a quarter of a century. Right enough, on a student exercise in 1980 that included a little name input, my offering was well received partly for bringing up as an issue the multiplicity of Gàidhlig name-forms, among others. Yeah, I’d probably been watching Can Seo the Sunday before.

  2. 26/05/2014 at 6:36 pm

    You’re welcome. As to the causes, in my experience it mostly does boil down to either ignorance or arrogance but I’m sure there’s many grey cases in between. As a matter of fact, I’m involved in such a grey case, PDFForge, who are ultimately willing to implement plurals but currently don’t have the resources. With small projects, I tend to be more forgiving but for the likes of Microsoft to not fix this is, in my view, shameful. Especially since they are always going on at their translators about making the translations ‘sound natural as if you were talking to another human’. I’ve given up counting how many times I’ve tried telling them that we’d love to, if they stopped hamstringing us!
    Gaelic names is fun… though not insurmountable. I only know of one such system but it asks the user during sign-up what their address form is i.e. the sign up as Dòmhnall but then have the option of adding A Dhòmhnaill too. Wouldn’t be impossible to automate either I guess but who’d implement it >.<

  3. 27/05/2014 at 1:23 pm

    On top of this let’s come up with something slightly more interesting: Austronesian reduplication which *sometimes* has a function kinda similar to pluralization.

    Now in a naive sense, you don’t reduplicate plurals. In Bahasa Indonesia, fish is ikan, so:

    1 ikan, 2 ikan, 3 ikan, 4 ikan, 5 ikan, etc. Here ikan is *singular.* (example, “Saya minta ambil lima ikan,” I want to get five fish). But ikan-ikan is a generic plural: saya minta ambil ikan-ikan (“I want to get some fish”). But if you did “Saya minta ambil lima ikan-ikan” that would mean “I want to get five kinds of fish” and so when paired with an explicit number you get a pluralization of categories or groupings, or worse, things that are similar. (“Saya minta ambil lima kuda-kuda” could mean “I want to get five groups of horses” or “I want to get five sawhorses”).

    However, there are times when you do, for example, you can use ikan-ikan to indicate kinds of fish (in plural):

    1 ikan, 2 ikan-ikan, 3 ikan-ikan, etc. (Here reduplication is effectively pluralizing collective groups). However if you want to get one group of fish, you’d leave off the number and just use the generic plural.

    Similarly if 2 is dua, then we can do the same with pairs:

    1 dua, 2 dua-dua, 3 dua-dua, 4 dua-dua

    What this means is that you can’t even assume that our notions of plurality roughly translate. Plurality may in fact mean something vastly different in sufficiently removed languages.

    With kuda (horse) this becomes particularly problematic, since kuda-kuda means “something like a horse” (for example a stance or a sawhorse). So in addition to having “kuda” mean “horse” and “kuda-kuda” mean “horses” with the above duplication rules, you *also* have kuda-kuda meaning sawhorse (singular or plural). Similarly you have the same problem with laki vs laki-laki, and a host of other nouns.

    What this means is that for any noun, you have *multiple possible sets of pluralization rules* depending on what you mean to say.

  4. 27/05/2014 at 1:44 pm

    Nice, reminds me of my mother tongue, Cantonese, where reduplication oddly enough rarely implies plural but can be used in seemingly confusing ways, both as a diminutive, an intensifier and a nominaliser, to name the most confusing. So in gwái gwái syú syú (ghost ghost mouse mouse) – which translates as ‘sneaking around’ broadly speaking, it sort of works as a sort of intensifier. In heu heu on the other hand ‘shoe shoe’ it works as a diminutive (i.e. shoeikins).

  5. GunChleoc
    07/07/2014 at 1:25 pm

    From the projects I’ve worked on so far, Wikimedia gets the double gold star for supporting the user’s natural gender as well as proper plural forms. The triple gold star actually goes to a computer game – OpenTTD. The support giving both gender and case to their short terms, so you can then change the longs strings they get inserted into according to gender. And of course proper plural forms. Actually, they get another star for implementing a web translator that shows similar strings, to help translators handle their custom format, and they do fuzzy matches as well.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: