Archive

Posts Tagged ‘developers’

While 420km below the ISS a Dani is sharpening his stone axe

26/05/2014 5 comments

Sometimes the world of software feels a bit like that, a confusing array of ancient and cutting edge stuff.I see you nodding sagely, thinking of the people still using Windows 98 or even more extreme, Windows 3.11 or people who just don’t want to upgrade to Firefox 3 (we’re on 29 just now, for those of you on Shrome). I actually understand that, on the one hand you have very low-key users who just write the odd email and on the other you have specialists (this is most likely something happening at your local hospital, incidentally) who rely on a custom-rigged system using custom-designed software, all done in the days of yore, to run some critical piece of technology and who are loathe to change it since… well… it works. I don’t blame them, who wants to mess around with bleeding tiles when they’re trying to zap your tumour.

But that wasn’t actually what I was thinking about. I was thinking about the spectrum of localizer friendly and unfriendly software. At the one extreme you have cutting edge Open Source developers working on the next generation of localization (also known as l20n, one up from l10n) and on the other you have… well, troglodytes. Since I don’t want to turn this into a really complicated lecture about linguistic features, I’ll pick a fairly straightforward example, the one that actually made me pick up my e-pen in anger. Plurals.

What’s the big deal, slap an -s on? Ummm. No. Ever since someone decided that counting one-two-lots (ah, I wish I had grown up a !San) was no longer sufficient, languages have been busy coming up with astonishingly complex (or simple) ways of counting stuff. One the one extreme you have languages like Cantonese which don’t inflict any changes on the things they’re counting. So the writing system aside, you just go 0 apple, 1 apple, 2 apple… 100 apple, 1,000 apple and so on.

English is a tiny step away from that, counting 0 apples, 1 apple, 2 apples… 100 apples, 1,000 apples and so on. Spot something already? Indeed. Logic doesn’t really come into it, not in a mathematical sense. By that I mean there is no reason why in Cantonese 0 should pattern with 1, 2 etc but that in English 0 should go with 2, 3, etc. It just does. Sure, historical linguists can sometimes shed light on how these have developed but not very often. On the whole, they just are.

This is where it gets entertaining (for linguists). First insight, there aren’t as many systems as there are languages. So much less than 6,000. In fact, looking at the places where such rules are collected, there are probably less than a 100 different ways (on the planet) for counting stuff. Still fun time though (for linguists). Let me give you a couple of examples. A lot of Slavonic (Ukrainian, Russian etc) languages require up to 3 different forms of a noun:

  • FORM 1: any number ending in 1 (1, 11, 21, 31….)
  • FORM 2: ends in 2, 3 or 4 – but not 12, 13 or 14 (22, 23, 24, 32, 33, 34…)
  • FORM 3: anything else (12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 26, 27…)

That almost makes sense in a way. But we can add a few more twists. Take the resurrected decimal system in Scottish Gaelic. It requires up to 4 forms of a noun:

  • FORM 1: 1 and 11 (1 chat, 11 chat)
  • FORM 2: 2 and 12 (2 chat, 12 chat)
  • FORM 3: 3-10, 13-20 (3 cait, 4 cait, 13 cait, 14 cait…)
  • FORM 4: anything else (21 cat, 22 cat, 100 cat…)

Hang one, you’re saying, surely FORM 1 and FORM 2 could be merged. ’fraid not, because while the word cat makes it look as if they’re the same, if you start counting something beginning with the letter d, n, t, s, the following happens:

  • FORM 1: 1 taigh, 11 taigh
  • FORM 2: 2 thaigh, 12 thaigh
  • FORM 3: 3 taighean, 4 taighean, 13 taighean, 14 taighean…
  • FORM 4: 21 taigh, 22 taigh, 100 taigh…

Told you, fun! Now here’s where it gets annoying. Initially, in the very early days of software, localization mostly meant taking software written in English and translating it into German, French, Spanish, Italian & Co and then a bit later on adding Chinese, Japanese and Korean to the list.

Through a sheer fluke, that worked almost perfectly. English has a very common pattern, as it turns out (one form for 1 and another for anything else) so going from English to German posed no problems in translation. You simple took a pair of English strings like:

  • Open one file
  • Open %d files

and translated them into German:

  • Eine Datei öffnen
  • %d Dateien öffnen

Similarly, going to Chinese also posed no problem, you just ended up with a superfluous string because (I’ll use English words rather than Chinese characters):

  • Open one file
  • Open %d file

also created no linguistic or computational problems. Well, there was the fact that in French 0 patterns with 1, not with the plural as it does in English but I bet at that point English developers thought they were home and dry and ready to tick off the whole issue of numbers and number placeholders in software.

Now I have no evidence but I suspect a Slavonic language like Russian was one of the first to kick up a stink. Because as we saw, it has a much more elaborate pattern than English. Now there was one bit of good news for the developers: although these linguistic setups were elaborate in some cases, they also followed predictable patterns and you only need about 6 categories (which ended up being called ONE, TWO, FEW, MANY, OTHER for the sake of readability – so Gaelic ended up with ONE, TWO, FEW and OTHER for example). Which meant you could write a rule for the language in question and then prep your software to present the translator – and ultimately the user – with the right number of strings for translation. Sure, they look a bit crazy, like this one for Gaelic:

Plural-Forms: nplurals=4; plural=(n==1 || n==11) ? 0 : (n==2 || n==12) ? 1 : (n > 2 && n < 20) ? 2 : 3;\n

but you only had to do it once and that was that. Simples… you’d think. Oh no. I mean, yes, certainly doable and indeed a lot of software correctly applies plural formatting these days. Most Open Source projects certainly do, programs like Linux or Firefox for example have it, which is the reason why you probably never noticed anything odd about it.

One step down from this nice implementation of plurals are projects like Joomla! who will allow you to use plurals but they won’t help you. Let me explain (briefly). Joomla! has one of the more atavistic approaches to localization – they expect translators to work directly in the .ini files Joomla! uses. Oh wow. So to begin with, that DOES enable you to do plurals but to begin with you have to figure out how to say the plural rule of your language in Joomla! and put that into one of the files. In our case, that turned out to be

   public static function getPluralSuffixes($count) {
if ($count == 0 || $count > 19) {
$return =  array(‘0’);
}
elseif($count == 1 || $count == 11) {
$return =  array(‘1’);
}
elseif($count == 2 || $count == 12) {
$return =  array(‘2’);
}
elseif(($count > 2 && $count < 12) || ($count > 12 && $count < 19) {
$return =  array(‘FEW’);
}

Easy peasy. One then has to take the English, for example:

COM_CONTENT_N_ITEMS_CHECKED_IN_0=”No cat”
COM_CONTENT_N_ITEMS_CHECKED_IN_1=”%d cat”
COM_CONTENT_N_ITEMS_CHECKED_IN_MORE=”%d cats”

and change it to this for Gaelic:

COM_CONTENT_N_ITEMS_CHECKED_IN_1=”%d cat”
COM_CONTENT_N_ITEMS_CHECKED_IN_2=”%d chat”
COM_CONTENT_N_ITEMS_CHECKED_IN_FEW=”%d cait”
COM_CONTENT_N_ITEMS_CHECKED_IN_OTHER=”%d cat”

Unsurprisingly, most localizers just can’t be bothered doing the plurals properly in Joomla!.

Ning is another project in this category – they also required almost as many contortions as Joomla! but their mud star is for having had plural formatting. And then having ditched it because allegedly the translators put in too many errors. Well duh… give a man a rusty saw and then complain he’s not sawing fast enough or what?

And then there are those projects which stubbornly plod on without any form of plural formatting (except English style plurals of course). The selection of programs which are still without proper plurals IS surprising I must say. You might think you’d find a lot of very old Open Source projects here which go back so far that no-one wants to bother with fixing the code. Wrong. There are some fairly new programs and apps in this category where the developers chose to ignore plurals either through linguistic ignorance or arrogance. Skype (started in 2003) and Netvibes (2005) for example. Just for contrast, Firefox was born in 2002 and to my knowledge always accounted for plurals.

Similarly, some of them belong to big software houses which technically have the money and manpower to fix this – such as Microsoft. Yep, Microsoft. To this date, no Microsoft product I’m aware of can handle non-English type plurals properly in ANY other language. Russians must be oddly patient when it comes to languages cause I get really annoyed when my screen tells me I have closed 5 window

A lot of software falls somewhere between the two extremes – I guess it’s just the way humans are, looking at the way we build our cities into and onto and over older bits of city except when it all falls down and we have to (or can?) start from scratch. But that makes it no less annoying when you’re trying to make software sound less like a robot in translation than it has to…

PS: I’d be curious to know which program first implemented plurals. I’m sort of guessing it’s Linux but I’m not old enough to remember. Let me know if you have some insights?

PPS: If you’re a developer and want to know more about plurals, I recommend the Unicode Consortium’s page on plurals as a starting point, you can take it from there.

Advertisements

All look same, eh?

15/09/2012 5 comments

I must have been an elephant in another life, given how much time I seem to spend these days shaking my head over “avoidable stupidity”. Or maybe I’m just becoming a grumpy old man. That might be it – I’m losing the ability of youth to look at a slice of cold pizza and go “yummmm”. These days, I look at it and think “The cheese is hard, the cat sniffed it, I can’t even remember when I ordered it” and chuck it out. Ah but I digress.

This week’s headshaker is the way we seem to be loosing control to the developers, control over things that should not be in the remit of developers. Things like letting some algorithm “identify” the language of web content and adjusting my search results based on that. Who dreamt that up? No idea but I bet he was white, monolingual and only had the faintest notion that apart from English, there’s that thing the people making tacos speak and then maybe the thing the Chinese takeaway people use. Choice of three – easy, if L does not equal English, check for non-Latin. If it’s non-Latin in must be Chinese, if it is, it’s Spanish. At least that’s the way it comes across.

The problem is, dear developer, that there’s a great many languages out there and there’s quite a few which are fairly close to each other. Like Irish and Scottish Gaelic for example. So if you’re decide to automatically identify content by language and modify my search results based on that, then bloody well make sure you get it right! Anything else is just seriously annoying unless you give me the option of manually tweaking it.

Given that it’s not like it’s impossible to teach a computer to figure out the difference (for one, Irish uses acutes, Gaelic graves… the one goes up, the other one down, see?) it also raises the question of exactly whom they’re getting to program this stuff? High school students?

Probably not actually, I suspect they’re all really good at code. But listening to my other half, a business consultant with his very own set of why-oh-why’s, I suspect the problem actually is NOT the ability to do code. It’s lack of guidance at all levels. The way big companies hire folk these days goes something like this:

  1. Company A identifies an apparent problem. Without making sure they identify the root cause, they call for a Fixer-of-Problem-A. First mistake. You’re granny breaking her ankle may be the apparent problem but without checking, you don’t know if the problem is actually osteoporosis.
  2. So, having rightly or wrongly identified the problem, these days, a job spec gets sent to an agency. Second mistake. They usually get the wrong person to write the job spec, which means the agency is already at the receiving end of a potential mis-diagnosis and a badly written job spec. I’ve seen some of these… the really bad ones are the equivalent of needing a plumber and calling for someone with a proven track record in “the physical aspects of interior decoration as relates to waste disposal”. Yes. THAT bad.
  3. So we move onto mistake four. The agency usually adds its own flavour of inane, if not misleading, waffle. Using the plumber again, they add something about needing an end-to-end CV showing more than 20 years of experience in toilet seat lifting in blue-chip companies.
  4. Because it’s an IT related job everyone on this daisy chain assumes that the fixer and/or overseer of the fixing have to be IT people. Wrong. Fifth mistake. Of course you need IT folk to do the black magic but the overseer of the circus does not have to be one. In fact, I’d go as far as saying that they shouldn’t be one. Developers, when left to their own devices, tend to lose themselves in coding “fun” stuff. A failing I guess we all suffer from in our respective domains but for some reason, we let developers get away with policing themselves. In other words, a herd of sheep needs a sheepdog and a shepherd for guidance and direction, not another sheep. The sheepdog and shepherd should have a track record of having dealt with sheep but they don’t have to be sheep themselves.

I reckon it’s this nauseating daisy-chain of mistakes which blesses us with nonsense like the above. We need the coder to do the fancy stuff which, for example, helps identify the content of web pages. Jolly good, I can see the use of that if well done. But it should not be left to the developers to decide that is what’s needed right now, what it will do, how it gets tested, how it gets implemented and how to make sure the user has the necessary control over it if they need it. For that, we need a shepherd who’s not a sheep. If we manage that, I suspect we’d see fewer Siris, fewer counter-intuitive user interfaces, better language in the interfaces and a way of stopping Google from asking me every two seconds if I want to translate this damn page. No, I’m multilingual, and besides, running Irish machine translation over Gaelic won’t work anyway, dammit!

Akerbeltz blogs??

And in Beurla? That’s English, for the goidelically challenged. Thing is, I already connect with my Goidelic-speaking friends via many a channel but what I may have to say that’s fit for a blog is actually much less aimed at them.

Thing is, I spotted the great opportunities that Open Software had to offer to small languages a long time ago but when I had a look in, I got nowhere. More about that later. It wasn’t until a chance meeting between an American Irish speaker and myself in a pub in Dublin that I finally managed to get something off the ground with the brilliant help of said Gaelgeoir, Kevin Scannell, who encouraged me to go back and localize Mozilla Firefox.

That was back in 2009. I’ve since morphed into the Scottish Gaelic localization team for anything from Mozilla to LibreOffice. Surprisingly common scenario, but again, more on that later. 2011 in particular has been a busy year and I now feel that I’ve moved beyond the noob stage and where I’m allowed to have a view or two on some things.

So, Dear Developer, thanks for tuning in and I hope this will be provide an insight as to what localization looks like from the other end of the fibreoptic cable!