How to stonewall Open Source

07/03/2015 3 comments

I seem to be posting a lot about Google these days but then they ARE turning into the digital equivalent of Nestlé.

I’ve been pondering this post for a while and how to approach it without making it sound like I believe in area 52. So I’ll just say what happened and let you come to your own conclusions mostly.

Back when Google still ran the Google in Your Language project, I tried hard to get into Gmail and what was rumoured to be a browser but failed though they were keen to push the now canned Picasa. <eyeroll> Then of course they canned the whole Google in Your Language thing. When I eventually found out that Google Chrome is technically nothing else than a rebranded version of an Open Source browser called Chromium, I thought ‘great, should be able to get a leg into the door that way’. Think again. So I looked around and was already confused because there did not appear to be a clear distinction between Chromium and Chrome. The two main candidates were Launchpad and Google Code. So January 2011 I decide to file an issue on Google Code, thinking that even if it’s the wrong place, they should be able to point me in the right direction. The answer came pretty quick. Even though the project is called Chromium, they (quote) don’t accept third party translations for chrome. And nobody seems to know where the translations come from or how you become an official translator. A vague reference that I maybe should try Ubuntu.

I gave it some time. Lots of time in fact. I picked up the thread again early in 2013. Now the semi-serious suggestion was to fork Chromium and do my translation on the fork. Very funny. Needless to say, I was getting rather disgusted at the whole affair and decided to give up on Chrome/Chromium.

When I noticed that an Irish translator on Launchpad has asked a similar question about Chromium and saw the answer was they, as far as they know, push the translations upstream to Chromium from Launchpad, I decided I might as well have a go. As someone had suggested, at least I’ll get Chromium on Linux.

Fast forward to October 2014 and I’m almost done with the translation on Launchpad so I figure I better file a bug early because it will likely take forever. Bug filed, enthusiastic response from some admin on Launchpad. Great, I think to myself, should be plain sailing from here on. Spoke too soon. End of January 2015, the translation long completed, I query to silence and only get more silence. More worryingly, someone points me at a post on Ubuntu about Chromium on Launchpad being, well, dead.

Having asked the question in a Chromium IRC chat room, I decided to have another go on Google Code, new bug, new luck maybe? Someone in the room did sound supportive. That was January 28, 2015. To date, nothing has happened apart from someone ‘assigning the bug to l10n PM for triage’.

I’m coming to the conclusion that Chromium has only the thinnest veneer of being open. Perhaps in the sense that I can get a hold of the source code and play around with it. But there is a distinct lack of openness and approachability about the whole thing. Perhaps that was the intention all along, to use the Open Source community to improve the source code but to give back as little as possible and to build as many layers of secrecy and to put as many obstacles in people’s path as possible. At least when it comes to localization.

At least Ubuntu is no longer pushing Chromium as the default browser. But that still leaves me with a whole pile of translation work which is not being used. Maybe I should check out some other Chromium based browsers like Comodo Dragon or Yandex. Perhaps I’m being paranoid but I’m not keep on software coming from Russia being on my systems or recommending it to other people. Either way, I’m left with the same problem that we have with Firefox in a sense – it would mean having to wean people off pre-installed versions of Google Chrome or Internet Explorer.

Anyone got any good ideas? Cause I’m fresh out of…

The spectre of Google Translate for Gaelic

15/01/2015 1 comment

Not the kind of pre-Christmas cheer I was hoping for, seriously. Slap bang on the 23rd, someone draws my attention to an article called Google urged to go Gaelic. In a nutshell, a left-field (most likely well-intentioned) appeal by an MSP from Central Scotland to add Scottish Gaelic to the list of languages. As the mere thought was nauseating, I made some time and wrote a very long letter to Murdo Fraser, the man in question, with copies going to David Boag at Bòrd na Gàidhlig and Alasdair Allan, minister for languages. As it sums up my arguments quite succinctly (I hoped), I’ll just copy it here:


Just before Christmas, a friend drew my attention to an article in the Courier regarding Google Translate in which Mr Murdo Fraser argues for a campaign to get Scottish Gaelic onto Google Translate.

I’m sure that this is a well-intentioned idea but in my professional opinion, it would have terrible consequences. As one of the few people who work entirely in the field of Gaelic IT, I have a keen interest in technology and the potential benefit – and damage – this offers to languages like Gaelic. As it happens, I also was the Gaelic localizer (i.e. translator) for Google when it was still running the Google In Your Language programme and I have watched (often with dismay) what Google has done in this area since. One of the projects that certainly caught my eye was Google Translate, especially when Irish was added as a language in 2009. But having spoken to Irish people working in this field and having watched the effects of it on the Irish language, I rapidly came to the conclusion that while it looks ‘cool’, being on a machine translation system for a small(er) language was not necessarily a benefit and in some cases, a tragedy.

Without going into too much technical detail, machine translation of the kind that Google does works best with the following ingredients:
– a massive (billions of words) aligned bilingual corpus
– translation between structurally similar languages or
– translation from a grammatically complex language into a less grammatically complex language but not the other way round
– translation of short, non-colloquial phrases and sentences but not complex, colloquial or literary structures

In essence, machine translation trains an algorithms in ‘patterns’, which is why massive amounts of data are needed and why it works better from a complex language into a less complex language. For example, it is relatively easy to teach the system that German der/die/das require ‘the’ in English, but it requires a massive amount of data for the system to become clever enough to understand when ‘the’ becomes ‘der’ but not ‘die’.

Unfortunately for Irish, none of these conditions were met – and would also not be met for Scottish Gaelic. To begin with, even if we digitized all the works ever produced which exist in English and Gaelic, the corpus would still be tiny by comparison to the German/English corpus for example.

Then there is the issue of linguistic distance, Irish/Gaelic and English are structurally very different, with Gaelic/Irish having a lot more in the way of complex grammatical structures than English. To compensate for this, the corpus would have to be truly massive. Which is why the existing Irish/English system is extremely poor by anyone’s standards.

One might argue that the aim is not a perfect translation system but a means of accessing information only available in other languages – which is the case for many of the languages which are on Google Translate. But I’m doubtful if the reverse is true. To begin with, no fluent Gaelic speaker requires a Gaelic > English translation system and there is preciously little which is published in Gaelic in digital form which does not also exist in English. All this would do is remove yet another reason for learning Gaelic.

That would leave English > Gaelic and herein lies the tragedy of the English/Irish pairing on Google Translate. Whatever the intentions of the developers, people will mis-use such a system. I have put together a few annotated photos which illustrate the scale of the disaster in Ireland here. From school reports to official government websites, there are few places where students, individuals or officials trying to cut corners have not used Irish translations of Google Translate in ways they were not intended to be used.

If there HAD been a Gaelic/English pair, Police Scotland would have been an even bigger target of ridicule because such an automated translation would have produced gibberish at worst and absurd semi-Gaelic at best.

I think we can all agree that the last thing Gaelic needs is masses of poor quality translations floating around the internet. Funding is extremely short these days and this would, in my view, be a poor use of these scarce funds. There are more pressing battles to be fought in the field of Gaelic and IT, such as the refusal by the 3rd party suppliers of IT services to Gaelic schools and units to provide (existing) Gaelic software or even a keyboard setting in any school that allows students to easily input accented characters, be that for Gaelic, Spanish or French.

is mise le meas mòr,


Turns out I wasn’t the only one horrified by the mere thought – John Storey also wrote a very long and polite letter.

Early in January and within days of each other, both John and I received almost identical responses which, in a nutshell, said ‘Thanks but I’ll keep trying anyway’. Even less encouragingly, it make some really irrelevant reference to the lack of teachers in Gaelic Medium Education. Which is true of course but well, not relevant?


Thank you for contacting me in relation to Scots Gaelic and Google Translate and for your detailed correspondence.

I appreciate the depth of your letter and note your concerns in relation to issues of accuracy and the potential impact to speakers of Gaelic of Google translate. I will be sure to consider these when next speaking on the subject.

I also agree that there are other battles to be fought in the field of Gaelic and IT and appreciate the current issues surrounding the number of teachers in Gaelic Medium Education.  However, I do believe it is worth promoting the case for a more accessible Gaelic presence online and without this I believe that Gaelic could miss out on the massive opportunities afforded by the digital age.


I’m still waiting for a response from Bòrd na Gàidhlig or Alastair Allan. But I’m not encouraged. Really frustrated actually because (at least as the Press & Journal and the Perthshire Conservatives would have it), it seems like Bòrd na Gàidhlig and Alastair Allan are throwing their weight behind this ill-fated caper.

I really hope Google turns them down because I really don’t want to end up where the Irish IT specialists ended up – the merry world of “Told you so”…

But sadly “Got Gaelic onto Google” probably just sounds sexier on your CV than “Banged some desks and made sure all kids in Gaelic Medium Education can now easily type àèìòù”…

How to make headlines for the wrong reasons

Good afternoon, boys and girls, very bad language, for example, what we see in the side and at the airport these days? No, I haven’t gone insane, I’m just illustrating a point by resorting to reductio ad absurdum. In other words, I punched the sentence Hey folks, anyone up for some really truly bad language like the stuff we’re seeing at BÁC airport these days? into Bad Translator and let it go through 10 machine translations.

More unheavy fuel?

Why? Glad you asked… these days, Google is making headlines both in the Irish traditional press and in social media. But for all the wrong reasons. The reason? Google Translate. Or rather, a language pair someone should have thought about a little more. Or at least done some user testing on it. Something…

So what I imagine happened is this… some bright spark, either on the Google side or some well-meaning Irish government official thought it would be great if we could have Irish on Google translate. First mistake. Give humans a tool, and they will mis-use it. Like our ex-joiner hammering in screws. So before you give people a tool, think about likely scenarios of mis-use. It clearly does not require a team of MENSA members to imagine that in a minoritised language like Irish, people might start using it for things like their homework or cheap translations rather than a quick way of getting the gist behind web content.

But having blissfully ignored this step, someone must have forged ahead and contributed a bilingual corpus to Google developers with a note along the lines of here’s a corpus for Irish, please add it to Google Translate. Most likely, second mistake. Right, so there are many ways of building machine translation systems but most rely on a mix of rules and a bilingual corpus. The idea being that as long as you feed a computer enough aligned data in two languages, it can use statistics to figure out how to translate between the two. This idea in itself is sound. Sort of. It depends on the languages in question, the amount of data involved and the direction of the translation oddly enough. Here’s an ideal scenario: build a system using a VAST amount of data (we’re talking billions of words) to translate between closely related languages and into the language which has the less fancy grammatical system. Like German to English. That works quite well as a pair on Google Translate because a) there are indeed vast amounts of texts which exist in both languages. German has the fancier grammar (3 genders, case marking, inflection of verbs…) whereas English does buggerall (some past tense markers on verbs and a plural -s aside, which is peanuts in linguistic terms).

A bit like saying ‘Going all passengers from The gates please their sick people as if the doors to be opened before your Boarding Times’

But once you move away from the ideal model, things start creaking. The more complex the structures of the target language, the more data you’d need for the computer to make any sense of it. So going English to Icelandic creaks much more because even though they’re related languages (ultimately), Icelandic is even more complex than German. Oh and there’s less bilingual data of course.

You get the idea. Now Irish is eye-candy to a linguist. It has grammatical structures to die for, a case system, two genders, two types of mutation (that’s when the first sound in a word changes… you might now people called Hamish? Well that’s what Irish does to a man called Séamus when you address him), a headache-inducing system for inflecting verbs, a different word order (English is subject-object-verb, Irish is verb-subject-object) and so on. A thousand things English doesn’t do. So what would we need to make this work? Yup, take a gold star, a corpus billions of words big.

Unfortunately there’s no bilingual corpus that even comes close to that. Or at the very least, Google did not feed in anywhere near enough data. I’ve lost track but I think it’s mistake 3?

Cue mistake 4… let it loose on people without a big warning strapped to it or any form of user testing. The result? Eye-wateringly bad translations which start cropping up in the weirdest places. Facebook … ok, we could probably live with that… homework… a lot worse, don’t teachers have enough to contend with? And of course the jewel in the crown – official signage. Yep, that’s right. Google Translate has been making its way onto signage from Dublin Airport to government websites. And the result is almost always nauseating. Breaking through barriers? Only the blood vessels in Irish speakers’ brains perhaps…

It’s not that one shouldn’t attempt to bring technology to smaller languages, I’m all for that. But quality is key. It’s a hard enough sell at the best of times and something like a poor machine translation system can seriously damage the confidence people have in technology in or for their language. A little careful thinking goes a long way…

Categories: Uncategorized

Once bitten by Open Source, hooked forever?

So some would claim. But having just read the news from Munich, I would re-iterate the need for some soul-searching as to the truth of that claim. The news being that the City of Munich, having decided to switch from Microsoft to Linux in 2004, is considering going back to Microsoft. Sure, there may be some shady business involved but reading the article, there are valid problems that the users are raising.

There are undeniable benefits of Open Source stuff and I won’t bore everyone with going into them again. And undoubtedly some issues stem from users just being so used to Microsoft. But what stood out for me was the comment Munich’s mayor Dieter Reiter made about the complications with managing email, calendars and contacts and that in his view, Linux is sometimes behind Microsoft.

Now before y’all start listing the amazing tools I can sudo onto my Ubuntu machine, that’s not the point. The point is that what Microsoft does offer and which still eludes the Open Source scene is integration and end-user friendliness. Ubuntu sort of makes a stab at that but in my view still falls short.

I will forgo my usual verbosity and simply pose some questions:

  1. Was it really smart of Mozilla to ditch the official development of Thunderbird (their email client) and Lightning (the calendar that goes with it)? Rather than integrating it further with Firefox and coming up with a webmail service based on it?
  2. Why is there still so little cross-project coordination and cooperation in the Open Source scene?
  3. Could this be a painful lesson that OS is not an addictive drug to most users and that they will come off it if they’re having a bad trip? Does this mean that the cavalier way in which most OS projects approach issues of usability and the user interface are coming round big time to bite us?

Don’t get me wrong. I still think it’s the only sustainable way forward, especially for SMLs (small to medium locales). But pride in amazing code will not cut the mustard with Mrs McGinty down the road who just wants something she can use out of the box and link to her phone and with a calendar for her webmail so she won’t forget her next appointment with the orthodontist. Without resorting to command lines that would make Linus weep.

While 420km below the ISS a Dani is sharpening his stone axe

26/05/2014 5 comments

Sometimes the world of software feels a bit like that, a confusing array of ancient and cutting edge stuff.I see you nodding sagely, thinking of the people still using Windows 98 or even more extreme, Windows 3.11 or people who just don’t want to upgrade to Firefox 3 (we’re on 29 just now, for those of you on Shrome). I actually understand that, on the one hand you have very low-key users who just write the odd email and on the other you have specialists (this is most likely something happening at your local hospital, incidentally) who rely on a custom-rigged system using custom-designed software, all done in the days of yore, to run some critical piece of technology and who are loathe to change it since… well… it works. I don’t blame them, who wants to mess around with bleeding tiles when they’re trying to zap your tumour.

But that wasn’t actually what I was thinking about. I was thinking about the spectrum of localizer friendly and unfriendly software. At the one extreme you have cutting edge Open Source developers working on the next generation of localization (also known as l20n, one up from l10n) and on the other you have… well, troglodytes. Since I don’t want to turn this into a really complicated lecture about linguistic features, I’ll pick a fairly straightforward example, the one that actually made me pick up my e-pen in anger. Plurals.

What’s the big deal, slap an -s on? Ummm. No. Ever since someone decided that counting one-two-lots (ah, I wish I had grown up a !San) was no longer sufficient, languages have been busy coming up with astonishingly complex (or simple) ways of counting stuff. One the one extreme you have languages like Cantonese which don’t inflict any changes on the things they’re counting. So the writing system aside, you just go 0 apple, 1 apple, 2 apple… 100 apple, 1,000 apple and so on.

English is a tiny step away from that, counting 0 apples, 1 apple, 2 apples… 100 apples, 1,000 apples and so on. Spot something already? Indeed. Logic doesn’t really come into it, not in a mathematical sense. By that I mean there is no reason why in Cantonese 0 should pattern with 1, 2 etc but that in English 0 should go with 2, 3, etc. It just does. Sure, historical linguists can sometimes shed light on how these have developed but not very often. On the whole, they just are.

This is where it gets entertaining (for linguists). First insight, there aren’t as many systems as there are languages. So much less than 6,000. In fact, looking at the places where such rules are collected, there are probably less than a 100 different ways (on the planet) for counting stuff. Still fun time though (for linguists). Let me give you a couple of examples. A lot of Slavonic (Ukrainian, Russian etc) languages require up to 3 different forms of a noun:

  • FORM 1: any number ending in 1 (1, 11, 21, 31….)
  • FORM 2: ends in 2, 3 or 4 – but not 12, 13 or 14 (22, 23, 24, 32, 33, 34…)
  • FORM 3: anything else (12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 26, 27…)

That almost makes sense in a way. But we can add a few more twists. Take the resurrected decimal system in Scottish Gaelic. It requires up to 4 forms of a noun:

  • FORM 1: 1 and 11 (1 chat, 11 chat)
  • FORM 2: 2 and 12 (2 chat, 12 chat)
  • FORM 3: 3-10, 13-20 (3 cait, 4 cait, 13 cait, 14 cait…)
  • FORM 4: anything else (21 cat, 22 cat, 100 cat…)

Hang one, you’re saying, surely FORM 1 and FORM 2 could be merged. ’fraid not, because while the word cat makes it look as if they’re the same, if you start counting something beginning with the letter d, n, t, s, the following happens:

  • FORM 1: 1 taigh, 11 taigh
  • FORM 2: 2 thaigh, 12 thaigh
  • FORM 3: 3 taighean, 4 taighean, 13 taighean, 14 taighean…
  • FORM 4: 21 taigh, 22 taigh, 100 taigh…

Told you, fun! Now here’s where it gets annoying. Initially, in the very early days of software, localization mostly meant taking software written in English and translating it into German, French, Spanish, Italian & Co and then a bit later on adding Chinese, Japanese and Korean to the list.

Through a sheer fluke, that worked almost perfectly. English has a very common pattern, as it turns out (one form for 1 and another for anything else) so going from English to German posed no problems in translation. You simple took a pair of English strings like:

  • Open one file
  • Open %d files

and translated them into German:

  • Eine Datei öffnen
  • %d Dateien öffnen

Similarly, going to Chinese also posed no problem, you just ended up with a superfluous string because (I’ll use English words rather than Chinese characters):

  • Open one file
  • Open %d file

also created no linguistic or computational problems. Well, there was the fact that in French 0 patterns with 1, not with the plural as it does in English but I bet at that point English developers thought they were home and dry and ready to tick off the whole issue of numbers and number placeholders in software.

Now I have no evidence but I suspect a Slavonic language like Russian was one of the first to kick up a stink. Because as we saw, it has a much more elaborate pattern than English. Now there was one bit of good news for the developers: although these linguistic setups were elaborate in some cases, they also followed predictable patterns and you only need about 6 categories (which ended up being called ONE, TWO, FEW, MANY, OTHER for the sake of readability – so Gaelic ended up with ONE, TWO, FEW and OTHER for example). Which meant you could write a rule for the language in question and then prep your software to present the translator – and ultimately the user – with the right number of strings for translation. Sure, they look a bit crazy, like this one for Gaelic:

Plural-Forms: nplurals=4; plural=(n==1 || n==11) ? 0 : (n==2 || n==12) ? 1 : (n > 2 && n < 20) ? 2 : 3;\n

but you only had to do it once and that was that. Simples… you’d think. Oh no. I mean, yes, certainly doable and indeed a lot of software correctly applies plural formatting these days. Most Open Source projects certainly do, programs like Linux or Firefox for example have it, which is the reason why you probably never noticed anything odd about it.

One step down from this nice implementation of plurals are projects like Joomla! who will allow you to use plurals but they won’t help you. Let me explain (briefly). Joomla! has one of the more atavistic approaches to localization – they expect translators to work directly in the .ini files Joomla! uses. Oh wow. So to begin with, that DOES enable you to do plurals but to begin with you have to figure out how to say the plural rule of your language in Joomla! and put that into one of the files. In our case, that turned out to be

   public static function getPluralSuffixes($count) {
if ($count == 0 || $count > 19) {
$return =  array(‘0’);
}
elseif($count == 1 || $count == 11) {
$return =  array(‘1’);
}
elseif($count == 2 || $count == 12) {
$return =  array(‘2’);
}
elseif(($count > 2 && $count < 12) || ($count > 12 && $count < 19) {
$return =  array(‘FEW’);
}

Easy peasy. One then has to take the English, for example:

COM_CONTENT_N_ITEMS_CHECKED_IN_0=”No cat”
COM_CONTENT_N_ITEMS_CHECKED_IN_1=”%d cat”
COM_CONTENT_N_ITEMS_CHECKED_IN_MORE=”%d cats”

and change it to this for Gaelic:

COM_CONTENT_N_ITEMS_CHECKED_IN_1=”%d cat”
COM_CONTENT_N_ITEMS_CHECKED_IN_2=”%d chat”
COM_CONTENT_N_ITEMS_CHECKED_IN_FEW=”%d cait”
COM_CONTENT_N_ITEMS_CHECKED_IN_OTHER=”%d cat”

Unsurprisingly, most localizers just can’t be bothered doing the plurals properly in Joomla!.

Ning is another project in this category – they also required almost as many contortions as Joomla! but their mud star is for having had plural formatting. And then having ditched it because allegedly the translators put in too many errors. Well duh… give a man a rusty saw and then complain he’s not sawing fast enough or what?

And then there are those projects which stubbornly plod on without any form of plural formatting (except English style plurals of course). The selection of programs which are still without proper plurals IS surprising I must say. You might think you’d find a lot of very old Open Source projects here which go back so far that no-one wants to bother with fixing the code. Wrong. There are some fairly new programs and apps in this category where the developers chose to ignore plurals either through linguistic ignorance or arrogance. Skype (started in 2003) and Netvibes (2005) for example. Just for contrast, Firefox was born in 2002 and to my knowledge always accounted for plurals.

Similarly, some of them belong to big software houses which technically have the money and manpower to fix this – such as Microsoft. Yep, Microsoft. To this date, no Microsoft product I’m aware of can handle non-English type plurals properly in ANY other language. Russians must be oddly patient when it comes to languages cause I get really annoyed when my screen tells me I have closed 5 window

A lot of software falls somewhere between the two extremes – I guess it’s just the way humans are, looking at the way we build our cities into and onto and over older bits of city except when it all falls down and we have to (or can?) start from scratch. But that makes it no less annoying when you’re trying to make software sound less like a robot in translation than it has to…

PS: I’d be curious to know which program first implemented plurals. I’m sort of guessing it’s Linux but I’m not old enough to remember. Let me know if you have some insights?

PPS: If you’re a developer and want to know more about plurals, I recommend the Unicode Consortium’s page on plurals as a starting point, you can take it from there.

Sometimes being anal-retentive works

But mostly, it doesn’t. As is my conclusion regarding the “security” settings in Windows 8 where they’ve frankly tied themselves into a knot that would do the Midgard Serpent proud.

I only became aware of this knot when trying to install a program recently, in my case this was the highly innocuous LibreOffice update (which is basically a re-install that keeps your personal files and addons rather than an upgrade). So for the purposes of what I was doing, let’s treat this as a new installation. You get half way through and what happens? Error 1303 is what happens, the one about “installer has insufficient privileges to access blablabla”:

So basically it’s telling me that I, the one and only user of this machine who also happens to be logged in as an admin, doesn’t have the necessary rights to install a program. Rrrright…

There are two ways I can look at this. The cynic in me says they’re trying to force the bulk of users (who are out-of-the-box users who don’t “mess” with their systems) into using the pre-installed, approved and expensive junk their computers come with. Because the solutions to this problem start at the Gordian level and spiral upwards, some involving command prompts or a staggering array of permission setting windows that looks more like a digital card-house than system administration.

The other of course is sheer idiocy, where some developer figured that the best way of stopping users from cough using their systems would be the implement a fiendish array of permissions and user levels that would prevent unauthorised programs from installing themselves or users from accidentally messing up things. The only Ymir-sized snag is that you end up with users, desperate to install the things they actually want, from fiddling around with the permission settings for users and admins. Usually in the form of trying to create at least one super-user to get around all these issues. Which brings us round in a neat circle, where anyone gaining illegal access to the system has all the privileges they could ever want. I believe sporting natures describe that as an “own goal”. Nice one chaps.

Oh, but I did find a fairly simple workaround in the end. Amusingly, this anal-retentive approach seems to apply mainly to system folders and folders the system created. Such as Programs or Programs (x86). If you tell the installer to create a new directory, such as C:\Programan\LibreOffice4\, then it doesn’t bat an eyelid. “Oh my” as George Takei would say…

When peer review goes pear shaped

29/01/2014 2 comments

Well I’m glad I asked. What happened was this…

I had a request from someone asking if I could localize TinyMCE (a WYSIWYG editor – think of it as a miniature form of Word sitting within a website) so they could use it on their website for their Gaelic-speaking editors. There aren’t that many strings and the project is handled on Transifex using po files so the process seemed straight-forward too (if you don’t know what a po file is  – the main thing about them is that there are many translation memory packages which handle them and, if you have already done LibreOffice or something like that and stored those strings in the memory, there will be few strings in a project like TinyMCE for which there are no translation memory suggestions. In a nutshell – it allows an experienced software translator to work much faster).

So off I go. Pretty much a cake-walk, half a Bond film and 2 episodes of Big Bang later, the job was done. Now in many cases once a language has been accepted for translation and when you have translated all or at least most of the project, these translations will show up in the released program eventually. But just because I’m a suspicious old fart (by now), I messaged the admins and asked about the process of getting them released. Good thing too. Turns out they use an API to pull the translations from Transifex and onto their system (they’ve basically automated that step, which I can understand). The catch however is that it only grabs translations set to Reviewed.

Cue a groan from me. To cut the TinyMCE story short at this point, it seems this is down to Transifex (at least according to the TinyMCE admin) so they were quite happy for me to just breeze through them and set them to Reviewed myself. Fortunately it wasn’t a large job so 15 minutes later (admittedly, I have a about 14 other jobs on my desk just now which I would have rather done…), they were all set, thank goodness to keyboard shortcuts.

But back to the groan. I have come across this approach before and on the face of it, it makes sense. If you do community translation (i.e. you let a bunch of volunteers from the web translate into languages you as admins don’t understand and don’t have time to QA) but you’d like to have at least some measure of QA over the translations, by adding this step of peer reviewing, you can be at least more or less sure that you’re not getting ‘Jamie is a dork’ and ‘Muahahaha’ type translations.

The only problem is, peer review in online localization relies on large number of volunteers. Only a small percentage of speakers have any inclination towards translating pro bono publico and even fewer feel like reviewing other people’s translations (there is something slightly obscene about proofreading, it’s like having someone else put words in your mouth, they almost always taste funny…). I once did some rough and ready stats on the percentages of people of a given language who will be engaged in not-for-profit localization (of mainstream projects like Firefox or LibreOffice). It’s about ONE active localizer for every 500,000 speakers. So German can call upon something like 20 really active localizers. Scottish Gaelic on the other hand statistically has … well, it has less than 60,000 speakers. You work it out. So it’s seriously blessed by having TWO of them.

In any case, even if you disbelieve my figures (I’d be the first to admit to not being great shakes at numbers), the percentages are really small. So if you set up a translation process that necessitates not only translation but also peer review, you’re essentially screwing small languages because the chances are there will never be a reviewer with enough time or energy (never mind ability) to review stuff. It’s one of the reasons why we haven’t touched WhatsApp yet, they simply won’t let a translation into live without review.

So if you design a process like that and want to make sure you’re not creating big problems for smaller languages (and we’re not just talking Gaelic-style tiny languages, even languages like Kazakh or Estonian have such problems) make sure you

  • allow enough wriggle-room to over-ride such requirements, for example by allowing a localizer to demonstrate their credentials (for example through long-term participation in other projects) and
  • design a system where, if it’s absolutely necessary to set specific tags, admins can bulk-tag translations for a certain language.

Over and out.

Follow

Get every new post delivered to your Inbox.