Archive

Posts Tagged ‘localization’

How to stonewall Open Source

07/03/2015 3 comments

I seem to be posting a lot about Google these days but then they ARE turning into the digital equivalent of Nestlé.

I’ve been pondering this post for a while and how to approach it without making it sound like I believe in area 52. So I’ll just say what happened and let you come to your own conclusions mostly.

Back when Google still ran the Google in Your Language project, I tried hard to get into Gmail and what was rumoured to be a browser but failed, though they were keen to push the now canned Picasa. <eyeroll> Then of course they canned the whole Google in Your Language thing. When I eventually found out that Google Chrome is technically nothing else than a rebranded version of an Open Source browser called Chromium, I thought ‘great, should be able to get a leg into the door that way’. Think again. So I looked around and was already confused because there did not appear to be a clear distinction between Chromium and Chrome. The two main candidates were Launchpad and Google Code. So January 2011 I decide to file an issue on Google Code, thinking that even if it’s the wrong place, they should be able to point me in the right direction. The answer came pretty quick. Even though the project is called Chromium, they (quote) don’t accept third party translations for chrome. And nobody seems to know where the translations come from or how you become an official translator. A vague reference that I maybe should try Ubuntu.

I gave it some time. Lots of time in fact. I picked up the thread again early in 2013. Now the semi-serious suggestion was to fork Chromium and do my translation on the fork. Very funny. Needless to say, I was getting rather disgusted at the whole affair and decided to give up on Chrome/Chromium.

When I noticed that an Irish translator on Launchpad had asked a similar question about Chromium and saw the answer was they, as far as they know, push the translations upstream to Chromium from Launchpad, I decided I might as well have a go. As someone had suggested, at least I’ll get Chromium on Linux.

Fast forward to October 2014 and I’m almost done with the translation on Launchpad so I figure I better file a bug early because it will likely take forever. Bug filed, enthusiastic response from some admin on Launchpad. Great, I think to myself, should be plain sailing from here on. Spoke too soon. End of January 2015, the translation long completed, I query to silence and only get more silence. More worryingly, someone points me at a post on Ubuntu about Chromium on Launchpad being, well, dead.

Having asked the question in a Chromium IRC chat room, I decided to have another go on Google Code, new bug, new luck maybe? Someone in the room did sound supportive. That was January 28, 2015. To date, nothing has happened apart from someone ‘assigning the bug to l10n PM for triage’.

I’m coming to the conclusion that Chromium has only the thinnest veneer of being open. Perhaps in the sense that I can get a hold of the source code and play around with it. But there is a distinct lack of openness and approachability about the whole thing. Perhaps that was the intention all along, to use the Open Source community to improve the source code but to give back as little as possible and to build as many layers of secrecy and to put as many obstacles in people’s path as possible. At least when it comes to localization.

At least Ubuntu is no longer pushing Chromium as the default browser. But that still leaves me with a whole pile of translation work which is not being used. Maybe I should check out some other Chromium-based browsers like Comodo Dragon or Yandex. Perhaps I’m being paranoid but I’m not keen on software coming from Russia being on my systems or recommending it to other people. Either way, I’m left with the same problem that we have with Firefox in a sense – it would mean having to wean people off pre-installed versions of Google Chrome or Internet Explorer.

Anyone got any good ideas? Cause I’m fresh out of…

While 420km below the ISS a Dani is sharpening his stone axe

26/05/2014 5 comments

Sometimes the world of software feels a bit like that, a confusing array of ancient and cutting edge stuff.I see you nodding sagely, thinking of the people still using Windows 98 or even more extreme, Windows 3.11 or people who just don’t want to upgrade to Firefox 3 (we’re on 29 just now, for those of you on Shrome). I actually understand that, on the one hand you have very low-key users who just write the odd email and on the other you have specialists (this is most likely something happening at your local hospital, incidentally) who rely on a custom-rigged system using custom-designed software, all done in the days of yore, to run some critical piece of technology and who are loathe to change it since… well… it works. I don’t blame them, who wants to mess around with bleeding tiles when they’re trying to zap your tumour.

But that wasn’t actually what I was thinking about. I was thinking about the spectrum of localizer friendly and unfriendly software. At the one extreme you have cutting edge Open Source developers working on the next generation of localization (also known as l20n, one up from l10n) and on the other you have… well, troglodytes. Since I don’t want to turn this into a really complicated lecture about linguistic features, I’ll pick a fairly straightforward example, the one that actually made me pick up my e-pen in anger. Plurals.

What’s the big deal, slap an -s on? Ummm. No. Ever since someone decided that counting one-two-lots (ah, I wish I had grown up a !San) was no longer sufficient, languages have been busy coming up with astonishingly complex (or simple) ways of counting stuff. One the one extreme you have languages like Cantonese which don’t inflict any changes on the things they’re counting. So the writing system aside, you just go 0 apple, 1 apple, 2 apple… 100 apple, 1,000 apple and so on.

English is a tiny step away from that, counting 0 apples, 1 apple, 2 apples… 100 apples, 1,000 apples and so on. Spot something already? Indeed. Logic doesn’t really come into it, not in a mathematical sense. By that I mean there is no reason why in Cantonese 0 should pattern with 1, 2 etc but that in English 0 should go with 2, 3, etc. It just does. Sure, historical linguists can sometimes shed light on how these have developed but not very often. On the whole, they just are.

This is where it gets entertaining (for linguists). First insight, there aren’t as many systems as there are languages. So much less than 6,000. In fact, looking at the places where such rules are collected, there are probably less than a 100 different ways (on the planet) for counting stuff. Still fun time though (for linguists). Let me give you a couple of examples. A lot of Slavonic (Ukrainian, Russian etc) languages require up to 3 different forms of a noun:

  • FORM 1: any number ending in 1 (1, 11, 21, 31….)
  • FORM 2: ends in 2, 3 or 4 – but not 12, 13 or 14 (22, 23, 24, 32, 33, 34…)
  • FORM 3: anything else (12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 26, 27…)

That almost makes sense in a way. But we can add a few more twists. Take the resurrected decimal system in Scottish Gaelic. It requires up to 4 forms of a noun:

  • FORM 1: 1 and 11 (1 chat, 11 chat)
  • FORM 2: 2 and 12 (2 chat, 12 chat)
  • FORM 3: 3-10, 13-20 (3 cait, 4 cait, 13 cait, 14 cait…)
  • FORM 4: anything else (21 cat, 22 cat, 100 cat…)

Hang one, you’re saying, surely FORM 1 and FORM 2 could be merged. ’fraid not, because while the word cat makes it look as if they’re the same, if you start counting something beginning with the letter d, n, t, s, the following happens:

  • FORM 1: 1 taigh, 11 taigh
  • FORM 2: 2 thaigh, 12 thaigh
  • FORM 3: 3 taighean, 4 taighean, 13 taighean, 14 taighean…
  • FORM 4: 21 taigh, 22 taigh, 100 taigh…

Told you, fun! Now here’s where it gets annoying. Initially, in the very early days of software, localization mostly meant taking software written in English and translating it into German, French, Spanish, Italian & Co and then a bit later on adding Chinese, Japanese and Korean to the list.

Through a sheer fluke, that worked almost perfectly. English has a very common pattern, as it turns out (one form for 1 and another for anything else) so going from English to German posed no problems in translation. You simple took a pair of English strings like:

  • Open one file
  • Open %d files

and translated them into German:

  • Eine Datei öffnen
  • %d Dateien öffnen

Similarly, going to Chinese also posed no problem, you just ended up with a superfluous string because (I’ll use English words rather than Chinese characters):

  • Open one file
  • Open %d file

also created no linguistic or computational problems. Well, there was the fact that in French 0 patterns with 1, not with the plural as it does in English but I bet at that point English developers thought they were home and dry and ready to tick off the whole issue of numbers and number placeholders in software.

Now I have no evidence but I suspect a Slavonic language like Russian was one of the first to kick up a stink. Because as we saw, it has a much more elaborate pattern than English. Now there was one bit of good news for the developers: although these linguistic setups were elaborate in some cases, they also followed predictable patterns and you only need about 6 categories (which ended up being called ONE, TWO, FEW, MANY, OTHER for the sake of readability – so Gaelic ended up with ONE, TWO, FEW and OTHER for example). Which meant you could write a rule for the language in question and then prep your software to present the translator – and ultimately the user – with the right number of strings for translation. Sure, they look a bit crazy, like this one for Gaelic:

Plural-Forms: nplurals=4; plural=(n==1 || n==11) ? 0 : (n==2 || n==12) ? 1 : (n > 2 && n < 20) ? 2 : 3;\n

but you only had to do it once and that was that. Simples… you’d think. Oh no. I mean, yes, certainly doable and indeed a lot of software correctly applies plural formatting these days. Most Open Source projects certainly do, programs like Linux or Firefox for example have it, which is the reason why you probably never noticed anything odd about it.

One step down from this nice implementation of plurals are projects like Joomla! who will allow you to use plurals but they won’t help you. Let me explain (briefly). Joomla! has one of the more atavistic approaches to localization – they expect translators to work directly in the .ini files Joomla! uses. Oh wow. So to begin with, that DOES enable you to do plurals but to begin with you have to figure out how to say the plural rule of your language in Joomla! and put that into one of the files. In our case, that turned out to be

   public static function getPluralSuffixes($count) {
if ($count == 0 || $count > 19) {
$return =  array(‘0’);
}
elseif($count == 1 || $count == 11) {
$return =  array(‘1’);
}
elseif($count == 2 || $count == 12) {
$return =  array(‘2’);
}
elseif(($count > 2 && $count < 12) || ($count > 12 && $count < 19) {
$return =  array(‘FEW’);
}

Easy peasy. One then has to take the English, for example:

COM_CONTENT_N_ITEMS_CHECKED_IN_0=”No cat”
COM_CONTENT_N_ITEMS_CHECKED_IN_1=”%d cat”
COM_CONTENT_N_ITEMS_CHECKED_IN_MORE=”%d cats”

and change it to this for Gaelic:

COM_CONTENT_N_ITEMS_CHECKED_IN_1=”%d cat”
COM_CONTENT_N_ITEMS_CHECKED_IN_2=”%d chat”
COM_CONTENT_N_ITEMS_CHECKED_IN_FEW=”%d cait”
COM_CONTENT_N_ITEMS_CHECKED_IN_OTHER=”%d cat”

Unsurprisingly, most localizers just can’t be bothered doing the plurals properly in Joomla!.

Ning is another project in this category – they also required almost as many contortions as Joomla! but their mud star is for having had plural formatting. And then having ditched it because allegedly the translators put in too many errors. Well duh… give a man a rusty saw and then complain he’s not sawing fast enough or what?

And then there are those projects which stubbornly plod on without any form of plural formatting (except English style plurals of course). The selection of programs which are still without proper plurals IS surprising I must say. You might think you’d find a lot of very old Open Source projects here which go back so far that no-one wants to bother with fixing the code. Wrong. There are some fairly new programs and apps in this category where the developers chose to ignore plurals either through linguistic ignorance or arrogance. Skype (started in 2003) and Netvibes (2005) for example. Just for contrast, Firefox was born in 2002 and to my knowledge always accounted for plurals.

Similarly, some of them belong to big software houses which technically have the money and manpower to fix this – such as Microsoft. Yep, Microsoft. To this date, no Microsoft product I’m aware of can handle non-English type plurals properly in ANY other language. Russians must be oddly patient when it comes to languages cause I get really annoyed when my screen tells me I have closed 5 window

A lot of software falls somewhere between the two extremes – I guess it’s just the way humans are, looking at the way we build our cities into and onto and over older bits of city except when it all falls down and we have to (or can?) start from scratch. But that makes it no less annoying when you’re trying to make software sound less like a robot in translation than it has to…

PS: I’d be curious to know which program first implemented plurals. I’m sort of guessing it’s Linux but I’m not old enough to remember. Let me know if you have some insights?

PPS: If you’re a developer and want to know more about plurals, I recommend the Unicode Consortium’s page on plurals as a starting point, you can take it from there.

When peer review goes pear shaped

29/01/2014 2 comments

Well I’m glad I asked. What happened was this…

I had a request from someone asking if I could localize TinyMCE (a WYSIWYG editor – think of it as a miniature form of Word sitting within a website) so they could use it on their website for their Gaelic-speaking editors. There aren’t that many strings and the project is handled on Transifex using po files so the process seemed straight-forward too (if you don’t know what a po file is  – the main thing about them is that there are many translation memory packages which handle them and, if you have already done LibreOffice or something like that and stored those strings in the memory, there will be few strings in a project like TinyMCE for which there are no translation memory suggestions. In a nutshell – it allows an experienced software translator to work much faster).

So off I go. Pretty much a cake-walk, half a Bond film and 2 episodes of Big Bang later, the job was done. Now in many cases once a language has been accepted for translation and when you have translated all or at least most of the project, these translations will show up in the released program eventually. But just because I’m a suspicious old fart (by now), I messaged the admins and asked about the process of getting them released. Good thing too. Turns out they use an API to pull the translations from Transifex and onto their system (they’ve basically automated that step, which I can understand). The catch however is that it only grabs translations set to Reviewed.

Cue a groan from me. To cut the TinyMCE story short at this point, it seems this is down to Transifex (at least according to the TinyMCE admin) so they were quite happy for me to just breeze through them and set them to Reviewed myself. Fortunately it wasn’t a large job so 15 minutes later (admittedly, I have a about 14 other jobs on my desk just now which I would have rather done…), they were all set, thank goodness to keyboard shortcuts.

But back to the groan. I have come across this approach before and on the face of it, it makes sense. If you do community translation (i.e. you let a bunch of volunteers from the web translate into languages you as admins don’t understand and don’t have time to QA) but you’d like to have at least some measure of QA over the translations, by adding this step of peer reviewing, you can be at least more or less sure that you’re not getting ‘Jamie is a dork’ and ‘Muahahaha’ type translations.

The only problem is, peer review in online localization relies on large number of volunteers. Only a small percentage of speakers have any inclination towards translating pro bono publico and even fewer feel like reviewing other people’s translations (there is something slightly obscene about proofreading, it’s like having someone else put words in your mouth, they almost always taste funny…). I once did some rough and ready stats on the percentages of people of a given language who will be engaged in not-for-profit localization (of mainstream projects like Firefox or LibreOffice). It’s about ONE active localizer for every 500,000 speakers. So German can call upon something like 20 really active localizers. Scottish Gaelic on the other hand statistically has … well, it has less than 60,000 speakers. You work it out. So it’s seriously blessed by having TWO of them.

In any case, even if you disbelieve my figures (I’d be the first to admit to not being great shakes at numbers), the percentages are really small. So if you set up a translation process that necessitates not only translation but also peer review, you’re essentially screwing small languages because the chances are there will never be a reviewer with enough time or energy (never mind ability) to review stuff. It’s one of the reasons why we haven’t touched WhatsApp yet, they simply won’t let a translation into live without review.

So if you design a process like that and want to make sure you’re not creating big problems for smaller languages (and we’re not just talking Gaelic-style tiny languages, even languages like Kazakh or Estonian have such problems) make sure you

  • allow enough wriggle-room to over-ride such requirements, for example by allowing a localizer to demonstrate their credentials (for example through long-term participation in other projects) and
  • design a system where, if it’s absolutely necessary to set specific tags, admins can bulk-tag translations for a certain language.

Over and out.

Needle in a haystack

09/02/2013 5 comments

It’s been a strange sort of end to the week. I e-met a new language and came face to face with a linguistic, digital needle in a cyberhaystack. Ok, I’m not making much sense so far, I know… just setting the scene!

We all know Skype, the new version of which (quoting my hilarious brother) “convinces through less functionality and more bugs”.  Back when Skype still belonged to itself, I eventually discovered the fact that, at least on Windows, it’s pretty easy to localize. You go to Tools » Change Language » Edit Skype Language file and right down there where everyone can see it, you have the option to save the English.lang file (which contains the English strings) under a new name and add your own translation. So back in 2011 I started working on a Gaidhlig.lang and by early 2012 had finally caught up with all the updates that kept getting in the way.

LiNiha

The Li Niha (Nias) interface

 

What does one do when one has completed a translation? Sure, you submit it to the project and ask them to bundle it, release it, whatever. Not so fast, buckoes… Due to “size issues” (I’d like to remind everyone at this point that currently, a full language file weighs in at a massive 400KB), Skype only bundles the usual 20 or so suspects, CJK (that’s Chinese, Japanese and Korean) and a bunch of European languages with the install file. Since they never though of adding an Install new language function that could pull a file from some repository, the short of it was that even having localized the lot, you were on your own. Sure, you could post the file as an attachment on the forum but then who goes trawling through a forum in search of a language file?

Using the usual “Gaelic” channels, I think we’ve reached a reasonable number of people so far but certainly less than we would have reached had it been “inside the program itself.

But before I knock the old forum too much, I should point out that it actually had a dedicated localization section. Why do I mention this? Because, moving to the next episode where we finally meet Mr Big, when Skype was bought by Microsoft, the forums were wiped and *cough* improved. That’s right, the localization section went. Especially the parts where people were trying very hard to figure out how to turn a .lang file into something that Linux and MacOS could digest. Am I glad I took copies of the bits that were useful…

Anyway, even in the new forum, the localization questions never went away. But the stock answer of the one admin who bothers to check that corner is always that “there’s no news”. In fairness, I don’t think he actually has the power to do anything, he’s just the unfortunate person who has to interact with, shock and horror, the users. So even though Skype was first launched in 2003, here we are in 2012 still asking the same questions – why can’t you bundle our language, why can’t we convert/localize the files for MacOS/Linux and how about frickin plural formatting?

Yep, “there’s no news”. The chap working on Welsh then had an interesting suggestion – can’t we host them on SourceForge? You see, the problem with distributing the files via the forum is that once your post moves off the first page, who’s going to see it? So, brilliant idea I thought and we went about setting up a project. Nothing fancy, just the .lang files which don’t come bundled with Skype and a few Wiki pages with guidance.

Seeing I had a quiet day and since my contributions in terms of code are… amusing, I decided to hit the web to locate all the .lang files out there, or as many of them as possible anyway – I may suck at code but I rock at websearches! Half a day later, I had the most amazing collection of languages. Some I had known about – Gaelic, Welsh, Cornish, Irish and Uyghur – as their translators had been active on the forum. Some were part of the usual suspects but some were totally unexpected and one I’d never even heard about which is, as a matter of fact, rather unusual. So in the end, we had:

  1. Adyghe
  2. Afrikaans
  3. Albanian
  4. Armenian
  5. Basque
  6. Breton
  7. Chuvash
  8. Cornish
  9. Erzya
  10. Esperanto
  11. Faroese
  12. Gaelic
  13. Irish
  14. Ligurian
  15. Macedonian
  16. Mirandese
  17. Nias
  18. Tajik
  19. Tamil
  20. Uyghur (Persion and Latin script)
  21. Welsh

Definitely wow. Admittedly, not all are complete but it’s still one of the most diverse lists I’ve ever come across, even if there are no languages from the Americas in the list. Especially Adyghe, Chuvash and Erzya are not languages you normally see on localization projects. And Nias I had never even heard about. Turns out it’s a language of some 700,000 speakers off the coast of Sumatra. That certainly cheered me up. Yeah I know, geek 🙂

But what made me shake my head all afternoon was something else – the lengths I had to go to in manipulating my websearches and the places I found some of them. Gaelic I had, Welsh, Albanian and Cornish came of Skype’s forum. Basque (normally a rather well organized language) I found embedded as a .obj file on some archived forum post. Adyghe, Chuvash and Erzya came of some websites that looked a bit like a forum where someone had posted, in the case of Erzya without linebreaks, the translations – in two cases, with the Russian strings still embedded so I had to strip those out first before creating the .lang files. Armenian came out of a public DropBox and Breton off the Ofis ar Brezhoneg website. Afrikaans was on some unlinked page on someone’s personal website. Esperanto was on the Wiki of the Universala Esperanto Asocio but it took me some time to figure that in order to get the strings, I had to trawl through the page history as someone had at some point – accidentally or deliberately – deleted them. Mirandese and Nias were in some silent loop on abandoned university websites – probably student projects from long ago. And one came off a file sharing site, I forget which, making me seriously wonder if I was downloading porn, a virus or actually the .lang file. I actually even found Kurdish but the people who did that seem to have accidentally stripped out the string names so having explained the problem, they’re trying to match them together again as my Kurdish isn’t that baş.

I didn’t quite know whether to congratulate myself or whether to cry. All that effort, all those wonderfully selfless people putting their time and effort into translating something into their language. And then, because the people making money off it couldn’t be bothered, we ended up with these needles in the cyberhaystack. Crying is still an option I feel…

It’s nice to know they’re on SourcForge now (check out SkypeInYourLanguage) and that there’s a few people willing to put some time into making the process a bit better but by gum guys… if people are actually willing to help you make more money by making your product available in more languages, how about giving them a leg up, rather than the finger?

Wishful thinking à la Bretonne

03/02/2013 8 comments

Have you noticed that sometimes developers DO get it right but then are faced with strange user behaviours? No, I’m not talking about developers thinking that something should be the case, which isn’t. I’m talking about a strange chain of events on Facebook which makes me doubt the motivation of some language activists (yes, we’re allowed to self-criticize guys!).

We all know about Facebook. What we don’t all know about Facebook is that they have a pretty bizarre approach to translations (we can hardly call it localization…) and I don’t mean the fact they, for the most part, rely on community volunteers. No, it’s the process. There’s no clear process of adding or registering a new project and heaven knows how they actually pick the languages. At one point, Rumantsch was in (it now isn’t, no idea how it got in or why it’s now out, it’s a fairly small language with between 35,000 and 60,000 speakers), as is Northern Sami, Irish, Mongol and the usual big boys, including some questionable choices like Leet Speak and Pirate. So most languages are out. Not surprisingly, this has led to a number of Facebook groups and campaigns by people trying to get their  languages into the project. There used to be a project page full of posts along the lines of “please add my language” and “how do we get Facebook to add our language?” – universally met with thundering silence. Admins were rarer than Lord Howe Island stick insects.

Back in whenever, a chap called Neskie Manuel had a crafty idea, about getting his language, Secwepemctsín, onto Facebook. Why not, he figured, find a way of overlaying Facebook with a “translation skin” in order to make the process of translation (and in this case even localization) independent of Facebook & Co? It was a neat idea, which was somewhat interrupted by his sad and untimely death.

Now, round about the same time, two things happened. The Bretons set up a “Facebook in Breton” compaign. Fair enough. And a chap called Kevin Scannell took on board Neskie’s Facebook idea. Excellent. Before too long, the Facebook group had over 12,000 members and Kevin had released his script for a slew of amazing languages. It overlays not all of Facebook but just the most visible strings (the one’s we see daily, not the boring EULAs and junk). Even more amazingly, it can handle stuff Facebook hasn’t even woken up to yet, such as plurals, case marking and so on. Wow indeed.

The languages hailed from the four corners of the planet, from Aragonese, Manx and Nawat through Hiligaynon, Secwepemctsín, Samoan, K’iche’ and Māori to Kunwinjku and Gundjeihmi (two Australian languages). Wow indeed. And, of course Breton.

Now here’s the bizarre thing though. Ok, it’s not the full thing but who’d turn down a sandwich while waiting for a roast chicken that might never appear? No one, you’d think, so based on a combined market share of some 50% between Firefox and Chrome, some 200,000 speakers and 12,000 people in the “Facebook in Breton” group, you’d expect what, anything north of 6,000 enthusiastic users of the Breton script. After all, more than 1,100 people installed it in Scottish Gaelic (less than 60,000 speakers) and more than 500 people in Manx (way less than 2,000 fluent speakers).

A case of “you’d think” indeed. To date, a mind-boggling 450 people have installed it in Breton. As far as I can tell, the translation is good and was done by a single, highly fluent speaker (Fulup Jakez who works for Ofis ar Brezhoneg). So it’s not a quality issue. The scripts work (I use the Gaelic one) so it’s not that either. The Facebook group was notified several times, so it’s not like they didn’t know. Ok, so maybe not all Likes of the group actually are from speakers, fair enough, but glancing through the active posters, a lot of them seem to be in the right “linguistic area”.

So while the groupies are still foaming at the mouth about the lack of support from Zuckerberg and Co, there’s a perfectly good interim that would allow you to say Kenavo to French and Degemer mat to Breton on Facebook every day. I really don’t get it. Is it really the case that some activists are more in love with the idea of the thing than would actually use it if it was around? Or am I missing something really obvious? I sure hope I am…

On a more positive note, I hope the general idea of this type of “overlay” will eventually take off big time. We will never be able to convince the big boys to support all the languages on the planet, all of which are equally worthy of services in their own languages, whether they’re trying to re-grow lost speakers or whether they’re just a small to medium sized community. So having a tool that puts control over what we see on our screens into our hands would be great. No more running from company to company trying to make the case for adding language X, a little less duplication (I don’t know how many zillion times I’ve translated “Edit picture”), better quality and more focus on the important bits of an interface to translate (not the EULA for example… a document that sadly every software company is keen to have translated as soon as possible without ever asking who’ll read it). Ach well, I can hope…

The what keys?

02/06/2012 4 comments

I’ve just been through a head-scratching exercise and before you suggest anti-dandruff shampoos, it was about access keys. Yes… or was it shortcut keys? Or a hotkey? Or a quick key? Which sums up part of the problem – there’s too damn many of them. Now the basic idea is solid – access keys are keyboard combinations which allow you to instruct your computer to carry out frequently used tasks without having to click through the menu. So far, so good. For examply, on Windows CTRL c has long been the command for copy and CTRL v for paste. Then there’s CTRL z for undo and… errr… yes, to be honest, that’s all I ever use and I use PCs a lot, more than I care to think.

I don’t know who invented the first access key but our friends the consumers-of-too-many-pizzas must have thought this was brilliant. If copy and paste access keys are good, surely there must be other useful ones… like for open, save, close, tools, help, save as, pluck a chicken, pick your nose… and soon the whole program was peppered with the damn things. Not only that one program of course … wherever it has started, it soon spread to the rest and like the thing about electric plugs, everyone used a different name and a different key combination without ever giving a thought to the end user. Was it CTRL j, ALT j, ALTGR j or ALT CTRL SHIFT j? Or ALT OPTION or hang on, that was my DoodleBug program on Windows, I’m now on a Mac in VLC. Should I use the Apple button or Fn?

I bet if you did some research, you’d find that a lot of people only ever use a minute fraction of the available access/shortcut/whatever keys. Anecdotal evidence would suggest that the smart everyday user knows less than half a dozen and that most know none at all. And certainly no one uses them to navigate 5 levels down to the Proxy Settings of their browser. Yes, there have been attempts to streamline them but that again ignored the basic question of “do we need them” or “how many do we need”? In any cases, the attempts have been as successful as moves to standardise electric plugs or convince China that the rule of law is a good thing.

So what does this have to do with localization and my headscratching? Well, unfortunately no one bother either to automate the process. Which means that when you localize software, you have to manually add them. Now if the localization process in general was smarter, then that might sort of work but remember that when localizing software, from Microsoft to LibreOffice, what you essentially get is a table with one language in one column and your translation in another. Certainly no visual context. And usually no info which tells you anything about the scope (as in, which of these appear next to each other). So you’re faced with something like this:

&Fry
Deep fr&y
Stir-&fry
Stea&m
&Boil
Sa&utee
&Oven-roast

And it’s left to you to figure it out. In the above, your guess is as good as mine whether those all appear in the same menu or in two different ones (perhaps the first 3 in a Fry menu and the last in an Other menu). Oh, and did I mention that they don’t even agree on the symbol? In some, it’s &Fry (which gives the end user the line under the Fry), in others you have to to ~Fry and… oh, you get the idea.

So to a half-baked idea, we’ve added a haphazard localization process. Great. Oh, did I mention the guidelines? The ones which say you should put lines under letters with descenders (something dropping down like gjpqy)? Which is usually fine in English with its 26 letters. But Gaelic only has 18 guys, 16 if I have to cut out letters with descenders and even less if I’m instructed not to use thin letters (like ilf). Do I look like the Brahan Seer? I won’t even start on the difficulties that arise in locating said string when you see a wrong access key on screen in testing.

I did take the time to make sure that the most visible ones don’t overlap in programs like LibreOffice and Firefox. But several layers down, to be honest, I can’t be bothered. So I had to remind myself that the nice person who filed a bug on my behalf with a list of them for some several-layers-down-menu-about-frigging-proxies that they’re not responsible for the general mess that are access keys and not to bite their head off. In the end, I did post a condensed version of my reasoning – the fact they’re mostly pointless and as a result, that I don’t have the time and manpower to fix something which the translator shouldn’t have to fix in the first place.

Honestly, don’t people ever take a step back and think?

Dear grumpy Native Speaker

31/05/2012 4 comments

Localization is obviously just a means to an end – the end being the end-user. You know, normal people. So since they’re also part of this process and so that you know I dish out fairly in both directions, not just developers, here’s an instalment which looks at the native-speaking end-user. Because I had a fairly nasty gripe in my inbox. No names but I think we all recognize the type.

First off, I have the utmost respect for native speakers of small languages who have managed to keep their language alive in the face of adversity. Secondly, I do not for one moment believe that any amount of learning can fullyreplace native speaker intuition though I will uphold the argument that in terms of formal grammar and spelling, learners often have a better take on things. Simply due to the differences in process – one learnt at the knee (no flashcards involved), the other using an intimidating array of books (often with too little “knee” involved).  Thus both groups have strengths and weaknesses which can and ought to complement each other. It certainly should not be a dogfight.

A peculiar paradox arises out if this situation though which many of you will recognize. When it comes to breaking into new territory for language X, it’s usually learners who do that. I’m sure you could write entire PhDs on the topic but on the whole, I think it’s fair to say that learners simply don’t put up with the argument that “language X has never been used for technology Y before”. They’ve always used, say, a browser and therefore they want it in their chosen language X. Again the two groups behave differently. On the whole, the native speakers assumes it doesn’t exist and that it can’t be done. The learner will go and look and if there isn’t one, will do something about it. As in, they sign up to a project like Mozilla Firefox and put in hours and hours of their own time to translate it.

Here’s the paradox. In the translation industry you’re usually only hired to translate into your native language because only native speakers are attuned to the nuances of their language. You usually also have to demonstrate competence in grammar and spelling. But in the world of small languages, such people are rare. Very rare. Literacy is usually lower amongst native speakers than learners because the mainstream education system doesn’t cater for the language. But very rarely do you find a learner who can’t read and write the language. So we get a situation where the people with the best linguistic skills are the least likely people to be found on a project like Firefox or LibreOffice.

Before you get visions of linguistic horror – the outcome is usually not that bad. Once in a while you come across real junk but on the whole, translations of software into small languages usually range from ok to good. Some are very good. While learners can go a bit neologism-happy now and then, what native speakers tend to forget is that when any language breaks into a new domain, it will sound a bit weird. Think about a really technical manual in your native language – does that roll off your tongue, does it ensure immediate comprehension by a non-specialist? But we’ll leave that debate for another day.

And before we get too carried away blaming the education system, there obviously are native speakers of small languages with high levels of literacy, especially in Europe. But for some reason, they often don’t get involved. I have my views on why that is but I don’t want this to become a rant. Let’s just say that they don’t, for the most part.

Now, my time is a limited as that of a native speaker. I enjoy the sunshine and going for walks too. My point is, before you send a rather nasty message off to someone the next time complaining that “no native speaker would have ever translated X like that”, albeit in rather lovely, native-sounding, well-spelled and grammar-checked language, ask yourself this question: Have you volunteered your time to the project in question to ensure the outcome is as good as can be? Cause if you haven’t, then I really don’t want to hear from you.