Archive

Archive for the ‘Crowdsourcing’ Category

When peer review goes pear shaped

29/01/2014 2 comments

Well I’m glad I asked. What happened was this…

I had a request from someone asking if I could localize TinyMCE (a WYSIWYG editor – think of it as a miniature form of Word sitting within a website) so they could use it on their website for their Gaelic-speaking editors. There aren’t that many strings and the project is handled on Transifex using po files so the process seemed straight-forward too (if you don’t know what a po file is  – the main thing about them is that there are many translation memory packages which handle them and, if you have already done LibreOffice or something like that and stored those strings in the memory, there will be few strings in a project like TinyMCE for which there are no translation memory suggestions. In a nutshell – it allows an experienced software translator to work much faster).

So off I go. Pretty much a cake-walk, half a Bond film and 2 episodes of Big Bang later, the job was done. Now in many cases once a language has been accepted for translation and when you have translated all or at least most of the project, these translations will show up in the released program eventually. But just because I’m a suspicious old fart (by now), I messaged the admins and asked about the process of getting them released. Good thing too. Turns out they use an API to pull the translations from Transifex and onto their system (they’ve basically automated that step, which I can understand). The catch however is that it only grabs translations set to Reviewed.

Cue a groan from me. To cut the TinyMCE story short at this point, it seems this is down to Transifex (at least according to the TinyMCE admin) so they were quite happy for me to just breeze through them and set them to Reviewed myself. Fortunately it wasn’t a large job so 15 minutes later (admittedly, I have a about 14 other jobs on my desk just now which I would have rather done…), they were all set, thank goodness to keyboard shortcuts.

But back to the groan. I have come across this approach before and on the face of it, it makes sense. If you do community translation (i.e. you let a bunch of volunteers from the web translate into languages you as admins don’t understand and don’t have time to QA) but you’d like to have at least some measure of QA over the translations, by adding this step of peer reviewing, you can be at least more or less sure that you’re not getting ‘Jamie is a dork’ and ‘Muahahaha’ type translations.

The only problem is, peer review in online localization relies on large number of volunteers. Only a small percentage of speakers have any inclination towards translating pro bono publico and even fewer feel like reviewing other people’s translations (there is something slightly obscene about proofreading, it’s like having someone else put words in your mouth, they almost always taste funny…). I once did some rough and ready stats on the percentages of people of a given language who will be engaged in not-for-profit localization (of mainstream projects like Firefox or LibreOffice). It’s about ONE active localizer for every 500,000 speakers. So German can call upon something like 20 really active localizers. Scottish Gaelic on the other hand statistically has … well, it has less than 60,000 speakers. You work it out. So it’s seriously blessed by having TWO of them.

In any case, even if you disbelieve my figures (I’d be the first to admit to not being great shakes at numbers), the percentages are really small. So if you set up a translation process that necessitates not only translation but also peer review, you’re essentially screwing small languages because the chances are there will never be a reviewer with enough time or energy (never mind ability) to review stuff. It’s one of the reasons why we haven’t touched WhatsApp yet, they simply won’t let a translation into live without review.

So if you design a process like that and want to make sure you’re not creating big problems for smaller languages (and we’re not just talking Gaelic-style tiny languages, even languages like Kazakh or Estonian have such problems) make sure you

  • allow enough wriggle-room to over-ride such requirements, for example by allowing a localizer to demonstrate their credentials (for example through long-term participation in other projects) and
  • design a system where, if it’s absolutely necessary to set specific tags, admins can bulk-tag translations for a certain language.

Over and out.

Needle in a haystack

09/02/2013 5 comments

It’s been a strange sort of end to the week. I e-met a new language and came face to face with a linguistic, digital needle in a cyberhaystack. Ok, I’m not making much sense so far, I know… just setting the scene!

We all know Skype, the new version of which (quoting my hilarious brother) “convinces through less functionality and more bugs”.  Back when Skype still belonged to itself, I eventually discovered the fact that, at least on Windows, it’s pretty easy to localize. You go to Tools » Change Language » Edit Skype Language file and right down there where everyone can see it, you have the option to save the English.lang file (which contains the English strings) under a new name and add your own translation. So back in 2011 I started working on a Gaidhlig.lang and by early 2012 had finally caught up with all the updates that kept getting in the way.

LiNiha

The Li Niha (Nias) interface

 

What does one do when one has completed a translation? Sure, you submit it to the project and ask them to bundle it, release it, whatever. Not so fast, buckoes… Due to “size issues” (I’d like to remind everyone at this point that currently, a full language file weighs in at a massive 400KB), Skype only bundles the usual 20 or so suspects, CJK (that’s Chinese, Japanese and Korean) and a bunch of European languages with the install file. Since they never though of adding an Install new language function that could pull a file from some repository, the short of it was that even having localized the lot, you were on your own. Sure, you could post the file as an attachment on the forum but then who goes trawling through a forum in search of a language file?

Using the usual “Gaelic” channels, I think we’ve reached a reasonable number of people so far but certainly less than we would have reached had it been “inside the program itself.

But before I knock the old forum too much, I should point out that it actually had a dedicated localization section. Why do I mention this? Because, moving to the next episode where we finally meet Mr Big, when Skype was bought by Microsoft, the forums were wiped and *cough* improved. That’s right, the localization section went. Especially the parts where people were trying very hard to figure out how to turn a .lang file into something that Linux and MacOS could digest. Am I glad I took copies of the bits that were useful…

Anyway, even in the new forum, the localization questions never went away. But the stock answer of the one admin who bothers to check that corner is always that “there’s no news”. In fairness, I don’t think he actually has the power to do anything, he’s just the unfortunate person who has to interact with, shock and horror, the users. So even though Skype was first launched in 2003, here we are in 2012 still asking the same questions – why can’t you bundle our language, why can’t we convert/localize the files for MacOS/Linux and how about frickin plural formatting?

Yep, “there’s no news”. The chap working on Welsh then had an interesting suggestion – can’t we host them on SourceForge? You see, the problem with distributing the files via the forum is that once your post moves off the first page, who’s going to see it? So, brilliant idea I thought and we went about setting up a project. Nothing fancy, just the .lang files which don’t come bundled with Skype and a few Wiki pages with guidance.

Seeing I had a quiet day and since my contributions in terms of code are… amusing, I decided to hit the web to locate all the .lang files out there, or as many of them as possible anyway – I may suck at code but I rock at websearches! Half a day later, I had the most amazing collection of languages. Some I had known about – Gaelic, Welsh, Cornish, Irish and Uyghur – as their translators had been active on the forum. Some were part of the usual suspects but some were totally unexpected and one I’d never even heard about which is, as a matter of fact, rather unusual. So in the end, we had:

  1. Adyghe
  2. Afrikaans
  3. Albanian
  4. Armenian
  5. Basque
  6. Breton
  7. Chuvash
  8. Cornish
  9. Erzya
  10. Esperanto
  11. Faroese
  12. Gaelic
  13. Irish
  14. Ligurian
  15. Macedonian
  16. Mirandese
  17. Nias
  18. Tajik
  19. Tamil
  20. Uyghur (Persion and Latin script)
  21. Welsh

Definitely wow. Admittedly, not all are complete but it’s still one of the most diverse lists I’ve ever come across, even if there are no languages from the Americas in the list. Especially Adyghe, Chuvash and Erzya are not languages you normally see on localization projects. And Nias I had never even heard about. Turns out it’s a language of some 700,000 speakers off the coast of Sumatra. That certainly cheered me up. Yeah I know, geek 🙂

But what made me shake my head all afternoon was something else – the lengths I had to go to in manipulating my websearches and the places I found some of them. Gaelic I had, Welsh, Albanian and Cornish came of Skype’s forum. Basque (normally a rather well organized language) I found embedded as a .obj file on some archived forum post. Adyghe, Chuvash and Erzya came of some websites that looked a bit like a forum where someone had posted, in the case of Erzya without linebreaks, the translations – in two cases, with the Russian strings still embedded so I had to strip those out first before creating the .lang files. Armenian came out of a public DropBox and Breton off the Ofis ar Brezhoneg website. Afrikaans was on some unlinked page on someone’s personal website. Esperanto was on the Wiki of the Universala Esperanto Asocio but it took me some time to figure that in order to get the strings, I had to trawl through the page history as someone had at some point – accidentally or deliberately – deleted them. Mirandese and Nias were in some silent loop on abandoned university websites – probably student projects from long ago. And one came off a file sharing site, I forget which, making me seriously wonder if I was downloading porn, a virus or actually the .lang file. I actually even found Kurdish but the people who did that seem to have accidentally stripped out the string names so having explained the problem, they’re trying to match them together again as my Kurdish isn’t that baş.

I didn’t quite know whether to congratulate myself or whether to cry. All that effort, all those wonderfully selfless people putting their time and effort into translating something into their language. And then, because the people making money off it couldn’t be bothered, we ended up with these needles in the cyberhaystack. Crying is still an option I feel…

It’s nice to know they’re on SourcForge now (check out SkypeInYourLanguage) and that there’s a few people willing to put some time into making the process a bit better but by gum guys… if people are actually willing to help you make more money by making your product available in more languages, how about giving them a leg up, rather than the finger?

Wishful thinking à la Bretonne

03/02/2013 8 comments

Have you noticed that sometimes developers DO get it right but then are faced with strange user behaviours? No, I’m not talking about developers thinking that something should be the case, which isn’t. I’m talking about a strange chain of events on Facebook which makes me doubt the motivation of some language activists (yes, we’re allowed to self-criticize guys!).

We all know about Facebook. What we don’t all know about Facebook is that they have a pretty bizarre approach to translations (we can hardly call it localization…) and I don’t mean the fact they, for the most part, rely on community volunteers. No, it’s the process. There’s no clear process of adding or registering a new project and heaven knows how they actually pick the languages. At one point, Rumantsch was in (it now isn’t, no idea how it got in or why it’s now out, it’s a fairly small language with between 35,000 and 60,000 speakers), as is Northern Sami, Irish, Mongol and the usual big boys, including some questionable choices like Leet Speak and Pirate. So most languages are out. Not surprisingly, this has led to a number of Facebook groups and campaigns by people trying to get their  languages into the project. There used to be a project page full of posts along the lines of “please add my language” and “how do we get Facebook to add our language?” – universally met with thundering silence. Admins were rarer than Lord Howe Island stick insects.

Back in whenever, a chap called Neskie Manuel had a crafty idea, about getting his language, Secwepemctsín, onto Facebook. Why not, he figured, find a way of overlaying Facebook with a “translation skin” in order to make the process of translation (and in this case even localization) independent of Facebook & Co? It was a neat idea, which was somewhat interrupted by his sad and untimely death.

Now, round about the same time, two things happened. The Bretons set up a “Facebook in Breton” compaign. Fair enough. And a chap called Kevin Scannell took on board Neskie’s Facebook idea. Excellent. Before too long, the Facebook group had over 12,000 members and Kevin had released his script for a slew of amazing languages. It overlays not all of Facebook but just the most visible strings (the one’s we see daily, not the boring EULAs and junk). Even more amazingly, it can handle stuff Facebook hasn’t even woken up to yet, such as plurals, case marking and so on. Wow indeed.

The languages hailed from the four corners of the planet, from Aragonese, Manx and Nawat through Hiligaynon, Secwepemctsín, Samoan, K’iche’ and Māori to Kunwinjku and Gundjeihmi (two Australian languages). Wow indeed. And, of course Breton.

Now here’s the bizarre thing though. Ok, it’s not the full thing but who’d turn down a sandwich while waiting for a roast chicken that might never appear? No one, you’d think, so based on a combined market share of some 50% between Firefox and Chrome, some 200,000 speakers and 12,000 people in the “Facebook in Breton” group, you’d expect what, anything north of 6,000 enthusiastic users of the Breton script. After all, more than 1,100 people installed it in Scottish Gaelic (less than 60,000 speakers) and more than 500 people in Manx (way less than 2,000 fluent speakers).

A case of “you’d think” indeed. To date, a mind-boggling 450 people have installed it in Breton. As far as I can tell, the translation is good and was done by a single, highly fluent speaker (Fulup Jakez who works for Ofis ar Brezhoneg). So it’s not a quality issue. The scripts work (I use the Gaelic one) so it’s not that either. The Facebook group was notified several times, so it’s not like they didn’t know. Ok, so maybe not all Likes of the group actually are from speakers, fair enough, but glancing through the active posters, a lot of them seem to be in the right “linguistic area”.

So while the groupies are still foaming at the mouth about the lack of support from Zuckerberg and Co, there’s a perfectly good interim that would allow you to say Kenavo to French and Degemer mat to Breton on Facebook every day. I really don’t get it. Is it really the case that some activists are more in love with the idea of the thing than would actually use it if it was around? Or am I missing something really obvious? I sure hope I am…

On a more positive note, I hope the general idea of this type of “overlay” will eventually take off big time. We will never be able to convince the big boys to support all the languages on the planet, all of which are equally worthy of services in their own languages, whether they’re trying to re-grow lost speakers or whether they’re just a small to medium sized community. So having a tool that puts control over what we see on our screens into our hands would be great. No more running from company to company trying to make the case for adding language X, a little less duplication (I don’t know how many zillion times I’ve translated “Edit picture”), better quality and more focus on the important bits of an interface to translate (not the EULA for example… a document that sadly every software company is keen to have translated as soon as possible without ever asking who’ll read it). Ach well, I can hope…

Dear grumpy Native Speaker

31/05/2012 4 comments

Localization is obviously just a means to an end – the end being the end-user. You know, normal people. So since they’re also part of this process and so that you know I dish out fairly in both directions, not just developers, here’s an instalment which looks at the native-speaking end-user. Because I had a fairly nasty gripe in my inbox. No names but I think we all recognize the type.

First off, I have the utmost respect for native speakers of small languages who have managed to keep their language alive in the face of adversity. Secondly, I do not for one moment believe that any amount of learning can fullyreplace native speaker intuition though I will uphold the argument that in terms of formal grammar and spelling, learners often have a better take on things. Simply due to the differences in process – one learnt at the knee (no flashcards involved), the other using an intimidating array of books (often with too little “knee” involved).  Thus both groups have strengths and weaknesses which can and ought to complement each other. It certainly should not be a dogfight.

A peculiar paradox arises out if this situation though which many of you will recognize. When it comes to breaking into new territory for language X, it’s usually learners who do that. I’m sure you could write entire PhDs on the topic but on the whole, I think it’s fair to say that learners simply don’t put up with the argument that “language X has never been used for technology Y before”. They’ve always used, say, a browser and therefore they want it in their chosen language X. Again the two groups behave differently. On the whole, the native speakers assumes it doesn’t exist and that it can’t be done. The learner will go and look and if there isn’t one, will do something about it. As in, they sign up to a project like Mozilla Firefox and put in hours and hours of their own time to translate it.

Here’s the paradox. In the translation industry you’re usually only hired to translate into your native language because only native speakers are attuned to the nuances of their language. You usually also have to demonstrate competence in grammar and spelling. But in the world of small languages, such people are rare. Very rare. Literacy is usually lower amongst native speakers than learners because the mainstream education system doesn’t cater for the language. But very rarely do you find a learner who can’t read and write the language. So we get a situation where the people with the best linguistic skills are the least likely people to be found on a project like Firefox or LibreOffice.

Before you get visions of linguistic horror – the outcome is usually not that bad. Once in a while you come across real junk but on the whole, translations of software into small languages usually range from ok to good. Some are very good. While learners can go a bit neologism-happy now and then, what native speakers tend to forget is that when any language breaks into a new domain, it will sound a bit weird. Think about a really technical manual in your native language – does that roll off your tongue, does it ensure immediate comprehension by a non-specialist? But we’ll leave that debate for another day.

And before we get too carried away blaming the education system, there obviously are native speakers of small languages with high levels of literacy, especially in Europe. But for some reason, they often don’t get involved. I have my views on why that is but I don’t want this to become a rant. Let’s just say that they don’t, for the most part.

Now, my time is a limited as that of a native speaker. I enjoy the sunshine and going for walks too. My point is, before you send a rather nasty message off to someone the next time complaining that “no native speaker would have ever translated X like that”, albeit in rather lovely, native-sounding, well-spelled and grammar-checked language, ask yourself this question: Have you volunteered your time to the project in question to ensure the outcome is as good as can be? Cause if you haven’t, then I really don’t want to hear from you.