How to stonewall Open Source

07/03/2015 3 comments

I seem to be posting a lot about Google these days but then they ARE turning into the digital equivalent of Nestlé.

I’ve been pondering this post for a while and how to approach it without making it sound like I believe in area 52. So I’ll just say what happened and let you come to your own conclusions mostly.

Back when Google still ran the Google in Your Language project, I tried hard to get into Gmail and what was rumoured to be a browser but failed, though they were keen to push the now canned Picasa. <eyeroll> Then of course they canned the whole Google in Your Language thing. When I eventually found out that Google Chrome is technically nothing else than a rebranded version of an Open Source browser called Chromium, I thought ‘great, should be able to get a leg into the door that way’. Think again. So I looked around and was already confused because there did not appear to be a clear distinction between Chromium and Chrome. The two main candidates were Launchpad and Google Code. So January 2011 I decide to file an issue on Google Code, thinking that even if it’s the wrong place, they should be able to point me in the right direction. The answer came pretty quick. Even though the project is called Chromium, they (quote) don’t accept third party translations for chrome. And nobody seems to know where the translations come from or how you become an official translator. A vague reference that I maybe should try Ubuntu.

I gave it some time. Lots of time in fact. I picked up the thread again early in 2013. Now the semi-serious suggestion was to fork Chromium and do my translation on the fork. Very funny. Needless to say, I was getting rather disgusted at the whole affair and decided to give up on Chrome/Chromium.

When I noticed that an Irish translator on Launchpad had asked a similar question about Chromium and saw the answer was they, as far as they know, push the translations upstream to Chromium from Launchpad, I decided I might as well have a go. As someone had suggested, at least I’ll get Chromium on Linux.

Fast forward to October 2014 and I’m almost done with the translation on Launchpad so I figure I better file a bug early because it will likely take forever. Bug filed, enthusiastic response from some admin on Launchpad. Great, I think to myself, should be plain sailing from here on. Spoke too soon. End of January 2015, the translation long completed, I query to silence and only get more silence. More worryingly, someone points me at a post on Ubuntu about Chromium on Launchpad being, well, dead.

Having asked the question in a Chromium IRC chat room, I decided to have another go on Google Code, new bug, new luck maybe? Someone in the room did sound supportive. That was January 28, 2015. To date, nothing has happened apart from someone ‘assigning the bug to l10n PM for triage’.

I’m coming to the conclusion that Chromium has only the thinnest veneer of being open. Perhaps in the sense that I can get a hold of the source code and play around with it. But there is a distinct lack of openness and approachability about the whole thing. Perhaps that was the intention all along, to use the Open Source community to improve the source code but to give back as little as possible and to build as many layers of secrecy and to put as many obstacles in people’s path as possible. At least when it comes to localization.

At least Ubuntu is no longer pushing Chromium as the default browser. But that still leaves me with a whole pile of translation work which is not being used. Maybe I should check out some other Chromium-based browsers like Comodo Dragon or Yandex. Perhaps I’m being paranoid but I’m not keen on software coming from Russia being on my systems or recommending it to other people. Either way, I’m left with the same problem that we have with Firefox in a sense – it would mean having to wean people off pre-installed versions of Google Chrome or Internet Explorer.

Anyone got any good ideas? Cause I’m fresh out of…

The spectre of Google Translate for Gaelic

15/01/2015 3 comments

Not the kind of pre-Christmas cheer I was hoping for, seriously. Slap bang on the 23rd, someone draws my attention to an article called Google urged to go Gaelic. In a nutshell, a left-field (most likely well-intentioned) appeal by an MSP from Central Scotland to add Scottish Gaelic to the list of languages. As the mere thought was nauseating, I made some time and wrote a very long letter to Murdo Fraser, the man in question, with copies going to David Boag at Bòrd na Gàidhlig and Alasdair Allan, minister for languages. As it sums up my arguments quite succinctly (I hoped), I’ll just copy it here:

Just before Christmas, a friend drew my attention to an article in the Courier regarding Google Translate in which Mr Murdo Fraser argues for a campaign to get Scottish Gaelic onto Google Translate.

I’m sure that this is a well-intentioned idea but in my professional opinion, it would have terrible consequences. As one of the few people who work entirely in the field of Gaelic IT, I have a keen interest in technology and the potential benefit – and damage – this offers to languages like Gaelic. As it happens, I also was the Gaelic localizer (i.e. translator) for Google when it was still running the Google In Your Language programme and I have watched (often with dismay) what Google has done in this area since. One of the projects that certainly caught my eye was Google Translate, especially when Irish was added as a language in 2009. But having spoken to Irish people working in this field and having watched the effects of it on the Irish language, I rapidly came to the conclusion that while it looks ‘cool’, being on a machine translation system for a small(er) language was not necessarily a benefit and in some cases, a tragedy.

Without going into too much technical detail, machine translation of the kind that Google does works best with the following ingredients:
– a massive (billions of words) aligned bilingual corpus
– translation between structurally similar languages or
– translation from a grammatically complex language into a less grammatically complex language but not the other way round
– translation of short, non-colloquial phrases and sentences but not complex, colloquial or literary structures

In essence, machine translation trains an algorithms in ‘patterns’, which is why massive amounts of data are needed and why it works better from a complex language into a less complex language. For example, it is relatively easy to teach the system that German der/die/das require ‘the’ in English, but it requires a massive amount of data for the system to become clever enough to understand when ‘the’ becomes ‘der’ but not ‘die’.

Unfortunately for Irish, none of these conditions were met – and would also not be met for Scottish Gaelic. To begin with, even if we digitized all the works ever produced which exist in English and Gaelic, the corpus would still be tiny by comparison to the German/English corpus for example.

Then there is the issue of linguistic distance, Irish/Gaelic and English are structurally very different, with Gaelic/Irish having a lot more in the way of complex grammatical structures than English. To compensate for this, the corpus would have to be truly massive. Which is why the existing Irish/English system is extremely poor by anyone’s standards.

One might argue that the aim is not a perfect translation system but a means of accessing information only available in other languages – which is the case for many of the languages which are on Google Translate. But I’m doubtful if the reverse is true. To begin with, no fluent Gaelic speaker requires a Gaelic > English translation system and there is preciously little which is published in Gaelic in digital form which does not also exist in English. All this would do is remove yet another reason for learning Gaelic.

That would leave English > Gaelic and herein lies the tragedy of the English/Irish pairing on Google Translate. Whatever the intentions of the developers, people will mis-use such a system. I have put together a few annotated photos which illustrate the scale of the disaster in Ireland here. From school reports to official government websites, there are few places where students, individuals or officials trying to cut corners have not used Irish translations of Google Translate in ways they were not intended to be used.

If there HAD been a Gaelic/English pair, Police Scotland would have been an even bigger target of ridicule because such an automated translation would have produced gibberish at worst and absurd semi-Gaelic at best.

I think we can all agree that the last thing Gaelic needs is masses of poor quality translations floating around the internet. Funding is extremely short these days and this would, in my view, be a poor use of these scarce funds. There are more pressing battles to be fought in the field of Gaelic and IT, such as the refusal by the 3rd party suppliers of IT services to Gaelic schools and units to provide (existing) Gaelic software or even a keyboard setting in any school that allows students to easily input accented characters, be that for Gaelic, Spanish or French.

is mise le meas mòr,

Turns out I wasn’t the only one horrified by the mere thought – John Storey also wrote a very long and polite letter.

Early in January and within days of each other, both John and I received almost identical responses which, in a nutshell, said ‘Thanks but I’ll keep trying anyway’. Even less encouragingly, it make some really irrelevant reference to the lack of teachers in Gaelic Medium Education. Which is true of course but well, not relevant?

Thank you for contacting me in relation to Scots Gaelic and Google Translate and for your detailed correspondence.

I appreciate the depth of your letter and note your concerns in relation to issues of accuracy and the potential impact to speakers of Gaelic of Google translate. I will be sure to consider these when next speaking on the subject.

I also agree that there are other battles to be fought in the field of Gaelic and IT and appreciate the current issues surrounding the number of teachers in Gaelic Medium Education.  However, I do believe it is worth promoting the case for a more accessible Gaelic presence online and without this I believe that Gaelic could miss out on the massive opportunities afforded by the digital age.

I’m still waiting for a response from Bòrd na Gàidhlig or Alastair Allan. But I’m not encouraged. Really frustrated actually because (at least as the Press & Journal and the Perthshire Conservatives would have it), it seems like Bòrd na Gàidhlig and Alastair Allan are throwing their weight behind this ill-fated caper.

I really hope Google turns them down because I really don’t want to end up where the Irish IT specialists ended up – the merry world of “Told you so”…

But sadly “Got Gaelic onto Google” probably just sounds sexier on your CV than “Banged some desks and made sure all kids in Gaelic Medium Education can now easily type àèìòù”…

Is Google getting a bit muahahaha?

08/04/2012 6 comments

Aye, Google… I’m rather disappointed at them these days I must say. It was a really exciting project in the beginning when I joined their Google in Your Language project. Gosh, I thought, they actually promise “Google believes that fast and accurate searching has universal value. That’s why we are eager to offer our service in all the languages scattered upon the face of the earth.” – how unusually enlightened for a software company, sign me up. Which I did, along with hundreds of other volunteers, putting in hundreds of hours of our time … well, you all know how it works. They did give us a t-shirt at one point, mind. In hindsight, the fact they did that rather than given each one of us, say, a dozen shiny Google shares should have set off some warning bells but hindsight is a great thing.

Initially, all languages were pretty much on a par but soon, inequality started creeping in. While other languages (the big ones) were getting jazzed up search interfaces, smaller languages like Gaelic weren’t. And I also began to realise that Google did not enable all translation projects (they come as separate sub-projects) for all languages, per default or on request. Things like Gmail or GoogleDocs. Ah, requests… that kinda implies communication, doesn’t it? We did have the google.public.translators group but as you might guess, admins were thin on the ground. Many questions and issue were left unanswered so while other projects got fancy with plural formatting and translation memories and suchlike, Google stuck to the if-it’s-not-in-English-we-don’t-want-to-know approach. Initially, I decided that, the company being a startup, this was down to limited resources and that change would come. Change did come to the coffers of Google but not to the localization teams.

More and more English kept creeping in, to the extent that I began to wonder how many people were still using the localized interfaces when they offer perhaps 10% of the overall functionality of Google. Yet, I kept telling myself it would get better. Hm.

I got briefly excited over Google Chrome.. very briefly mind. I foolishly assumed that something as important as this would automatically be made available to all teams. Nope. I emailed those precious few people at Google whose emails I had. No answer. Not to that particular question, but perhaps asking two questions in the same email is too demanding. So I start hitting the web in search of answers. I did get some, but everyone gave me a different one… some said that localizing Chromium would result in a localized Google Chrome, others contradicted that. No one over on the Linux side really seemed to know, answers again ranging from yes through maybe to no way. I’m still waiting for a definitive answer. A project to “move the web forward” indeed.

I’ve even written a very nice if somewhat disappointed letter to Google. That was back in January. Meanwhile, google.public.translators keeps coming and going on and offline and the newest post is from 2008. I posted earlier this year, asking where everyone was. Mysteriously, the post has disappeared. I deduce that admins are watching, but not communicating.

All in all, I’m feeling very bitter I must say. More so than over the OpenOffice thing. I still keep the User Interface for Gaelic up to 100% but in all honesty, if someone comes up with a good open source search engine, I’ll decamp. Google has been successful not only due to its fancy algorithms but also due to the many volunteers who made the interface available in their languages. If Google had only ever catered for the English-speaking world, then I doubt they’d be as successful today. It feels like ingratitude of the worst kind.

Was I foolish to put faith into something that was so clearly aiming for a commercial stranglehold on the web? Perhaps. Perhaps they’ll come good still, though I’m not holding my breath. In the meantime, I shall steer people towards OperaMail if they want online mail in Gaelic and put my hopes in the LibreOffice Cloud project.