Though I was contemplating “How to waste lifetime” as a title to be honest. If you don’t want to read through the Odyssey part, fair enough, the quick guide on what to do is at the very bottom.
No, I’m not about to repeat my rant on Detect locale, tempting as that may be. This is about the trek to actually get a locale onto the locales on offer on mobile operating systems like iOS or Android. In the case of Scottish Gaelic, we need to go back all the way to July 2010. I had just been roped into localizing Firefox and we had noticed that the plural rules for Gaelic were either missing or wrong. So in the process of fixing those, it was recommended to me that I also submit them to the Common Locale Data Repository (CLDR). Basically a big holding tank run by the Unicode Consortium for things like plural rules, names of days of the week, month names, whether the month goes before the day etc etc for different locales. Seemed reasonable, so off I go filing a ticket. It took a while but by September 2011, that was in. Yay.
In the meantime, because I had gotten involved with LibreOffice (well, technically speaking OpenOffice first), I also ended up submitting a minimal dataset for Gaelic to CLDR because creating one was a prerequisite for getting a release of LibreOffice and it again was recommended I submit to CLDR so it’s generally available. Fair enough. Took a while figuring out because back then, the handy Survey Tool (basically a graphic interface) didn’t exist – you had to edit an xml file. Yuck. Started in May 2011 and by October, that was done and dusted.
Here’s where I got naive. I thought the “filtering through” of locale data was automatic. I did actually ask a few people and they all thought it was automatic too – though nobody was entirely sure. For most of even the smaller locales such as Welsh and Irish, somebody must have done “it” far enough back for nobody to know where it came from. So October 2011 on, I start watching the list of locales on my Android. Periodically, I’d pop into a mobile phone shop to check the latest models, in case my phone and OS were just too old.
In the meantime, I kept chipping away at the xml file, adding things like language and country names until I had the file relatively complete. Hoping that perhaps there was a completion threshold – even though nobody seemed to know. I started pinging questions as Android aka Google, figuring they were easier to communicate with than Apple. Hah! It’s like standing at one end of the Munich Beer Festival and playing a game of Chinese whispers with someone at the far end. No answer, lots of silence or vague suggestions of “try there”. Spent hours trying to google the answer. Frustratingly, even though I can almost always tease the web into giving me the info I want, not on this occasion. It was as if nobody had ever actually done whatever it was that needed doing to get a new locale to pop up on mobile OS.
I was getting increasingly frustrated/annoyed/angry because up until increasingly, apps were using detect locale to determine the language of one’s UI. Up until Android 4.2, you could use an app to “fake” a locale i.e. I could set it to gd-GB and apps such as Opera Mini would come up in Gaelic. But towards the end of 2012, Google blocked that option. Don’t ask me why… The upshot was that even those apps which had been localized were now hidden away because Gaelic did not exist as an official locale. Which set me off on the quest to get manual locale selection into FOSS apps but that’s a different story.
To add insult to injury, while I could understand to some extent why Irish would just be “there” as a locale, I couldn’t for the life of me understand why Manx of all languages was there, but not Gaelic. I mean, bully for Manx but what gives?
Fast forward to May 2014. CLDR is implementing it’s shiny new Survey Tool and a colleague and I set about filling in the last remaining gaps in the locale data file. Still no Gaelic on Android or iOS, even though the data set was now complete. It wasn’t until August 2014, out of a discussion surrounding the Survey Tool, that someone finally pinned down the problem. Even though we’d had a good enough data set since 2011, this was held “just” in CLDR. It turns out that Android aka Google actually pulls it’s locales and locale data from something called ICU, the International Components for Unicode. So I file a bug on CLDR which someone kindly copymoved to ICU. While not great communicators, at least someone imported the data set from CLDR and it was finally included in the ICU 54 release in October 2014. It had taken more than 4 years to discover what was needed. And then it took less than 4 months to get it into the necessary data bucket. 😒
And even crazier, within weeks of the ticket being closed on ICU, a Gaelic speaking Apple tester excitedly mailed me to tell me that on his test version of iOS 8, Scottish Gaelic was there as a locale. There were a few other minor bumps in the road but with from iOS8, Gaelic was there as a locale and apparently, it made its debut on Android Marshmallow in October 2015. All we have to do now is for people to upgrade to iOS8 (fairly straight forward) and Android Marshmallow (not so straight forward, we’ll probably have to wait for people to physically upgrade their devices).
So here it is for all those who want their locale on Android, iOS & Co:
- Bring some spare time. Assuming a single contributor, it will probably take up to a year to get it to appear on the latest devices if you have perfect timing. More likely, 2 years.
- Submit a locale data set to CLDR. You will need a Survey Tool account – and bear in mind there is only ONE submission cycle a year, on the whole. If you missed the current one, I recommend you check out existing data sets because you will have to answer fairly techy questions around date and time formatting, plurals, sort orders and goodness knows what else. Pick a locale similar to your own or at least one for a language you speak and see what that looks like.
Also check what “coverage level target” your locale has (ask someone at CLDR via a ticket). Some locales have a low target, Gaelic happened to be in “comprehensive” for some reason. Probably not worth arguing which one you’re in and just knuckling down.
- File a ticket on ICU to get the data ported over.
- Wait and finally, enjoy.
To begin with, I do not hold all the facts and I do like (or do I have to used the present-past-potential-future tense already?) the product. But there have been so many what the fuck moments it sadly is time for another Dear Developer epistle.
The topic? Mozilla OS. Which judging by today’s post to the localization list by George Roter is now officially floating belly up and face down in digital muck. Oh sure, there are exciting opportunities with the Internet of Things (which has a lengthy Wikipedia article that truly fails to inspire) and Connected Devices (I have yet to meet a Mozillian who can actually tell me what that practically means for end-users).
I guess it at least has an element of closure because back in December, well, we were all completely in the dark, apart from a steady stream of well-meant fluffwords.
So what happened? Well, looking at it from the bottom-up view of a localizer, Mozilla has proven once again that it has a genuinely amazing and skilful pool of workers but management that makes a revolutionary student committee look efficient. So at some point the idea of Mozilla OS was born – all the good things about Mozilla but as an OS. Ok, sounds fair, and I was right in there from the start with localizations. Two reasons, no, make that three, one of which was selfish, the other practical and the third altruistic:
- We wanted mobile devices in our language (that was the selfish bit)
- Participating early means you reach the maintenance level of translation early, which is a lot easier when there are fewer words to begin with (the practical reason). Plus less of a chance localization turns into an afterthought. Or so I thought…
- We wanted to help create a better product that would reach more people (the altruistic reason but more on that later)
Regarding 1 and 2, I kind of started worrying early because it became clear that Mozilla was partly selling its soul to manufacturers. We could localize but there was to be no guarantee, as it turned out eventually, that commercial manufacturers would ship all locales with a high completion. Why? Apparently Mozilla had forgotten to either negotiate harder regarding that and/or forgotten to design an easy way of pulling an unshipped locale once your device had been set up. Ho-hum but given our experience with the better-late-than-never solution to manual locale selection on Mozilla Mobile, I had reasonable confidence there would be a solution. Eventually. So I stuck with the project. Paid for a testing device. Managed to get a tablet for testing too. Helped with sometimes left-field solutions, like when I helped someone crack the problem of how to sort contact lists in lists with mixed scripts without resorting to automatic Unicode conversion (like how to handle a contact like রবীন্দ্রনাথ ঠাকুর on a phone next to Jack Sparrow – easy, ask the user to provide a manual phonetic spelling during contact creation), filed bugs, was a bit of a squeaky wheel… yeah ok, I submitted no patches but I can’t code for toffee, believe it or not.
I guess alarm bells should have started going off when Flatfish (the tablet branch) went quiet. As in, suddenly there no more nightly builds and bugs were beginning to pile up, some pretty central (like the fact no build ever shipped all locales – no, it was crazier than that, the locales where there but the translations weren’t getting pulled from Pootle). Eventually the word was passed round in a very unofficial way that Flatfish was no longer a project Mozilla was pursuing. Like that wasn’t worth an announcement? Even a short blog post by someone high up? Gee, thanks…
At the very least it was highly odd that a mobile OS aiming to compete with existing mobile OS would ignore the tablet side but maybe, I said to myself, we’re prioritising resources until it works well on phone and then we’ll get onto tablets again.
Then in early December we had the news fiasco. Short version is, somehow word got out that Mozilla was canning Mozilla OS but nobody had prepared anything official, not even a blog post, never mind press releases. Just some fluff about the Internet of Things. There’s a pretty good write-up here if you want the whole nine yards. Then all through December and most of January everyone, including Mozillians (at least the workers at the “bottom”) had no idea about what was going on. Great.
In a sense, we still don’t (unless someone can finally explain Connected Devices and the IoT to me in simple, short sentences explaining how that relates to end-users…). Except that we are to cease all work on localizing Mozilla OS for now. Who know if this will still the position in a month but for now, there are not going to be any phones which will ship the OS. Why? Reading between the lines, the uptake wasn’t great. Really? Like it was ever going to be easy to get a share of the iOS/Android/Windows phone market? If the decision makers expected an easy ride, they were naive. If they expected a tough ride, why are we bottling out now?
Which, incidentally, they could have made easier but considering one thing they mostly seems to have ignored – while the existing 3 hog most of the market, they are very restricted in their approach to localized interfaces. There are up to 40 million speakers of lesser-used languages in the EU alone and while certainly not all will shift by any stretch of the imagination, for a considerable number of those Mozilla OS would have been one of the few realistic means of getting a device in THEIR language. Neither Android nor iOS cater for Breton or Occitan. Small fry, you might think. Not so. It’s a bit hard to count but there are at least some 350 million people on the planet speaking languages which are not amongst the big boys Android & Co cater for. If that isn’t a market then I don’t know what is.
Will it come back? I don’t know. Would be good… even better if they teamed up with Ubuntu on this one. For now, I’m focussing on Ubuntu Mobile which is also localizable AND ships all locales with a high completion percentage and CyanogenMod AOSP which the Asturians have recently proven to be a way onto at least some devices running a version of Android. Gaelic SHALL go to the ball… would have been nice if it had been with Mozilla OS too.
But seriously, Mozilla is not too big to fail and if it continues to behave like an ocean liner which is steered in a fashion reminiscent of a revolutionary student committee, there will be a hard rock somewhere along the line for it. Which would be a great disservice to all the inspiring and hard working folks at Mozilla, not to mention the volunteers and the world at large. So please, revolutionary leaders up there, put down the hooch, put the origami helmets in the memento drawer and sharpen up your leadership, planning and above all, communication.
The warm memories of childhood brought to you by the Terminator T-888 Cyberdyne Systems Class TOK715
Yes, I like technology. But increasingly these days, I wish we could have a global debate on where we’re going with this and how much of it we want. Not as Gaels or Basques or Chinese or Brazilians but as a species.
Let me backtrack a little. We’ve just released the first ever Gaelic text-to-speech voice, having working together with the great people at Cereproc in Edinburgh over the last year. This is a good thing. It may seem to contradict my intro but the way I see it, it is an enabling tool. If nothing else, it is an assistive tool for people who are blind or dyslexic – and who speak Gaelic. We often forget that being a speaker of a minority language does not prevent you from being struck by the same issues as everyone else. Or rather, speakers of majority languages tend to forget this. It is not meant to replace real humans and it won’t, as it cannot think for itself. It won’t run off to the kitchen and make dinner or suddenly turn round and say to the user “What’s with all this Somhairle stuff, I want to read some sci-fi, ok?”
Sure, learners will use the voice too and since it is a pretty good voice, it should enhance their learning experience, especially for those with little or no access to native speakers. So I don’t see an issue there (though we did all bust several collective guts making sure the quality is as good as possible).
But a couple of days later, a colleague drew my attention to a line in a summary of a talk to be given at the Centre for Speech and Technology Research. It talks of speech production and refers to “…multimodal interactive games, involving many characters, dialogue partners…”. Queue a slightly dystopian moment. I possibly misread the line slightly what it means is AI dialogue partners in games. Like “talking” to Deckard Cain in Diablo which is really just a fixed script which is reeled off following certain actions in the game.
But whatever the intended meaning, it did make me think about the wider implications of talking technology, and in this case speech technology in particular, further and further with little debate about where we’re going with all this. I did have this quick mental flash of a Gaelic speaking Terminator, baking cookies with a human child. Don’t be absurd, you might say but I don’t think it’s entirely far-fetched. There will come a point when our use of technology in language learning will turn into something more distasteful than a toy bleating out words. There will come a point when interaction with speech produced by an intelligent machine will start to infringe on the way our children learn language and probably even adult interactions.
It may be that we decide, as a species, that a Gaelic/Basque/Aymara/Rapanui… speaking robot is just the thing to re-invigorate our languages. But to my mind, it poses a bigger question about whether this won’t make the whole thing pointless? Not just the issue of language but increasingly us as a species? Our affection for things, pretty sunsets, memories of baking biscuits with our grandmother and the particular sound waves our mothers made at us, is a very human thing. I cannot see a machine developing an appreciation for a field of dandelion other than in a utilitarian sense. Or perhaps we might decide that an intelligent interaction with a machine is preferable to the passive consumption we have at the moment, like those families I observe on the train using tablets as pacifiers. Playing I Spy with the tablet? Would I be looking at the outside of the train or a picture of the outside projected onto the screen?
We still seem to be operating, as a species, on the basis that it’s ok to see what happens when I bang these two rocks together cause hey, it’s Zoug’s own time and effort and what harm can it do. But we’re reaching a point in our technological development where the harm we can do by just seeing what happens when you bang something together is becoming considerable.
It makes me wish we talked more about what we actually want before we go out and do it. But sadly, I cannot really see it happening much, not with people going “well, if I don’t, someone else will”. Maybe I’m just having a gloomy day (no, the sun IS shining in Glasgow today) but I get the feeling we might finally be getting close to an answer to the Fermi paradox, a somewhat unpalatable one albeit. Fingers crossed, eyes closed and hope for the best?
I’m glad to see others in the field have similar apprehensions about MT in small languages
This is an abbreviated transcript of a talk I gave at a British-Irish Council conference on language technology in indigenous, minority and lesser-used languages in Dublin earlier this month (November 2015) under the title ‘Do minority languages need the same language technology as majority languages?’ I wanted to bust the myth that machine translation is necessary for the revival of minority languages. What I had to say didn’t go down well with some in the audience, especially people who work in machine translation (unsurprisingly). So beware, there is controversy ahead!
View original post 1,251 more words
I seem to be posting a lot about Google these days but then they ARE turning into the digital equivalent of Nestlé.
I’ve been pondering this post for a while and how to approach it without making it sound like I believe in area 52. So I’ll just say what happened and let you come to your own conclusions mostly.
Back when Google still ran the Google in Your Language project, I tried hard to get into Gmail and what was rumoured to be a browser but failed, though they were keen to push the now canned Picasa. <eyeroll> Then of course they canned the whole Google in Your Language thing. When I eventually found out that Google Chrome is technically nothing else than a rebranded version of an Open Source browser called Chromium, I thought ‘great, should be able to get a leg into the door that way’. Think again. So I looked around and was already confused because there did not appear to be a clear distinction between Chromium and Chrome. The two main candidates were Launchpad and Google Code. So January 2011 I decide to file an issue on Google Code, thinking that even if it’s the wrong place, they should be able to point me in the right direction. The answer came pretty quick. Even though the project is called Chromium, they (quote) don’t accept third party translations for chrome. And nobody seems to know where the translations come from or how you become an official translator. A vague reference that I maybe should try Ubuntu.
I gave it some time. Lots of time in fact. I picked up the thread again early in 2013. Now the semi-serious suggestion was to fork Chromium and do my translation on the fork. Very funny. Needless to say, I was getting rather disgusted at the whole affair and decided to give up on Chrome/Chromium.
When I noticed that an Irish translator on Launchpad had asked a similar question about Chromium and saw the answer was they, as far as they know, push the translations upstream to Chromium from Launchpad, I decided I might as well have a go. As someone had suggested, at least I’ll get Chromium on Linux.
Fast forward to October 2014 and I’m almost done with the translation on Launchpad so I figure I better file a bug early because it will likely take forever. Bug filed, enthusiastic response from some admin on Launchpad. Great, I think to myself, should be plain sailing from here on. Spoke too soon. End of January 2015, the translation long completed, I query to silence and only get more silence. More worryingly, someone points me at a post on Ubuntu about Chromium on Launchpad being, well, dead.
Having asked the question in a Chromium IRC chat room, I decided to have another go on Google Code, new bug, new luck maybe? Someone in the room did sound supportive. That was January 28, 2015. To date, nothing has happened apart from someone ‘assigning the bug to l10n PM for triage’.
I’m coming to the conclusion that Chromium has only the thinnest veneer of being open. Perhaps in the sense that I can get a hold of the source code and play around with it. But there is a distinct lack of openness and approachability about the whole thing. Perhaps that was the intention all along, to use the Open Source community to improve the source code but to give back as little as possible and to build as many layers of secrecy and to put as many obstacles in people’s path as possible. At least when it comes to localization.
At least Ubuntu is no longer pushing Chromium as the default browser. But that still leaves me with a whole pile of translation work which is not being used. Maybe I should check out some other Chromium-based browsers like Comodo Dragon or Yandex. Perhaps I’m being paranoid but I’m not keen on software coming from Russia being on my systems or recommending it to other people. Either way, I’m left with the same problem that we have with Firefox in a sense – it would mean having to wean people off pre-installed versions of Google Chrome or Internet Explorer.
Anyone got any good ideas? Cause I’m fresh out of…
Not the kind of pre-Christmas cheer I was hoping for, seriously. Slap bang on the 23rd, someone draws my attention to an article called Google urged to go Gaelic. In a nutshell, a left-field (most likely well-intentioned) appeal by an MSP from Central Scotland to add Scottish Gaelic to the list of languages. As the mere thought was nauseating, I made some time and wrote a very long letter to Murdo Fraser, the man in question, with copies going to David Boag at Bòrd na Gàidhlig and Alasdair Allan, minister for languages. As it sums up my arguments quite succinctly (I hoped), I’ll just copy it here:
Just before Christmas, a friend drew my attention to an article in the Courier regarding Google Translate in which Mr Murdo Fraser argues for a campaign to get Scottish Gaelic onto Google Translate.
I’m sure that this is a well-intentioned idea but in my professional opinion, it would have terrible consequences. As one of the few people who work entirely in the field of Gaelic IT, I have a keen interest in technology and the potential benefit – and damage – this offers to languages like Gaelic. As it happens, I also was the Gaelic localizer (i.e. translator) for Google when it was still running the Google In Your Language programme and I have watched (often with dismay) what Google has done in this area since. One of the projects that certainly caught my eye was Google Translate, especially when Irish was added as a language in 2009. But having spoken to Irish people working in this field and having watched the effects of it on the Irish language, I rapidly came to the conclusion that while it looks ‘cool’, being on a machine translation system for a small(er) language was not necessarily a benefit and in some cases, a tragedy.
Without going into too much technical detail, machine translation of the kind that Google does works best with the following ingredients:
– a massive (billions of words) aligned bilingual corpus
– translation between structurally similar languages or
– translation from a grammatically complex language into a less grammatically complex language but not the other way round
– translation of short, non-colloquial phrases and sentences but not complex, colloquial or literary structures
In essence, machine translation trains an algorithms in ‘patterns’, which is why massive amounts of data are needed and why it works better from a complex language into a less complex language. For example, it is relatively easy to teach the system that German der/die/das require ‘the’ in English, but it requires a massive amount of data for the system to become clever enough to understand when ‘the’ becomes ‘der’ but not ‘die’.
Unfortunately for Irish, none of these conditions were met – and would also not be met for Scottish Gaelic. To begin with, even if we digitized all the works ever produced which exist in English and Gaelic, the corpus would still be tiny by comparison to the German/English corpus for example.
Then there is the issue of linguistic distance, Irish/Gaelic and English are structurally very different, with Gaelic/Irish having a lot more in the way of complex grammatical structures than English. To compensate for this, the corpus would have to be truly massive. Which is why the existing Irish/English system is extremely poor by anyone’s standards.
One might argue that the aim is not a perfect translation system but a means of accessing information only available in other languages – which is the case for many of the languages which are on Google Translate. But I’m doubtful if the reverse is true. To begin with, no fluent Gaelic speaker requires a Gaelic > English translation system and there is preciously little which is published in Gaelic in digital form which does not also exist in English. All this would do is remove yet another reason for learning Gaelic.
That would leave English > Gaelic and herein lies the tragedy of the English/Irish pairing on Google Translate. Whatever the intentions of the developers, people will mis-use such a system. I have put together a few annotated photos which illustrate the scale of the disaster in Ireland here. From school reports to official government websites, there are few places where students, individuals or officials trying to cut corners have not used Irish translations of Google Translate in ways they were not intended to be used.
If there HAD been a Gaelic/English pair, Police Scotland would have been an even bigger target of ridicule because such an automated translation would have produced gibberish at worst and absurd semi-Gaelic at best.
I think we can all agree that the last thing Gaelic needs is masses of poor quality translations floating around the internet. Funding is extremely short these days and this would, in my view, be a poor use of these scarce funds. There are more pressing battles to be fought in the field of Gaelic and IT, such as the refusal by the 3rd party suppliers of IT services to Gaelic schools and units to provide (existing) Gaelic software or even a keyboard setting in any school that allows students to easily input accented characters, be that for Gaelic, Spanish or French.
is mise le meas mòr,
Turns out I wasn’t the only one horrified by the mere thought – John Storey also wrote a very long and polite letter.
Early in January and within days of each other, both John and I received almost identical responses which, in a nutshell, said ‘Thanks but I’ll keep trying anyway’. Even less encouragingly, it make some really irrelevant reference to the lack of teachers in Gaelic Medium Education. Which is true of course but well, not relevant?
Thank you for contacting me in relation to Scots Gaelic and Google Translate and for your detailed correspondence.
I appreciate the depth of your letter and note your concerns in relation to issues of accuracy and the potential impact to speakers of Gaelic of Google translate. I will be sure to consider these when next speaking on the subject.
I also agree that there are other battles to be fought in the field of Gaelic and IT and appreciate the current issues surrounding the number of teachers in Gaelic Medium Education. However, I do believe it is worth promoting the case for a more accessible Gaelic presence online and without this I believe that Gaelic could miss out on the massive opportunities afforded by the digital age.
I’m still waiting for a response from Bòrd na Gàidhlig or Alastair Allan. But I’m not encouraged. Really frustrated actually because (at least as the Press & Journal and the Perthshire Conservatives would have it), it seems like Bòrd na Gàidhlig and Alastair Allan are throwing their weight behind this ill-fated caper.
I really hope Google turns them down because I really don’t want to end up where the Irish IT specialists ended up – the merry world of “Told you so”…
But sadly “Got Gaelic onto Google” probably just sounds sexier on your CV than “Banged some desks and made sure all kids in Gaelic Medium Education can now easily type àèìòù”…
Good afternoon, boys and girls, very bad language, for example, what we see in the side and at the airport these days? No, I haven’t gone insane, I’m just illustrating a point by resorting to reductio ad absurdum. In other words, I punched the sentence Hey folks, anyone up for some really truly bad language like the stuff we’re seeing at BÁC airport these days? into Bad Translator and let it go through 10 machine translations.
Why? Glad you asked… these days, Google is making headlines both in the Irish traditional press and in social media. But for all the wrong reasons. The reason? Google Translate. Or rather, a language pair someone should have thought about a little more. Or at least done some user testing on it. Something…
So what I imagine happened is this… some bright spark, either on the Google side or some well-meaning Irish government official thought it would be great if we could have Irish on Google translate. First mistake. Give humans a tool, and they will mis-use it. Like our ex-joiner hammering in screws. So before you give people a tool, think about likely scenarios of mis-use. It clearly does not require a team of MENSA members to imagine that in a minoritised language like Irish, people might start using it for things like their homework or cheap translations rather than a quick way of getting the gist behind web content.
But having blissfully ignored this step, someone must have forged ahead and contributed a bilingual corpus to Google developers with a note along the lines of here’s a corpus for Irish, please add it to Google Translate. Most likely, second mistake. Right, so there are many ways of building machine translation systems but most rely on a mix of rules and a bilingual corpus. The idea being that as long as you feed a computer enough aligned data in two languages, it can use statistics to figure out how to translate between the two. This idea in itself is sound. Sort of. It depends on the languages in question, the amount of data involved and the direction of the translation oddly enough. Here’s an ideal scenario: build a system using a VAST amount of data (we’re talking billions of words) to translate between closely related languages and into the language which has the less fancy grammatical system. Like German to English. That works quite well as a pair on Google Translate because a) there are indeed vast amounts of texts which exist in both languages. German has the fancier grammar (3 genders, case marking, inflection of verbs…) whereas English does buggerall (some past tense markers on verbs and a plural -s aside, which is peanuts in linguistic terms).
But once you move away from the ideal model, things start creaking. The more complex the structures of the target language, the more data you’d need for the computer to make any sense of it. So going English to Icelandic creaks much more because even though they’re related languages (ultimately), Icelandic is even more complex than German. Oh and there’s less bilingual data of course.
You get the idea. Now Irish is eye-candy to a linguist. It has grammatical structures to die for, a case system, two genders, two types of mutation (that’s when the first sound in a word changes… you might know people called Hamish? Well that’s what Irish does to a man called Séamus when you address him), a headache-inducing system for inflecting verbs, a different word order (English is subject-object-verb, Irish is verb-subject-object) and so on. A thousand things English doesn’t do. So what would we need to make this work? Yup, take a gold star, a corpus billions of words big.
Unfortunately there’s no bilingual corpus that even comes close to that. Or at the very least, Google did not feed in anywhere near enough data. I’ve lost track but I think it’s mistake 3?
Cue mistake 4… let it loose on people without a big warning strapped to it or any form of user testing. The result? Eye-wateringly bad translations which start cropping up in the weirdest places. Facebook … ok, we could probably live with that… homework… a lot worse, don’t teachers have enough to contend with? And of course the jewel in the crown – official signage. Yep, that’s right. Google Translate has been making its way onto signage from Dublin Airport to government websites. And the result is almost always nauseating. Breaking through barriers? Only the blood vessels in Irish speakers’ brains perhaps…
It’s not that one shouldn’t attempt to bring technology to smaller languages, I’m all for that. But quality is key. It’s a hard enough sell at the best of times and something like a poor machine translation system can seriously damage the confidence people have in technology in or for their language. A little careful thinking goes a long way…