Archive

Archive for September, 2016

When things are way, way, WAY worse than you thought they might get

01/09/2016 6 comments

You may recall that back in November last year I (and some other people) seriously questioned whether Google Translate for Gaelic was really such a great idea. Most people who came down on the side of it being a good thing cited things such as “attracting young people” (that must be the minority language equivalent of “exposure” in the arts world…), “enhancing the status”, “used judiciously, it will do this and that good thing” and “wait and see, it won’t be that bad”.

Well, I have some news for you and if you’ve never seen me furious, and I mean steaming-out-of-the-ears-furious, here’s your chance.

I wasn’t actually planning to blog about this again, not for a while. But then I made the mistake of doing something unrelated – a bit of data entry in the Faclair Beag. After another fruitless attempt at finding the English for a coileach-gòthan, I picked up one of my many note-sheets and decided I might as well enter one. After a few useful phrases, I came across an odd looking word so I decided to ask the poor man’s corpus (which is useful in giving you a very quick impression of how common a word is). 346 results – that seemed fairly conclusive (for a language like Gaelic) but being the OCD QA freak I am, as always I did a gross error check to see which sites these hits were coming from. Topslotsite? Strange but maybe a coincidence (something English typos or bad line breaks result in seemingly Gaelic words)… just keep going. Coinfalls? What the… Slotjar??? No, I’m not lying…

cac-GT-05

And it’s not just something Google ran over the site descriptions, we’re talking entire sites which people have just punched through Google Translate and put online:

cac-GT-02

It’s not just casino stuff… you can also get gibberish about business…

cac-GT-06

So here’s Reason 1 for me being furious: The more this happens, the less useful it will make the web for doing various Gaelic (and any other such unfortunate small language) related projects. I’m not just talking about messing up my searches. For instance, there are various spellcheckers for smaller languages which are based on web corpora i.e. bodies of text which have been collected from the web to form the basis of a spellchecker. This also often results in helpful word statistics – which words are more common than others. That may just sound like geekery but that’s the kind of geekery that helps make a better predictive text tool for example. So while still geeky, we’re talking geeky-that-is-useful-to-Joe-Blogs.

There more of this GT shite we’re getting on the web, for each of those language that will mean the quality of anything you might otherwise cull from the web will go down, down and down. Because unproofed machine translation will just always re-hash whatever is in the machine’s brain i.e. you will only ever get more of the same.

Then I came across this:

cac-GT-01

Some site in Russia about maths that has been Google-translated. Look at the lovely yellow box in particular.

So here is Reason 2 for the ceò coming out of my cluasan: Some of us spend a lot of time working on educational (and usually free) tools such as Scratch. It’s hard enough to convince people to try things like that in Gaelic without having to hand them a huge note saying “Beware of the following 3,000 sites which are not fit for purpose”. You know what that kind of warning does to people’s confidence in Gaelic software? Well, I’ll give you a small hint, it doesn’t improve it, that’s for sure.

At this point I decide to put e-pen to e-paper because there’s something else that has been making me furious about all this, something I was saving for later, at least until I had seen what effect reporting the next problem would have.

Ladies and Gentlemen, I give you Publishing Hell. Let’s start with a light entrée:

cac-GT-03

Nice, eh? For those of you not Gaelic speakers, that ought to say “Cidsin an t-Samhraidh”. But wait, I hear you say, surely that’s just an isolated thing… Nope, they are both beyond all linguistic pales and beyond counting:

cac-GT-04“Yes but isn’t that obvious? Surely people won’t buy these…” Well, I have news for that camp to. Apparently they do. And why not? Especially a learner who is not fluent might very well find something like that an attractive proposition for helping them learn more Gaelic… or Tswana… or Samoan… or Chichewa..

Thanks guys. Nicely done.

ADDENDUM: Someone on Facebook commented that these are problems to be solved; problems to be welcomed  and that Bitching about this problem is like bitching about bad teachers. You don’t send all the teachers back to teacher school en masse because one makes some hideous errors. A generation would be lost. Or a language. The reason I’m commenting on this here as an addendum is not to slag them off but because I realise that this is indeed a way some people will look at this. So I’m dissecting it as a potential view on the matter, not as a personal response (which I did on Facebook).

So is it a problem to be solved? I dont think it can actually be solved. Who has the time or the money to pay someone to have the time? I already invest much too much of my non-working time into building resources. I dont have the time to chase charlatans on the web. Its like fighting midges.

And this is not like having a bad teacher either. You have one bad teacher, fine, shunt them into admin or find another way of improving their skills. Or a bad night class teacher where word eventually will get round. The difference is the sheer scale of the issue. This is no longer a contest of human vs human, if you pardon the crude simplification, this has become human vs machine. And thats a contest where a group of humans, small in number, are not going to come off well because a machine translation system can dish out junk much much faster than a team of humans can locate them and shut them down. Even if there was a simple way of shutting such things down.

No, this is a problem that Gaelic speakers cannot fix and that has grown out of the short-sightedness of a few people chasing a sexy headline, unwilling to engage in meaningful debate. All we can do now is watch the terror unfold and hope that it will one day step on the toes of a language much bigger than Gaelic.

ADDENDUM 2: I have removed all references to the children’s books on Amazon I had previously discussed on here. The discussion over whether some of them are genuine translations and which of them and to what extent others might be MT was really beginning to detract from the issue at hand. I have asked the 3 people who re-blogged this post to take it down.

Categories: Uncategorized