Thursday, March 25, 2010

Translating Thai with help from electronic tools

With the advent of various electronic tools translations from one language to another should be greatly facilitated, improved, and made faster. However, I’ve found the initial preparation is no trivial matter. Furthermore, as I hope to show, when it comes to attempting a reasonably reliable translation, you need to draw on your wits and whatever knowledge you’ve tucked away in the recesses of your memory – so having a good memory is a good start!

I’ll indicate some particular issues with respect to Thai, with a few comparisons with other languages. I make no claims about my general linguistic ability and with Thai I consider myself a novice both in speaking and writing, though I’m gradually acquiring more skills – without any language aids I would not be able to get very far at all! Even so, having heard my mother speak to me as a child, I have some sense of how Thai ‘sounds’ and its structure.

Assuming that an electronic document is available, like humans, automated assistants have to content with the following general problems:

  • There’s no punctuation in Thai – it means that there’s more effort required in parsing the text and, particularly chunking, working out where divisions lie between clauses and sentences. I’ve struggled with this and sometimes depend on the tools’ suggestions.
  • There are no tenses in Thai apart from a few designators (token words added in) – it’s not always obvious what mode of voice to use and if making an arbitrary choice, then consistency is needed across the text as a whole.
  • Phonetic transcriptions are helpful for aiding a quicker reading, but there’s no single standard – I think it’s partly because Thai is tonal, and Romanised phonetics either look clumsy or just omit the tones; it’s also partly because of the sound combinations, many of which could be transcribed in more than one way.

A Suite of Translation Tools

But let’s not be too pessimistic – as Benny the Irish polyglot would say, the language cup is half full! Having created an electronic document, perhaps via scanning, OCR, and manual corrections, it’s time to find the tools to help you read it!

When it comes to electronic assistants, the temptation is take the easiest route: locate one tool, preferably free and on the Web, and just use that. However, it’s essential to have at least a second opinion! The first electronic tool that I have used in earnest is Lingvosoft Talking Dictionary Thai to English, though the pronunciation even in the 2010 version is still only in English. :-( This is basically a large conventional dictionary with a simple interface – you type in your word letter by letter, and if you’re not sure of the ending, then it will list words that start with that combination. I originally bought the Windows CE version thinking that it would be handy to have with me on my travels in Thailand, but I’ve not really got used to inputting on a small screen.

I’ve found this the most useful tool amongst all those I’ve tried is Thai2English. There’s a version of the software is available on the Web site http://thai2english.com. I have purchased the full copy, though it should be noted that it only runs on Windows. You can see from exploring the Web site that it goes well beyond a simple dictionary and has quite an array of pedagogic building blocks that supports those who are learning Thai.

However, the first thing that can be done is to get a quick sense of what the text is about and it’s here that I’ve turned to the Web by uploading content into Google Translate. This free service, which has only been available since January 2009, provides a very convenient interface offering a number of ways to get content translated automatically – technically it’s called machine translation. You can enter text into the box, upload a document or enter the URL (Web address) of a page that you’d like translated. You specify what language to which you’d like it to be translated and then just press the [Translate] button. You can also bookmark combinations, e.g. Thai to English:
http://translate.google.com/?th&tl=en#
(For newcomers, you can get a flavour from a quick overview provided by Google, which covers a lot of ground in a little over a minute, but you can pause, rewind and replay to take it all in...)

Google Translate does set a limit of a few pages per go, so if you have more than a slender booklet, you’d need to repeat this process a number of times, but for most purposes I don’t think that’s going to be very troublesome.

TIP: When running MS Windows (XP), I notice that there’s much better support for Firefox than Internet Explorer, especially when copying from the browser Window into a Word Processor, even to MS Word, when I intuitively expect more information to be retained from IE.

An example

I’ll consider the title and opening paragraph from my mother’s article about her experience of the Hampshire Buddhist Society. The URL is: http://www.chezpaul.org.uk/fuengsin/dhamma/hants60s.htm.

Here is what Google currently makes of it (click on the image to see the full size version):

Google Translate's translation into Thai of a title and paragraph of English

Room for improvement, yes? I think it’s quite instructive of the challenges facing language learners, so let’s take a closer look at this paragraph.

You can do this using the text box entry form or alternatively, you can actually enter the above URL into Google and ask for English to be returned. Wherever it encounters what it thinks is Thai, Google has a go at translating, so it generally leaves the English untouched, though not completely(!) In this interface, moving my mouse pointer over the translated title reveals the original Thai, ส่วนหนึ่งของชาวพุทธในอังกฤษ:

Google rollover revealing source text in Thai

Here is the phonetic transcription provided by Thai2English:

Phonetic transcription of a sentence generated by Thai2English

Right at the start there’s a lot of scope for differing translations. Let’s compare what Google and I make of it. I’ll do this chunk by chunk:

Title:
ส่วนหนึ่งของชาวพุทธในอังกฤษ
Google’s English:
Part of the Buddhist in England.
Paul’s English:
Some Buddhists in England.

Comments:

  • With Thai, there is no written designation for plural – here Google has interpreted ชาวพุทธ (chaao put) as singular, but should it be in the plural?
  • It opens with a figure of speech ส่วนหนึ่ง (suan neung), a construct recognised by Thai2English:
    Thai2English parsing Thai, recognising a phrase
    Lingvosoft also lists it as a phrase:

    Lingvosoft definition of ส่วนหนึ่ง

However, it’s still grammatically correct to assume that the two words are distinct: ส่วน หนึ่ง. Then a whole host of meanings are possible for ส่วน, which could be one of a number of parts of speech. Lingvosoft indicates:

Adverb.
Apropos;
Conjunction.
As for, as to
Noun.
Fragment, denominator, form, lineament, member, part, portion, proportion, quota, region, section, segment, while, zone, bit, body
Preposition.
As of

Thus it could be translated: Concerning a Buddhist ... , i.e. about a [single] Buddhist’s experiences in the UK.

So I’ve had to weigh up these alternatives. How to home in on the right meaning? One approach I adopt is to shorten the phrase, which should draw on a larger statistical sample so that the translation is based on more occurrences. Thus I can try ส่วนหนึ่งของชาว (sùan nèung kŏng chaao). Google renders this as 'Part of the people.' This helps persuade me to settle on 'Some people' as the main sense. Yet even with some more pointers it’s still largely guesswork until I’ve had a native or fluent speaker to check it for me.

Having pondered enough over just the title, let’s move onto the first sentence(!)

Sentence 1

นับตั้งแต่ข้าพเจ้าออกจากบ้านเมืองมาอยู่ในประเทศอังกฤษเป็นเวลาเกือบ ๕ ปีไม่มีโอกาสไปวัดทำบุญตักบาตรและฟังพระธรรมเทศนา

Google:
Since I come from homes in the UK for nearly 5 years, no opportunity to measure merits, and put listening preaching.
PT:
Ever since I left my homeland to be in England nearly 5 years ago I have not had the opportunity to go to a temple to make merits, to put almsfood in a monk's bowl, or to listen to the Buddha's teachings.

Comments on Google’s effort:

  • The subject of the sentence almost gets lost at ไม่มี – literally ‘there wasn’t’, but in English it’s clearer to turn this into the first person
  • Google omits the translation of ไป วัด (go to the temple), yet it’s a very common activity.
  • There’s a lack of contextual awareness with “measure merits” – it just doesn’t make sense here!
  • Google translates ตักบาตร as just ‘put’, but it’s a construction, which Thai2English renders as “to put almsfood in a monk's bowl” and Lingvosoft offers: “give food offerings to a Buddhist monk.” Perhaps the latter is safer, but the former really conveys the Thai tradition!
  • The resulting sentence offered by Google is grammatically very poor. If you look at it, there’s a distinct absence of Buddhist-related vocabulary, which suggests a significant gap in the corpora (assuming it is using statistical methods).

Afterwards I made a few more stylistic changes such as changing ‘home’ to ‘homeland’ to emphasize the change in culture.

Sentence 2

ข้าพเจ้ายังมีความเลื่อมใสในพุทธธศาสนาอยู่เสมอ

Google:
I also have a sequin. Enter the Buddhist religious path always.
PT:
Yet I still have faith in the Buddha's teachings.

Comments:

  • Whereas Thai2English translates ความเลื่อมใส as a phrase meaning ‘faithfulness, believability, conviction’, Google errs in its chunking and decides to apply a full stop in the middle of a word, i.e. after ความเลื่อม which literally means ‘glossy things,’ hence ‘sequin’!
  • Google doesn’t retain a single voice – it jumps from first person indicative to imperative(?)
  • The phrase พุทธธศาสนา is just the Thai transcription from the Pali of Buddha Sasana, which just means ‘teachings of the Buddha’. Although ‘Buddhist religious path’ sounds okay, to use the word 'religious' arguably brings with it a lot of unnecessary cultural baggage.

Sentence 3 (first part)

ในยามว่างได้พยายามอ่านหนังสือเกี่ยวกับธรรมนั่งสมาธิวิปัสสนา

Google:
The guard was busy trying to read books about the fair. Insight meditation.
PT:
In my free time I am always trying to read books on Dhamma, sit and practise Vipassana meditation.

Comments

  • Google has split this into two sentences.
  • Google has not recognised that ยาม ว่าง is a phrase; Lingvosoft confirms that on it’s own ยาม means ‘gatekeeper, guardian, ...’, but Thai2English both defines it as ‘time; hour; period’ and groups this word with ว่าง (‘free, empty, vacant’)
  • Google renders ธรรม as ‘the fair’, but that’s completely out of context. Thai2English helpfully offers amongst others: ‘dharma’ or ‘[to be] natural, lawful, normal.
  • It has taken นั่ง สมาธิ วิปัสสนา as just the practice (noun) of insight meditation, rather than as a verb. I’ve emphasized the activity by a longer rendering.

Sentence 3 (second part)

และปฏิบัติธรรมเท่าที่สามารถจะทำได้ในใจนั้นเฝ้าแต่คิดว่าคงจะได้พบกับชาวพุทธเข้าสักวันหนึ่ง

Google:
and practice as they can do but keep in mind that think that would be found to be a Buddhist one day.
PT:
and practise the Dhamma to the best of my ability. I keep these in mind, thinking that I might yet some day get to meet with other Buddhists.

Comments:

  • I found this a difficult clause and am not really sure about the translation.
  • Google’s clause is all over the place
  • Google again fails to translate the key word of ธรรม

As you can see, at present Google’s rendering is very variable, not coherent, and doesn’t make much sense. It seems to chop up sentences and make clauses into short sentences, giving a staccato effect! I’m guessing that Thai is not one of its stronger languages.

Evaluation

I have found that the most helpful translation tool is Thai2English and I copy chunks of Thai there. It gives meanings and phonetic transcriptions word by word, together with help concerning Thai grammar. Occasionally it also fails to chunk correctly and sometimes lacks some vocabulary, but most of the time is does a good job so that where there are doubts or blank spaces, I have often found that there are typographical errors in the original text (or mistakes in the OCR/copy typing).

Google Translate is quick and useful for giving some features, but it’s not fit for translating anything substantial. I’ve found that close-reading is required, for which Thai2English, supplemented by another electronic dictionary – here Lingvosoft – is far more productive.

Whilst Google struggles to provide accurate translations, it does provide a very useful template structure for working on documents: it splits up translations into bite-sized segments of Thai followed by English. At the moment I don't pay too much attention to its translation, but retain it whilst I’m working since sometimes it does offer useful clues. I'm sure that it will improve quite rapidly as it's an important project for them.

At the end of the day the notice pinned onto the board would be: "All translations may be subject to change!"

No comments: