Wednesday, August 29, 2012

Translate Chinese unicode code points to English

One of our legacy applications had the UI in chinese and it was required to convert it to English.
Instead of hiring a translator, we decided to use Google Translation Services.

But the application was picking up chinese lables/messages from a properties file. The properties file had the chinese characters expressed as unicode code points. The Google Translate webpage expected chinese characters to be typed or copy pasted onto the form. We searched for a similar translation service that would accept unicode code points, but in vain.

Finally, we decided to write a simple program that would write the chinese unicode codepoints to a file and then open the file using a program such as notepad++ or MS word. These programs support chinese characters and would allow you to copy paste them onto the Google Translation page.

Given below is the simple Java code snippet to write to a file. Please open this file using MS Word (or any other program that supports UTF-8 font rendering).
-------------------------------------------------
import java.io.File;
import com.google.common.io.Files;

public class Chinese_Chars {
    public static void main (String arg[])throws Exception{

        String str = "\u6587\u4EF6";
        byte[] array = str.getBytes("UTF-8");
        
        File file = new File("d:/temp.txt");
        Files.write(array, file);
    }
}
---------------------------------------------------

Show below are some screen shots of Google translate page and MS Word opening the file.