Open Registry of Game Information 

  • Internationalization issues

  • Talk about specific features of our upcoming online game database.
Talk about specific features of our upcoming online game database.

Moderators: MZ per X, gene

 #37981  by MZ per X
 26 Mar 2014, 19:27
We recently discussed some internationalization (i18n) issues via IRC, and I want to summarize what we talked about.

The main point we discussed was the differences between regional titles, text translations, and text transliterations, and how to implement those in our data model. Regional titles mean that a game or platform is released under different names in different regions, a translation means bringing a text from one language to another, while a transliteration means bringing a text from one script to another without changing the language.

Let's start with an example to illustrate these issues. Taking a look at the game "Secret of Mana", which is the US release of the Japanese original "Seiken Densetsu 2".

As we can see in the English entry for the game at Wikipedia, "Secret of Mana" is not the translation of "Seiken Densetsu 2". These are two different regional titles, which lead us to the following scheme for the game's titles:

Release Name (Region) ||| English language ||| Japanese language ||| Latin script ||| Japanese script

Secret of Mana (USA) ||| Secret of Mana ||| マナの秘密 ||| Secret of Mana ||| シークリット・オブ・マーナー

聖剣伝説2 (Japan) ||| Legend of the Sacred Sword 2 ||| 聖剣伝説2 ||| Seiken Densetsu 2 ||| 聖剣伝説2

As you can see here, we have two regional titles for the game, which can both be translated to every language imaginable, and both be transliterated to each of the eight scripts that are, in our humble opinion, important for a video game database.

The first thing to record is that every text - regardless of it being a person's name, game title, game description, screenshot caption, or whatever - is written in a certain script, and in a certain language. The separation between script and language is very important, so let's take another look at the two titles of Secret of Mana to make this distinction clear:

String (Script, Language)

聖剣伝説2 (Japanese, Japanese)
Legend of the Sacred Sword 2 (Latin, English)
Seiken Densetsu 2 (Latin, Japanese)

Secret of Mana (Latin, English)
マナの秘密 (Japanese, Japanese)
シークリット・オブ・マーナー (Japanese, English)

For "normal" texts like a game description, a transliteration prolly won't be needed. But one could think that for personal or geographical names only the transliteration (and thus, the script attribute) is needed, but that's not true. The "translation" to another language is important here, too. Let's take a look at another two examples for this:

String (Script, Language)

Михаил Сергеевич Горбачёв (Cyrillic, Russian)
Mikhail Sergeyevich Gorbachev (Latin, English)
Michail Sergeevič Gorbačёv (Latin, Russian)
Michail Sergejewitsch Gorbatschow (Latin, German)

東京 (Japanese, Japanese)
Tokyo (Latin, English)
Tōkyō (Latin, Japanese)
Tokio (Latin, German)

When I talk about "translation" here, I don't mean transferring the meaning of the name to the other language (Tokyo would be "Eastern Capital" in English, then.), but using the official spelling of that other language.

So, technically, every text object of our database can be defined as a "meta object" consisting of n strings with the two attributes (script, language) assigned to it.

Next problem is that we will have to pick one (or more) of these strings for display every time our meta text object is used. But some strings are needed for our video game documentation, some are only informal, and some are necessary to make a game more available within the database. So which one to pick, and how? Let's revisit the Secret of Mana example with this in mind:

String (Script, Language)

聖剣伝説2 (Japanese, Japanese)
Secret of Mana (Latin, English)

These are official release titles of the game for a certain region, so they need to be assigned to all releases using them.

Legend of the Sacred Sword 2 (Latin, English)
マナの秘密 (Japanese, Japanese)

These are only informal translations, which could be shown to users when hovering over the other language's title.

Seiken Densetsu 2 (Latin, Japanese)
シークリット・オブ・マーナー (Japanese, English)

These are transliterations of the official release titles, and therefore more important than the informal translations above. Exemplary, "Seiken Densetsu 2" is needed for Latin users searching for Japanese games, or for game lists in Latin script.

So, having said that, should we label some strings as "leading" or "important" in the meta text object to begin with, so those will show up when no other context is specified? Or shouldn't we do this, leaving the meta text object unaware of its content's importance, thus having to provide context every time we use the text object?

Not really sure, but my gut feeling is that a labeling of one string as being "leading" is too unflexible to solve future problems. So I'd go for the solution to always use a text object in context, and thus manually pick the right string from its contents based on that context.

Exemplary, if we connect the above text object

聖剣伝説2 (Japanese, Japanese)
Legend of the Sacred Sword 2 (Latin, English)
Seiken Densetsu 2 (Latin, Japanese)

as Japanese release title to the respective game, we need to specify that "聖剣伝説2" is the actual string used for this release. On the other hand, if a Latin user requires a list of all SNES games released in Japan, we would need to pick "Seiken Densetsu 2" as the string to show for this list.

So much for some basics to this complex issue, but there's one important question about i18n we didn't touch, yet. How to handle the different language versions of Oregami? While it may be rather easy to translate the (static) UI and help to another language, I am mainly talking about the textual content (descriptions screenshot captions, etc.), i.e. the data.

I think we will be well advised to only start a new data language once this language's community has grown to a critical mass of native contributors / approvers. But which way to go after we started more languages besides English? I see two basic ways:

1) The Wikipedia way: every language grows alone, more or less based on common standards. The quality of the texts may differ severely from one language to another, nonetheless.

2) English is the central language, so every other language's text is translated from and to it, common standards apply strictly. The quality level is comparable in every language.

Details need to be worked out, but what way do you prefer?
 #37983  by Ultyzarus
 27 Mar 2014, 16:47
I would not allow unofficial translations or titles at all, unless it is very commonly used (like common abbreviation FF) it would make it far too complicated. I would not, for instance, add a kana spelling of "Secret of Mana", nor a literal translation "Le secret de Mana" (I have never seen or heard anyone use this in French to refer to this game ;) )

As for transliteration, what should be done is choosing one transliteration standard for every script, and use that one in priority, unless an official source (cover, website, in-game title, etc.) spells it using a different standard (one example would be some of the Dynasty Warriors game that are spelled "Musou" on the Japanese covers rather than using a macron on the "o".

We should also be aware of grammatical differences for different languages, for instance, a colon in a French title needs a space before and after. All German nouns have a capital letter, etc.
 #37984  by Ultyzarus
 27 Mar 2014, 16:59
Of course, that is mostly considering that we may not have the necessary workforce to use every possible translation and transliteration. I also advise against it unless we have specialized staff to peer-review those.

Speaking of which, we should first document the different standards for each script. If we decide to choose one standard for each, we can review each of them and see which ones seem the most intuitive / easy to understand. Being from different countries and speaking different languages in addition to English, we should be able to pinpoint the most international-database-friendly methods ;)
 #37985  by Ultyzarus
 27 Mar 2014, 17:24
For Japanese:
http://en.wikipedia.org/wiki/Romanization_of_Japanese

The most used are:

Hepburn (the one that is mainly used on MG, that I am also suggesting be used here)

Nihon-shiki romanization

Kunrei-shiki romanization

example:
English: Roman characters to shrink
Japanese: ローマ字 縮む
Japanese kana: ローマじ ちぢむ
Hepburn: rōmaji chijimu
Nihon-shiki: rômazi tizimu
Kunrei-shiki: rômazi tidimu

So the needed string for Japanese (titles and names) would be:
-Japanese spelling (kanji (+kana, and sometimes romaji)) - shows how it is actually spelled
-Kana spelling (kanji replaced by hiragana (+katakana and romaji is they are in the original spelling) - shows the pronounciation
-Romanized title (no specific standard if there is an official source for it) - shows official romanization
-Standardized romanized title (using Hepburn or another standard if we choose another one) - shows a standardized romanization that is consistent on every page)
 #37989  by MZ per X
 30 Mar 2014, 21:16
Ultyzarus wrote:I would not allow unofficial translations or titles at all, unless it is very commonly used (like common abbreviation FF) it would make it far too complicated. I would not, for instance, add a kana spelling of "Secret of Mana", nor a literal translation "Le secret de Mana" (I have never seen or heard anyone use this in French to refer to this game ;) )
Such titles would only be informal. The main focus will be on official release titles, and its transliterations.
Ultyzarus wrote:As for transliteration, what should be done is choosing one transliteration standard for every script, and use that one in priority, unless an official source (cover, website, in-game title, etc.) spells it using a different standard (one example would be some of the Dynasty Warriors game that are spelled "Musou" on the Japanese covers rather than using a macron on the "o".
Full ACK! :)
Ultyzarus wrote:We should also be aware of grammatical differences for different languages, for instance, a colon in a French title needs a space before and after. All German nouns have a capital letter, etc.
Basically, I agree. But if we mainly deal with official release titles, we will use the spelling used there, don't we?
Ultyzarus wrote:Of course, that is mostly considering that we may not have the necessary workforce to use every possible translation and transliteration. I also advise against it unless we have specialized staff to peer-review those.
I think that we will only start a new data language once we have a strong enough community for it. The steps that I foresee for a new language are:
1) a forum board
2) UI and help translation
3) a critical mass of contributors / approvers for that language with good reputation
Ultyzarus wrote:Speaking of which, we should first document the different standards for each script. If we decide to choose one standard for each, we can review each of them and see which ones seem the most intuitive / easy to understand. Being from different countries and speaking different languages in addition to English, we should be able to pinpoint the most international-database-friendly methods ;)
A list of transliteration standards in our wiki is something I would gladly pass on to you to work on. ;)
 #37991  by Ultyzarus
 31 Mar 2014, 16:11
MZ per X wrote:
Ultyzarus wrote:As for transliteration, what should be done is choosing one transliteration standard for every script, and use that one in priority, unless an official source (cover, website, in-game title, etc.) spells it using a different standard (one example would be some of the Dynasty Warriors game that are spelled "Musou" on the Japanese covers rather than using a macron on the "o".
Full ACK! :)
Basically, we would have two different Jap titles for this one, it is quite simple :):
-Japanese characters (if we add them, with the corresponding Japanese spelling characters (kana)) with the standard Romanization
-Official Romanization
MZ per X wrote:
Ultyzarus wrote:We should also be aware of grammatical differences for different languages, for instance, a colon in a French title needs a space before and after. All German nouns have a capital letter, etc.
Basically, I agree. But if we mainly deal with official release titles, we will use the spelling used there, don't we?
That is mainly when titles are displayed in full capitals, or when title is above subtitle (colons would be our addition, etc.), for special (AaaaAaaaaaaaAaaaaah) cases, we would still go with how it is officially displayed.
 #37992  by Ultyzarus
 31 Mar 2014, 16:17
MZ per X wrote:
Ultyzarus wrote:I would not allow unofficial translations or titles at all, unless it is very commonly used (like common abbreviation FF) it would make it far too complicated. I would not, for instance, add a kana spelling of "Secret of Mana", nor a literal translation "Le secret de Mana" (I have never seen or heard anyone use this in French to refer to this game ;) )
Such titles would only be informal. The main focus will be on official release titles, and its transliterations.
Ultyzarus wrote:Of course, that is mostly considering that we may not have the necessary workforce to use every possible translation and transliteration. I also advise against it unless we have specialized staff to peer-review those.
I think that we will only start a new data language once we have a strong enough community for it. The steps that I foresee for a new language are:
1) a forum board
2) UI and help translation
3) a critical mass of contributors / approvers for that language with good reputation
The thing with translation, is that 20 translators could come up with 20 different translations for the main title, and literal translation isn't usually recommended. Example: if "Secret of Mana" had been an unofficial translation of French game "Le secret de Mana", there could have been "Mana's secret" as a translation possibility, and both would be correct. The more complex a title is, the more possibilities there are...
 #37993  by MZ per X
 31 Mar 2014, 20:49
Ultyzarus wrote:
MZ per X wrote:
Ultyzarus wrote:We should also be aware of grammatical differences for different languages, for instance, a colon in a French title needs a space before and after. All German nouns have a capital letter, etc.
Basically, I agree. But if we mainly deal with official release titles, we will use the spelling used there, don't we?
That is mainly when titles are displayed in full capitals, or when title is above subtitle (colons would be our addition, etc.), for special (AaaaAaaaaaaaAaaaaah) cases, we would still go with how it is officially displayed.
Ah, okay, sounds reasonable.
Ultyzarus wrote:The thing with translation, is that 20 translators could come up with 20 different translations for the main title, and literal translation isn't usually recommended. Example: if "Secret of Mana" had been an unofficial translation of French game "Le secret de Mana", there could have been "Mana's secret" as a translation possibility, and both would be correct. The more complex a title is, the more possibilities there are...
I don't see that as a problem, because these kind of translations would be only there to give a foreign user a glimpse of the meaning. So it doesn't matter whether the translation would be "Secret of Mana" or "Mana's secret", as long as the title's meaning is correctly transported.
 #37994  by Ultyzarus
 31 Mar 2014, 21:02
MZ per X wrote:
Ultyzarus wrote:The thing with translation, is that 20 translators could come up with 20 different translations for the main title, and literal translation isn't usually recommended. Example: if "Secret of Mana" had been an unofficial translation of French game "Le secret de Mana", there could have been "Mana's secret" as a translation possibility, and both would be correct. The more complex a title is, the more possibilities there are...
I don't see that as a problem, because these kind of translations would be only there to give a foreign user a glimpse of the meaning. So it doesn't matter whether the translation would be "Secret of Mana" or "Mana's secret", as long as the title's meaning is correctly transported.
All right, as long as we have our in-house translators, this should go smoothly ;)