xoops forums

mvandam

Quite a regular
Posted on: 2003/10/21 0:58
mvandam
mvandam (Show more)
Quite a regular
Posts: 253
Since: 2003/2/7 2
#1

Discussion: How best to make CONTENT multilingual

After reading joelg's recent comment in this thread:

https://xoops.org/modules/newbb/viewto ... p?topic_id=10734&forum=14

I thought it would be useful to start a discussionn on how *best* to implement multilingual capability for CONTENT within Xoops.

The above mentioned thread is a nice 'hack' to the core that allows codes different languages for the same text to all be entered into the same field. Each language text is surrounded by square brackets, and only the text for the currently selected language is shown when the content is rendered. This hack deals with content, titles, administration, and even block titles, I believe.

But is this really the *best* way to do this? i.e. If the XOOPS core was going to support multilingual content how would we do so from the ground up? Perhaps we will decide that the methods discussed in the above hack are the best, but I would like to initiate a little discussion to see what others think.

As joelg and others have pointed out, this hack does have a few drawbacks:

- if you have a lot of languages on your site, then your text becomes huge (storage problem in limited size database fields) and becomes intimidating to edit

- your users may not be comfortable with using square bracket codes to surround text from a certain language

- if your users don't all understand all languages, then why make all translations available to them while they edit. This may be confusing and introduces the possibility that they can mess up the translations in other languages.

I love to hear thoughts on what is a good method, or simply what features/characterestics would a good method have? Also does anyone have any favourite software which does a good job of multilingual content that might serve as an example? etc... Comments welcome!

svaha

Just can't stay away
Posted on: 2003/10/21 1:34
svaha
svaha (Show more)
Just can't stay away
Posts: 896
Since: 2003/8/2 2
#2

Re: Discussion: How best to make CONTENT multilingual

I write articles in English and Dutch. On my 'old' site http://www.amevita.com I did this in html. So everything I published had to be put in different html files. Now on my new site http://www.rainbowshaman.com I can simply publish by inserting into the text [ee][/ee] and [nl][/nl]. But it was for me quite a job to come so far.
As a standard :
1)You would need to assign a number to each language.
2)A global array (also to be used by the blocks/modules. In it the languages/numbers to be used
3)Module names/block names/menu item names/topic names and so on should be replaced by symbolic names
4)For the content : I like the method I use now (with the square brackets) When I make a mistake, I see it when I post, and no big harm is done. Some things I write are only in dutch language (local events in dutch language for example)So when you can't read dutch, just skip that article.

Of course there are some items like for instance searching that are more difficult, but it's a matter of growing insight, and though I am a very 'young' Xoopser I noticed that this is a community with spirit.

Aloha

PureLuXus

Not too shy to talk
Posted on: 2003/10/21 2:34
PureLuXus
PureLuXus (Show more)
Not too shy to talk
Posts: 116
Since: 2002/1/3 2
#3

Re: Discussion: How best to make CONTENT multilingual

hmmm

i think before the XOOPS team think about new features...
think about to make the 2.0.5 perfect before adding more and more things and bugs

svaha

Just can't stay away
Posted on: 2003/10/21 12:28
svaha
svaha (Show more)
Just can't stay away
Posts: 896
Since: 2003/8/2 2
#4

Re: Discussion: How best to make CONTENT multilingual

In real life we can often best act before we think.
In the software world, we can better think before we act.
This will lead to a good foundation of the (eventually) language implementation--> lesser bugs.
Meanwhile (this is just a basic discussion) we can work on a safer/even more stable Xoops.
The power of XOOPS lies in the diversity of its users/ developpers. When you limit yourself to one thing, you can't see the whole.
Maybe this simple discussion on a 'future' item will lead to the solution of bugs/errors, so carry on with an open mind.

Aloha

azeini

Just popping in
Posted on: 2003/10/21 13:58
azeini
azeini (Show more)
Just popping in
Posts: 19
Since: 2003/7/23
#5

Re: Discussion: How best to make CONTENT multilingual

Few days back, I tried to start a thread with regard to right-to-left (RTL) content, which unfortunately went off un-noticed.

We are using the little hack mentioned in that thread successfully on three XOOPS powered sites and are happy to be able to post in Farsi and English. If you want to see some samples, feel free to browse through the news and forums of our site. I think that using square brackets is a good method to support different scripts. And I would wish to see UTF-8 set as the default charset:
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />

We even have a little hack somewhere to add a button for RTL text so that the user doesn't need to type it in everytime.

I am happy to see this thread now. It is time for XOOPS to offer some multilingual features, and adding the general concept is "very" simple, at least from my POV and understanding.

Thoughts and feedback are welcome.

mvandam

Quite a regular
Posted on: 2003/10/21 23:42
mvandam
mvandam (Show more)
Quite a regular
Posts: 253
Since: 2003/2/7 2
#6

Re: Discussion: How best to make CONTENT multilingual

Thank you everyone for your suggestions so far. It seems that a couple of people like the square bracket approach. I don't currently run a multilingual site but I just imagined that on a large site if I wanted to support many languages (e.g. 10 or 20) then this approach does not seem very practical.

What I envision is a method where the user can select a language and items not available translated into that language are either shown in the as-written language or not displayed at all. As far as editing and storing the multiple translations... we have so far the square bracket approach. That means all translations are stored together and edited together. I think any problems with searching can be sorted out (probably with a little performance hit).

Another possibility is to store each translation in a separate database 'row'/record. This will improve performance (don't need to load all translations from database just to show one language version, and also don't need to parse for language identifiers). However, how does this get entered? I suppose a user would by default create the translation in their selected language. But how could they create additional translations in other languages? Perhaps a simple selector on the edit form would suffice to choose the language being entered. This could work for real content I think, but additional things like titles, block titles, menu items, etc... pose a bit of a problem. Perhaps a combination of the two approaches?

Just throwing some ideas out there... any thoughts?

Daigoro

Quite a regular
Posted on: 2003/10/24 23:40
Daigoro
Daigoro (Show more)
Quite a regular
Posts: 223
Since: 2003/7/3 2
#7

Re: Discussion: How best to make CONTENT multilingual

I'm sorry for beeing quiet for so long.

Beeing one of the people hacking away to implement the square brackets system to create a multilangual system, I've slowly started to form an opinion about how to implement a multilangual system.

I'll start by going though the square brackets approach, then some idears I have to implement the square brackets approach in an easier manner then currently done and finaly an idear on how to make a overall better multilangual system.

The square brackets approach does work, but it's not easy to maintain and it's a huge database "bandwidth" waster. It requires ALL text not part of the language files to be routed though the textsanitizer. A few exceptions exixts, such as the places where you edit the lines containing the square brackets, typically in admin.

The good thing about the square bracket approach, is that everything not included in square brackets will be treated as common language information.
The bad thing is that you need to remember to include the language specific in square brackets containing the language name, and that you will edit all languages at the same time.


The square bracket technics does indeed deal with all parts of xoops, including content, titles, administration, and block titles. BUT many of these still need to be hacked.
I have a number of hacks still not published on this site, but they will be as soon as time allows.


I have an idear on how to implement the hack in a global fasion. The idear is based on my lowlevel programming skills - I have no idear if it's possible within php, XOOPS or in another fasion. (The only php I know is from hacking Xoops).

I'll base this description on a C alike syntax.

Any kind of text comming out of a C program will come out one letter at a time, usualy though a routine called putchar(). The putchar rutine is the one one will patch if you need to convert from eg. Mac line-terminaltion to PC/Windows line-termination.
If XOOPS or php implements a function like that, then it would be possible to add the textsanitizer there, and thus make the system and all existing modules support the square brackets approact. The maximum linelength in both the system/modules and database will still have to be ajusted, though.
A person with intimate php knowledge may be able to make this hack.


Now for the overall system to become multilangual will require alot more the the square brackets approach.
The multilangual system I'm currently working on, uses danish, japanese and english. The common language is english, but the danes are learning japanese, and the japanese are learning danish.
This means that if a message is written in one language only, then people reading one or the other languages should still see that message, even if it's not their selected language. To combat problems with corrupted text, all contens is using multibytes chars.

One way I can think of to implement this, is to have (in this case) three databases running, one for each language.
At any time a message is entered, it should be saved only in the database for the language selected when typing the message.
If the message is then displayed in another language, then it should automatically select the language it was originally written in. If I choose to edit the message when using another language then the one it was written in, then it should ask for a translation, and save the new translation in the current language database. If two or more translations exists, and you select to view or edit the message using another language, then it should use a prioritized list to select which translation to display.
Many aspects of this still need to be ironed out, but I belive the basis to implement this approach is available.
It could be made as a transparent layer, where the database access is controlled by some core functions, which are matched to the language selector.
A special translator module may need to be made, too, where one can select one of many source languages, and a destination language - possibly with a personal selectable preferences setting. I imagine two senarios for this translator module - one is to edit existing contents, the other is to list and edit any untranslated contents.

I'm looking forward to a discussion about this important subject.

Best regards,

ajaxbr

Quite a regular
Posted on: 2003/10/26 19:34
ajaxbr
ajaxbr (Show more)
Quite a regular
Posts: 276
Since: 2003/10/25
#8

Re: Discussion: How best to make CONTENT multilingual

They way I see it, separation at the database level would be the best approach.

Xoops core would have an option to turn multilanguage content on or off, if it's on you´ll have to choose which N languages are supported. The database structure remains pretty much the same, except that for each content item you'll have N separate content, identifiable some flag that both allows editing of a given language version and displaying of a given version. Perhaps the language selection already available could be enhanced so that when a user changes it, the content, interface and char-encoding are affected.

So we have the same space use in the database, 1/N the bandwidth used in the square brackets way and the ability to make language driven searches. A default of showing all versions if no language code is set by the client makes this approach very search-engine friendly. A feature to generate Babelfish or Google-like translations for a quick and dirt version in languages the author can't handle would be nice too.

What do you think?

Herko

XOOPS is my life!
Posted on: 2003/10/26 20:26
Herko
Herko (Show more)
XOOPS is my life!
Posts: 4238
Since: 2002/2/4 1
#9

Re: Discussion: How best to make CONTENT multilingual

why whole separate db's? Every bit of content stored in the DB has an identifier. Why not make a module or a core feature that one can enable, that lets editors add a second or thrid or as many languages as you want to a site, by linking the unique item ID to the translated content? Thus, you have separate tables for the different languages, you can even specify which content is available in which languages, and for what groups. This is just a thought, not a technical scheme, but may be good for this discussion...

Herko

ajaxbr

Quite a regular
Posted on: 2003/10/26 20:50
ajaxbr
ajaxbr (Show more)
Quite a regular
Posts: 276
Since: 2003/10/25
#10

Re: Discussion: How best to make CONTENT multilingual

Thanks Herko, that was what I meant by Quote:
The database structure remains pretty much the same, except that for each content item you'll have N separate content, identifiable some flag that both allows editing of a given language version and displaying of a given version
but I didn't make myself clear enough.
I believe ideally the possible content languages should be hardcoded and linked to equivalent charsets and interface languages, is there any obvious flaw in that?

Would there be any kind of performance loss associated implementing a feature like this?