1
russy
Russian encoding issues of "xhld"
  • 2005/12/8 1:38

  • russy

  • Just popping in

  • Posts: 8

  • Since: 2005/12/8


Hello.

My site runs on the Japanese version of Xoops, and xhld (a RSS feed module) is installed on it to get RSS feeds from news sites written in Japanese, English, and Russian.

The issue I'm experiencing is that, a feed from BBC Russia cannot be displayed correctly on my site such that all the Cyrillic characters are garbled in the title line as the output of xhld. The encoding of the news feed provided by BBC Russia is not viewable (http://newsrss.bbc.co.uk/rss/russian/russia/rss.xml) but it seems to be windows-1251.

More specifically speaking, on the admin panel of xhld installed on my site, when I tried to enter the site's URL and its RSS's URL in each box required for the set-up, I got the following error.

"XmlParse error: not well-formed (invalid token) at line 4"

Now, could anyone kindly explain how to solve this issue? I have looked into two of the relevant files; russian/headlinerenderer.php and japanese/headlinerenderer.php in the module. I am not sure about which lines in either/both of the files should be fixed.

russian/headlinerenderer.php is written as,


// This is a sample of using iconv() for converting UTF-8 <-> non-iso-encoding
// Replace "WINDOWS-1251" to your encoding
// Don't forget adding the encoding into _AM_ENCODINGS of language/(your language)/admin.php


//Russian translation and russian encoding adaptation by Vladislav "FractalizeR" Rastrusny. http://www.vrsi.ru


if( ! class_exists( 'XhldRendererLocal' ) ) {
class XhldRendererLocal extends XhldRenderer
{
function XhldRendererLocal( &$headline , $mydirname='xhld0' )
{
parent::XhldRenderer( $headline , $mydirname ) ;
}

function convertFromUtf8(&$value, $key)
{
if( ! is_string( $value ) ) return ;
if( stristr( _CHARSET , 'iso-8859-1' ) ) {
$value = utf8_decode( $value ) ;
} else if( $this->_hl->getVar('headline_encoding') == 'iso-8859-1' && ! $this->_hl->getVar('headline_allowhtml') ) {
$value = htmlentities( utf8_decode( $value ) ) ;
} else {
$value = iconv( "UTF-8" , "WINDOWS-1251" , $value ) ;
}
}

function &convertToUtf8(&$xmlfile)
{
$encoding = $this->_hl->getVar('headline_encoding') ;

// auto detection
if( empty( $encoding ) ) {
$top_of_xml = substr( $xmlfile , 0 , 255 ) ;
preg_match( "/^<\?xml .* encoding=['\"]?([0-9a-z_-]+)/i", $top_of_xml , $regs ) ;
if( empty( $regs ) ) {
$encoding = 'utf-8' ;
} else {
$encoding = strtolower( $regs[1] ) ;
}
$this->_hl->setVar( 'headline_encoding' , $encoding ) ;
$headline_handler =& xoops_getmodulehandler('headline', $this->_mydirname);
$headline_handler->insert($this->_hl);
}

switch( strtolower( $encoding ) ) {
case 'iso-8859-1' :
$xmlfile = utf8_encode( $xmlfile ) ;
break ;
case 'windows-1251' :
$xmlfile = iconv( "WINDOWS-1251" , "UTF-8" , $xmlfile ) ;
break ;
case 'koi8-r' :
$xmlfile = iconv( "KOI8-R" , "UTF-8" , $xmlfile ) ;
break ;
case 'koi8-u' :
$xmlfile = iconv( "KOI8-U" , "UTF-8" , $xmlfile ) ;
break ;
case 'koi8-ru' :
$xmlfile = iconv( "KOI8-RU" , "UTF-8" , $xmlfile ) ;
break ;
case 'utf-8' :
default :
break ;
}

return $xmlfile;
}
}
}
?>

While japanese/headlinerenderer.php is,

if (function_exists('mb_convert_encoding') && ! class_exists( 'XhldRendererLocal' ) ) {
class XhldRendererLocal extends XhldRenderer
{
function XhldRendererLocal( &$headline , $mydirname='xhld0' )
{
parent::XhldRenderer( $headline , $mydirname ) ;
if( ! preg_match( '/(EUC-JP|UTF-8|SJIS)/i' , mb_internal_encoding() ) ) {
mb_internal_encoding( 'EUC-JP' ) ;
}
}

function convertFromUtf8(&$value, $key)
{
if( ! is_string( $value ) ) return ;
if( stristr( _CHARSET , 'iso-8859-1' ) ) {
$value = utf8_decode( $value ) ;
} else if( $this->_hl->getVar('headline_encoding') == 'iso-8859-1' && ! $this->_hl->getVar('headline_allowhtml') ) {
$value = htmlentities( utf8_decode( $value ) ) ;
} else {
$value = mb_convert_encoding( $value , mb_internal_encoding() , 'UTF-8' ) ;
}
}

function &convertToUtf8(&$xmlfile)
{
$encoding = $this->_hl->getVar('headline_encoding') ;

// auto detection
if( empty( $encoding ) ) {
$top_of_xml = substr( $xmlfile , 0 , 255 ) ;
preg_match( "/^<\?xml .* encoding=['\"]?([0-9a-z_-]+)/i", $top_of_xml , $regs ) ;
if( empty( $regs ) ) {
$encoding = 'utf-8' ;
} else if( stristr( $regs[1] , 'JIS' ) ) {
$encoding = 'shift_jis' ;
} else if( stristr( $regs[1] , 'euc' ) ) {
$encoding = 'euc-jp' ;
} else if( stristr( $regs[1] , 'utf-8' ) ) {
$encoding = 'utf-8' ;
} else {
$encoding = strtolower( $regs[1] ) ;
}
$this->_hl->setVar( 'headline_encoding' , $encoding ) ;
$headline_handler =& xoops_getmodulehandler('headline', $this->_mydirname);
$headline_handler->insert($this->_hl);
}

switch( strtolower( $encoding ) ) {
case 'iso-8859-1' :
$xmlfile = utf8_encode( $xmlfile ) ;
break ;
case 'shift_jis' :
$xmlfile = str_replace( chr( 0 ) , '' , mb_convert_encoding( $xmlfile , "UTF-8" , "Shift_JIS" ) ) ;
break ;
case 'euc-jp' :
$xmlfile = str_replace( chr( 0 ) , '' , mb_convert_encoding( $xmlfile , "UTF-8" , "EUC-JP" ) ) ;
break ;
case 'utf-8' :
default :
break ;
}

return $xmlfile;
}
}
}
?>

I would very much appreciate it if any of you could shed light on this issue.

Thank you.

russy

2
LazyBadger
Re: Russian encoding issues of "xhld"

1. In any case you' not ne able to see russian and japan text on same page (on not-UTF8 site)
2. Feed tried, added, viwed in my test-box
3. Check presence of iconv on your host
4. If your main locale is Japan, add into japanese/headlinerenderer.php part of russian/headlinerenderer.php (at least one more case)

case 'windows-1251' :
$xmlfile = iconv( "WINDOWS-1251" , "UTF-8" , $xmlfile ) ;
break ;
Quis custodiet ipsos custodes?

Webmaster of
XOOPS2.RU
XOOPS Modules Proving Ground
XOOPS Themes Exhibition

3
russy
Re: Russian encoding issues of "xhld"
  • 2005/12/8 5:45

  • russy

  • Just popping in

  • Posts: 8

  • Since: 2005/12/8


Quote:

case 'windows-1251' :
$xmlfile = iconv( "WINDOWS-1251" , "UTF-8" , $xmlfile ) ;
break ;


Hi, LazyBadger,

Thanks for your input. My site is hosted by a US company, but its target audience is those who can understand not only Japanese but also English and Russian.

As you suggested, I simply put the following lines you wrote

case 'windows-1251' :
$xmlfile = iconv( "WINDOWS-1251" , "UTF-8" , $xmlfile ) ;
break ;

in the lines below (in the "japanese/headlinerenderer.php" file)

case 'euc-jp' :
$xmlfile = str_replace( chr( 0 ) , '' , mb_convert_encoding( $xmlfile , "UTF-8" , "EUC-JP" ) ) ;
break ;

So, it looks like,

case 'euc-jp' :
$xmlfile = str_replace( chr( 0 ) , '' , mb_convert_encoding( $xmlfile , "UTF-8" , "EUC-JP" ) ) ;
break ;
case 'windows-1251' :
$xmlfile = iconv( "WINDOWS-1251" , "UTF-8" , $xmlfile ) ;
break ;

But this didn't do anything. In case I misunderstood what you suggested, could you elaborate on that, please?

Thanks.

Russy

4
russy
Re: Russian encoding issues of "xhld"
  • 2005/12/9 8:41

  • russy

  • Just popping in

  • Posts: 8

  • Since: 2005/12/8


Is there anyone here who can solve this issue?

5
LazyBadger
Re: Russian encoding issues of "xhld"

Probably I can... some mwmbers are active chatting in one more useless thread "can XOOPS compete with joomla?" and haven't time to help, only to show own Ego.

You didn't anwser my question - is japan locale main locale of your site?

Question 2 - did you define in feed properties it's charset as Windows-1251

Question 3 - can you see readable text if your'll switch to Windows-1251 charset be hand in browser?

Question 4 - is your site reacheable from net and viewable?
Quis custodiet ipsos custodes?

Webmaster of
XOOPS2.RU
XOOPS Modules Proving Ground
XOOPS Themes Exhibition

6
russy
Re: Russian encoding issues of "xhld"
  • 2005/12/10 2:31

  • russy

  • Just popping in

  • Posts: 8

  • Since: 2005/12/8


LazyBadger,

>is japan locale main locale of your site?

You mean, the site audience or the server location?

Main site audience: Those who understand Japanese
Server location; USA

>Question 2 - did you define in feed properties it's charset as Windows-1251

Yes, I added WINDOWS-1251 to the admin of xhld via _AM_ENCODINGS, and selected the encoding for the feed property.

> Question 3 - can you see readable text if your'll switch to Windows-1251 charset be hand in browser?

The text of the headline is garbled like, Áîëüøîé âåðíóë "Âîé&iacut...(Би-би-си - Россия). The texts in parentheses is displayed correctly.

> Question 4 - is your site reacheable from net and viewable?

My site is viewable on the net. It displays Cyrillic characters without any problems except for the output in this headline module.

7
LazyBadger
Re: Russian encoding issues of "xhld"

Quote:

russy wrote:
You mean, the site audience or the server location?

No and now... only main language of your site, sorry for poor definition
Quote:

Yes, I added WINDOWS-1251 to the admin of xhld via _AM_ENCODINGS, and selected the encoding for the feed property.

Well, at least module and this feed configured properly

Quote:

The text of the headline is garbled like, Áîëüøîé âåðíóë "Âîé&iacut...

Russian text shown in iso-8859-1 charset...

Quote:
My site is viewable on the net. It displays Cyrillic characters without any problems except for the output in this headline module.

I suppose (it can be wrong imagination), that your host doesn't have iconv. Can you ask server's admin about this? Or enable php-debug and check for "Warning [PHP]: input conversion failed due to input error in file modules/xhld0/class/saxparser.php..." in output?
Quis custodiet ipsos custodes?

Webmaster of
XOOPS2.RU
XOOPS Modules Proving Ground
XOOPS Themes Exhibition

8
russy
Re: Russian encoding issues of "xhld"
  • 2005/12/11 3:07

  • russy

  • Just popping in

  • Posts: 8

  • Since: 2005/12/8


Hi LazyBadger,

Thanks a lot for your inputs again.

I've asked the server administators hosting my site to see if their servers support iconv() but haven't heard from them. I also tried the debug function but didn't get the error you mentioned.

All I got is:

"XmlParse error: not well-formed (invalid token) at line 4"

in the admin panel of this module when the encoding was set to "automatic selection" or any other encodings including windows-1251.

Xoops itself doesn't have any problem with displaying Cyrillic characters but this module does have the encoding issue, when it comes to displaying some foreign chara sets like Russian.

I'm pretty sure that I should edit some lines in the module/xhdl/language/japanese/headlinerenderer.php" file, but don't know which and how.

I really need to get this worked. So far, it seems that you are the only one who probably understands what I'm trying to do. Would it be possible for you to edit the relevant files for me?

The encoding issue of this module has been a pain in my ass for the whole week, and really need someone's help with this.

9
russy
Re: Russian encoding issues of "xhld"
  • 2005/12/11 10:25

  • russy

  • Just popping in

  • Posts: 8

  • Since: 2005/12/8


OK, here is the update:

The server hosting my website does supports iconv().

Now what should I do?

russy

10
LazyBadger
Re: Russian encoding issues of "xhld"

Just dirty idea, - try to add my feed (Win-1251 also) for test... if you'll get error on line 20, it can means only one - for some (unknown me) reason you can't use russian in your imported feeds. Why - I'm not sure, but russian+any other more than 7-bit language is always big headache, if UTF8 not used on site.
Maybe, GIJOE can say something more useful in his site's forum

I translated your problem already in xhld forum
Quis custodiet ipsos custodes?

Webmaster of
XOOPS2.RU
XOOPS Modules Proving Ground
XOOPS Themes Exhibition

Login

Who's Online

174 user(s) are online (83 user(s) are browsing Support Forums)


Members: 0


Guests: 174


more...

Donat-O-Meter

Stats
Goal: $100.00
Due Date: Jan 31
Gross Amount: $0.00
Net Balance: $0.00
Left to go: $100.00
Make donations with PayPal!

Latest GitHub Commits