1
catalin
what u said
  • 2003/10/17 18:01

  • catalin

  • Just popping in

  • Posts: 30

  • Since: 2003/9/4 7


the thinghy u said abot forum=5372g82vfsvf8u turned into t949f-html ( or how did u say it ) is the Search engine indexer from techwizards.com ... and that only changes the URL of the site form forum=35b1297b5129 to simple t84187.html ( or anything else with html )

But a script that can take the hole content of the site and generate a identical ( in look and content ) of the php site. Now when it will be indexed all major engines will index it like hell ... and give u top 3, or 2 in almost al queryes. Believe me i worked at a Search engine optimizing company ... :) now demised ...

Xoops core team i have a Question ... can u make it ???

If u aren`t inspired do some b2evolution testing ( that`s were i got the ideea from ) ... BTW my blog hao one some first places in google :P

2
mvandam
Re: what u said
  • 2003/10/17 18:19

  • mvandam

  • Quite a regular

  • Posts: 253

  • Since: 2003/2/7 2


I believe you can do these kind of tricks with 'mod_rewrite' (if you are using apache web browser). Probably similar mechanisms exist for other web servers. However, such a feature is a *server-level* trick and cannot be done from within the XOOPS php code, I think.

The problem is that each module will probably need separate rules because there is no consistent use of arguments to specify which content page to look at. e.g. news uses 'article_id', forum uses 'forum', 'topic_id' and 'post_id', etc... But still, it could be done. Perhaps someone has already done this??

But for search engines, does it really matter? I've heard lots of discussion and from what I gather it doesn't matter if your URL is viewtopic.php&topic_id=1234 or view.topic.1234.html -- i.e. you would have equal rankings with google assuming all other content, links (to and from), and meta info is the same. Doing some random 'googling' I frequently see php scripts listed...

3
skalpa
Re: what u said
  • 2003/10/17 19:04

  • skalpa

  • Quite a regular

  • Posts: 300

  • Since: 2003/4/16


Quote:
But still, it could be done. Perhaps someone has already done this??


Don't think so coz it wasn't possible until recently due to a bug in the kernel.
However, it is now, and I think it could be useful (not specially for search engines, as I think you're right, but also because I think some people would prefer advertising URLs like /news/mytopic/ instead of /modules/news/index.php?storytopic=N).
So I'm seriously investigating all that whould be needed for that to be implemented (coz even if mod_rewrite rules are written, they're useless if the links throughout the site all stay /modules/stuff/?topic=nnn

And also, that's the 2nd time today I see someone wondering about the presence of words in an URL and engines ranking, so I'll try double-checking about that (I'm not sure at all actually, and even your're right in your example, maybe changing a number to a word could be relevant).

Skalpa.>

4
Anonymous
Re: what u said
  • 2003/10/17 19:21

  • Anonymous

  • Posts: 0

  • Since:


An Example to work with mod_rewrite and XOOPS you see here:

http://www.myxoopsforge.de/modules/phpwiki/index.php/XoopsModRewrite

Greetz Predator

5
Draven
Re: what u said
  • 2003/10/17 19:40

  • Draven

  • Module Developer

  • Posts: 337

  • Since: 2003/5/28


Quote:

Predator wrote:
An Example to work with mod_rewrite and XOOPS you see here:

http://www.myxoopsforge.de/modules/phpwiki/index.php/XoopsModRewrite


Any chance somone could translate that for us non-german folk?


On the topic of search engines, while Google can crawl "index.php?var=value" many other major search engines still do not, and their crawlers fail to index the pages. If you look at any major website like MSN, Yahoo, CNN, Netscpe etc.. they all use some sort of URL rewrite meathod to display their content. Also, most crawlers won't go below a certain sublevel of direcotries, usually 2, unless a direct link is found in a higher level (like the homepage). So if you have http://mysite.com/xoops/modules/articles/index.php?etc chances are the crawlers won't go deeper than the modules main directory. This is also an instance where having http://mysite.com/articles/4566.html would be usefull for indexing.

There's actually some very good articles over at http://www.sitepoint.com concerning engines and indexing. Give a read if you have a chance.

http://www.sitepoint.com/article/910
http://www.sitepoint.com/article/1060
http://www.sitepoint.com/article/485
http://www.sitepoint.com/article/945


There's just a few to start you going.

6
Draven
Re: what u said
  • 2003/10/17 20:15

  • Draven

  • Module Developer

  • Posts: 337

  • Since: 2003/5/28


Quote:
Myth #8: Google Will Not Index Dynamic Pages

Some search engines have, in the past, had problems with dynamic pages, that is, pages that use a query string. This was not due to any technical limitation, but rather, because search engines knew that it was possible to create a set of an infinite amount of dynamic pages, or they could create an endless loop. In either case, the search engines did not want their crawlers to be caught spidering endless numbers of dynamically generated pages.

Google is a newer search engine, and has never had a problem with query strings. However, some dynamic pages can still throw Google for a loop.

Some shopping carts or forums store session information in the URL when cookies are unable to be written. This effectively kills search engines like Google because search engines key their indexes with URLs, and when you put session information in the URL, that URL will change constantly. This is especially true as Google uses multiple IP addresses to crawl the Web, so each crawler will see a different URL on your site, which basically results in those pages not being listed. It is important that if you use such software, you amend it so that if cookies are unable to be written, the software simply does not track session information.

So, you don't need to use search engine-friendly URLs to be listed in Google. However, these URLs do have other benefits, such as hiding what server side technology you use (so that you may change it seamlessly later), and they are more people-friendly. Additionally, while Google can spider dynamic pages, it may limit the amount of dynamic pages it spiders from one particular site. Your best bet for a good ranking is to use search-engine friendly URLs.


7
skalpa
Re: what u said
  • 2003/10/17 21:00

  • skalpa

  • Quite a regular

  • Posts: 300

  • Since: 2003/4/16


Thanks a lot folks !

The german article describes (I think) what I was planning to do (well, it's even better ), but I believed the fix needed was already done, so I'll ensure it's really fixed in the next release.

In short:
-There's a little fix for common.php
-Then he registers a Smarty "output filter".
These funcs are called after the template has been processed but before it's sent to the browser, so he can replace all the Xoops-like URLs with the one that are in htaccess (the opposite of what Apache does)

Quote:
Also, most crawlers won't go below a certain sublevel of directories, usually 2, unless a direct link is found in a higher level (like the homepage).


That's what the trick at the bottom is for (a nice one also): it checks the User-agent string to see if it's a search engine bot and if it is, it will send a custom page (so you can insert all the links you want here), instead of the normal homepage.

Thanks again for all that infos

Skalpa.>

8
Anonymous
Re: what u said
  • 2003/10/17 21:35

  • Anonymous

  • Posts: 0

  • Since:


@Draven if your intressting in this example of core hack for mod_rewrite let me know and i translate it.

Well due to the coming english part of the site all these stuff will be in any case be translated and expand.

Greetz Predator

Login

Who's Online

260 user(s) are online (204 user(s) are browsing Support Forums)


Members: 0


Guests: 260


more...

Donat-O-Meter

Stats
Goal: $100.00
Due Date: Dec 31
Gross Amount: $0.00
Net Balance: $0.00
Left to go: $100.00
Make donations with PayPal!

Latest GitHub Commits