There is one feature that was recently added to search engines, that is a sitemap..
When you are implementing a sitemap they are an XML Document that tells search engine, when and where to scan/crawl on your site.
So far for all my site I have had to build a stand alone sitemap that is in php that does the site's content etc. It would be good if there was a controlling class in the module systems for this..
There are a couple of things you have to do to get a sitemap working properly..
First it has to have only existing URL's in it; you can see the full sitemap this is taken from at
http://www.bee-unlimited.co.uk/sitemap.phpfor example a section of this is like:
<urlset> <loc>http://www.bee-unlimited.co.uk/modules/content/?id=1
<lastmod>2008-01-01lastmod>
<changefreq>monthlychangefreq>
<priority>1priority>
url>
<url>
<loc>http://www.bee-unlimited.co.uk/modules/content/?id=2
<lastmod>2008-01-01lastmod>
<changefreq>monthlychangefreq>
<priority>1priority>
url>
urlset>
Loc: URL for crawling
lastmod: The last modification time
changefreq: Daily, Monthly, Yearly, Never - This is the frequency of change
Priority: 1 - 0.1 (1 being the highest).
For sitemap convention check
http://www.sitemap.org, you will also have to change your robots.txt to add in the sitemap for search engines like MSN & Live which only automatically check for them
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /cache/
Disallow: /class/
Disallow: /images/
Disallow: /include/
Disallow: /install/
Disallow: /kernel/
Disallow: /language/
Disallow: /templates_c/
Disallow: /themes/
Disallow: /uploads/
Sitemap: /sitemap.php
Amicability most modules do not have a all this data, ie the content module, which has no date of creation.
The other change you may have to make is to common.php what I have done is made a database with robots crawler hostnames that allows for crawlers to access section anonymously but not the regular user.
This is a simple few lines of code around 20
and example bot checking tool would be:
function friendlybot(){
global $xoopsDB;
$hostname = gethostbyaddr($_SERVER['REMOTE_ADDR']);
$sql = "SELECT bottype FROM ".$xoopsDB->prefix("bot");
$ret = $xoopsDB->query($sql);
$state=false;
while ($row = $xoopsDB->fetchArray($ret)){
if (strpos(" ".$hostname,$row['bottype'])>0){
$state=true;
}
}
return $state;
}
Where the change in common.php would be start at lines 336 and would be changed to something like this allowing for crawlers to crawl a page but the security still to be implemented.
if ($xoopsUser) {
if (!$moduleperm_handler->checkRight('module_read', $xoopsModule->getVar('mid'), $xoopsUser->getGroups())) {
if (!friendlybot()){
redirect_header(XOOPS_URL."/user.php",1,_NOPERM);
exit();
}
}
$xoopsUserIsAdmin = $xoopsUser->isAdmin($xoopsModule->getVar('mid'));
} else {
if (!$moduleperm_handler->checkRight('module_read', $xoopsModule->getVar('mid'), XOOPS_GROUP_ANONYMOUS)) {
if (!friendlybot()){
redirect_header(XOOPS_URL."/user.php",1,_NOPERM);
exit();
}
}
}
Your
bottype varchar (128) field would contain things like:
archive.org
ask.com
crawl.yahoo.net
cs.tamu.edu
cuill.com
entireweb.com
googlebot.com
inktomisearch.com
looksmart.com
msnbot.msn.com
picsearch.com
search.live.com
snarked.org
yahoo.com
blinklist.com
icio.us
digg.com
furl.net
simpy.com
spurl.net
img.com
facebook.com
We need something like a globalism solution for site maps as it will ensure our websites are indexed correctly and quickly, I have index appear in google in a couple of hours with a sitemap rather than several days since i implemented this.
Sitemap.php looks like this at
http://www.bee-unlimited.co.uk but you would have to customise this for most environment until some form of standardisation occurs in the XOOPS library.
require("mainfile.php");
header('Content-type: text/xml');
$query[0] = "SELECT storyid from _beeu_content";
$query[1] = "SELECT storyid from _beeu_singleclip";
$query[2] = "SELECT storyid from _beeu_vjmixes";
$query[3] = "SELECT lid,cid,date from _beeu_myalbum_photos where status>0 order by date desc limit 5000";
$query[4] = "SELECT storyid,created from _beeu_stories order by created desc limit 5000";
global $xoopsDB;
$ret = $xoopsDB->query($query[0]);
while ($row = $xoopsDB->fetchArray($ret)){
$url[] = array("loc" => XOOPS_URL.'/modules/content/?id='.$row['storyid'],
"lastmod" => date('Y-m-d',time()),
"changefreq" => "monthly",
"priority" => "1");
}
$ret = $xoopsDB->query($query[1]);
while ($row = $xoopsDB->fetchArray($ret)){
$url[] = array("loc" => XOOPS_URL.'/modules/singleclip/?id='.$row['storyid'],
"lastmod" => date('Y-m-d',time()),
"changefreq" => "monthly",
"priority" => "1");
}
$ret = $xoopsDB->query($query[2]);
while ($row = $xoopsDB->fetchArray($ret)){
$url[] = array("loc" => XOOPS_URL.'/modules/vjmixes/?id='.$row['storyid'],
"lastmod" => date('Y-m-d',time()),
"changefreq" => "monthly",
"priority" => "1");
}
$ret = $xoopsDB->query($query[3]);
while ($row = $xoopsDB->fetchArray($ret)){
$url[] = array("loc" => XOOPS_URL.'/modules/myalbum/photo.php?lid='.$row['lid']."&".'cid='.$row['cid'],
"lastmod" => date('Y-m-d',$row['date']),
"changefreq" => "monthly",
"priority" => "1");
}
$ret = $xoopsDB->query($query[4]);
while ($row = $xoopsDB->fetchArray($ret)){
$url[] = array("loc" => XOOPS_URL.'/modules/news/article.php?storyid='.$row['storyid'],
"lastmod" => date('Y-m-d',$row['date']),
"changefreq" => "monthly",
"priority" => "1");
}
?>
echo "1.0" encoding="UTF-8"?>n"; ?>
for($f=0;$f<count($url);$f++){
?>
echo $url[$f]['loc'];?>
echo $url[$f]['lastmod'];?>
echo $url[$f]['changefreq'];?>
echo $url[$f]['priority'];?>
} ?>