1
Hi,
there was much talk about seo and i read much often, that its good to have a page with all links on one site. Okay we have the sitemap-module, but its monly listing the categories, not the links to every article and item.
Si i searched for a script that spiders the whole site and lists all links on one site.
I found this script - not mine
and it works very good except it only spiders the links on the indexsite. I saw other users of this script listing all sites on theire site, so its possible to do this with this script.
Problem:
What to change, so the script lists all items and links of the whole site.
Mayabe the php -pros could help me with this script.
First upload to your site and replace localhostn with your sitename
Sidemapperscript on my sitethanxxxxx
Here is the code of the - sidemapper.php:
SITEMAPPER a PHP script to create WEB site maps Copyright (C) 2001 Earl C. Terwilliger
/*************************************************
SITEMAPPER.PHP Version: 1.1
Copyright (c): 2001 Earl C. Terwilliger all rights reserved
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
as published by the Free Software Foundation; either version 2
of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
If you would like a copy of the GNU General Public License
write to the:
Free Software Foundation, Inc.
59 Temple Place - Suite 330
Boston, MA 02111-1307
USA
To get the latest version of this program, point your
browser to:
http://www.agent-source.com/sitemapper
or contact the author at:
earlt@agent-source.com
README for SiteMapper 1.1 2/22/2001 Copyright (C) 2001 Earl C. Terwilliger earlt@agent-source.com
SiteMapper.php was created to build a "site map" of a web site. It takes a given URL
and spiders/crawls the local links found from there to build a single HTML page listing
all links found. The resultant page is useful in the following ways:
1) to easily navigate to any page on the site (large sites)
2) to place as a hidden (or visible) link on the main page so search engines find
all pages on the site (Note: some SEs only search 1 link deep)
3) using the additional parms you will have all the HTML code for the whole web site
in an easy to search single file
4) links are checked to see if they are there (if not, a page not found is given)
5) other things I haven't thought of yet
Requirements: The latest version of PHP (why settle for less?)
Tested under: Linux (RedHat 7.0 of course!) but should run anywhere you have
PHP installed.
To run sitemapper.php from a shell prompt (edit the $url="http://localhost" in sitemapper.php
and change it to the URL you want mapped):
php sitemapper.php >sitemap.html
or put sitemapper.php in a directory accessible from you web server and run it:
http://localhost/sitemapper.php?url=http://www.yourplace.com
sitemapper.php takes additional parameters (when run from a web/browser):
&headers=0 (to not list the headers returned from the site being mapped)
&list=1 (to list the HTML source of all found linked to pages)
(edit these values to what you need them to be when you run SiteMapper from a shell prompt).
Once the new page (output from SiteMapper) is created, it can be saved. (If you are invoking it
from a browser, use File -> Save As)
Known Bugs: none
You can contact the author at: earlt@agent-source.com
SiteMapper.php is released under the GNU GPL license. ENJOY!
*************************************************/
function get_headers($host, $path = "/")
{
$fp = fsockopen ("$host", 80, &$errno, &$errstr) or die("Socket Open Error: $errno Reason: $errstr");
fputs($fp,"GET $path HTTP/1.0nn");
while (!$end) {
$line = fgets($fp, 2048);
if (trim($line) == "") $end = true;
printf("%s
n",$line);
}
fclose($fp);
printf("
n");
}
function get_url_page($url)
{
static $contents = "";
static $prev_url = "";
if ($url == $prev_url) return $contents;
$contents = @file($url);
if (empty($contents)) $contents = @file($url);
$prev_url = $url;
return $contents;
}
function list_url_page($fcontents) {
while (list ($line_num, $line) = each ($fcontents)) {
printf("Line %s: %s
n",$line_num,htmlspecialchars($line));
}
}
function url_page_links($fcontents) {
$contents = implode("", $fcontents);
preg_match_all("|href="?([^"' >]+)|i", $contents, $arrayoflinks);
return $arrayoflinks;
}
function make_url($url,$link) {
global $host;
$purl = parse_url($url);
$path = $purl[path];
$u = "http://".$host;
if (!isset($path)) {
if (substr($link,0,1) == "/") $u .= $link;
else $u .= "/".$link;
return $u;
}
$p = explode("/",$path);
array_pop($p);
array_shift($p);
$c = count($p);
$l = explode("/",$link);
$d = count($l);
for($e=0;$e<$d;++$e) {
if ($l[$e] == "..") --$c;
}
if ($c > 0) for($e=0;$e<$c;++$e) $u .= "/".$p[$e];
if ($d == 0) {
if (substr($link,0,1) == "/") $u .= $link;
else $u .= "/".$link;
return $u;
}
for($e=0;$e<$d;++$e) {
if (substr($l[$e],0,1) == ".") continue;
$u .= "/".$l[$e];
}
return $u;
}
function parse_links($url,$links) {
global $host;
global $mp;
global $pages;
global $mpages;
while(list(,$link) = each($links[1])) {
if (substr($link,0,1) == "\") continue;
if (substr($link,0,1) == "#") continue;
if (substr($link,0,4) == "http") {
printf(" %s">%s
n",$link,$link);
continue;
}
if (substr($link,0,6) == "mailto") continue;
if (substr($link,0,7) == "sitemap") continue;
$nurl = make_url($url,$link);
if ((substr($nurl,-3) == "htm") ||
(substr($nurl,-4) == "html") ||
(substr($nurl,-3) == "php")) {
if (!check_link($nurl)) {
$p = get_url_page($nurl);
if (empty($p)) {
$mpages[] = $nurl;
$mp += 1;
printf(" MPage%d">%s
n",$mp,$nurl);
printf(" |__Unable to retrieve this page ...
n");
continue;
}
else $pages[] = $nurl;
}
}
printf(" %s">%s
n",$nurl,$nurl);
}
}
function index_page($url) {
global $host;
global $pc;
global $list;
printf("
Page%d">%s
n",$pc+1,$url);
printf("%s">%s
n",$url,$url);
$p = get_url_page($url);
if (empty($p)) {
printf("|__Unable to retrieve this page ...
n");
return 0;
}
if ($list) list_url_page($p);
$links = url_page_links($p);
parse_links($url,$links);
}
function check_link($entry) {
global $pages;
reset($pages);
while(list(,$link) = each($pages)) {
if($entry == $link) return 1;
}
return 0;
}
if (!isset($list)) $list = 0;
if (!isset($url)) $url = "http://localhost";
if (!isset($headers)) $headers = 1;
if (substr($url,0,7) != "http://") {
printf("
Invalid URL. It must begin with http://
n");
exit();
}
$purl = parse_url($url);
$host = $purl[host];
printf("Site Map: %s">%s IP: %s
n",$url,$url,gethostbyname($host));
$pages[] = $url;
$mpages[] = "";
$mp = 0;
array_pop($mpages);
if ($headers) get_headers($host);
$pc = 0;
while(1) {
$url = $pages[$pc];
if(empty($url)) break;
index_page($url);
flush();
$pc += 1;
}
$pc = count($pages);
printf("
.... %03d Pages Indexed ....
n",$pc);
reset($pages);
$pc = 0;
while($pages[$pc]) {
$link = $pages[$pc];
printf("#Page%d">%03d %s
n",$pc+1,$pc+1,$link,$link);
++$pc;
}
$mp = count($mpages);
printf("
.... %03d Pages Missing [Broken Links] ....
n",$mp);
reset($mpages);
$mp = 0;
while($mpages[$mp]) {
$link = $mpages[$mp];
printf("#MPage%d">%03d %s
n",$mp+1,$mp+1,$link,$link);
++$mp;
}
?>
Created by PHP SiteMapper v1.1 (C) 2001 Earl C. Terwilliger