4
Yes, search engines will spider your site regardless of whether you have a robots file or not. The robots file is used more for the sake of preventing robots from indexing specific directories. Of course, you can specify how you want robots to index your site using XOOPS admin, but the default value is usually sufficient.
Quote:
User-agent: * <<-- Any robot
Disallow: /cgi-bin/ <<-- Directory to stay out of
Disallow: /tmp/ <<-- Directory to stay out of
Disallow: /cache/ <<-- Directory to stay out of
Disallow: /class/ <<-- Directory to stay out of
Disallow: /images/ <<-- Directory to stay out of
Disallow: /include/ <<-- Directory to stay out of
Disallow: /install/ <<-- Directory to stay out of
Disallow: /kernel/ <<-- Directory to stay out of
Disallow: /language/ <<-- Directory to stay out of
Disallow: /templates_c/ <<-- Directory to stay out of
Disallow: /themes/ <<-- Directory to stay out of
Disallow: /uploads/ <<-- Directory to stay out of
A very good example of why this is useful is when I was running piCal 0.5 on my site and the Google bot ate up 5GB of bandwith in two weeks indexing every single page of piCAL. Asside from chewed up bandwidth, there is also the issue of security. You wouldn't want John Q. Public to be able to pull up a cached copy of the admin menu, would you?
Hope this sheds some light on things.