1
awarrior
module spidering?
  • 2006/12/12 17:51

  • awarrior

  • Just popping in

  • Posts: 48

  • Since: 2006/10/9


If I set up an articles module and have this only available for 'webmasters' to access, set via the admin panel and say one other group called 'elite' for instance. Would the contents that are included in this module be accessable and spidered by the search engines as normal?

I'm not looking for a way to stop this happening, in fact the reverse.

Does anyone have an answer, opinion or idea to this?
I'm smarter than the average bear boo boo......

2
irmtfan
Re: module spidering?
  • 2006/12/12 18:24

  • irmtfan

  • Module Developer

  • Posts: 3419

  • Since: 2003/12/7


as long as you have any content in "database" and "anonymous" group dont have access Spiders cant index them.

3
carnuke
Re: module spidering?
  • 2006/12/12 18:33

  • carnuke

  • Home away from home

  • Posts: 1955

  • Since: 2003/11/5


I think the simple answer is no, search engines cannot crawl content that is behind any login requirement.

Let's look into this more closely; Anonymous users group is generally understood to be the users that do not require any logins to view content on your website. That also includes any search engine crawler. So anything that you want to be publically available and crawled by search engines should be in the antonymous users group. Please NOTE most content defaults to registered users group to be viewed and has to be set to the anonymous users group to be viewed publically. This is a security feature and prevents you publishing content publicaly accidentally.

Search engines can't 'login' so they cant view anything that is published ONLY to registered groups or any other user group other than anonymous.

There is just one other caveat that is slightly worrying; Google has recently announced that it is releasing code search. which basically means it will release to searchable results of scripts and code of unprotected directories!

So be aware, that if this intrusion goes ahead with Google you should always put 'private content into a database (via XOOPS permissions) or password protect your HTML directories that contain private content.

The google thing is going to make a big storm over the internet generally, but providing you use XOOPS content permissions system, your pretty safe.

Thank-you for your important question, it will be FAQd

Hope that helps
hhttp://houseofstrauss.co.uk Resource for alternative health and holistic lifestyle
search xoops

4
Dave_L
Re: module spidering?
  • 2006/12/12 18:59

  • Dave_L

  • XOOPS is my life!

  • Posts: 2277

  • Since: 2003/11/7


carnuke, are you referring to this? http://www.google.com/codesearch/

5
tom
Re: module spidering?
  • 2006/12/12 19:16

  • tom

  • Friend of XOOPS

  • Posts: 1359

  • Since: 2002/9/21


mmm interesting Carnuke, didn't realise Google had these plans, I suppose I should really spend some time keeping up-to date.

Once question, is there any way to prevent Google from doing this? maybe like the disallow robots txt? or perhaps a code encoder, I seem to remember seeing something about being able to prevent code viewing, I ask as some modules allow the use of html/txt upload, I'm presuming even though these could be hidden member content, because they are not database, they could still be indexed by Google?

Ok I suppose more than one question there, lol.

Take care
Tim

6
irmtfan
Re: module spidering?
  • 2006/12/12 20:12

  • irmtfan

  • Module Developer

  • Posts: 3419

  • Since: 2003/12/7


Thank you carnuke but i always had google code in my mind.
personally i think any html file (or other file?) in wwwroot is not safe and you cant take it as "private" and "more secure"

so you have 2 ways:

1- as i wrote in my last reply keep all private data in database via XOOPS permission.

2- thanks to Gijoe "wraps" module you can store "ANY FILE" (pics, html , docs ,...) outside of http very easy AND PROTECT IT WITH XOOPS PERMISSIONS.
try download this image( it is a recommendation):
http://d.jadoogaran.org/modules/customer/Recommendations/IUT.jpg

7
tom
Re: module spidering?
  • 2006/12/12 21:58

  • tom

  • Friend of XOOPS

  • Posts: 1359

  • Since: 2002/9/21


Hi irmtfan,

The module from Peak/Gijoe, do you know if there is there any options in the latest version of myalbum to protect the images.

I ask as I have some people hotlinking my images, stealing bandwith, I was just about to use .httaccess to stop this, but didn't want to run the risk of preventing proxy users, access.

If this myalbum module doesn't do this, can the module you mention prodect the whole uploads directory?

I don't want to restrict access to people hotlinking my site logo, as it helps promote the site.

Thanks for any advice.

Tom

8
carnuke
Re: module spidering?
  • 2006/12/13 1:42

  • carnuke

  • Home away from home

  • Posts: 1955

  • Since: 2003/11/5


=> Tom, almost.. in fact it's http://www.google.com/codesearch without the /backslash LOL otherwise it gives a 404

and yes, there are ways to prevent indexing of files and google will respect this (so I believe). The pain in BTM is simply that the onus is now on webmasters to lock down their files and permissions, whereas before we didn't need to do so except for security sensitive files.

Here's a little background from internetnews.com I'm sure there's a lot more out there to fuel the fire.

There's an argument that there will be advantages to this new feature, as developers will be able to search their own code to see whose using it illicitly. Mmm, a two edged sword I think, as source code will be easier to steal in the first place, thanks to the Google concept.
hhttp://houseofstrauss.co.uk Resource for alternative health and holistic lifestyle
search xoops

9
irmtfan
Re: module spidering?
  • 2006/12/13 6:16

  • irmtfan

  • Module Developer

  • Posts: 3419

  • Since: 2003/12/7


Quote:
do you know if there is there any options in the latest version of myalbum to protect the images

No and i dont know any other gallery module can do that. because always people can use "Direct Linking" when your images are inside webroot.

Quote:
can the module you mention prodect the whole uploads directory?

Yes! in theory you can store all your images outside of http using "wraps" and link them one by one in a gallery module.

I'm nearly sure its possible give it a try! but its a tough work also users cant upload images.

10
chippyash
Re: module spidering?
  • 2006/12/13 7:57

  • chippyash

  • Friend of XOOPS

  • Posts: 501

  • Since: 2004/1/29


Tim said
Quote:
I ask as I have some people hotlinking my images, stealing bandwith, I was just about to use .httaccess to stop this, but didn't want to run the risk of preventing proxy users, access.

If this myalbum module doesn't do this, can the module you mention prodect the whole uploads directory?

I don't want to restrict access to people hotlinking my site logo, as it helps promote the site.


If you have your logo as say a .png file and all other files as .gif, .jpg etc, then you can hotlink protect all files except your logo.

The rewrite rule for .htaccess is:
RewriteEngine on
RewriteRule .*\.(jpg|jpeg|gif|png|bmp)$ - [F,NC]

put list of extensions in that you want to protect, leave out the ones you don't!

You can enhance the protection by
a/ allowing people to access image files directly by typing the URL in browser by adding

RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteRule .*\.(jpg|jpeg|gif|png|bmp)$ - [F,NC]

b/ adding specific sites that may access your images

RewriteEngine on
RewriteCond %{HTTP_REFERER} !^http://mysite.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://mysite.com$ [NC]
RewriteRule .*\.(jpg|jpeg|gif|png|bmp)$ - [F,NC]

You may want to add teh major search engines to the allowable list if you want your images indexing by them.

This all supposes of course that that your server supports the rewrite rules. (and if it doesn't, please don't ask me how - I don't know - speak to your system admin.)

Login

Who's Online

444 user(s) are online (377 user(s) are browsing Support Forums)


Members: 0


Guests: 444


more...

Donat-O-Meter

Stats
Goal: $100.00
Due Date: Nov 30
Gross Amount: $0.00
Net Balance: $0.00
Left to go: $100.00
Make donations with PayPal!

Latest GitHub Commits