1
jamgill
Static site copy?
  • 2005/3/1 14:42

  • jamgill

  • Just popping in

  • Posts: 5

  • Since: 2005/3/1 1


Hi, first post.

I employed XOOPS as a point-solution for a quick site I needed to do and it worked well, but the project is over and I want to archive the site as-is so it can be served as static pages and/or distributed on CD (and browsed on-cd).

Is there a preferred way to make such an archive?

As I look into using XOOPS for other projects, this seems like important functionality for populating mirror sites and archival purposes. I have looked through the guides and I have not seen mention of this functionality.

My next step is to see which mirroring tools might produce the desired result. Does anyone have any tips for doing this?

TIA,
--jamgill

keywords: static site offline mirror copy archive

2
jamgill
Re: Static site copy?
  • 2005/3/2 5:33

  • jamgill

  • Just popping in

  • Posts: 5

  • Since: 2005/3/1 1


Now I've tried with wget, I used:

Quote:
jamgill$: wget --html-extension --convert-links -o logfile --cookies=off --header "Cookie: PHPSESSID=eb0478f2635490a280d1932abb3ea109" --page-requisites -r http://example.com


...and it pulls the "main" page (that one would see after logging in), but any of the links to stories or photos (stuff in modules) shows the error: "Sorry, you don't have the permission to access this area."

Any thoughts? Pointers? Does nobody ever mirror their XOOPS site?

3
jamgill
Re: Static site copy?
  • 2005/3/3 7:48

  • jamgill

  • Just popping in

  • Posts: 5

  • Since: 2005/3/1 1


I have combed through the documentation and help forums of several other similar CMS systems today and have found this question asked in several different ways in several different places. At least I'm not completly off base, right?

Ideas? Anyone?

4
Mithrandir
Re: Static site copy?

I am not familiar with this process and certainly not with wget, but... this caught my attention:
--cookies=off --header "Cookie: PHPSESSID=eb0478f2635490a280d1932abb3ea109"

Aren't you turning off sending cookies before setting it? Could that be the reason why you don't get permission to view the pages?

5
DonXoop
Re: Static site copy?

I think that you should be able to snapshot a site to a static copy similar to this. Aren't the denied errors coming from hitting links that anonymous isn't allowed? Many sites have this problem, having links that when followed are then denied unless logged in.

If you want a complete copy you might need to config the rights for anon to follow the links that you want to archive. You can turn them off when done.

I don't think you want to have the session id in the urls either. But I guess you you are using that to simulate a logged in user?

6
jamgill
Re: Static site copy?
  • 2005/3/3 15:53

  • jamgill

  • Just popping in

  • Posts: 5

  • Since: 2005/3/1 1


Other than a login block and a static block on the front page, nothing on my site is accessable without logging in. Using the above method, I manage to get a snapshot of the "main" page (what one sees first after logging in), but that's all that works. Wget follows links to, say, stories ... but the site returns the error message described above.

Quote:
Aren't you turning off sending cookies before setting it? Could that be the reason why you don't get permission to view the pages?


From my reading of the wget manpage, this is the correct way to handle this situation. Here's the section of the manual I drew this conclusion from:

http://www.gnu.org/software/wget/manual/wget.html.gz#HTTP%20Options


Because I don't find the cookie from visiting my XOOPS site in my ~/.mozilla/firefox/blah/cookies.txt file, I "view cookes" in my browser preferences to find (what I believe is the) correct thing to send and include that in the header string. The idea would be to make XOOPS think that wget is my already-authenticated Firefox session. I flip off cookies with the other switch so wget and the site don't try to exchange cookes otherwise. This seems to be explicitly covered in the wget manpage, but I could be misunderstanding the situation.

Anyway, thanks ... I'll keep working on it.

7
jamgill
Re: Static site copy?
  • 2005/3/3 16:07

  • jamgill

  • Just popping in

  • Posts: 5

  • Since: 2005/3/1 1


Quote:

DonXoop wrote:
I think that you should be able to snapshot a site to a static copy similar to this.

Me too, making this all the more frustrating

Quote:

Aren't the denied errors coming from hitting links that anonymous isn't allowed? ...


Good question. No, the only thing anonymous is allowed to see on this site is the login block and a static block (that says you have to register and log in to see anything else). The wget process pulls the main "inside" page beautifully, but when it tries to hit links off of that it gets the errors.

Quote:

If you want a complete copy you might need to config the rights for anon to follow the links that you want to archive. You can turn them off when done.


Hrmm ... I had thought of that, but in the future and I am definitely going to want to do this with other projects, so I should find a solution now while the pressure is low. I've always done this with hand-rolled HTML sites, just copy the directory structure to a CD! :)

Quote:

I don't think you want to have the session id in the urls either. But I guess you you are using that to simulate a logged in user?


I am trying to simulate a logged-in user, I don't particularly care if the urls are pretty or not ... though I guess it would be nicer if they were. That's probably more of an XOOPS site admin setting than a http-fetching mirrorer problem, eh?

Thanks for the reply...

Login

Who's Online

337 user(s) are online (247 user(s) are browsing Support Forums)


Members: 0


Guests: 337


more...

Donat-O-Meter

Stats
Goal: $100.00
Due Date: Nov 30
Gross Amount: $0.00
Net Balance: $0.00
Left to go: $100.00
Make donations with PayPal!

Latest GitHub Commits