Wednesday, July 30, 2008

Saving Blogs from "Be Good" Google?


Recently, South Africa Sucks was taken down. Audacious Epigone asked how to archive 'em.
I *think* I found the answer. Just go to Google and type in "Website Downloader" or go to download.com and do the same.

The most popular one, and the one I tested and used is http://www.httrack.com/ Its absurdly simple to use assuming you arent interested in any advanced options (which you'll need for blogs?).

1) Download the proggie.
2) Install said proggie.
3) Read the Manual or just type
httrack "http://www.all.net/" -O "/tmp/www.all.net" "+*.all.net/*" -v

http://www.all.net being the website.

-O meaning Output to, and /tmp/www.all.net being the place on your Hard Disk where you want it saved.

+*.all.net/* tells the program to not download from anywhere but .all.net/

I tried this on some websites I read often and it saved them without much trouble. But with blogs the problem is :-
1) The comments section is https://
and...
2) The comments section is in Blogger.com, NOT under differing.blogspot.com

However this shouldnt be a problem because all blogs have their unique identifier. Anyone out there willing to read through the manual and see if we can save blogs in their totality (AKA including comments pages) with this program?

P.S You can get the blog by typing http://blogname.blogspot.com/search?max-results=1000 in the URL Bar.
And Comments by typing http://blogname.blogspot.com/feeds/posts/default?max-results=
1000 there. (But its not a proper solution because the comments are timelined, so not necessarily in order of postings).

1 comment:

Audacious Epigone said...

Fabulous. You're the man! I'm in the process of pulling all the post content now. Graphics don't work since I need the unique embed, but that's a minor issue. Thanks.