A friend of mine is looking to archive a (vile) website that is about to be taken down. She needs it for advocacy for the people the site is against. What program can download as much of the site as possible?
@hhardy01
What parameters do I need for it? I don't need just one page, but the entire directory tree or whatever it's called...
@eladhen Archive.org will archive it if you submit it to them, but it's ... if it's a shitty website, maybe you don't want that.
http://www.dheinemann.com/2011/archiving-with-wget/
That's how to do it with wget. That'll get pretty much all of it.
@ajroach42
It is vile beyond words. I rather not.
@ajroach42
Thanks for the link. I'll try that.
@ajroach42
The parameters there for wget seem to do the trick!
@eladhen Excellent.
@eladhen Use wget. Here's a page with Windows binaries: https://eternallybored.org/misc/wget/
And here's an old Linux Jounal article on how to download entire websites with wget.
https://www.linuxjournal.com/content/downloading-entire-web-site-wget
@starbreaker
I'm on Linux. No windows machine for windows binaries.
@eladhen Then, depending on your distro, you might already have wget installed. If not, you probably already know how to get it. :)
@starbreaker
I've got wget and have used it in the past for simpler things.
@eladhen No reason it shouldn't work for this, right?
@eladhen my advice would be to point the Wayback Machine at it
@tindall
What's that?
@eladhen https://archive.org/web/
Basically, Archive.org keeps a permanent record of many sites on the Internet, and you can send them a URL and they'll add the site to their list.
However, if what you need is a total scrape of the site (all pages), you will be better off using http://www.httrack.com/ or a cURL script.
I suggest Archive.org mainly because 1) it's a well-known and highly trusted organization, reducing possible doubt about the accuracy of the backup and 2) it's easy.
@tindall
The fact that this site is taken down is a blessing. I don't want it to be publicly available.
@eladhen @starbreaker The Archive Team might be able to help. They have scripts and stuff. As well as wget, there's wpull which is designed for downloading whole sites. grab-site is wpull with a nicer interface. There's lots of stuff on archiving websites if you start there
@ebel
Yeah, the problem is the takedown is tomorrow, I'm trying to find something that's ready to go. Not much time to lose on research...
@starbreaker
@eladhen @ebel @starbreaker the archive team have an irc channel. Maybe pop in and ask?
@eladhen httrrack can download whole websites.
@eladhen
wget