Forum Home
Press F1
 
Thread ID: 116758 2011-03-18 23:25:00 Freeware mass file downloader Agent_24 (57) Press F1
Post ID Timestamp Content User
1187349 2011-03-18 23:25:00 Not sure exactly what you call these things, Lightning download has a feature called "Site Storm" and GetRight has something similar... (obviously since they are basically the same program)

Basically the need is to download an entire directory and subdirectories and files in all directories from a website.


DownThemAll is halfway there, it can download all files from one page\directory, but it won't work through subdirectories etc.


Does anyone know of a free program that can do this?
Agent_24 (57)
1187350 2011-03-18 23:53:00 I think you may be looking for Webreaper which does work under Win7 64 bit as it happens.

http://www.webreaper.net/

HTH.
Snorkbox (15764)
1187351 2011-03-18 23:53:00 wget?

from 'man wget':

Wget can follow links in HTML, XHTML, and CSS pages, to create local versions of remote web sites, fully recreating the directory structure of the original site. This is sometimes referred to as "recursive downloading." While doing that, Wget respects the Robot Exclusion Standard (`/robots.txt'). Wget can be instructed to convert the links in downloaded files to point at the local files, for offline viewing
fred_fish (15241)
1187352 2011-03-19 04:15:00 HT Track Website Copier (http://www.httrack.com/) is what you want. jwil1 (65)
1187353 2011-03-19 04:46:00 wget?

from 'man wget':Yep, wget is perfect if the OP is happy to use a CLI tool.
Erayd (23)
1187354 2011-03-19 05:15:00 I didn't pick Agent_24 as someone who would wet his pants at the sight of a command line :cool: fred_fish (15241)
1187355 2011-03-19 07:10:00 :lol: no I don't mind CLI, although some programs can be annoying to use.

Didn't know wget could do this actually, but I was looking really for something that can do this on Windows.

That WebReaper program looked promising but it had a lot of errors trying to download some of the files, which I can download just fine manually, so not sure if it's a bug or what but it doesn't seem useful in this particular instance. I think I will keep it for future use though...

Have yet to try HT Track....
Agent_24 (57)
1187356 2011-03-20 04:34:00 ...but I was looking really for something that can do this on Windows.Wget for Windows (gnuwin32.sourceforge.net). Erayd (23)
1187357 2011-03-20 05:26:00 Oooooo :D

Hopefully that will work, then!
Agent_24 (57)
1187358 2011-03-25 20:40:00 I think it can be done manually with Google: Google Docs (Spreadsheet). Use command =importxml("url" "query"), sometimes with Google Apps Scripts (code.google.com) and Google Code. (code.google.com/) You can host your own scripts, if you want, on Google App Engine (http:). I have extracted url's/sub url's from websites for competitive analysis against my sites, or to search for my content that's been scraped. Then quickly look into the content for those urls.

The trick and tedious part, is what elements to apply (e.g. div, class, etc) and how to assign it to the "query" part in the formula, which you can get elements from the source code for the web site (or search results) in question. Helps of course if you know web design html codes and scripting. Works quick, but just getting the script set up (or series of scripts) for it to work. Then you may need filter/merge using spreadsheet commands. But should be able to build own scripts with excel/docs - basic tutorial and examples at distilled UK. (www.distilled.co.uk)
kahawai chaser (3545)
1 2