Forum Home
Press F1
 
Thread ID: 88586 2008-04-01 17:49:00 Best software to scrap the online content from website prayami (13094) Press F1
Post ID Timestamp Content User
655023 2008-04-01 17:49:00 Hi,

I want to scrap or scan or grab( not sure the proper word ) one website. And put the contents in the excel file and save on my system.
I want to set different criteria for scrapping, like, scrap HTML table with this heading etc, if possible.

Is there any free software available for that? Please also let me know which
software is best to buy in case the free one doesn't have good features.

Thanks in advance,
prayami (13094)
655024 2008-04-01 18:00:00 httrack - my first choice
or
Blackwidow or Teleport pro.

None will grab "everything" depending on the site if it has certain scripts or database type things.
Bantu (52)
655025 2008-04-01 18:13:00 Thanks for reply,

I read the features of httrack and Blackwidow but I couldn't find the feature to save in excel file. All they do is copy whole website and put it offline to our system.

But I want to save some data in some format like excel etc.

Thanks,
prayami (13094)
655026 2008-04-01 20:09:00 I doubt whether you'll find any that export to Excel, possibly a few that do Word. Why do you need it in Excel?

Most likley you'll need to save the page offline as with HTTrack etc and then manually copy the bits you want into Excel.
autechre (266)
655027 2008-04-01 20:41:00 I know there are some softwares available. But before I buy, I want some expert's advice.
I searched the net and I found:
Web Scraper Plus+ 5
www.download.com

Thanks,
prayami (13094)
655028 2008-04-01 20:46:00 What do you exactly want to insert in excel from a web page? - script codes, comment codes, links, tags, photo's, etc or what?... As mentioned by autechre above, excel would be uncommon for storing web components (if that's what you are considering) - even by web masters...

Importantly, what do you plan to do with this content? I certainly would not "grab" topical/photo content, without contacting the web master/owner of the site - if you intend to post it on the internet, or publish publicly by other means. I contacted the site owner to use his NZ photo's for one of my sites, as long as I left a link to his site...

You could consider a compiler, where you can convert html, etc to pdf - more commonly known as a e-book...often a important tutorial reference created by web owners...
kahawai chaser (3545)
655029 2008-04-01 21:46:00 I am not going to get anything without owners' consent.
I am not going to get photos.
Mostly I may require to get HTML Tables' data in excel.
e.g.
Address Detail List
Movie Songs Detail List
Products Detail List and specification

May require to get anything in the table form on the website.

Thanks,
prayami (13094)
655030 2008-04-01 23:26:00 I see prayami - tables. I guess many site visitors would manually copy them; right click, etc. Though there are probably scripts on the web somewhere, though I believe there are no popular software to do so. You could visit sourceforge (sourceforge.net/), the open source site and maybe try a search on digital point (http:) forums (where hundreds of programmers/web designers hang out).

The term scraping generally refers to programs/people that auto collect rss feeds from sites that provide rss subscriptions. They then auto drop those feeds (e.g. latest news/content/updates, etc) into their own sites...
kahawai chaser (3545)
655031 2008-04-01 23:45:00 You can do it from within Excel.

Data > Import External Data > New Web Query

Browse to the page you want, then in the query browser window, you can tick the tables or portions of the page that you want saved in excel and hit OK (or done or whatever the button is). There are other options that you can set within that window too.

Once you've got it in excel you can run formula etc., and can set it to update on a regular basis if desired (to keep the data up-to-date).

Mike.
Mike (15)
1