Forum Home
Press F1
 
Thread ID: 113606 2010-10-28 00:59:00 Help extracting a tarball ubergeek85 (131) Press F1
Post ID Timestamp Content User
1148264 2010-10-28 05:51:00 let me guess. you're gonna test in on the n810 :) jareemon (5207)
1148265 2010-10-28 08:30:00 Nah, I'm thinking cellphones...

Anyway, Filzip is getting the job done (still).

Thanks everyone.
ubergeek85 (131)
1148266 2010-10-28 08:41:00 Wikipedia has a mobile site, would it be worth having on a phone? jareemon (5207)
1148267 2010-10-28 08:45:00 Maybe, but I don't think they have a dump of that. They ask people not to crawl the site to dump it, rather, use the dumps they provide.

I could always get the SQL dump, install mediawiki, then crawl the mobile version on my local machine... hmmm... ideas...
ubergeek85 (131)
1148268 2010-10-28 09:04:00 You realise there are much smaller dumps than 218GB, containing just the article text (no media, no history etc)? For a mobile application, this makes far more sense. A mobile-optimised version of this should end up at about 3GB for the English version. Erayd (23)
1148269 2010-10-28 09:24:00 You realise there are much smaller dumps than 218GB, containing just the article text (no media, no history etc)? For a mobile application, this makes far more sense. A mobile-optimised version of this should end up at about 3GB for the English version.

Oh yes.

Only reason I downloaded this one was because it was in HTML as opposed to XML, or an SQL dump.

Means I don't have to crawl/render it.

And the download itself was only 14Gb.
ubergeek85 (131)
1148270 2010-10-28 09:26:00 Only reason I downloaded this one was because it was in HTML as opposed to XML, or an SQL dump.
Trust me, you will find an SQL dump *much* easier to work with for this kind of thing, especially when the time comes to implement search indexes.


And the download itself was only 14Gb.The download size isn't the issue, the issue is how you fit a usable database onto a mobile device. 3GB is a hell of a lot more practical than 218GB.
Erayd (23)
1148271 2010-10-28 09:31:00 That's the thing; HTML is quite easy to read on a mobile device, just use the integrated web browser. No code needed. If the mobile browser supports compressed HTML, then it gets even smaller still.

Either way, I've downloaded it, might as well do something with it; I've already got XML dumps too, but they just aren't useful to me.
ubergeek85 (131)
1148272 2010-10-28 09:37:00 Yup - so then you write something that generates HTML from the database and feeds it to the browser, or an embedded browser view.

Storing the whole thing as HTML in individual files, even compressed, is really a non-starter; it's just too big.
Erayd (23)
1148273 2010-10-28 09:49:00 I guess I'm not quite expressing myself correctly; what I'm trying to do is make a simple HTML dump of wikipedia, with no executables required. Sure, it might be a few GB, but TBH, IDK. It's a project, not saying I'm going to succeed, not saying it's useful, or even the best way t go about it. Just a bit of fun.

Also, Filzip has finished, but it's only extracted as far as 1/3/6... not even into the a's.

(the pages are sorted by the first three letters, and put into the appropriate subdirectory, example; the page '1360 in History' has been extracted, but '137 (number)' hasn't)
ubergeek85 (131)
1 2 3