| Forum Home | ||||
| Press F1 | ||||
| Thread ID: 113606 | 2010-10-28 00:59:00 | Help extracting a tarball | ubergeek85 (131) | Press F1 |
| Post ID | Timestamp | Content | User | ||
| 1148264 | 2010-10-28 05:51:00 | let me guess. you're gonna test in on the n810 :) | jareemon (5207) | ||
| 1148265 | 2010-10-28 08:30:00 | Nah, I'm thinking cellphones... Anyway, Filzip is getting the job done (still). Thanks everyone. |
ubergeek85 (131) | ||
| 1148266 | 2010-10-28 08:41:00 | Wikipedia has a mobile site, would it be worth having on a phone? | jareemon (5207) | ||
| 1148267 | 2010-10-28 08:45:00 | Maybe, but I don't think they have a dump of that. They ask people not to crawl the site to dump it, rather, use the dumps they provide. I could always get the SQL dump, install mediawiki, then crawl the mobile version on my local machine... hmmm... ideas... |
ubergeek85 (131) | ||
| 1148268 | 2010-10-28 09:04:00 | You realise there are much smaller dumps than 218GB, containing just the article text (no media, no history etc)? For a mobile application, this makes far more sense. A mobile-optimised version of this should end up at about 3GB for the English version. | Erayd (23) | ||
| 1148269 | 2010-10-28 09:24:00 | You realise there are much smaller dumps than 218GB, containing just the article text (no media, no history etc)? For a mobile application, this makes far more sense. A mobile-optimised version of this should end up at about 3GB for the English version. Oh yes. Only reason I downloaded this one was because it was in HTML as opposed to XML, or an SQL dump. Means I don't have to crawl/render it. And the download itself was only 14Gb. |
ubergeek85 (131) | ||
| 1148270 | 2010-10-28 09:26:00 | Only reason I downloaded this one was because it was in HTML as opposed to XML, or an SQL dump. Trust me, you will find an SQL dump *much* easier to work with for this kind of thing, especially when the time comes to implement search indexes. And the download itself was only 14Gb.The download size isn't the issue, the issue is how you fit a usable database onto a mobile device. 3GB is a hell of a lot more practical than 218GB. |
Erayd (23) | ||
| 1148271 | 2010-10-28 09:31:00 | That's the thing; HTML is quite easy to read on a mobile device, just use the integrated web browser. No code needed. If the mobile browser supports compressed HTML, then it gets even smaller still. Either way, I've downloaded it, might as well do something with it; I've already got XML dumps too, but they just aren't useful to me. |
ubergeek85 (131) | ||
| 1148272 | 2010-10-28 09:37:00 | Yup - so then you write something that generates HTML from the database and feeds it to the browser, or an embedded browser view. Storing the whole thing as HTML in individual files, even compressed, is really a non-starter; it's just too big. |
Erayd (23) | ||
| 1148273 | 2010-10-28 09:49:00 | I guess I'm not quite expressing myself correctly; what I'm trying to do is make a simple HTML dump of wikipedia, with no executables required. Sure, it might be a few GB, but TBH, IDK. It's a project, not saying I'm going to succeed, not saying it's useful, or even the best way t go about it. Just a bit of fun. Also, Filzip has finished, but it's only extracted as far as 1/3/6... not even into the a's. (the pages are sorted by the first three letters, and put into the appropriate subdirectory, example; the page '1360 in History' has been extracted, but '137 (number)' hasn't) |
ubergeek85 (131) | ||
| 1 2 3 | |||||