Forum Home
Press F1
 
Thread ID: 53705 2005-01-25 02:28:00 Scanning JJJJJ (528) Press F1
Post ID Timestamp Content User
317772 2005-01-25 03:21:00 Sorry Jack - I'm confused. Waht you've got there is obviously a page that has already been poorly OCRed. Are you proposing to OCR it again?

As for digicam versus a scanner - a scanner should do a better job than a digicam by simple dint of the fact that the page is flat and the image surface is "flat" in relation to it. I've copied pages with a camera when I haven't been able to take them to a scanner, and it can be damn hard to get a good clean shot - the bigest problem is you get variation in exposure across the page. That is, some of the page shows a white background, but at the edges of the image it often goes dark grey and that really messes up OCR, since the "whiteness" of the page varies significantly across the image.
Biggles (121)
317773 2005-01-25 03:37:00 That's right. It was scanned or copied or whatever at National Archives.
I want to get it into my computer so I can include it or part of it in a book I'm writing.
A copy of an original document looks more realistic than an obviously new typewritten copy.

Yes I should have mentioned that everything I am trying to scan is a copy or a scan of an original document. All the originals are in the Archives or the Turnbull Library and I don't think they would give them to me.

Jack
JJJJJ (528)
317774 2005-01-25 03:44:00 OK - so you'd rather have an image of the original, than an OCR copy of the text on it - right?

My mum uses her digicam for this purpose and with practice you can get a resonable result - which can be much improved by adjusting the image in an image editor afterwards (assuming that is that you want to clean the image up to look good not to OCR well). The problem is you
1] may not be allowed to sue a flash
2] may not want to anyhow since it will "blow out" the resulting image

so a tripod may be neccessary to get a good image without camera shake blurring it.

A camera that does good work at high ISO settings - ISO400 and up - would be a useful camera for this kind of work.
Biggles (121)
317775 2005-01-25 03:49:00 uh....why not just scan it as an image? Metla (12)
317776 2005-01-25 04:50:00 Hi JJJJ. Agree with Metla. If it is a copy of the original you want for authenticity, etc., give the OCR a miss and just scan the thing. If you want to play with OCR, there are several freebees around, all of which create different results, each requiring different amount of correction and editing. ;) Scouse (83)
317777 2005-01-25 05:17:00 ~ ~:;
~ (\i ~"~
.
~
~ '\ ~ .,~
...,;~
1
r;
5>
~ -,

Looks pretty good OCR results tio me. Can't you read that? ;)

To get scanned pictures of (photocopied?) originals, it will pay to experiment with the scanner settings. You might find that black&white gives better contrast than grey-scale or colour.
Graham L (2)
317778 2005-01-25 06:49:00 if you want to use a camera, here is an article.
Link. (www.nikon.co.jp)
Nomad (952)
317779 2005-01-25 07:21:00 I agree with those that say, why not scan in an image of the document . From there you have an image that you can inhance to your hearts content in an image editor .

I do this sort of thing all the time with some really messed up documents, including photo copies of photo copies of photo copies, poorly done diazo plans or photo copies of same or photo copies of folded, used and abused diazo plans (the worst) .

To get the best results, scan at around 2-300dpi in grey scale, save a copy in an image editors native format . Then use the image editor to adjust contrast/brightness/flashfill, rub out spots, creases and copier speckling . Save back to a lighter weight format like JPEG (reduce size/resolution to suite) for inserting into your document or printing off or try the OCR software on it now if you want to insert it as text (a quote) rather than as a doc within a doc .
Murray P (44)
317780 2005-01-25 09:11:00 I too use an image (not OCR) of documents embedded into Word for reports. Just scan as a picture, not as a document? godfather (25)
1 2