Forum Home
Press F1
 
Thread ID: 53705 2005-01-25 02:28:00 Scanning JJJJJ (528) Press F1
Post ID Timestamp Content User
317762 2005-01-25 02:28:00 How do commercial printers do such a good job with scans? Do they just have a better scanner than I do?
I have pages and pages of typewritten script and pages from old books. Probably printed in Post script. How do I get them into my computer in readable form?
They scan OK but what they look like is nothing like what the originals do. Words missed out and mis-spelled. Impossible to read. I could edit out the errors but that would take twice as long as typeing the whole page.

What's got me beaten is why can I scan a page from a moden book, but not from an old one?
:badpc: :badpc: :badpc: :badpc: That's fixed the scanner,
Jack
JJJJJ (528)
317763 2005-01-25 02:31:00 Jack - are you talkign about optical character recognition? It's not clear from your opening comment about commercial printers. Biggles (121)
317764 2005-01-25 02:38:00 Hi Bruce. Welcome back.
Just a plain everyday A4 scanner. An HP 3670.
If OCR is the problem why does it do such a good job with a modern book?
Jack
JJJJJ (528)
317765 2005-01-25 02:51:00 OCR is dependant on being able to tell the letter forms from the background paper, blemishes etc etc. A modern book, with clean white pages and crisp typeface, is lilely to OCR better than an old one where the pages have yellowed. Also, if the paper is light, so that text on the other side of the page can show through when the scanner light passes across the page, that can mess up OCR too. And if the paper is textured as opposed to smooth (as some poor-quality old paperback paper is) then that also can mess up the result, as the small indentations in the paper create shadows which get misinterpreted by the OCR software.

In other words, anything that reduces the distinction between the letters and the page makes OCR hadrer. We used to do a fiar bit of OCR here back in the dark ages, and sometimes I found it easier to photocopy the page first, increasing contrast and "whitening" up the paper and then OCR the photocopy not the original.

Some letter forms are easier to recognise than others too, and naturally the OCR package you use also has an effect - not all are as good as others.

And yes - commercial printers have much better scanners. Scanning for commercial printing is done on large drum scanners, rahter than small flatbed scanners. You get what you pay for, but then who wants to spent $50,000 for a scanner at home!
Biggles (121)
317766 2005-01-25 03:03:00 Quite a few years ago the new technology craze struck Parliament. Why didn't they get an efficient way of producing Hansard?

So, they got the Hansard reporters typewriters which would produce OCR type when they transcribed the shorthand records. And of course, this nice clean copy could be scanned into the typesetting system.

But. :(

The politicians get to "correct" the Hansard reports of what they said. So they were given the nice clean typed sheets to proofread. They scribbled over them as they tried to convert the actual speeches into English, added dark rings from the tea cups (and whisky glasses) ...

The system was abandoned.
Graham L (2)
317767 2005-01-25 03:08:00 Just a sample of what I'm trying to copy .
. J
JJ' .
'" ~ t .
.
t" . . . g
~)~
. .
t
~ .
~ ~:;
~ (\i ~"~
.
~
~ '\ ~ . ,~
. . . ,;~
1
r;
5>
~ -,
~~
t --'l~~
, ~'~
~,'
Ct- . ,
f\
, ' I
10 Beach Road, ! '- /
Wanganui, " //
19th December'1939 .
Mr . G . H . Scholefield, Box 1369 J WET . . . LINGTON .
,Dear Sir,
. Replying to your letter of the 7th inst . , and trust the
foregoingparticul4TS will be of assistance .
. ~
ijeI1'ryJi\'a~arrj! Born in London, August 16th . 1816 .
Arrived in Wanganui, Aug . 19th . 1841, died in Wanganui, Nov . 3rd . 1898 . His wife who arrived with him, was born
in London, 28th . June 1816 . Her maiden name was J~e ~e~m,e;aI'e]: . She died in'Wanganui . His eldest SOD,
Thomas Wellington Nathan, Was born in WellingtQD,
8th April ~841, this is the correct birth date,
( having received advice from my Sister, Mrs . J . H . Gratebatch)
he died in Palmerston North on the 2nd March 1909 .
Anthony Nathan died in Taihape, a few ~ears ago . He
married Sarah Anne Harris, daughter of Samuel Gregory
Harris .
Joseph Nathan died in Palmerston North a few years ago;
he marr~ed Annie Penfold .
"""', , William Nathan died a few years ago, somewhere near Napier .
. William was married, but I never heard the name of his
wife .
George Nathan, I think is still living, but I do not know; he married Mary Connery .
Jane Nathan died in Wanganui, Sept!lst . 1912; she married
, . James Rapley 2 . , Cie . . . ~~ C>it'",,'J:6~f .
Mary Nathan died in Wellington a few years ago j she married Thomas Bush .
Winifred I have not heard~of for years; she married
James Morey . .
Susan died in Wellington; she married Mr . Coker . .
Norah Margaret Carol died in Wanganui, April 1s t . 1878 j she martied William Gardner .
. I have 'heard of my . grandparents speak of another ~on . ~tV-fA
. .
.
named Charlie, who died when \!l1ite young . V,,(n"" - . tJyQ;
(''''Jane, the little girl who came out with her parentis, was
~'
L
taken by the Maoris, and was kept for some time before
(
being released . She could speak the Maori languag~ just as efficiently as a Maori . She was the aunt who took
r
my siater and myself, when our mother died in 1871 . y/
i I understand that there are many relatives living
I in New Zealand at the present time, but I have not been
in contact with any for years .
. f'
I am,
JJJJJ (528)
317768 2005-01-25 03:13:00 Would this work?
If I buy a digital camera and photograph each page. I can get that into my computer. But will I be able to edit it in Word? Will it convert to *.doc?

And would a digital camera take a clear enough photo?
Jack
JJJJJ (528)
317769 2005-01-25 03:17:00 Jack, to reinforce what BB is saying, a couple of years back I was working overseas and a company I was investigating was spending in excess of $100,000 NZ to get accurate scanning of text, in a commercial environment .

Much of that was for the software .

Unless you are prepared to spend serious money on hardware and software, its not going to be an exact science . All you can do is try to get the best hard copy quality input possible, with the largest size of font . Even the font type significantly affects quality of OCR .
godfather (25)
317770 2005-01-25 03:20:00 The key, other than mentioned above about the quality of the original, is the ability of the OCR engine. I have been using Presto for some time and it does a pretty good job. The hardest thing to copy from is newsprint - thin and the reverse shows.

Presto is by an outfit called Newsoft - google will pull it up. I think they still have a trial period. Worth a look.

Leon
leonidas5 (2306)
317771 2005-01-25 03:20:00 Hows about a hand text scanner?

I priced one of these babies up for a guy a couple weeks back and they seemed pretty swish.

Although it would require more elbow grease to be applied,and i have no idea of the quality of the finished product.
Metla (12)
1 2