OCR(1) (cetus,hydra,coma) OCR(1) NAME ocr - optical character recognition SYNOPSIS ocr [ option ... ] [ file ] DESCRIPTION Ocr reads a black-and-white image of a page from file, and writes ASCII to the standard output. If no file is speci- fied, it reads from the standard input. The input is a picfile(5) image of one column of machine- printed text, normally scanned in by cscan(1). Fonts, sizes, and line-spacings may vary within the column, but each line should have a constant text size and baseline. Lines should be parallel and roughly horizontal. In the output, white space approximates the original page layout. Words that spell(1) are preferred, and hyphenations across lines are recombined. The options are: -as The alphabet is the union of symbol sets selected by characters in string s, from among: A ABCDEFGHIJKLMNOPQRSTUVWXYZ a abcdefghijklmnopqrstuvwxyz 0 0123456789 . .,-:;*'"?!/&$()[]#@% (basic punctuation) ^ ^~`\|{}_ (extended punct'n) + +-*/<>=.Ee[] (numerical punct'n) s \(sc\(dg\(dd\(ct\(bu\(co ... (selected non-ASCII) l fi fl ff ffi ffl ae oe ... (ligatures, digraphs) g \(*a\(*b\(*g\(*d\(*e\(*z ... (Greek lower case) G AB\(*G\(*DEZ ... (Greek upper case) The default is -aAa0.+^, the full printable-ASCII set, which may be abbreviated as -ap. Thus, -apslgG selects all of the above. -c Find columns in complex nested layouts using greedy white covers algorithm. -ml[,r] Trim the left and right margins of the image by l and r inches, respectively, before looking for columns. If r is omitted, it is assumed to equal l. -nn Find the n largest columns by analysis of a single OCR(1) (cetus,hydra,coma) OCR(1) vertical projection. Each column should be compactly-printed and separated from the others by at least 2 ems of horizontal white space. -pn,m Point sizes lie in the range [ n, m ]; other sizes are discarded. The default is -p6,24. -s Defeat spelling check (but continue to favor numeric strings and good punctuation). -t Write troff(1) format. Each column is shown on a separate page, lines at their original height, words at their original horizontal location, and characters roughly original size in Times roman. Hyphenated words are not recombined. -u Unspellable words are prefixed with `?' or, if -t is specified, printed boldface. -ww Find the largest column of width w inches, within a single vertical projection. Fonts Trained on over 100 Latin-alphabet book fonts in various italic, bold, etc styles. Only one font of Greek, without diacriticals. Also Swedish and Tibetan, on request. SEE ALSO bcp(1), cscan(1), font(6), picfile(5), spell(1), troff(1) BUGS For best results, use images of high-contrast, cleanly- printed original documents digitized at a resolution of 400 pixels/inch or higher. It may help to restrict the alphabet and sizes to what's there.