use UTF-8 encoding
This commit is contained in:
parent
7fe8e1ffe0
commit
1c3623d719
54
README
54
README
@ -25,7 +25,7 @@ cryptographic software is subject to U.S. export control laws and
|
|||||||
regulations. The new 1997 Commerce Department Export Administration
|
regulations. The new 1997 Commerce Department Export Administration
|
||||||
Regulations (EAR) explicitly provide that "A printed book or other printed
|
Regulations (EAR) explicitly provide that "A printed book or other printed
|
||||||
material setting forth encryption source code is not itself subject to the
|
material setting forth encryption source code is not itself subject to the
|
||||||
EAR." (see 15 C.F.R. §734.3(b)(2)). PGP, in an overabundance of caution,
|
EAR." (see 15 C.F.R. §734.3(b)(2)). PGP, in an overabundance of caution,
|
||||||
has only made available its source code in a form that is not subject to
|
has only made available its source code in a form that is not subject to
|
||||||
those regulations. So, books containing cryptographic source code may be
|
those regulations. So, books containing cryptographic source code may be
|
||||||
published, and after they are published they may be exported, but only
|
published, and after they are published they may be exported, but only
|
||||||
@ -167,24 +167,24 @@ The first step to getting OrnniPage 7 to work well is to set it up with
|
|||||||
options to disable all of its more advanced features for preserving font
|
options to disable all of its more advanced features for preserving font
|
||||||
changes and formatting. Look in the Seffings menu.
|
changes and formatting. Look in the Seffings menu.
|
||||||
|
|
||||||
· Create a Zone Contents File with all of ASCII in it, plus the extra
|
· Create a Zone Contents File with all of ASCII in it, plus the extra
|
||||||
bullet, currency, yen and pilcrow symbols. Name it "Source Code".
|
bullet, currency, yen and pilcrow symbols. Name it "Source Code".
|
||||||
· Create a Source Code style set. Within it, create a Source Code zone style
|
· Create a Source Code style set. Within it, create a Source Code zone style
|
||||||
and make it the default.
|
and make it the default.
|
||||||
· Set the font to something fixed-width, like Courier.
|
· Set the font to something fixed-width, like Courier.
|
||||||
· Set a fixed font size (10 point) and plain text, left-aligned.
|
· Set a fixed font size (10 point) and plain text, left-aligned.
|
||||||
· Set the tab character to a space.
|
· Set the tab character to a space.
|
||||||
· Set the text flow to hard line returns.
|
· Set the text flow to hard line returns.
|
||||||
· Set the margins to their widest.
|
· Set the margins to their widest.
|
||||||
· The font mapping options are irrelevant.
|
· The font mapping options are irrelevant.
|
||||||
|
|
||||||
Go to the settings panel and:
|
Go to the settings panel and:
|
||||||
|
|
||||||
· Under Scanner, set the brightness to manual. With careful setting of the
|
· Under Scanner, set the brightness to manual. With careful setting of the
|
||||||
threshold, this generates much better results than either the automatic
|
threshold, this generates much better results than either the automatic
|
||||||
threshold or the 3D OCR. Around 144 has been a good setting for us; you
|
threshold or the 3D OCR. Around 144 has been a good setting for us; you
|
||||||
may want to start there.
|
may want to start there.
|
||||||
· Under OCR, you'll build a training file to use later, but turn off
|
· Under OCR, you'll build a training file to use later, but turn off
|
||||||
automatic page orientation and select your Source Code style set in the
|
automatic page orientation and select your Source Code style set in the
|
||||||
Output Options. Also set a reasonable reject character. (For test, we
|
Output Options. Also set a reasonable reject character. (For test, we
|
||||||
used the pi symbol, which came across from the Macintosh as a weird
|
used the pi symbol, which came across from the Macintosh as a weird
|
||||||
@ -228,26 +228,26 @@ specific Latin-1 characters to be processed.
|
|||||||
|
|
||||||
They characters most in need of training are as follows:
|
They characters most in need of training are as follows:
|
||||||
|
|
||||||
· Zero is printed 'slashed.'
|
· Zero is printed 'slashed.'
|
||||||
· Lowercase L has a curled tail to distinguish it clearly from other
|
· Lowercase L has a curled tail to distinguish it clearly from other
|
||||||
vertical characters like 1 and I.
|
vertical characters like 1 and I.
|
||||||
· The or-bar or pipe symbol '|' is printed "broken" with a gap in the
|
· The or-bar or pipe symbol '|' is printed "broken" with a gap in the
|
||||||
middle to distinguish it similarly.
|
middle to distinguish it similarly.
|
||||||
· The underscore character has little "serifs" on the end to distinguish
|
· The underscore character has little "serifs" on the end to distinguish
|
||||||
it from a minus sign. We also raised it a just a tad higher than the
|
it from a minus sign. We also raised it a just a tad higher than the
|
||||||
normal underscore character, which was too low in the character cell to
|
normal underscore character, which was too low in the character cell to
|
||||||
be reliably seen by OmniPage.
|
be reliably seen by OmniPage.
|
||||||
· Tabs are printed as a hollow right-pointing triangle, followed by blanks
|
· Tabs are printed as a hollow right-pointing triangle, followed by blanks
|
||||||
to the correct alignment position. If not trained enough, OmniPage
|
to the correct alignment position. If not trained enough, OmniPage
|
||||||
guesses this is a capital D. You should train OmniPage to recognize this
|
guesses this is a capital D. You should train OmniPage to recognize this
|
||||||
symbol as a currency symbol (Latin-1 244).
|
symbol as a currency symbol (Latin-1 244).
|
||||||
· Any spaces in the original that follow a space, or a blank on the printed
|
· Any spaces in the original that follow a space, or a blank on the printed
|
||||||
page, are printed as a tiny black triangle. You should train OmniPage to
|
page, are printed as a tiny black triangle. You should train OmniPage to
|
||||||
recognize this as a center dot or bullet (Latin-1 267). We didn't use a
|
recognize this as a center dot or bullet (Latin-1 267). We didn't use a
|
||||||
standard center dot because OmniPage confused it with a period.
|
standard center dot because OmniPage confused it with a period.
|
||||||
· Any form feeds in the original are printed as a yen currency symbol
|
· Any form feeds in the original are printed as a yen currency symbol
|
||||||
(Latin-1 245).
|
(Latin-1 245).
|
||||||
· Lines over 80 columns long are broken after 79 columns by appending a big
|
· Lines over 80 columns long are broken after 79 columns by appending a big
|
||||||
ugly black block. You should train OmniPage to recognize this as a
|
ugly black block. You should train OmniPage to recognize this as a
|
||||||
pilcrow (paragraph symbol, Latin-1 266). We did this because after
|
pilcrow (paragraph symbol, Latin-1 266). We did this because after
|
||||||
deciding something black and visible was suitable, we found out the font
|
deciding something black and visible was suitable, we found out the font
|
||||||
@ -264,16 +264,16 @@ to train on, use that.
|
|||||||
|
|
||||||
Other things that need training:
|
Other things that need training:
|
||||||
|
|
||||||
· ~ (tilde), ^ (caret), ` (backquote) and ' (quote). These get dropped
|
· ~ (tilde), ^ (caret), ` (backquote) and ' (quote). These get dropped
|
||||||
frequently unless you train them.
|
frequently unless you train them.
|
||||||
· i, j and; (semicolon). These get mixed up.
|
· i, j and; (semicolon). These get mixed up.
|
||||||
· 3 and S. These also get mixed up.
|
· 3 and S. These also get mixed up.
|
||||||
· Q can fail to be recognized.
|
· Q can fail to be recognized.
|
||||||
· C and [ can be confused.
|
· C and [ can be confused.
|
||||||
· c/C, o/O, p/P, s/S, u/U, v/V, w/W, y/Y and z/Z are often confused. This
|
· c/C, o/O, p/P, s/S, u/U, v/V, w/W, y/Y and z/Z are often confused. This
|
||||||
can be helped by some training.
|
can be helped by some training.
|
||||||
· r gets confused with c and n. I don't understand c, but it happens.
|
· r gets confused with c and n. I don't understand c, but it happens.
|
||||||
· f gets confused with i.
|
· f gets confused with i.
|
||||||
|
|
||||||
The OCR training pages have lots of useful examples of troublesome
|
The OCR training pages have lots of useful examples of troublesome
|
||||||
characters. Scan a few pages of material, training each page, then scan a
|
characters. Scan a few pages of material, training each page, then scan a
|
||||||
|
Loading…
Reference in New Issue
Block a user