From 1c3623d7195d339e00ede92b112ad45a1f4b36e1 Mon Sep 17 00:00:00 2001 From: rnhmjoj Date: Wed, 15 May 2019 18:20:12 +0200 Subject: [PATCH] use UTF-8 encoding --- README | 54 +++++++++++++++++++++++++++--------------------------- 1 file changed, 27 insertions(+), 27 deletions(-) diff --git a/README b/README index 8abf4a8..239659c 100644 --- a/README +++ b/README @@ -25,7 +25,7 @@ cryptographic software is subject to U.S. export control laws and regulations. The new 1997 Commerce Department Export Administration Regulations (EAR) explicitly provide that "A printed book or other printed material setting forth encryption source code is not itself subject to the -EAR." (see 15 C.F.R. §734.3(b)(2)). PGP, in an overabundance of caution, +EAR." (see 15 C.F.R. §734.3(b)(2)). PGP, in an overabundance of caution, has only made available its source code in a form that is not subject to those regulations. So, books containing cryptographic source code may be published, and after they are published they may be exported, but only @@ -167,24 +167,24 @@ The first step to getting OrnniPage 7 to work well is to set it up with options to disable all of its more advanced features for preserving font changes and formatting. Look in the Seffings menu. -· Create a Zone Contents File with all of ASCII in it, plus the extra +· Create a Zone Contents File with all of ASCII in it, plus the extra bullet, currency, yen and pilcrow symbols. Name it "Source Code". -· Create a Source Code style set. Within it, create a Source Code zone style +· Create a Source Code style set. Within it, create a Source Code zone style and make it the default. -· Set the font to something fixed-width, like Courier. -· Set a fixed font size (10 point) and plain text, left-aligned. -· Set the tab character to a space. -· Set the text flow to hard line returns. -· Set the margins to their widest. -· The font mapping options are irrelevant. +· Set the font to something fixed-width, like Courier. +· Set a fixed font size (10 point) and plain text, left-aligned. +· Set the tab character to a space. +· Set the text flow to hard line returns. +· Set the margins to their widest. +· The font mapping options are irrelevant. Go to the settings panel and: -· Under Scanner, set the brightness to manual. With careful setting of the +· Under Scanner, set the brightness to manual. With careful setting of the threshold, this generates much better results than either the automatic threshold or the 3D OCR. Around 144 has been a good setting for us; you may want to start there. -· Under OCR, you'll build a training file to use later, but turn off +· Under OCR, you'll build a training file to use later, but turn off automatic page orientation and select your Source Code style set in the Output Options. Also set a reasonable reject character. (For test, we used the pi symbol, which came across from the Macintosh as a weird @@ -228,26 +228,26 @@ specific Latin-1 characters to be processed. They characters most in need of training are as follows: -· Zero is printed 'slashed.' -· Lowercase L has a curled tail to distinguish it clearly from other +· Zero is printed 'slashed.' +· Lowercase L has a curled tail to distinguish it clearly from other vertical characters like 1 and I. -· The or-bar or pipe symbol '|' is printed "broken" with a gap in the +· The or-bar or pipe symbol '|' is printed "broken" with a gap in the middle to distinguish it similarly. -· The underscore character has little "serifs" on the end to distinguish +· The underscore character has little "serifs" on the end to distinguish it from a minus sign. We also raised it a just a tad higher than the normal underscore character, which was too low in the character cell to be reliably seen by OmniPage. -· Tabs are printed as a hollow right-pointing triangle, followed by blanks +· Tabs are printed as a hollow right-pointing triangle, followed by blanks to the correct alignment position. If not trained enough, OmniPage guesses this is a capital D. You should train OmniPage to recognize this symbol as a currency symbol (Latin-1 244). -· Any spaces in the original that follow a space, or a blank on the printed +· Any spaces in the original that follow a space, or a blank on the printed page, are printed as a tiny black triangle. You should train OmniPage to recognize this as a center dot or bullet (Latin-1 267). We didn't use a standard center dot because OmniPage confused it with a period. -· Any form feeds in the original are printed as a yen currency symbol +· Any form feeds in the original are printed as a yen currency symbol (Latin-1 245). -· Lines over 80 columns long are broken after 79 columns by appending a big +· Lines over 80 columns long are broken after 79 columns by appending a big ugly black block. You should train OmniPage to recognize this as a pilcrow (paragraph symbol, Latin-1 266). We did this because after deciding something black and visible was suitable, we found out the font @@ -264,16 +264,16 @@ to train on, use that. Other things that need training: -· ~ (tilde), ^ (caret), ` (backquote) and ' (quote). These get dropped +· ~ (tilde), ^ (caret), ` (backquote) and ' (quote). These get dropped frequently unless you train them. -· i, j and; (semicolon). These get mixed up. -· 3 and S. These also get mixed up. -· Q can fail to be recognized. -· C and [ can be confused. -· c/C, o/O, p/P, s/S, u/U, v/V, w/W, y/Y and z/Z are often confused. This +· i, j and; (semicolon). These get mixed up. +· 3 and S. These also get mixed up. +· Q can fail to be recognized. +· C and [ can be confused. +· c/C, o/O, p/P, s/S, u/U, v/V, w/W, y/Y and z/Z are often confused. This can be helped by some training. -· r gets confused with c and n. I don't understand c, but it happens. -· f gets confused with i. +· r gets confused with c and n. I don't understand c, but it happens. +· f gets confused with i. The OCR training pages have lots of useful examples of troublesome characters. Scan a few pages of material, training each page, then scan a