initial commit
This commit is contained in:
commit
60052b2f16
32
MANIFEST
Normal file
32
MANIFEST
Normal file
@ -0,0 +1,32 @@
|
|||||||
|
1 test-file
|
||||||
|
2 MANIFEST
|
||||||
|
D books/
|
||||||
|
D books/tools/
|
||||||
|
3 bootstrap
|
||||||
|
4 bootstrap2
|
||||||
|
5 sortpages
|
||||||
|
6 Makefile
|
||||||
|
7 heap.c
|
||||||
|
8 heap.h
|
||||||
|
9 mempool.c
|
||||||
|
10 mempool.h
|
||||||
|
11 util.c
|
||||||
|
12 util.h
|
||||||
|
13 repair.c
|
||||||
|
14 subst.c
|
||||||
|
15 subst.h
|
||||||
|
16 unmunge.c
|
||||||
|
17 munge.c
|
||||||
|
18 yapp.doc
|
||||||
|
19 yapp
|
||||||
|
20 psgen
|
||||||
|
21 makemanifest
|
||||||
|
D books/ps/
|
||||||
|
22 prolog.ps
|
||||||
|
23 charmap.ps
|
||||||
|
D books/example/
|
||||||
|
24 Makefile
|
||||||
|
25 .cvsignore
|
||||||
|
26 filelist
|
||||||
|
27 footer.ps
|
||||||
|
28 us-constitution.gz
|
477
README
Normal file
477
README
Normal file
@ -0,0 +1,477 @@
|
|||||||
|
PREFACE
|
||||||
|
-------
|
||||||
|
|
||||||
|
This book grew out of a project to publish source code for cryptographic
|
||||||
|
software, namely PGP (Pretty Good Privacy), a software package for the
|
||||||
|
encryption of electronic mail and computer files. PGP is the most widely
|
||||||
|
used software in the world for email encryption. Pretty Good Privacy, Inc
|
||||||
|
(or "PGP") has published the source code of PGP for peer review, a long-
|
||||||
|
standing tradition in the history of PGP. The first time a fully implemented
|
||||||
|
cryptographic software package was published in its entirety in book form
|
||||||
|
was "PGP Source Code and Internals," by Philip Zimmermann, published by The
|
||||||
|
MIT Press, 1995, ISBN 0-262-24039-4.
|
||||||
|
|
||||||
|
Peer review of the source code is important to get users to trust the
|
||||||
|
software, since any weaknesses can be detected by knowledgeable experts who
|
||||||
|
make the effort to review the code. But peer review cannot be completely
|
||||||
|
effective unless the experts conducting the review can compile and test the
|
||||||
|
software, and verify that it is the same as the software products that are
|
||||||
|
published electronically. To facilitate that, PGP publishes its source code
|
||||||
|
in printed form that can be scanned into a computer via OCR (optical
|
||||||
|
character recognition) technology.
|
||||||
|
|
||||||
|
Why not publish the source code in electronic form? As you may know,
|
||||||
|
cryptographic software is subject to U.S. export control laws and
|
||||||
|
regulations. The new 1997 Commerce Department Export Administration
|
||||||
|
Regulations (EAR) explicitly provide that "A printed book or other printed
|
||||||
|
material setting forth encryption source code is not itself subject to the
|
||||||
|
EAR." (see 15 C.F.R. §734.3(b)(2)). PGP, in an overabundance of caution,
|
||||||
|
has only made available its source code in a form that is not subject to
|
||||||
|
those regulations. So, books containing cryptographic source code may be
|
||||||
|
published, and after they are published they may be exported, but only
|
||||||
|
while they are still in printed form.
|
||||||
|
|
||||||
|
Electronic commerce on the Internet cannot fully be successful without
|
||||||
|
strong cryptography. Cryptography is important for protecting our privacy,
|
||||||
|
civil liberties, and the security of our personal and business transactions
|
||||||
|
in the information age. The widespread deployment of strong cryptography
|
||||||
|
can help us regain some of the privacy and security that we have lost due
|
||||||
|
to information technology. Further, strong cryptography (in the form of
|
||||||
|
PGP) has already proven itself to be a valuable tool for the protection of
|
||||||
|
human rights in oppressive countries around the world, by keeping those
|
||||||
|
governments from reading the communications of human rights workers.
|
||||||
|
|
||||||
|
This book of tools contains no cryptographic software of any kind, nor does
|
||||||
|
it call, connect, nor integrate in any way with cryptographic software. But
|
||||||
|
it does contain tools that make it easy to publish source code in book form.
|
||||||
|
And it makes it easy to scan such source code in with OCR software rapidly
|
||||||
|
and accurately.
|
||||||
|
|
||||||
|
Philip Zimmermann
|
||||||
|
prz@acm.org
|
||||||
|
|
||||||
|
November 1997
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
INTRODUCTION
|
||||||
|
------------
|
||||||
|
|
||||||
|
This book contains tools for printing computer source code on paper in
|
||||||
|
human-readable form and reconstructing it exactly using automated tools.
|
||||||
|
While standard OCR software can recover most of the graphic characters,
|
||||||
|
non-printing characters like tabs, spaces, newlines and form feeds cause
|
||||||
|
problems.
|
||||||
|
|
||||||
|
In fact, these tools can print any ASCII text file; it's just that the
|
||||||
|
attention these tools pay to spacing is particularly valuable for computer
|
||||||
|
source code. The two-dimensional indentation structure of source code is
|
||||||
|
very important to its comprehensibility. In some cases, distinctions
|
||||||
|
between non-printing characters are critical: the standard make utility
|
||||||
|
will not accept spaces where it expects to see a tab character.
|
||||||
|
|
||||||
|
Producing a byte-for-byte identical copy of the original is also valuable
|
||||||
|
for authentication, as you can verify a checksum.
|
||||||
|
|
||||||
|
There are five problems we have addressed:
|
||||||
|
|
||||||
|
1. Getting good OCR accuracy.
|
||||||
|
2. Preserving whitespace.
|
||||||
|
3. Preserving lines longer than can be printed on the page.
|
||||||
|
4. Dealing with data that isn't human-readable.
|
||||||
|
5. Detecting and correcting any residual errors.
|
||||||
|
|
||||||
|
The first problem is partly addressed by using a font designed for OCR
|
||||||
|
purposes, OCR-B. OCR-A is a very ugly font that contains only the digits 0
|
||||||
|
through 9 and a few special punctuation symbols. OCR-B is a very readable
|
||||||
|
monospaced font that contains a full ASCII set, and has been popular as a
|
||||||
|
font on line printers for years because it distinguishes ambiguous
|
||||||
|
characters and is clear even if fuzzy or distorted.
|
||||||
|
|
||||||
|
The most unusual thing about the OCR-B font is the way that it prints a
|
||||||
|
lower-case letter 1, with a small hook on the bottom, something like an
|
||||||
|
upper-case L. This is to distinguish it from the numeral 1. We also made
|
||||||
|
some modifications to the font, to print the numeral 0 with a slash, and
|
||||||
|
to print the vertical bar in a broken form. Both of these are such common
|
||||||
|
variants that they should not present any intelligibility barrier. Finally,
|
||||||
|
we print the underscore character in a distinct manner that is hopefully
|
||||||
|
not visually distracting, but is clearly distinguishable from the minus
|
||||||
|
sign even in the absence of a baseline reference.
|
||||||
|
|
||||||
|
The most significant part of getting good OCR accuracy is, however, using
|
||||||
|
the OCR tools well. We've done a lot of testing and experimentation and
|
||||||
|
present here a lot of information on what works and what doesn't.
|
||||||
|
|
||||||
|
To preserve whitespace, we added some special symbols to display spaces,
|
||||||
|
tabs, and form feeds. A space is printed as a small triangular dot
|
||||||
|
character, while a hollow rightward-pointing triangle (followed by blank
|
||||||
|
spaces to the right tab stop) signifies a tab. A form feed is printed as
|
||||||
|
a yen symbol, and the printed line is broken after the form feed.
|
||||||
|
|
||||||
|
Making the dot triangular instead of square helps distinguish it from a
|
||||||
|
period. To reduce the clutter on the page and make the text more readable,
|
||||||
|
the space character is only printed as a small dot if it follows a blank
|
||||||
|
on the page (a tab or another space), or comes immediately before the end
|
||||||
|
of the line. Thus, the reader (human or software) must be able to
|
||||||
|
distinguish one space from no spaces, but can find multiple spaces by
|
||||||
|
counting the dots (and adding one).
|
||||||
|
|
||||||
|
The format is designed so that 80 characters, plus checksums, can be
|
||||||
|
printed on one line of an 8.5x11" (or A4) page, the still-common punched
|
||||||
|
card line length. Longer lines are managed with the simple technique of
|
||||||
|
appending a big ugly black blob to the first part of the line indicating
|
||||||
|
that the next printed line should be concatenated with the current one
|
||||||
|
with no intervening newline. Hopefully, its use is infrequent.
|
||||||
|
|
||||||
|
While ASCII text is by far the most popular form, some source code is not
|
||||||
|
readable in the usual way. It may be an audio clip, a graphic image bitmap,
|
||||||
|
or something else that is manipulated with a specialized editing tool. For
|
||||||
|
printing purposes, these tools just print any such files as a long string
|
||||||
|
of gibberish in a 64-character set designed to be easy to OCR unambiguously.
|
||||||
|
Although the tools recognize such binary data and apply extra consistency
|
||||||
|
checks, that can be considered a separate step.
|
||||||
|
|
||||||
|
Finally, the problem of residual errors arises. OCR software is not perfect,
|
||||||
|
and uses a variety of heuristics and spelling-check dictionaries to clean up
|
||||||
|
any residual errors in human-language text. This isn't reliable enough for
|
||||||
|
source code, so we have added per-page and per-line checksums to the printed
|
||||||
|
material, and a series of tools to use those checksums to correct any
|
||||||
|
remaining errors and convert the scanned text into a series of files again.
|
||||||
|
|
||||||
|
This "munged" form is what you see in most of the body of this book. We
|
||||||
|
think it does a good job of presenting source code in a way that can be read
|
||||||
|
easily by both humans and computers.
|
||||||
|
|
||||||
|
The tools are command-line oriented and a bit clunky. This has a purpose
|
||||||
|
beyond laziness on the authors' parts: it keeps them small. Keeping them
|
||||||
|
small makes the "bootstrapping" part of scanning this book easier, since you
|
||||||
|
don't have the tools to help you with that.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
SCANNING
|
||||||
|
--------
|
||||||
|
|
||||||
|
Our tests were done with OmniPage 7.0 on a Power Macintosh 8500/120 and an
|
||||||
|
HP ScanJet 4c scanner with an automatic document feeder. The first part of
|
||||||
|
this is heavily OmniPage-specific, as that appears to be the most widely
|
||||||
|
available OCR software.
|
||||||
|
|
||||||
|
The tools here were developed under Linux, and should be generally portable
|
||||||
|
to any Unix platform. Since this book is about printing and scanning source
|
||||||
|
code, we assume the readers have enough programming background to know how
|
||||||
|
to build a program from a Makefile, understand the hazards of CR, LF or CRLF
|
||||||
|
line endings, and such minor details without explicit mention.
|
||||||
|
|
||||||
|
The first step to getting OrnniPage 7 to work well is to set it up with
|
||||||
|
options to disable all of its more advanced features for preserving font
|
||||||
|
changes and formatting. Look in the Seffings menu.
|
||||||
|
|
||||||
|
· Create a Zone Contents File with all of ASCII in it, plus the extra
|
||||||
|
bullet, currency, yen and pilcrow symbols. Name it "Source Code".
|
||||||
|
· Create a Source Code style set. Within it, create a Source Code zone style
|
||||||
|
and make it the default.
|
||||||
|
· Set the font to something fixed-width, like Courier.
|
||||||
|
· Set a fixed font size (10 point) and plain text, left-aligned.
|
||||||
|
· Set the tab character to a space.
|
||||||
|
· Set the text flow to hard line returns.
|
||||||
|
· Set the margins to their widest.
|
||||||
|
· The font mapping options are irrelevant.
|
||||||
|
|
||||||
|
Go to the settings panel and:
|
||||||
|
|
||||||
|
· Under Scanner, set the brightness to manual. With careful setting of the
|
||||||
|
threshold, this generates much better results than either the automatic
|
||||||
|
threshold or the 3D OCR. Around 144 has been a good setting for us; you
|
||||||
|
may want to start there.
|
||||||
|
· Under OCR, you'll build a training file to use later, but turn off
|
||||||
|
automatic page orientation and select your Source Code style set in the
|
||||||
|
Output Options. Also set a reasonable reject character. (For test, we
|
||||||
|
used the pi symbol, which came across from the Macintosh as a weird
|
||||||
|
sequence, but you can use anything as long as you make the appropriate
|
||||||
|
definition in subst.c.)
|
||||||
|
|
||||||
|
Do an initial scan of a few pages and create a manual zone encompassing
|
||||||
|
all of the text. Leave some margin for page misalignment, and leave space
|
||||||
|
on the sides for the left-right shift caused by the book binding being in
|
||||||
|
different places on odd and even pages.
|
||||||
|
|
||||||
|
Set the Zone Contents and the Style set to the Source Code settings. After
|
||||||
|
setting the Style Set, the Zone Style should be automatically set correctly
|
||||||
|
(since you set Source Code as the default).
|
||||||
|
|
||||||
|
Then save the Zone Template, and in the pop-up menu under the Zone step on
|
||||||
|
the main toolbar you can now select it.
|
||||||
|
|
||||||
|
Now we're ready to get characters recognized. The first results will be
|
||||||
|
terrible, with lots of red (unrecognizable) and green (suspicious) text in
|
||||||
|
the recognized window. Some tweaking will improve this enormously.
|
||||||
|
|
||||||
|
The first step is setting a good black threshold. Auto brightness sets the
|
||||||
|
threshold too low, making the character outlines bleed and picking up a lot
|
||||||
|
of glitches on mostly-blank pages. Try training OCR on the few pages you've
|
||||||
|
scanned and look at the representative characters. Adjust the threshold so
|
||||||
|
the strokes are clear and distinct, neither so thin they are broken nor so
|
||||||
|
think they smear into each other. The character that bleeds worst is
|
||||||
|
lowercase w, while the underscore and tab symbols have the thinnest lines
|
||||||
|
that need worry.
|
||||||
|
|
||||||
|
You'll have to re-scan (you can just click the AUTO button) until you get
|
||||||
|
satisfactory results.
|
||||||
|
|
||||||
|
The next step is training. You should scan a significant number of pages
|
||||||
|
and teach OmniPage about any characters it has difficulty with. There are
|
||||||
|
several characters which have been printed in unusual ways which you must
|
||||||
|
teach OmniPage about before it can recognize them reliably. We also have
|
||||||
|
some characters that are unique, which the tools expect to be mapped to
|
||||||
|
specific Latin-1 characters to be processed.
|
||||||
|
|
||||||
|
They characters most in need of training are as follows:
|
||||||
|
|
||||||
|
· Zero is printed 'slashed.'
|
||||||
|
· Lowercase L has a curled tail to distinguish it clearly from other
|
||||||
|
vertical characters like 1 and I.
|
||||||
|
· The or-bar or pipe symbol '|' is printed "broken" with a gap in the
|
||||||
|
middle to distinguish it similarly.
|
||||||
|
· The underscore character has little "serifs" on the end to distinguish
|
||||||
|
it from a minus sign. We also raised it a just a tad higher than the
|
||||||
|
normal underscore character, which was too low in the character cell to
|
||||||
|
be reliably seen by OmniPage.
|
||||||
|
· Tabs are printed as a hollow right-pointing triangle, followed by blanks
|
||||||
|
to the correct alignment position. If not trained enough, OmniPage
|
||||||
|
guesses this is a capital D. You should train OmniPage to recognize this
|
||||||
|
symbol as a currency symbol (Latin-1 244).
|
||||||
|
· Any spaces in the original that follow a space, or a blank on the printed
|
||||||
|
page, are printed as a tiny black triangle. You should train OmniPage to
|
||||||
|
recognize this as a center dot or bullet (Latin-1 267). We didn't use a
|
||||||
|
standard center dot because OmniPage confused it with a period.
|
||||||
|
· Any form feeds in the original are printed as a yen currency symbol
|
||||||
|
(Latin-1 245).
|
||||||
|
· Lines over 80 columns long are broken after 79 columns by appending a big
|
||||||
|
ugly black block. You should train OmniPage to recognize this as a
|
||||||
|
pilcrow (paragraph symbol, Latin-1 266). We did this because after
|
||||||
|
deciding something black and visible was suitable, we found out the font
|
||||||
|
we used doesn't have a pilcrow in it.
|
||||||
|
|
||||||
|
The zero and the tab character, because of their frequency, deserve special
|
||||||
|
attention.
|
||||||
|
|
||||||
|
In addition, look for any unrecognized characters (in red) and retrain those
|
||||||
|
pages. If you get an unrecognized character, that character needs training,
|
||||||
|
but Caere says that "good examples" are best to train on, so if the training
|
||||||
|
doesn't recognize a slightly fuzzy K, and there's a nice crisp K available
|
||||||
|
to train on, use that.
|
||||||
|
|
||||||
|
Other things that need training:
|
||||||
|
|
||||||
|
· ~ (tilde), ^ (caret), ` (backquote) and ' (quote). These get dropped
|
||||||
|
frequently unless you train them.
|
||||||
|
· i, j and; (semicolon). These get mixed up.
|
||||||
|
· 3 and S. These also get mixed up.
|
||||||
|
· Q can fail to be recognized.
|
||||||
|
· C and [ can be confused.
|
||||||
|
· c/C, o/O, p/P, s/S, u/U, v/V, w/W, y/Y and z/Z are often confused. This
|
||||||
|
can be helped by some training.
|
||||||
|
· r gets confused with c and n. I don't understand c, but it happens.
|
||||||
|
· f gets confused with i.
|
||||||
|
|
||||||
|
The OCR training pages have lots of useful examples of troublesome
|
||||||
|
characters. Scan a few pages of material, training each page, then scan a
|
||||||
|
few dozen pages and look for recognition problems. Look for what OmniPage
|
||||||
|
reports as troublesome, and when you have the repair program working, use
|
||||||
|
it to find and report further errors. Train a few pages particularly dense
|
||||||
|
in problems and append the troublesome characters to the training file, the
|
||||||
|
re-recognize the lot.
|
||||||
|
|
||||||
|
Double-check your training file for case errors. It's easy to miss the shift
|
||||||
|
key in the middle of a lot of training and will result in terrible results
|
||||||
|
even though OmniPage won't report anything amiss. We have spent a while
|
||||||
|
wondering why OmniPage wasn't recognizing capital S or capital W, only to
|
||||||
|
find that OmniPage was just doing what it was trained to do.
|
||||||
|
|
||||||
|
We have heard some reports that OmniPage has problems with large training
|
||||||
|
files. We have observed OmniPage suffering repeatable internal errors
|
||||||
|
sometimes after massive training additions, but they were cured by deleting
|
||||||
|
a few training images. Appending more training images to the training file
|
||||||
|
did not cause the problem to re-appear.
|
||||||
|
|
||||||
|
Repairing the OCR results
|
||||||
|
|
||||||
|
If the only copy of the tools you have is printed in this book, see the next
|
||||||
|
chapter on bootstrapping at this point. Here, we assume that you have the
|
||||||
|
tools and they work.
|
||||||
|
|
||||||
|
When you have some reasonable OCR results, delete any directory pages. With
|
||||||
|
no checksum information, they just confuse the postprocessing tools. (The
|
||||||
|
tools will just stop with an error when they get to the "uncorrectable"
|
||||||
|
directory name and you'll have to delete it then, so it's not fatal if you
|
||||||
|
forget.) Copy the data to a machine that you have the repair and unmunge
|
||||||
|
utilities on.
|
||||||
|
|
||||||
|
The repair utility attempts automatic table-driven correction of common
|
||||||
|
scanning errors. You have to recompile it to change the tables, but are
|
||||||
|
encouraged to if you find a common problem that it does not correct reliably.
|
||||||
|
If it gets stuck, it will deposit you into your favorite editor on or
|
||||||
|
slightly after the offending line. (The file you will be editing is the
|
||||||
|
unprocessed portion of the input.) After you correct the problem and quit
|
||||||
|
the editor, repair will resume.
|
||||||
|
|
||||||
|
"Your favorite editor" is taken from the $VISUAL and $EDITOR environment
|
||||||
|
variables, or the -e option to repair.
|
||||||
|
|
||||||
|
The repair utility never alters the original input file. It will produce
|
||||||
|
corrected output for file in file.out, and when it has to stop, it writes
|
||||||
|
any remaining uncorrected input back out to file.in (via a temporary
|
||||||
|
file.dump) and lets you edit this file. If you re-run repair on file and
|
||||||
|
file.in exists, repair will restart from there, so you may safely quit and
|
||||||
|
re-run repair as often as you like. (But if you change the input file, you
|
||||||
|
need to delete the .in file for repair to notice the change.)
|
||||||
|
|
||||||
|
Statistics on repair's work are printed to file.log. This is an excellent
|
||||||
|
place to look to see if any characters require more training.
|
||||||
|
|
||||||
|
As it works, repair prints the line it is working on. If you see it make a
|
||||||
|
mistake or get stuck, you can interrupt it (control-C or whatever is
|
||||||
|
appropriate), and it will immediately drop into the editor. If you interrupt
|
||||||
|
it a second time, it will exit rather than invoking the editor. If the
|
||||||
|
editor returns a non-zero result code (fails), repair will also stop. (E.g.
|
||||||
|
:cq in vim.)
|
||||||
|
|
||||||
|
One thing that repair fixes without the least trouble is the number of
|
||||||
|
spaces expected after a printing tab character. It's such an omnipresent OCR
|
||||||
|
software error that repair doesn't even log it as a correction.
|
||||||
|
|
||||||
|
In some cases, repair can miscorrect a line and go on to the next line,
|
||||||
|
possibly even more than once, finally giving up a few lines below the actual
|
||||||
|
error. If you are having trouble spotting the error, one helpful trick is to
|
||||||
|
exit the editor and let repair try to fix the page again, but interrupt it
|
||||||
|
while it is still working on the first line, before it has found the
|
||||||
|
miscorrection.
|
||||||
|
|
||||||
|
The Nasty Lines
|
||||||
|
|
||||||
|
Some lines of code, particularly those containing long runs of underscore or
|
||||||
|
minus characters, are particularly difficult to scan reliably. The repair
|
||||||
|
program has a special "nasty lines" feature to deal with this. If a file
|
||||||
|
named "nastylines" (or as specified by the -l option) exists, they are
|
||||||
|
checksummed and are considered as total replacements for any input line with
|
||||||
|
the same checksum. So, for example, if you place a blank line in the
|
||||||
|
nastylines file, any scanner noise on blank lines will be ignored.
|
||||||
|
|
||||||
|
The "nastylines" file is re-read every time repair restarts after an edit,
|
||||||
|
so you can add more lines as the program runs. (The error-correction patterns
|
||||||
|
should be done this way, too, but that'll have to wait for the next release.)
|
||||||
|
|
||||||
|
Sortpages
|
||||||
|
|
||||||
|
If, in the course of scanning, the pages have been split up or have gotten
|
||||||
|
out of order, a perl script called sortpages can restore them to the proper
|
||||||
|
order. It can merge multiple input files, discard duplicates, and warns about
|
||||||
|
any missing pages it encounters. This script requires that the pages have
|
||||||
|
been repaired, so that the page headers can be read reliably. The repair
|
||||||
|
program does not care about the order it works on pages in; it examines each
|
||||||
|
page independently. Unmunge, however, does need the pages in order.
|
||||||
|
|
||||||
|
Unmunging
|
||||||
|
|
||||||
|
After repair has finished its work, the unmunge program strips out the
|
||||||
|
checksums and, based on the page headers, divides the data up among various
|
||||||
|
files. Its first argument is the file to unpack. The optional second argument
|
||||||
|
is a manifest file that lists all of the files and the directories they go
|
||||||
|
in. Supplying this (an excellent idea) lets unmunge recreate a directory
|
||||||
|
hierarchy and warn about missing files.
|
||||||
|
|
||||||
|
When you have unmunged everything and reconstructed the original source code,
|
||||||
|
you are done. Unmunge verifies all of the checksums independently of repair,
|
||||||
|
as a sanity check, and you can have high confidence that the files are
|
||||||
|
exactly the same as the originals that were printed.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
BOOTSTRAPPING
|
||||||
|
-------------
|
||||||
|
|
||||||
|
There's a problem using the postprocessing tools to correct OCR errors, when
|
||||||
|
the code being OCRed is the tools themselves. We've tried to provide a
|
||||||
|
reasonably easy way to get the system up and running starting from nothing
|
||||||
|
but a copy of OmniPage.
|
||||||
|
|
||||||
|
You could just scan all of the tools in, correct any errors by hand, delete
|
||||||
|
the error-checking information in a text editor, and compile them. But
|
||||||
|
finding all the errors by hand is painful in a body of code that large.
|
||||||
|
With the aid of perl (version 5), which provides a lot of power in very
|
||||||
|
little code, we have provided some utilities to make this process easier.
|
||||||
|
|
||||||
|
The first-stage bootstrap is a one-page perl script designed to be as small
|
||||||
|
and simple as possible, because you'll have to hand-correct it. It can verify
|
||||||
|
the checksums on each line, and drop you into the editor on any lines where
|
||||||
|
an error has occurred. It also knows how to strip out the visible spaces and
|
||||||
|
tabs, how to correct spacing errors after visible tab characters, and how to
|
||||||
|
invoke an editor on the erroneous line.
|
||||||
|
|
||||||
|
Scan in the first-stage bootstrap as carefully as possible, using OmniPage's
|
||||||
|
warnings to guide you to any errors, and either use a text editor or the
|
||||||
|
one-line perl command at the top of the file to remove the checksums and
|
||||||
|
convert any funny printed characters to whitespace form.
|
||||||
|
|
||||||
|
The first thing to do is try running it on itself, and correct any errors you
|
||||||
|
find this way. Note that the script writes its output to the file named in
|
||||||
|
the page header, so you should name your hand-corrected version differently
|
||||||
|
(or put it in a different directory) to avoid having it overwritten.
|
||||||
|
|
||||||
|
The second-stage bootstrap is a much denser one-pager, with better error
|
||||||
|
detection; it can detect missing lines and missing pages, and takes an
|
||||||
|
optional second argument of a manifest file which it can use to put files
|
||||||
|
in their proper directories. It's not strictly necessary, but it's only one
|
||||||
|
more (dense) page and you can check it against itself and the original
|
||||||
|
bootstrap.
|
||||||
|
|
||||||
|
Both of the botstrap utilities can correct tab spacing errors in the OCR
|
||||||
|
output. Although this doesn't matter in most source code, it is included
|
||||||
|
in the checksums.
|
||||||
|
|
||||||
|
Once you have reached this point, you can scan in the C code for repair and
|
||||||
|
unmunge. The C unmunge is actually less friendly than the bootstrap
|
||||||
|
utilities, because it is only intended to work with the output of repair.
|
||||||
|
It is, however, much faster, since computing CRCs a bit at a time in an
|
||||||
|
interpreted language is painfully slow for large amounts of data. It can
|
||||||
|
also deal with binary files printed in radix-64.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
PRINTING
|
||||||
|
--------
|
||||||
|
|
||||||
|
Despite the title of this book, this process of producing a book is not well
|
||||||
|
documented, since it's been evolving up to the moment of publication. There,
|
||||||
|
is, however, a very useful working example of how to produce a book
|
||||||
|
(strikingly similar to this book) in the example directory, all controlled
|
||||||
|
by a Makefile.
|
||||||
|
|
||||||
|
Briefly, a master perl script called psgen takes three parameters: a file
|
||||||
|
list, a page numbers file to write to, and a volume number (which should
|
||||||
|
always be 1 for a one-volume book). It runs the listed files through the
|
||||||
|
munge utility, wraps them in some simple PostScript, and prepends a prolog
|
||||||
|
that defines the special characters and PostScript functions needed by the
|
||||||
|
text.
|
||||||
|
|
||||||
|
The file list also includes per-file flags. The most important is the
|
||||||
|
text/binary marker. Text files can also have a tab width specified, although
|
||||||
|
munge knows how to read Emacs-style tab width settings from the end of a
|
||||||
|
source file.
|
||||||
|
|
||||||
|
The prolog is assembled from various other files and defines by psgen using
|
||||||
|
a simple preprocessor called yapp (Yet Another Preprocessor). This process
|
||||||
|
includes some book-specific information like the page footer.
|
||||||
|
|
||||||
|
Producing the final PostScript requires the necessary non-standard fonts
|
||||||
|
(Futura for the footers and OCRB for the code) and the psutils package,
|
||||||
|
which provides the includeres utility used to embed the fonts in the
|
||||||
|
PostScript file. The fonts should go in the books/ps directory, as
|
||||||
|
"Futura.pfa" and the like.
|
||||||
|
|
||||||
|
The pagenums file can be used to produce a table of contents. For this book,
|
||||||
|
we generated the front matter (such as this chapter) separately, told psgen
|
||||||
|
to start on the next page after this, and concatenated the resultant
|
||||||
|
PostScript files for printing. The only trick was making the page footers
|
||||||
|
look identical.
|
3
example/.cvsignore
Normal file
3
example/.cvsignore
Normal file
@ -0,0 +1,3 @@
|
|||||||
|
pagenums
|
||||||
|
MANIFEST
|
||||||
|
code.ps
|
23
example/Makefile
Normal file
23
example/Makefile
Normal file
@ -0,0 +1,23 @@
|
|||||||
|
BOOKROOT=..
|
||||||
|
TOOLSDIR=$(BOOKROOT)/tools
|
||||||
|
PSDIR=$(BOOKROOT)/ps
|
||||||
|
YAPP=$(TOOLSDIR)/yapp
|
||||||
|
MAKEMANIFEST=$(TOOLSDIR)/makemanifest
|
||||||
|
PSGEN=BOOKROOT=$(BOOKROOT) $(TOOLSDIR)/psgen
|
||||||
|
INCLUDERES=(cd $(PSDIR); includeres)
|
||||||
|
|
||||||
|
code.ps pagenums: filelist footer.ps MANIFEST books
|
||||||
|
$(PSGEN) -P2 -l3 -DfooterFile=footer.ps filelist pagenums 1 \
|
||||||
|
| $(INCLUDERES) > code.ps
|
||||||
|
|
||||||
|
books:
|
||||||
|
ln -s $(BOOKROOT) books
|
||||||
|
|
||||||
|
MANIFEST: filelist
|
||||||
|
$(MAKEMANIFEST) $< > $@
|
||||||
|
|
||||||
|
clean:
|
||||||
|
rm -f `cat .cvsignore`
|
||||||
|
|
||||||
|
gv%: %.ps
|
||||||
|
gv $<
|
32
example/filelist
Normal file
32
example/filelist
Normal file
@ -0,0 +1,32 @@
|
|||||||
|
V 1 8
|
||||||
|
T MANIFEST
|
||||||
|
D books/
|
||||||
|
D books/tools/
|
||||||
|
T books/tools/bootstrap
|
||||||
|
T books/tools/bootstrap2
|
||||||
|
T4 books/tools/sortpages
|
||||||
|
T books/tools/Makefile
|
||||||
|
T books/tools/heap.c
|
||||||
|
T books/tools/heap.h
|
||||||
|
T books/tools/mempool.c
|
||||||
|
T books/tools/mempool.h
|
||||||
|
T books/tools/util.c
|
||||||
|
T books/tools/util.h
|
||||||
|
T books/tools/repair.c
|
||||||
|
T books/tools/subst.c
|
||||||
|
T books/tools/subst.h
|
||||||
|
T books/tools/unmunge.c
|
||||||
|
T books/tools/munge.c
|
||||||
|
T books/tools/yapp.doc
|
||||||
|
T4 books/tools/yapp
|
||||||
|
T4 books/tools/psgen
|
||||||
|
T4 books/tools/makemanifest
|
||||||
|
D books/ps/
|
||||||
|
T books/ps/prolog.ps
|
||||||
|
T books/ps/charmap.ps
|
||||||
|
D books/example/
|
||||||
|
T books/example/Makefile
|
||||||
|
T books/example/.cvsignore
|
||||||
|
T books/example/filelist
|
||||||
|
T books/example/footer.ps
|
||||||
|
B books/example/us-constitution.gz
|
5
example/footer.ps
Normal file
5
example/footer.ps
Normal file
@ -0,0 +1,5 @@
|
|||||||
|
% A program to print the page footer, using the magic P function,
|
||||||
|
% which takes a string and a font.
|
||||||
|
(Tools for Publishing Source Code via OCR ) /Futura P
|
||||||
|
(\343) /Symbol P % Copyright symbol
|
||||||
|
( 1997 Pretty Good Privacy, Inc.) /Futura P
|
BIN
example/us-constitution.gz
Normal file
BIN
example/us-constitution.gz
Normal file
Binary file not shown.
68
ps/charmap.ps
Normal file
68
ps/charmap.ps
Normal file
@ -0,0 +1,68 @@
|
|||||||
|
%%BeginResource: procset Latin1-vec 0 0
|
||||||
|
/Latin1-vec [
|
||||||
|
/.notdef /.notdef /.notdef /.notdef
|
||||||
|
/.notdef /.notdef /.notdef /.notdef
|
||||||
|
/.notdef /.notdef /.notdef /.notdef
|
||||||
|
/.notdef /.notdef /.notdef /.notdef
|
||||||
|
/.notdef /.notdef /.notdef /.notdef
|
||||||
|
/.notdef /.notdef /.notdef /.notdef
|
||||||
|
/.notdef /.notdef /.notdef /.notdef
|
||||||
|
/.notdef /.notdef /.notdef /.notdef
|
||||||
|
/space /exclam /quotedbl /numbersign
|
||||||
|
/dollar /percent /ampersand /${rightQuoteGlyph}
|
||||||
|
/parenleft /parenright /asterisk /plus
|
||||||
|
/comma /hyphen /period /slash
|
||||||
|
/${zeroGlyph} /one /two /three
|
||||||
|
/four /five /six /seven
|
||||||
|
/eight /nine /colon /semicolon
|
||||||
|
/less /equal /greater /question
|
||||||
|
/at /A /B /C
|
||||||
|
/D /E /F /G
|
||||||
|
/H /I /J /K
|
||||||
|
/L /M /N /O
|
||||||
|
/P /Q /R /S
|
||||||
|
/T /U /V /W
|
||||||
|
/X /Y /Z /bracketleft
|
||||||
|
/backslash /bracketright /asciicircum /${underscoreGlyph}
|
||||||
|
/${leftQuoteGlyph} /a /b /c
|
||||||
|
/d /e /f /g
|
||||||
|
/h /i /j /k
|
||||||
|
/l /m /n /o
|
||||||
|
/p /q /r /s
|
||||||
|
/t /u /v /w
|
||||||
|
/x /y /z /braceleft
|
||||||
|
/${barGlyph} /braceright /tilde /.notdef
|
||||||
|
/.notdef /.notdef /.notdef /.notdef
|
||||||
|
/.notdef /.notdef /.notdef /.notdef
|
||||||
|
/.notdef /.notdef /.notdef /.notdef
|
||||||
|
/.notdef /.notdef /.notdef /.notdef
|
||||||
|
/.notdef /.notdef /.notdef /.notdef
|
||||||
|
/.notdef /.notdef /.notdef /.notdef
|
||||||
|
/.notdef /.notdef /.notdef /.notdef
|
||||||
|
/.notdef /.notdef /.notdef /.notdef
|
||||||
|
/space /exclamdown /cent /sterling
|
||||||
|
/${tabGlyph} /yen /brokenbar /section
|
||||||
|
/dieresis /copyright /ordfeminine /guillemotleft
|
||||||
|
/logicalnot /hyphen /registered /macron
|
||||||
|
/degree /plusminus /twosuperior /threesuperior
|
||||||
|
/acute /mu /${pilcrowGlyph} /${bulletGlyph}
|
||||||
|
/cedilla /dotlessi /ordmasculine /guillemotright
|
||||||
|
/onequarter /onehalf /threequarters /questiondown
|
||||||
|
/Agrave /Aacute /Acircumflex /Atilde
|
||||||
|
/Adieresis /Aring /AE /Ccedilla
|
||||||
|
/Egrave /Eacute /Ecircumflex /Edieresis
|
||||||
|
/Igrave /Iacute /Icircumflex /Idieresis
|
||||||
|
/Eth /Ntilde /Ograve /Oacute
|
||||||
|
/Ocircumflex /Otilde /Odieresis /multiply
|
||||||
|
/Oslash /Ugrave /Uacute /Ucircumflex
|
||||||
|
/Udieresis /Yacute /Thorn /germandbls
|
||||||
|
/agrave /aacute /acircumflex /atilde
|
||||||
|
/adieresis /aring /ae /ccedilla
|
||||||
|
/egrave /eacute /ecircumflex /edieresis
|
||||||
|
/igrave /iacute /icircumflex /idieresis
|
||||||
|
/eth /ntilde /ograve /oacute
|
||||||
|
/ocircumflex /otilde /odieresis /divide
|
||||||
|
/oslash /ugrave /uacute /ucircumflex
|
||||||
|
/udieresis /yacute /thorn /ydieresis
|
||||||
|
]def
|
||||||
|
%%EndResource
|
306
ps/prolog.ps
Normal file
306
ps/prolog.ps
Normal file
@ -0,0 +1,306 @@
|
|||||||
|
##set pageNumFont="Futura"
|
||||||
|
##set dirNameFont="Futura-Heavy"
|
||||||
|
##set fontsNeeded="${font} Symbol Futura Futura-Heavy"
|
||||||
|
##set includeFontComments=<<"END"
|
||||||
|
%%IncludeResource: font ${font}
|
||||||
|
%%IncludeResource: font Symbol
|
||||||
|
%%IncludeResource: font Futura
|
||||||
|
%%IncludeResource: font Futura-Heavy
|
||||||
|
END
|
||||||
|
##if ${font} eq Courier
|
||||||
|
##set charShrinkFactor=0.93
|
||||||
|
##set zeroGlyph=Oslash
|
||||||
|
##set underscoreGlyph=underscore
|
||||||
|
##set bulletGlyph=bullet
|
||||||
|
##set tabGlyph=currency
|
||||||
|
##set leftQuoteGlyph=quoteleft
|
||||||
|
##set rightQuoteGlyph=quoteright
|
||||||
|
##set pilcrowGlyph=paragraph
|
||||||
|
##set barGlyph=bar
|
||||||
|
##else
|
||||||
|
##set charShrinkFactor=1
|
||||||
|
##set zeroGlyph=Oslash
|
||||||
|
##set underscoreGlyph=underscore2
|
||||||
|
##set bulletGlyph=bullet2
|
||||||
|
##set tabGlyph=tabsym
|
||||||
|
##set leftQuoteGlyph=grave
|
||||||
|
##set rightQuoteGlyph=quoteright ### was "acute"
|
||||||
|
##set pilcrowGlyph=erase
|
||||||
|
##set barGlyph=orsym
|
||||||
|
##set do_custom_chars=1
|
||||||
|
##endif
|
||||||
|
%!PS-Adobe-3.0
|
||||||
|
%%Orientation: Portrait
|
||||||
|
%%Pages: (atend)
|
||||||
|
%%DocumentNeededResources: font ${fontsNeeded}
|
||||||
|
%%DocumentMedia: Letter 612 792 74 white ()
|
||||||
|
%%EndComments
|
||||||
|
%%BeginDefaults
|
||||||
|
%%PageMedia: Letter
|
||||||
|
%%PageResources: font ${fontsNeeded}
|
||||||
|
%%EndDefaults
|
||||||
|
%%BeginProlog
|
||||||
|
%%BeginResource: procset Custom-Preamble 0 0
|
||||||
|
%
|
||||||
|
% Document definitions
|
||||||
|
% (Upper case to avoid collisions)
|
||||||
|
%
|
||||||
|
|
||||||
|
% 8.5x11 paper is 612x792 points, but 24 points near the edge or so
|
||||||
|
% shouldn't be used.
|
||||||
|
/Topmargin 770 def
|
||||||
|
/Leftmargin 30 def
|
||||||
|
/Rightmargin 612 Leftmargin sub def
|
||||||
|
/Botmargin 22 def
|
||||||
|
/Bindoffset 40 def
|
||||||
|
|
||||||
|
/Lineskip -10 def
|
||||||
|
% How much to shrink characters by?
|
||||||
|
/Factor ${charShrinkFactor} def
|
||||||
|
/Fontsize 9.5 Factor mul def
|
||||||
|
% (1000 units is std height, so Courier at 6/10 aspect ratio is 600.
|
||||||
|
% Widen to make up for scaling loss.
|
||||||
|
/Charwidth
|
||||||
|
Rightmargin Leftmargin sub Bindoffset sub 87 div Fontsize div 1000 mul
|
||||||
|
def
|
||||||
|
|
||||||
|
% Print a header (expects page number on stack)
|
||||||
|
/OddPageStart
|
||||||
|
{ save exch /MyFont findfont Fontsize scalefont setfont
|
||||||
|
/CurrentLeft Leftmargin Bindoffset add def
|
||||||
|
/CurrentRight Rightmargin def
|
||||||
|
CurrentLeft Topmargin moveto } def
|
||||||
|
|
||||||
|
/EvenPageStart
|
||||||
|
{ save exch /MyFont findfont Fontsize scalefont setfont
|
||||||
|
/CurrentLeft Leftmargin def
|
||||||
|
/CurrentRight Rightmargin Bindoffset sub def
|
||||||
|
CurrentLeft Topmargin moveto } def
|
||||||
|
|
||||||
|
% /MyFont findfont [Fontsize 0 0 Fontsize 0 0] makefont setfont
|
||||||
|
|
||||||
|
% Print the name of the directory in a large font
|
||||||
|
/DirPage
|
||||||
|
{
|
||||||
|
/${dirNameFont} findfont 14 scalefont setfont
|
||||||
|
0 -10 rmoveto (Directory) show
|
||||||
|
CurrentLeft 30 add currentpoint exch pop 20 sub moveto show
|
||||||
|
} def
|
||||||
|
|
||||||
|
% Advance a line
|
||||||
|
/L {show CurrentLeft currentpoint exch pop Lineskip add moveto} bind def
|
||||||
|
|
||||||
|
% Print the "inside" footer line using P (string font => )
|
||||||
|
% We do some magic involving redefining P to first measure the
|
||||||
|
% width of this string and then print it, so you must use it
|
||||||
|
% to do all printing.
|
||||||
|
/Foot {
|
||||||
|
##ifdef footerFile
|
||||||
|
##include "${footerFile}"
|
||||||
|
##endif
|
||||||
|
} def
|
||||||
|
|
||||||
|
% /P is defined in the Setup section
|
||||||
|
|
||||||
|
% Print an odd footer
|
||||||
|
/OddPageEnd
|
||||||
|
{ CurrentLeft Botmargin moveto CurrentRight Botmargin lineto
|
||||||
|
1 setlinewidth stroke
|
||||||
|
CurrentLeft Botmargin 10 sub moveto
|
||||||
|
Foot
|
||||||
|
10 string cvs dup stringwidth
|
||||||
|
pop CurrentRight exch sub currentpoint exch pop moveto
|
||||||
|
/${pageNumFont} P
|
||||||
|
showpage
|
||||||
|
restore
|
||||||
|
} def
|
||||||
|
|
||||||
|
% Print an even footer
|
||||||
|
/EvenPageEnd
|
||||||
|
{ CurrentLeft Botmargin moveto CurrentRight Botmargin lineto
|
||||||
|
1 setlinewidth stroke
|
||||||
|
Leftmargin Botmargin 10 sub moveto
|
||||||
|
/${pageNumFont} P
|
||||||
|
CurrentRight FootWidth sub currentpoint exch pop moveto
|
||||||
|
Foot
|
||||||
|
showpage
|
||||||
|
restore
|
||||||
|
} def
|
||||||
|
|
||||||
|
##ifdef do_custom_chars
|
||||||
|
% A 1000-point OCRB discunderline consists of:
|
||||||
|
% 111.45 -173.688 moveto
|
||||||
|
% 609.356 -173.688 lineto
|
||||||
|
% 609.356 -70.9227 lineto
|
||||||
|
% 111.45 -70.9227 lineto
|
||||||
|
% closepath
|
||||||
|
% 720.0 -0.0 moveto
|
||||||
|
% Line thickness is
|
||||||
|
% 102.7653 pts.
|
||||||
|
|
||||||
|
% This would suggest the following values:
|
||||||
|
/underleft 111.45 def
|
||||||
|
/underright 609.356 def
|
||||||
|
/underthick 102.7643 def
|
||||||
|
/underup underthick def
|
||||||
|
/underdown 0 def
|
||||||
|
/underserif 25 def
|
||||||
|
|
||||||
|
% These look better in GhostScript, but not on a real Adobe rasterizer
|
||||||
|
%/underright 600 def
|
||||||
|
%/underleft 100 def
|
||||||
|
%/underthick 75 def
|
||||||
|
|
||||||
|
171
|
||||||
|
211
|
||||||
|
36081
|
||||||
|
% The default bullet character is
|
||||||
|
% 254.0 341.0 moveto
|
||||||
|
% 254.0 170.0 lineto
|
||||||
|
% 465.0 170.0 lineto
|
||||||
|
% 465.0 341.0 lineto
|
||||||
|
% closepath
|
||||||
|
% Our modified version is based on:
|
||||||
|
/bullwid 204 def
|
||||||
|
/bullht 176.75 def
|
||||||
|
/bullleft 254 341 add bullwid sub 2 div def
|
||||||
|
/bullright 254 341 add bullwid add 2 div def
|
||||||
|
/bullbot 254 def
|
||||||
|
/bulltop bullbot bullht add def
|
||||||
|
|
||||||
|
% And a custom-created tab symbol
|
||||||
|
/tableft 250 def
|
||||||
|
/tabright 550 def
|
||||||
|
/tabtop 550 def
|
||||||
|
/tabbot 50 def
|
||||||
|
/tablinewidth 35 def
|
||||||
|
|
||||||
|
% Let's try a vertical bar
|
||||||
|
% OCRB defines (|)
|
||||||
|
% 411.062 -173.688 moveto
|
||||||
|
% 411.062 741.043 lineto
|
||||||
|
% 308.297 741.043 lineto
|
||||||
|
% 308.297 -173.688 lineto
|
||||||
|
% closepath
|
||||||
|
% 720.0 -0.0 moveto
|
||||||
|
/orleft 308.297 def
|
||||||
|
/orright 411.062 def
|
||||||
|
/orbot -173.688 def
|
||||||
|
/ortop 741.043 def
|
||||||
|
/orbreak 150 def % Width of break
|
||||||
|
/orbbot ortop orbot add orbreak sub 2 div def % Bottom of break
|
||||||
|
/orbtop ortop orbot add orbreak add 2 div def % Top of break
|
||||||
|
##endif
|
||||||
|
|
||||||
|
% newfontname encoding-vec fontname -> - make a new encoded font
|
||||||
|
/MF2 {
|
||||||
|
% Make a dict for the new font, with room for the /Metrics
|
||||||
|
findfont dup length 1 add dict begin
|
||||||
|
% Copy everything except the FID entry
|
||||||
|
{1 index /FID eq {pop pop} {def} ifelse} forall
|
||||||
|
% Set the encoding vector
|
||||||
|
/Encoding exch def
|
||||||
|
|
||||||
|
##ifdef do_custom_chars
|
||||||
|
% Create a new expanded CharStrings dictionary
|
||||||
|
CharStrings dup length 5 add dict
|
||||||
|
begin { def } forall
|
||||||
|
% Create a custom underscore character
|
||||||
|
/underscore2 {
|
||||||
|
pop
|
||||||
|
//Charwidth 0 % width, bounding box follows
|
||||||
|
//underleft //underdown neg //underright //underthick //underup add
|
||||||
|
setcachedevice
|
||||||
|
//underleft //underthick //underup add moveto
|
||||||
|
//underleft //underserif add //underthick //underup add lineto
|
||||||
|
//underleft //underserif add //underthick lineto
|
||||||
|
//underright //underserif sub //underthick lineto
|
||||||
|
//underright //underserif sub //underthick //underup add lineto
|
||||||
|
//underright //underthick //underup add lineto
|
||||||
|
//underright //underdown neg lineto
|
||||||
|
//underright //underserif sub //underdown neg lineto
|
||||||
|
//underright //underserif sub 0 lineto
|
||||||
|
//underleft //underserif add 0 lineto
|
||||||
|
//underleft //underserif add //underdown neg lineto
|
||||||
|
//underleft //underdown neg lineto
|
||||||
|
closepath fill
|
||||||
|
} bind def
|
||||||
|
% Create a custom bullet character.
|
||||||
|
/bullet2 {
|
||||||
|
pop
|
||||||
|
//Charwidth 0 % width, bounding box follows
|
||||||
|
//bullleft //bullbot //bullright //bulltop
|
||||||
|
setcachedevice
|
||||||
|
//bullleft //bullbot moveto
|
||||||
|
//bullleft bullright add 2 div bulltop lineto
|
||||||
|
//bullright //bullbot lineto
|
||||||
|
closepath fill
|
||||||
|
} bind def
|
||||||
|
% Create a custom tab character.
|
||||||
|
/tabsym {
|
||||||
|
pop
|
||||||
|
//Charwidth 0 % width, bounding box follows
|
||||||
|
//tableft //tablinewidth sub //tabbot //tablinewidth sub
|
||||||
|
//tabright //tablinewidth add //tabtop //tablinewidth add
|
||||||
|
setcachedevice
|
||||||
|
//tablinewidth setlinewidth
|
||||||
|
true setstrokeadjust
|
||||||
|
0 setlinejoin
|
||||||
|
//tableft //tabbot moveto
|
||||||
|
//tabright //tabtop //tabbot add 2 div lineto
|
||||||
|
//tableft //tabtop lineto
|
||||||
|
closepath stroke
|
||||||
|
} bind def
|
||||||
|
/orsym {
|
||||||
|
pop
|
||||||
|
//Charwidth 0 % width, bounding box follows
|
||||||
|
//orleft //orbot //orright //ortop
|
||||||
|
setcachedevice
|
||||||
|
//orleft //orbot moveto
|
||||||
|
//orleft //orbbot lineto
|
||||||
|
//orright //orbbot lineto
|
||||||
|
//orright //orbot lineto
|
||||||
|
closepath
|
||||||
|
//orleft //ortop moveto
|
||||||
|
//orleft //orbtop lineto
|
||||||
|
//orright //orbtop lineto
|
||||||
|
//orright //ortop lineto
|
||||||
|
closepath fill
|
||||||
|
} bind def
|
||||||
|
/CharStrings currentdict end def
|
||||||
|
##endif
|
||||||
|
|
||||||
|
% Create a new dict to be the /Metrics values
|
||||||
|
CharStrings dup length dict
|
||||||
|
% Now fill in the metrics dict with the desired width
|
||||||
|
begin { pop Charwidth def } forall /Metrics currentdict end def
|
||||||
|
% End of definitions
|
||||||
|
currentdict end
|
||||||
|
% Define the font
|
||||||
|
definefont pop
|
||||||
|
} bind def
|
||||||
|
|
||||||
|
% Check PostScript language level.
|
||||||
|
/gs_languagelevel /languagelevel where { pop languagelevel } { 1 } ifelse def
|
||||||
|
|
||||||
|
%%EndResource
|
||||||
|
##include "charmap.ps"
|
||||||
|
${includeFontComments}
|
||||||
|
%%EndProlog
|
||||||
|
|
||||||
|
|
||||||
|
%%BeginSetup
|
||||||
|
|
||||||
|
/MyFont Latin1-vec /${font} MF2
|
||||||
|
/#copies 1 def
|
||||||
|
|
||||||
|
% Compute the width of the /Foot string, by defining P to
|
||||||
|
% add up the x-width of the characters.
|
||||||
|
/P { findfont 9 scalefont setfont stringwidth pop add } def
|
||||||
|
/FootWidth 0 Foot def
|
||||||
|
% Redefine P to print, as usual
|
||||||
|
/P { findfont 9 scalefont setfont show } def
|
||||||
|
%%BeginResource: procset foo 0 0
|
||||||
|
% This is an example
|
||||||
|
%%EndResource
|
||||||
|
%%EndSetup
|
30
tools/Makefile
Normal file
30
tools/Makefile
Normal file
@ -0,0 +1,30 @@
|
|||||||
|
all: unmunge repair munge
|
||||||
|
|
||||||
|
OPT = -g -O -W -Wall
|
||||||
|
COMMON_OBJS = util.o
|
||||||
|
|
||||||
|
UNMUNGE_OBJS = $(COMMON_OBJS) unmunge.o
|
||||||
|
MUNGE_OBJS = $(COMMON_OBJS) munge.o
|
||||||
|
REPAIR_OBJS = $(COMMON_OBJS) heap.o mempool.o subst.o repair.o
|
||||||
|
|
||||||
|
unmunge: $(UNMUNGE_OBJS)
|
||||||
|
$(CC) $(OPT) -o $@ $(UNMUNGE_OBJS)
|
||||||
|
|
||||||
|
munge: $(MUNGE_OBJS)
|
||||||
|
$(CC) $(OPT) -o $@ $(MUNGE_OBJS)
|
||||||
|
|
||||||
|
repair: $(REPAIR_OBJS)
|
||||||
|
$(CC) $(OPT) -o $@ $(REPAIR_OBJS)
|
||||||
|
|
||||||
|
.c.o:
|
||||||
|
$(CC) $(OPT) -o $@ -c $<
|
||||||
|
|
||||||
|
clean:
|
||||||
|
-rm -f *.o munge unmunge repair core *.core
|
||||||
|
|
||||||
|
unmunge.o: util.h
|
||||||
|
munge.o: util.h
|
||||||
|
repair.o: heap.h mempool.h util.h subst.h
|
||||||
|
heap.o: heap.h
|
||||||
|
mempool.o: mempool.h
|
||||||
|
subst.o: subst.h
|
68
tools/bootstrap
Normal file
68
tools/bootstrap
Normal file
@ -0,0 +1,68 @@
|
|||||||
|
#!/usr/bin/perl -s
|
||||||
|
#
|
||||||
|
# bootstrap -- Simpler version of unmunge for bootstrapping
|
||||||
|
#
|
||||||
|
# Unmunge this file using:
|
||||||
|
# perl -ne 'if (s/^ *[^-\s]\S{4,6} ?//) { s/[\244\245\267]/ /g; print; }'
|
||||||
|
#
|
||||||
|
# $Id: bootstrap,v 1.15 1997/11/14 03:52:53 mhw Exp $
|
||||||
|
|
||||||
|
sub Fatal { print STDERR @_; exit(1); }
|
||||||
|
sub Max { my ($a, $b) = @_; ($a > $b) ? $a : $b; }
|
||||||
|
sub TabSkip { $tabWidth - 1 - (length($_[0]) % $tabWidth); }
|
||||||
|
|
||||||
|
($tab,$yen,$pilc,$cdot,$tmp1,$tmp2)=("\244","\245","\266","\267","\377","\376");
|
||||||
|
$editor = $ENV{'VISUAL'} || $ENV{'EDITOR'} || 'vi';
|
||||||
|
$inFile = $ARGV[0];
|
||||||
|
doFile: {
|
||||||
|
open(IN, "<$inFile") || die;
|
||||||
|
for ($lineNum = 1; ($_ = <IN>); $lineNum++) {
|
||||||
|
s/^\s+//; s/\s+$//; # Strip leading and trailing spaces
|
||||||
|
next if (/^$/); # Ignore blank lines
|
||||||
|
($prefix, $seenCRCStr, $dummy, $_) = /^(\S{2})(\S{4})( (.*))?/;
|
||||||
|
|
||||||
|
# Correct the number of spaces after each tab
|
||||||
|
while (s/$tab( *)/$tmp1 . ($tmp2 x &Max(length($1), &TabSkip($`)))/e) {}
|
||||||
|
s/ ( +)/" " . ($cdot x length($1))/eg; # Correct center dots
|
||||||
|
s/$tmp1/$tab/g; s/$tmp2/ /g; # Restore tabs and spaces from correction
|
||||||
|
s/\s*$/\n/; # Strip trailing spaces, and add a newline
|
||||||
|
|
||||||
|
$crc = $seenCRC = 0; # Calculate CRC
|
||||||
|
for ($data = $_; $data ne ""; $data = substr($data, 1)) {
|
||||||
|
$crc ^= ord($data);
|
||||||
|
for (1..8) {
|
||||||
|
$crc = ($crc >> 1) ^ (($crc & 1) ? 0x8408 : 0);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if ($crc != hex($seenCRCStr)) { # CRC mismatch
|
||||||
|
close(IN); close(OUT);
|
||||||
|
unlink(@filesCreated);
|
||||||
|
@filesCreated = ();
|
||||||
|
@oldStat = stat($inFile);
|
||||||
|
system($editor, "+$lineNum", $inFile);
|
||||||
|
@newStat = stat($inFile);
|
||||||
|
redo doFile if ($oldStat[9] != $newStat[9]); # Check mod date
|
||||||
|
&Fatal("Line $lineNum invalid: $_");
|
||||||
|
}
|
||||||
|
|
||||||
|
if ($prefix eq '--') { # Process header line
|
||||||
|
($code, $pageNum, $file) = /^(\S{19}) Page (\d+) of (.*)/;
|
||||||
|
$tabWidth = hex(substr($code, 11, 1));
|
||||||
|
if ($file ne $lastFile) {
|
||||||
|
print "$file\n";
|
||||||
|
&Fatal("$file: already exists\n") if (!$f && (-e $file));
|
||||||
|
close(OUT);
|
||||||
|
open(OUT, ">$file") || &Fatal("$file: $!\n");
|
||||||
|
push(@filesCreated, ($lastFile = $file));
|
||||||
|
}
|
||||||
|
} else { # Unmunge normal line
|
||||||
|
s/$tab( *)/"\t".(" " x (length($1) - &TabSkip($`)))/eg;
|
||||||
|
s/$yen\n/\f/; # Handle form feeds
|
||||||
|
s/$pilc\n//; # Handle continuation lines
|
||||||
|
s/$cdot/ /g; # Center dots -> spaces
|
||||||
|
|
||||||
|
print OUT;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
close(IN); close(OUT);
|
||||||
|
}
|
72
tools/bootstrap2
Normal file
72
tools/bootstrap2
Normal file
@ -0,0 +1,72 @@
|
|||||||
|
#!/usr/bin/perl -s
|
||||||
|
#
|
||||||
|
# bootstrap2 -- Second stage bootstrapper, a version of unmunge
|
||||||
|
#
|
||||||
|
# $Id: bootstrap2,v 1.4 1997/11/14 03:52:54 mhw Exp $
|
||||||
|
|
||||||
|
sub Cleanup { close(IN); close(OUT); unlink(@files); @files = (); }
|
||||||
|
sub Fatal { &Cleanup(); print STDERR @_; exit(1); }
|
||||||
|
sub TabSkip { $tabWidth - 1 - (length($_[0]) % $tabWidth); }
|
||||||
|
sub TabFix { my ($needed, $actual) = (&TabSkip($_[0]), length($_[1]));
|
||||||
|
$tmp1 . ($tmp2 x $needed) . (" " x ($actual - $needed)); }
|
||||||
|
sub HumanEdit { my ($file, $line, @message) = ($inFile, @_); &Cleanup();
|
||||||
|
@old = stat($file); system($editor, "+$line", $file); @new = stat($file);
|
||||||
|
redo doFile if ($old[9] != $new[9]); # Check mod date
|
||||||
|
&Fatal("Line $line, ", @message); }
|
||||||
|
|
||||||
|
($tab,$yen,$pilc,$cdot,$tmp1,$tmp2)=("\244","\245","\266","\267","\377","\376");
|
||||||
|
$editor = $ENV{'VISUAL'} || $ENV{'EDITOR'} || 'vi';
|
||||||
|
($inFile, $manifest, @rest) = @ARGV;
|
||||||
|
if ($manifest ne "") { # Read manifest file
|
||||||
|
open(MANIFEST, "<$manifest") || &Fatal("$manifest: $!\n");
|
||||||
|
while (<MANIFEST>) { $dir = $1 if /^D\s+(.*)$/;
|
||||||
|
$index[$1] = $dir . $2 if /^(\d+)\s+(.*)$/; }
|
||||||
|
}
|
||||||
|
doFile: {
|
||||||
|
$seenPCRC = $pcrc1 = 0; $lastFlags = 1; $lastFileNum = 0;
|
||||||
|
open(IN, "<$inFile") || &Fatal("$inFile: $!\n");
|
||||||
|
for ($line = 1; ($_ = <IN>); $line++) {
|
||||||
|
s/^\s+//; s/\s+$//; # Strip leading and trailing spaces
|
||||||
|
next if (/^$/); # Ignore blank lines
|
||||||
|
($prefix, $seenCRCStr, $dummy, $_) = /^(\S{2})(\S{4})( (.*))?/;
|
||||||
|
while (s/$tab( *)/&TabFix($`, $1)/eo) {} # Correct spaces after tabs
|
||||||
|
s/($tmp2| )( +)/$1 . ($cdot x length($2))/ego; # Correct center dots
|
||||||
|
s/$tmp1/$tab/go; s/$tmp2/ /go; # Restore tabs/spaces from correction
|
||||||
|
s/\s*$/\n/; # Strip trailing spaces, and add a newline
|
||||||
|
|
||||||
|
$crc = 0; $pcrc = $pcrc1; # Calculate CRCs
|
||||||
|
for ($data = $_; $data ne ""; $data = substr($data, 1)) {
|
||||||
|
$crc ^= ord($data); $pcrc1 ^= ord($data);
|
||||||
|
for (1..8) { $crc = ($crc >> 1) ^ (($crc & 1) ? 0x8408 : 0);
|
||||||
|
$pcrc1 = ($pcrc1 >> 1) ^ (($pcrc1 & 1) ? 0xedb88320 : 0); }
|
||||||
|
}
|
||||||
|
($seenPLCRC, $seenCRC) = map { hex($_) } ($prefix, $seenCRCStr);
|
||||||
|
&HumanEdit($line, "CRC failed: $_") if $crc != $seenCRC;
|
||||||
|
if ($prefix eq '--') { # Process header line
|
||||||
|
&HumanEdit($line - 1, "Page CRC failed") if $pcrc != $seenPCRC;
|
||||||
|
($humanHdr, $pageNum, $file) = /^\S{19} (Page (\d+) of (.*))/;
|
||||||
|
($vers, $flags, $seenPCRC, $tabWidth, $prodNum, $fileNum) =
|
||||||
|
map { hex($_) } /^(\S)(\S\S)(\S{8})(\S)(\S{3})(\S{4})/;
|
||||||
|
if ($fileNum != $lastFileNum) {
|
||||||
|
print STDERR "MISSING files\n" if $fileNum != $lastFileNum + 1;
|
||||||
|
&Fatal("Missing pages\n") if $pageNum != 1 || !($lastFlags & 1);
|
||||||
|
if ($manifest ne "") {
|
||||||
|
($_ = $index[$fileNum]) =~ m%([^/]*)$%;
|
||||||
|
&Fatal("Manifest mismatch\n") if ($file ne $1);
|
||||||
|
($file = $_) =~ s|/+|mkdir($`, 0777), "/"|eg; # mkdir -p
|
||||||
|
}
|
||||||
|
&Fatal("$file: already exists\n") if (!$f && (-e $file));
|
||||||
|
close(OUT); open(OUT, ">$file") || &Fatal("$file: $!\n");
|
||||||
|
push(@files, $file); print "$fileNum $file\n";
|
||||||
|
} else {
|
||||||
|
&Fatal("MISSING pages\n") if ($pageNum != $lastPageNum + 1);
|
||||||
|
}
|
||||||
|
($lastFlags,$lastFileNum,$lastPageNum) = ($flags,$fileNum,$pageNum);
|
||||||
|
$pcrc1 = 0;
|
||||||
|
} else { # Unmunge normal line
|
||||||
|
&HumanEdit($line, "CRC failed: $_") if ($pcrc1 >> 24) != $seenPLCRC;
|
||||||
|
s/$tab( *)/"\t".(" " x (length($1) - &TabSkip($`)))/ego;
|
||||||
|
s/$yen\n/\f/o; s/$pilc\n//o; s/$cdot/ /go; print OUT;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
144
tools/heap.c
Normal file
144
tools/heap.c
Normal file
@ -0,0 +1,144 @@
|
|||||||
|
/*
|
||||||
|
* heap.c -- Simple priority queue. Takes pointers to cost values
|
||||||
|
* (presumably the first field in a larger structure) and returns
|
||||||
|
* them in increasing order of cost.
|
||||||
|
*
|
||||||
|
* Copyright (C) 1997 Pretty Good Privacy, Inc.
|
||||||
|
*
|
||||||
|
* Written by Colin Plumb and Mark H. Weaver
|
||||||
|
*
|
||||||
|
* $Id: heap.c,v 1.2 1997/07/05 02:55:23 colin Exp $
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include <stdio.h> /* For fprintf(stderr, "Out of memory") */
|
||||||
|
#include <stdlib.h> /* For malloc() & co. */
|
||||||
|
|
||||||
|
#include "heap.h"
|
||||||
|
|
||||||
|
#define HeapParent(i) ((i) / 2)
|
||||||
|
#define HeapLeftChild(i) ((i) * 2)
|
||||||
|
#define HeapRightChild(i) ((i) * 2 + 1)
|
||||||
|
#define HeapElem(h, i) (h)->elems[i]
|
||||||
|
#define HeapMinElem(h) HeapElem(h, 1)
|
||||||
|
#define HeapElemCost(e) (*(e))
|
||||||
|
#define HeapCost(h, i) HeapElemCost(HeapElem(h, i))
|
||||||
|
#define HeapSize(h) ((h)->numElems)
|
||||||
|
|
||||||
|
static void
|
||||||
|
SiftDown(Heap const *heap, HeapCost *e)
|
||||||
|
{
|
||||||
|
HeapIndex size = HeapSize(heap), parent = 1, child;
|
||||||
|
HeapCost cparent = HeapElemCost(e), cchild;
|
||||||
|
|
||||||
|
for (;;) {
|
||||||
|
child = 2*parent;
|
||||||
|
if (child > size)
|
||||||
|
break;
|
||||||
|
cchild = HeapCost(heap, child);
|
||||||
|
if (child < size && cchild > HeapCost(heap, child+1)) {
|
||||||
|
cchild = HeapCost(heap, child+1);
|
||||||
|
child++;
|
||||||
|
}
|
||||||
|
if (cparent <= cchild)
|
||||||
|
break; /* Stop sifting down */
|
||||||
|
HeapElem(heap, parent) = HeapElem(heap, child);
|
||||||
|
parent = child;
|
||||||
|
}
|
||||||
|
HeapElem(heap, parent) = e;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Debug tool: verify heap property */
|
||||||
|
void
|
||||||
|
HeapVerify(Heap *heap)
|
||||||
|
{
|
||||||
|
HeapIndex i;
|
||||||
|
|
||||||
|
for (i = 2; i <= HeapSize(heap); i++)
|
||||||
|
if (HeapCost(heap, i) < HeapCost(heap, HeapParent(i)))
|
||||||
|
fprintf(stderr, "DEBUG: VerifyHeap failed at elem %d\n", i);
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Remove and return the minimum cost from the heap. */
|
||||||
|
HeapCost *
|
||||||
|
HeapGetMin(Heap *heap)
|
||||||
|
{
|
||||||
|
HeapIndex lastElem = HeapSize(heap);
|
||||||
|
HeapCost *retval;
|
||||||
|
|
||||||
|
if (!lastElem)
|
||||||
|
return NULL;
|
||||||
|
retval = HeapMinElem(heap);
|
||||||
|
HeapSize(heap) = lastElem-1;
|
||||||
|
SiftDown(heap, HeapElem(heap, lastElem));
|
||||||
|
return retval;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Helper - set heap size, reallocating if needed */
|
||||||
|
static void
|
||||||
|
HeapResize(Heap *heap, HeapIndex newNumElems)
|
||||||
|
{
|
||||||
|
if (newNumElems >= heap->elemsAllocated) {
|
||||||
|
HeapIndex newAllocSize = heap->elemsAllocated * 2;
|
||||||
|
|
||||||
|
if (newAllocSize <= newNumElems)
|
||||||
|
newAllocSize = newNumElems + 1;
|
||||||
|
heap->elems = (HeapCost **)realloc((void *)heap->elems,
|
||||||
|
sizeof(*heap->elems) * newAllocSize);
|
||||||
|
if (heap->elems == NULL) {
|
||||||
|
fprintf(stderr, "Fatal error: Out of memory growing heap\n");
|
||||||
|
exit(1);
|
||||||
|
}
|
||||||
|
heap->elemsAllocated = newAllocSize;
|
||||||
|
}
|
||||||
|
heap->numElems = newNumElems;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Add an element to the heap */
|
||||||
|
void
|
||||||
|
HeapInsert(Heap *heap, HeapCost *newElem)
|
||||||
|
{
|
||||||
|
HeapIndex parent, i = ++HeapSize(heap);
|
||||||
|
HeapCost cost = HeapElemCost(newElem);
|
||||||
|
|
||||||
|
HeapResize(heap, i);
|
||||||
|
/* Sift up until parent = 0 */
|
||||||
|
while ((parent = HeapParent(i)) && HeapCost(heap, parent) > cost) {
|
||||||
|
HeapElem(heap, i) = HeapElem(heap, parent);
|
||||||
|
i = parent;
|
||||||
|
}
|
||||||
|
heap->elems[i] = newElem;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Initialize a new heap */
|
||||||
|
void
|
||||||
|
HeapInit(Heap *heap, HeapIndex initSize)
|
||||||
|
{
|
||||||
|
initSize++; /* Add one for temporary element */
|
||||||
|
if (initSize < 1)
|
||||||
|
initSize = 1;
|
||||||
|
heap->elems = (HeapCost **)malloc(initSize * sizeof(*heap->elems));
|
||||||
|
if (heap->elems == NULL) {
|
||||||
|
fprintf(stderr, "Fatal error: Out of memory creating heap\n");
|
||||||
|
exit(1);
|
||||||
|
}
|
||||||
|
heap->elemsAllocated = initSize;
|
||||||
|
heap->numElems = 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Free up a heap's resources. */
|
||||||
|
void
|
||||||
|
HeapDestroy(Heap *heap)
|
||||||
|
{
|
||||||
|
free((void *)heap->elems);
|
||||||
|
heap->elemsAllocated = 0;
|
||||||
|
heap->numElems = 0;
|
||||||
|
heap->elems = NULL;
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Local Variables:
|
||||||
|
* tab-width: 4
|
||||||
|
* End:
|
||||||
|
* vi: ts=4 sw=4
|
||||||
|
* vim: si
|
||||||
|
*/
|
43
tools/heap.h
Normal file
43
tools/heap.h
Normal file
@ -0,0 +1,43 @@
|
|||||||
|
/*
|
||||||
|
* heap.h -- Simple priority queue. Takes pointers to cost values
|
||||||
|
* (presumably the first field in a larger structure) and returns
|
||||||
|
* them in increasing order of cost.
|
||||||
|
*
|
||||||
|
* Copyright (C) 1997 Pretty Good Privacy, Inc.
|
||||||
|
*
|
||||||
|
* Written by Colin Plumb and Mark H. Weaver
|
||||||
|
*
|
||||||
|
* $Id: heap.h,v 1.6 1997/10/31 04:22:46 mhw Exp $
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifndef HEAP_H
|
||||||
|
#define HEAP_H 1
|
||||||
|
|
||||||
|
#include <stdio.h>
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include <limits.h>
|
||||||
|
|
||||||
|
typedef int HeapCost;
|
||||||
|
#define COST_INFINITY INT_MAX
|
||||||
|
typedef unsigned HeapIndex;
|
||||||
|
|
||||||
|
typedef struct Heap {
|
||||||
|
HeapCost **elems;
|
||||||
|
HeapIndex numElems, elemsAllocated;
|
||||||
|
} Heap;
|
||||||
|
|
||||||
|
void HeapInit(Heap *heap, HeapIndex initSize);
|
||||||
|
void HeapDestroy(Heap *heap);
|
||||||
|
void HeapInsert(Heap *heap, HeapCost *newElem);
|
||||||
|
HeapCost *HeapGetMin(Heap *heap);
|
||||||
|
void HeapVerify(Heap *heap);
|
||||||
|
|
||||||
|
#endif
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Local Variables:
|
||||||
|
* tab-width: 4
|
||||||
|
* End:
|
||||||
|
* vi: ts=4 sw=4
|
||||||
|
* vim: si
|
||||||
|
*/
|
31
tools/makemanifest
Normal file
31
tools/makemanifest
Normal file
@ -0,0 +1,31 @@
|
|||||||
|
#!/usr/bin/perl
|
||||||
|
|
||||||
|
$fileNum = 0;
|
||||||
|
while(<>)
|
||||||
|
{
|
||||||
|
/^([VDTB])(\S*)\s+(.*)/ || die("Bad filelist, line $.");
|
||||||
|
($type, $options, $name) = ($1, $2, $3);
|
||||||
|
|
||||||
|
if ($type eq "D")
|
||||||
|
{
|
||||||
|
$dir = $name;
|
||||||
|
print "D $dir\n";
|
||||||
|
}
|
||||||
|
elsif ($type eq "V")
|
||||||
|
{
|
||||||
|
# Do nothing
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
$fileNum++;
|
||||||
|
$tail = $name;
|
||||||
|
$tail =~ s|^.*/||;
|
||||||
|
die("Bad filelist, line $.") if $name ne $dir . $tail;
|
||||||
|
print "$fileNum $tail\n";
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#
|
||||||
|
# vi: ai ts=4
|
||||||
|
# vim: si
|
||||||
|
#
|
137
tools/mempool.c
Normal file
137
tools/mempool.c
Normal file
@ -0,0 +1,137 @@
|
|||||||
|
/*
|
||||||
|
* mempool.c - Pooled memory allocation, similar to GNU obstacks.
|
||||||
|
*
|
||||||
|
* $Id: mempool.c,v 1.5 1997/11/13 23:53:08 colin Exp $
|
||||||
|
*/
|
||||||
|
#include <assert.h>
|
||||||
|
#include <stdio.h>
|
||||||
|
#include <string.h>
|
||||||
|
#include <stdlib.h> /* For malloc() & free() */
|
||||||
|
|
||||||
|
#include "mempool.h"
|
||||||
|
|
||||||
|
/*
|
||||||
|
* The memory pool allocation functions
|
||||||
|
*
|
||||||
|
* These are based on a linked list of memory blocks, usually of uniform
|
||||||
|
* size. New memory is allocated from the tail of the current block,
|
||||||
|
* until that is inadequate, then a new block is allocated.
|
||||||
|
* The entire pool can be freed at once by calling memPoolFree().
|
||||||
|
*/
|
||||||
|
struct PoolBuf {
|
||||||
|
struct PoolBuf *next;
|
||||||
|
unsigned size;
|
||||||
|
/* Data follows */
|
||||||
|
};
|
||||||
|
|
||||||
|
/* The prototype empty pool, including the default allocation size. */
|
||||||
|
static struct MemPool EmptyPool = { 0, 0, 0, 4096, 0 , 0, 0};
|
||||||
|
|
||||||
|
/* Initialize the pool for first use */
|
||||||
|
void
|
||||||
|
memPoolInit(struct MemPool *pool)
|
||||||
|
{
|
||||||
|
*pool = EmptyPool;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Set the pool's purge function */
|
||||||
|
void
|
||||||
|
memPoolSetPurge(struct MemPool *pool, int (*purge)(void *), void *arg)
|
||||||
|
{
|
||||||
|
pool->purge = purge;
|
||||||
|
pool->purgearg = arg;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Free all the memory in the pool */
|
||||||
|
void
|
||||||
|
memPoolEmpty(struct MemPool *pool)
|
||||||
|
{
|
||||||
|
struct PoolBuf *buf;
|
||||||
|
|
||||||
|
while ((buf = pool->head) != 0) {
|
||||||
|
pool->head = buf->next;
|
||||||
|
free(buf);
|
||||||
|
}
|
||||||
|
pool->freespace = 0;
|
||||||
|
pool->totalsize = 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Restore a pool to a marked position, freeing subsequently allocated
|
||||||
|
* memory.
|
||||||
|
*/
|
||||||
|
void
|
||||||
|
memPoolCutBack(struct MemPool *pool, struct MemPool const *cutback)
|
||||||
|
{
|
||||||
|
struct PoolBuf *buf;
|
||||||
|
|
||||||
|
assert(pool);
|
||||||
|
assert(cutback);
|
||||||
|
assert(pool->totalsize >= cutback->totalsize);
|
||||||
|
|
||||||
|
while((buf = pool->head) != cutback->head) {
|
||||||
|
pool->head = buf->next;
|
||||||
|
free(buf);
|
||||||
|
}
|
||||||
|
*pool = *cutback;
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Allocate a chunk of memory for a structure. Alignment is assumed to be
|
||||||
|
* a power of 2. It could be generalized, if that ever becomes relevant.
|
||||||
|
* Note that alignment is from the beginning of an allocated chunk, which
|
||||||
|
* is guaranteed by ANSI to be as aligned as can possibly matter.
|
||||||
|
*/
|
||||||
|
void *
|
||||||
|
memPoolAlloc(struct MemPool *pool, unsigned len, unsigned alignment)
|
||||||
|
{
|
||||||
|
char *p;
|
||||||
|
unsigned t;
|
||||||
|
|
||||||
|
/* Where to allocate next object */
|
||||||
|
p = pool->freeptr;
|
||||||
|
/* How far it is from the beginning of the chunk. */
|
||||||
|
t = p - (char *)pool->head;
|
||||||
|
/* How much to round up freeptr to make alignment */
|
||||||
|
t = -t & --alignment;
|
||||||
|
|
||||||
|
/* Okay, does it fit? */
|
||||||
|
if (pool->freespace >= len+t) {
|
||||||
|
pool->freespace -= len+t;
|
||||||
|
p += t;
|
||||||
|
pool->freeptr = p + len;
|
||||||
|
return p;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* It does not fit in the current chunk. Go for a bigger chunk. */
|
||||||
|
|
||||||
|
/* First, figure out how much to skip at the beginning of the chunk */
|
||||||
|
alignment &= -(unsigned)sizeof(struct PoolBuf);
|
||||||
|
alignment += sizeof(struct PoolBuf);
|
||||||
|
/* Then, figure out a chunk size that will fit */
|
||||||
|
t = pool->chunksize;
|
||||||
|
assert(t);
|
||||||
|
while (len + alignment > t)
|
||||||
|
t *= 2;
|
||||||
|
while ((p = malloc(t)) == 0) {
|
||||||
|
/* If that didn't work, try purging or smaller allocations */
|
||||||
|
if (!pool->purge || !pool->purge(pool->purgearg)) {
|
||||||
|
t /= 2;
|
||||||
|
if (len + alignment > t)
|
||||||
|
fputs("Out of memory!\n", stderr);
|
||||||
|
exit (1); /* Failed */
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Update the various pointers. */
|
||||||
|
pool->totalsize += t;
|
||||||
|
((struct PoolBuf *)p)->next = pool->head;
|
||||||
|
((struct PoolBuf *)p)->size = t;
|
||||||
|
pool->head = (struct PoolBuf *)p;
|
||||||
|
pool->freespace = t - len - alignment;
|
||||||
|
p += alignment;
|
||||||
|
pool->freeptr = p + len;
|
||||||
|
|
||||||
|
return p;
|
||||||
|
}
|
36
tools/mempool.h
Normal file
36
tools/mempool.h
Normal file
@ -0,0 +1,36 @@
|
|||||||
|
/* $Id: mempool.h,v 1.2 1997/11/13 23:53:09 colin Exp $ */
|
||||||
|
|
||||||
|
#ifndef MEMPOOL_H
|
||||||
|
#define MEMPOOL_H
|
||||||
|
|
||||||
|
typedef struct MemPool {
|
||||||
|
struct PoolBuf *head;
|
||||||
|
char *freeptr;
|
||||||
|
unsigned freespace;
|
||||||
|
unsigned chunksize; /* Default starting point */
|
||||||
|
unsigned long totalsize;
|
||||||
|
int (*purge)(void *); /* Return non-zero to retry alloc */
|
||||||
|
void *purgearg;
|
||||||
|
} MemPool;
|
||||||
|
|
||||||
|
/* A global pool for miscellaneous stuff. */
|
||||||
|
extern struct MemPool MiscPool;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Nice clean interfaces
|
||||||
|
*/
|
||||||
|
void memPoolInit(struct MemPool *pool);
|
||||||
|
void memPoolSetPurge(struct MemPool *pool, int (*purge)(void *), void *arg);
|
||||||
|
void memPoolEmpty(struct MemPool *pool);
|
||||||
|
void memPoolCutBack(struct MemPool *dest, struct MemPool const *cutback);
|
||||||
|
void *memPoolAlloc(struct MemPool *pool, unsigned len, unsigned alignment);
|
||||||
|
#ifdef DEADCODE
|
||||||
|
char const *memPoolStore(struct MemPool *pool, char const *str);
|
||||||
|
#endif
|
||||||
|
|
||||||
|
/* Lookie here! An ASNI-compliant alignment finder! */
|
||||||
|
#define alignof(type) (sizeof(struct{type _x; char _y;}) - sizeof(type))
|
||||||
|
|
||||||
|
#define memPoolNew(pool, type) memPoolAlloc(pool, sizeof(type), alignof(type))
|
||||||
|
|
||||||
|
#endif /* MEMPOOL_H */
|
543
tools/munge.c
Normal file
543
tools/munge.c
Normal file
@ -0,0 +1,543 @@
|
|||||||
|
/*
|
||||||
|
* munge.c -- Program to convert a text file into "munged" form,
|
||||||
|
* suitable for reconstruction from printed form. Tabs are
|
||||||
|
* made visible and checksums are added to each line and each
|
||||||
|
* page to protect against transcription errors.
|
||||||
|
*
|
||||||
|
* Copyright (C) 1997 Pretty Good Privacy, Inc.
|
||||||
|
*
|
||||||
|
* Designed by Colin Plumb, Mark H. Weaver, and Philip R. Zimmermann
|
||||||
|
* Written by Mark H. Weaver
|
||||||
|
*
|
||||||
|
* $Id: munge.c,v 1.32 1997/11/12 23:28:53 mhw Exp $
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include <stdio.h>
|
||||||
|
#include <errno.h>
|
||||||
|
#include <string.h>
|
||||||
|
#include <ctype.h>
|
||||||
|
#include <stdlib.h>
|
||||||
|
|
||||||
|
#include "util.h"
|
||||||
|
|
||||||
|
/*
|
||||||
|
* The file is divided into pages, and the format of each page is
|
||||||
|
*
|
||||||
|
--f414 000b2dc79af40010002 Page 1 of munge.c
|
||||||
|
|
||||||
|
bc38e5 /*
|
||||||
|
40a838 * munge.c -- Program to convert a text file into munged form
|
||||||
|
647222 *
|
||||||
|
193f28 * Copyright (C) 1997 Pretty Good Privacy, Inc.
|
||||||
|
827222 *
|
||||||
|
699025 * Designed by Colin Plumb, Mark H. Weaver, and Philip R. Zimmermann
|
||||||
|
0d050c * Written by Mark H. Weaver
|
||||||
|
*
|
||||||
|
* Where the first 2 columns are the high 8 bits (in hex) of a running
|
||||||
|
* CRC-32 of the page (the string "--", unlikely to be confused with
|
||||||
|
* any digits, indicates a page header line) and the next 4 columns
|
||||||
|
* are a CRC-16 of the rest of the line. Then a space (not counted in
|
||||||
|
* the CRC), and the line of text. Tabs are printed as the currency
|
||||||
|
* symbol (ISO Latin 1 character 164) followed by the appropriate number
|
||||||
|
* of spaces, and any form feeds are printed as a yen symbol (Latin 1 165).
|
||||||
|
* The CRC is computed on the transformed line, including the trailing
|
||||||
|
* newline. No trailing whitespace is permitted.
|
||||||
|
*
|
||||||
|
* The header line contains a (hex) number of the form 0ffcccccccctpppnnnn,
|
||||||
|
* where the digit 0 is a version number, ff are flags, ccccccc is the CRC-32
|
||||||
|
* of the page, t is the tab size (usually 4 or 8; 0 for binary files that
|
||||||
|
* are sent in radix-64), ppp is the product number (usually 1, different
|
||||||
|
* for different books), and nnnn is the file number (sequential from 1).
|
||||||
|
*
|
||||||
|
* This is followed by " Page %u of " and the file name.
|
||||||
|
*/
|
||||||
|
|
||||||
|
typedef struct MungeState
|
||||||
|
{
|
||||||
|
EncodeFormat const * fmt;
|
||||||
|
EncodeFormat const * hFmt;
|
||||||
|
int binaryMode, tabWidth;
|
||||||
|
long origLineNumber;
|
||||||
|
long productNumber, fileNumber, pageNumber, lineNumber;
|
||||||
|
unsigned long fileOffset;
|
||||||
|
CRC pageCRC;
|
||||||
|
char const * fileName;
|
||||||
|
char const * fileNameTail;
|
||||||
|
char * pageBuffer; /* Buffer large enough to hold one page */
|
||||||
|
char * pagePos; /* Current position in pageBuffer */
|
||||||
|
word16 hdrFlags;
|
||||||
|
FILE * file;
|
||||||
|
FILE * out;
|
||||||
|
} MungeState;
|
||||||
|
|
||||||
|
|
||||||
|
void ChecksumLine(EncodeFormat const *fmt, char const *line, size_t length,
|
||||||
|
char *prefix, CRC *pageCRC)
|
||||||
|
{
|
||||||
|
CRC lineCRC;
|
||||||
|
CRC runCRCPart = 0;
|
||||||
|
|
||||||
|
lineCRC = CalculateCRC(fmt->lineCRC, 0, (byte const *)line, length);
|
||||||
|
if (pageCRC != NULL)
|
||||||
|
{
|
||||||
|
*pageCRC = CalculateCRC(fmt->pageCRC, *pageCRC,
|
||||||
|
(byte const *)line, length);
|
||||||
|
runCRCPart = RunningCRCFromPageCRC(fmt, *pageCRC);
|
||||||
|
}
|
||||||
|
|
||||||
|
prefix += EncodeCheckDigits(fmt, runCRCPart, fmt->runningCRCBits, prefix);
|
||||||
|
prefix += EncodeCheckDigits(fmt, lineCRC, fmt->lineCRC->bits, prefix);
|
||||||
|
|
||||||
|
*prefix++ = ' '; /* Write a space over the null byte */
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Returns 1 for convenience */
|
||||||
|
int PrintFileError(MungeState *state, char const *message)
|
||||||
|
{
|
||||||
|
fprintf(stderr, "%s in %s %s %lu\n", message, state->fileName,
|
||||||
|
state->binaryMode ? "offset" : "line",
|
||||||
|
state->binaryMode ? state->fileOffset : state->origLineNumber);
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
int MungeLine(MungeState *state, char *buffer, int length,
|
||||||
|
char *line, int *bufferUsed)
|
||||||
|
{
|
||||||
|
int i = 0, j = 0, jOld = 0;
|
||||||
|
char ch;
|
||||||
|
|
||||||
|
for (i = 0; i < length && j < LINE_LENGTH; i++)
|
||||||
|
{
|
||||||
|
jOld = j;
|
||||||
|
ch = buffer[i];
|
||||||
|
if (ch == '\t')
|
||||||
|
{
|
||||||
|
line[j++] = TAB_CHAR;
|
||||||
|
if (state->tabWidth < 1)
|
||||||
|
return PrintFileError(state,
|
||||||
|
"ERROR: Tab found in radix64 stream");
|
||||||
|
else
|
||||||
|
while (j % state->tabWidth && j < LINE_LENGTH)
|
||||||
|
line[j++] = TAB_PAD_CHAR;
|
||||||
|
}
|
||||||
|
else if (ch == '\n')
|
||||||
|
{
|
||||||
|
if (i + 1 < length)
|
||||||
|
return PrintFileError(state,
|
||||||
|
"UNEXPECTED ERROR: fgets read past newline!?");
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
else if (ch == '\f')
|
||||||
|
{
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
else if (ch == ' ' && (j <= 0 || line[j-1] == ' ' ||
|
||||||
|
line[j-1] == SPACE_CHAR ||
|
||||||
|
i+1 >= length || buffer[i+1] == '\n'))
|
||||||
|
{
|
||||||
|
line[j++] = SPACE_CHAR;
|
||||||
|
}
|
||||||
|
else if (ch >= ' ' && ch <= '~')
|
||||||
|
line[j++] = ch;
|
||||||
|
else
|
||||||
|
return PrintFileError(state, "ERROR: Non-ASCII char");
|
||||||
|
}
|
||||||
|
|
||||||
|
if (i < length && buffer[i] == '\n')
|
||||||
|
{
|
||||||
|
i++;
|
||||||
|
state->origLineNumber++;
|
||||||
|
}
|
||||||
|
else if (i < length && buffer[i] == '\f' && j < LINE_LENGTH)
|
||||||
|
{
|
||||||
|
i++;
|
||||||
|
line[j++] = FORMFEED_CHAR;
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
/* If there's no newline, we need to add the continuation marker */
|
||||||
|
if (i > 0 && j >= LINE_LENGTH)
|
||||||
|
{
|
||||||
|
/* Remove the last character if we're out of room */
|
||||||
|
i--;
|
||||||
|
j = jOld;
|
||||||
|
}
|
||||||
|
line[j++] = CONTIN_CHAR;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Strip trailing spaces */
|
||||||
|
while (j > 0 && isspace((unsigned char)line[j - 1]))
|
||||||
|
j--;
|
||||||
|
|
||||||
|
if (j > LINE_LENGTH) /* This should never happen */
|
||||||
|
return PrintFileError(state, "ERROR: Internal error, line too long");
|
||||||
|
|
||||||
|
/* Add trailing newline and NULL */
|
||||||
|
line[j++] = '\n';
|
||||||
|
line[j++] = '\0';
|
||||||
|
|
||||||
|
/* Return number of chars used from buffer */
|
||||||
|
*bufferUsed = i;
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
static void
|
||||||
|
Encode3(byte const src[3], char dest[4])
|
||||||
|
{
|
||||||
|
dest[0] = radix64Digits[ (src[0]>>2 & 0x3f)];
|
||||||
|
dest[1] = radix64Digits[(src[0]<<4 & 0x30) | (src[1]>>4 & 0x0f)];
|
||||||
|
dest[2] = radix64Digits[(src[1]<<2 & 0x3c) | (src[2]>>6 & 0x03)];
|
||||||
|
dest[3] = radix64Digits[(src[2] & 0x3f)];
|
||||||
|
}
|
||||||
|
|
||||||
|
static int
|
||||||
|
EncodeLine(byte const *src, int srcLen, char *dest)
|
||||||
|
{
|
||||||
|
char * destp = dest;
|
||||||
|
byte tempSrc[3];
|
||||||
|
|
||||||
|
for (; srcLen >= 3; srcLen -= 3)
|
||||||
|
{
|
||||||
|
Encode3(src, destp);
|
||||||
|
src += 3; destp += 4;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (srcLen > 0)
|
||||||
|
{
|
||||||
|
memset(tempSrc, 0, sizeof(tempSrc));
|
||||||
|
memcpy(tempSrc, src, srcLen);
|
||||||
|
Encode3(src, destp);
|
||||||
|
src += 3; destp += 4; srcLen -= 3;
|
||||||
|
while (srcLen < 0)
|
||||||
|
destp[srcLen++] = RADIX64_END_CHAR;
|
||||||
|
}
|
||||||
|
|
||||||
|
return destp - dest;
|
||||||
|
}
|
||||||
|
|
||||||
|
static int
|
||||||
|
MungeBinaryLine(MungeState *state, byte const *buffer, int length, char *line)
|
||||||
|
{
|
||||||
|
char binLine[128];
|
||||||
|
int binLength; /* Destination length */
|
||||||
|
int used;
|
||||||
|
|
||||||
|
binLength = EncodeLine(buffer, length, binLine);
|
||||||
|
|
||||||
|
/* Append newline */
|
||||||
|
binLine[binLength++] = '\n';
|
||||||
|
binLine[binLength] = '\0';
|
||||||
|
|
||||||
|
return MungeLine(state, binLine, binLength, line, &used);
|
||||||
|
}
|
||||||
|
|
||||||
|
int MaybePageBreak(MungeState *state)
|
||||||
|
{
|
||||||
|
EncodeFormat const * fmt = state->fmt;
|
||||||
|
EncodeFormat const * hFmt = state->hFmt;
|
||||||
|
|
||||||
|
if (state->lineNumber >= LINES_PER_PAGE)
|
||||||
|
{
|
||||||
|
char line[512];
|
||||||
|
char * lineData = line + PREFIX_LENGTH;
|
||||||
|
char * p = lineData;
|
||||||
|
|
||||||
|
p += EncodeCheckDigits(hFmt, 0, HDR_VERSION_BITS, p);
|
||||||
|
p += EncodeCheckDigits(hFmt, state->hdrFlags, HDR_FLAG_BITS, p);
|
||||||
|
p += EncodeCheckDigits(hFmt, state->pageCRC, fmt->pageCRC->bits, p);
|
||||||
|
p += EncodeCheckDigits(hFmt, state->tabWidth, HDR_TABWIDTH_BITS, p);
|
||||||
|
p += EncodeCheckDigits(hFmt, state->productNumber, HDR_PRODNUM_BITS, p);
|
||||||
|
p += EncodeCheckDigits(hFmt, state->fileNumber, HDR_FILENUM_BITS, p);
|
||||||
|
|
||||||
|
sprintf(p, " Page %ld of %s\n", state->pageNumber + 1,
|
||||||
|
state->fileNameTail);
|
||||||
|
|
||||||
|
if (strlen(lineData) > LINE_LENGTH + 1)
|
||||||
|
{
|
||||||
|
PrintFileError(state, "ERROR: Header line too long");
|
||||||
|
fprintf(stderr, "> %s", lineData);
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Compute checksums and prefix them to line */
|
||||||
|
ChecksumLine(fmt, lineData, strlen(lineData), line, NULL);
|
||||||
|
|
||||||
|
fprintf(state->out, "%c%c%s\n%s\f", HDR_PREFIX_CHAR,
|
||||||
|
fmt->headerTypeChar, line + 2, state->pageBuffer);
|
||||||
|
|
||||||
|
state->pageNumber++;
|
||||||
|
state->lineNumber = 0;
|
||||||
|
state->pageCRC = 0;
|
||||||
|
state->pagePos = state->pageBuffer; /* Clear page buffer */
|
||||||
|
}
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Search for Emacs "tab-width: " maker in file.
|
||||||
|
* Emacs is stricter about the format, but this will do.
|
||||||
|
*/
|
||||||
|
int FindTabWidth(MungeState *state)
|
||||||
|
{
|
||||||
|
char const * const tabWidthMarker = " tab-width: ";
|
||||||
|
char buffer[512];
|
||||||
|
char * p;
|
||||||
|
int length;
|
||||||
|
int tabWidth = 0;
|
||||||
|
|
||||||
|
fseek(state->file, -(sizeof(buffer) - 1), SEEK_END);
|
||||||
|
length = fread(buffer, 1, sizeof(buffer) - 1, state->file);
|
||||||
|
buffer[length] = '\0';
|
||||||
|
p = strstr(buffer, tabWidthMarker);
|
||||||
|
if (p != NULL)
|
||||||
|
{
|
||||||
|
p += strlen(tabWidthMarker);
|
||||||
|
while (*p != '\0' && *p != '\n' && isspace(*p))
|
||||||
|
p++;
|
||||||
|
tabWidth = strtol(p, &p, 10);
|
||||||
|
while (*p != '\0' && *p != '\n' && isspace(*p))
|
||||||
|
p++;
|
||||||
|
if (*p != '\n' || tabWidth < 2)
|
||||||
|
tabWidth = 0;
|
||||||
|
else if (tabWidth > 16)
|
||||||
|
fprintf(stderr, "WARNING: Weird tab-width (%d), %s\n",
|
||||||
|
tabWidth, state->fileName);
|
||||||
|
}
|
||||||
|
return tabWidth;
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Open the given source file and send the munged output to the
|
||||||
|
* FILE *, with the given options.
|
||||||
|
*/
|
||||||
|
int MungeFile(char const *fileName, FILE *out, EncodeFormat const *fmt,
|
||||||
|
int binaryMode, int defaultTabWidth,
|
||||||
|
long productNumber, long fileNumber)
|
||||||
|
{
|
||||||
|
MungeState * state;
|
||||||
|
int length, used;
|
||||||
|
char line[PREFIX_LENGTH + LINE_LENGTH + 10];
|
||||||
|
char * lineData = line + PREFIX_LENGTH;
|
||||||
|
char buffer[128];
|
||||||
|
int result = 0;
|
||||||
|
|
||||||
|
state = (MungeState *)calloc(1, sizeof(*state));
|
||||||
|
state->fmt = fmt;
|
||||||
|
state->hFmt = &hexFormat;
|
||||||
|
state->origLineNumber = 1;
|
||||||
|
state->fileName = fileName;
|
||||||
|
state->pageCRC = 0;
|
||||||
|
state->productNumber = productNumber;
|
||||||
|
state->fileNumber = fileNumber;
|
||||||
|
state->pageNumber = 0;
|
||||||
|
state->lineNumber = 0;
|
||||||
|
state->fileOffset = 0;
|
||||||
|
state->binaryMode = binaryMode;
|
||||||
|
state->pageBuffer = malloc(PAGE_BUFFER_SIZE);
|
||||||
|
state->pageBuffer[0] = '\0';
|
||||||
|
state->pagePos = state->pageBuffer;
|
||||||
|
state->hdrFlags = 0;
|
||||||
|
state->out = out;
|
||||||
|
|
||||||
|
state->fileNameTail = strrchr(state->fileName, '/');
|
||||||
|
if (state->fileNameTail == NULL)
|
||||||
|
state->fileNameTail = state->fileName;
|
||||||
|
else
|
||||||
|
state->fileNameTail++;
|
||||||
|
|
||||||
|
state->file = fopen(state->fileName, binaryMode ? "rb" : "r");
|
||||||
|
if (state->file == NULL)
|
||||||
|
{
|
||||||
|
result = errno;
|
||||||
|
fprintf(stderr, "ERROR opening %s: %s\n",
|
||||||
|
state->fileName, strerror(result));
|
||||||
|
goto error;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (state->binaryMode)
|
||||||
|
{
|
||||||
|
state->tabWidth = 0;
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
state->tabWidth = FindTabWidth(state);
|
||||||
|
if (state->tabWidth == 0)
|
||||||
|
state->tabWidth = defaultTabWidth;
|
||||||
|
rewind(state->file);
|
||||||
|
}
|
||||||
|
|
||||||
|
while (!feof(state->file))
|
||||||
|
{
|
||||||
|
if (state->binaryMode)
|
||||||
|
{
|
||||||
|
length = fread(buffer, 1, BYTES_PER_LINE, state->file);
|
||||||
|
if (length < 1)
|
||||||
|
{
|
||||||
|
if (feof(state->file))
|
||||||
|
break;
|
||||||
|
goto fileError;
|
||||||
|
}
|
||||||
|
if ((result = MaybePageBreak(state)))
|
||||||
|
goto error;
|
||||||
|
if ((result = MungeBinaryLine(state, buffer, length, lineData)))
|
||||||
|
goto error;
|
||||||
|
state->fileOffset += length;
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
if (fgets(buffer, sizeof(buffer), state->file) == NULL)
|
||||||
|
{
|
||||||
|
if (feof(state->file))
|
||||||
|
break;
|
||||||
|
goto fileError;
|
||||||
|
}
|
||||||
|
length = strlen(buffer);
|
||||||
|
if ((result = MaybePageBreak(state)))
|
||||||
|
goto error;
|
||||||
|
if ((result = MungeLine(state, buffer, length, lineData, &used)))
|
||||||
|
goto error;
|
||||||
|
|
||||||
|
if (used < length)
|
||||||
|
if (fseek(state->file, used - length, SEEK_CUR))
|
||||||
|
goto fileError;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Compute checksums and prefix them to the line */
|
||||||
|
ChecksumLine(fmt, lineData, strlen(lineData), line, &state->pageCRC);
|
||||||
|
|
||||||
|
strcpy(state->pagePos, line);
|
||||||
|
length = strlen(state->pagePos);
|
||||||
|
/* Suppress trailing whitespace on blank lines */
|
||||||
|
if (length == PREFIX_LENGTH+1 && state->pagePos[length-1] == '\n') {
|
||||||
|
state->pagePos[--length-1] = '\n';
|
||||||
|
state->pagePos[length] = '\0';
|
||||||
|
}
|
||||||
|
state->pagePos += length;
|
||||||
|
|
||||||
|
state->lineNumber++;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (state->lineNumber > 0)
|
||||||
|
{
|
||||||
|
/* Force a final page break */
|
||||||
|
state->lineNumber = LINES_PER_PAGE;
|
||||||
|
state->hdrFlags |= HDR_FLAG_LASTPAGE;
|
||||||
|
if ((result = MaybePageBreak(state)))
|
||||||
|
goto error;
|
||||||
|
}
|
||||||
|
|
||||||
|
result = 0;
|
||||||
|
goto done;
|
||||||
|
|
||||||
|
fileError:
|
||||||
|
result = ferror(state->file);
|
||||||
|
|
||||||
|
error:
|
||||||
|
done:
|
||||||
|
if (state != NULL)
|
||||||
|
{
|
||||||
|
if (state->file != NULL)
|
||||||
|
fclose(state->file);
|
||||||
|
free(state);
|
||||||
|
}
|
||||||
|
return result;
|
||||||
|
}
|
||||||
|
|
||||||
|
int main(int argc, char *argv[])
|
||||||
|
{
|
||||||
|
int result = 0;
|
||||||
|
int i, j;
|
||||||
|
int defaultTabWidth = 4;
|
||||||
|
int binaryMode = 0;
|
||||||
|
long productNumber = 1;
|
||||||
|
long fileNumber = 1;
|
||||||
|
char * endOfNumber;
|
||||||
|
EncodeFormat const * fmt = NULL;
|
||||||
|
|
||||||
|
InitUtil();
|
||||||
|
|
||||||
|
for (i = 1; i < argc && argv[i][0] == '-'; i++)
|
||||||
|
{
|
||||||
|
if (0 == strcmp(argv[i], "--"))
|
||||||
|
{
|
||||||
|
i++;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
for (j = 1; argv[i][j] != '\0'; j++)
|
||||||
|
{
|
||||||
|
if (isdigit(argv[i][j]))
|
||||||
|
{
|
||||||
|
defaultTabWidth = argv[i][j] - '0';
|
||||||
|
if (defaultTabWidth < 2 || defaultTabWidth > 9)
|
||||||
|
fprintf(stderr, "WARNING: Weird default tab-width (%d)\n",
|
||||||
|
defaultTabWidth);
|
||||||
|
}
|
||||||
|
else if (argv[i][j] == 'b')
|
||||||
|
{
|
||||||
|
binaryMode = 1;
|
||||||
|
}
|
||||||
|
else if (argv[i][j] == 'F')
|
||||||
|
{
|
||||||
|
fmt = FindFormat(argv[i][j+1]);
|
||||||
|
if (!fmt || argv[i][j+2] != '\0')
|
||||||
|
{
|
||||||
|
fprintf(stderr, "ERROR: Invalid format char\n");
|
||||||
|
exit(1);
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
else if (argv[i][j] == 'p')
|
||||||
|
{
|
||||||
|
productNumber = strtol(&argv[i][j+1], &endOfNumber, 10);
|
||||||
|
if (*endOfNumber != '\0')
|
||||||
|
{
|
||||||
|
fprintf(stderr, "ERROR: Invalid product number\n");
|
||||||
|
exit(1);
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
else if (argv[i][j] == 'f')
|
||||||
|
{
|
||||||
|
fileNumber = strtol(&argv[i][j+1], &endOfNumber, 10);
|
||||||
|
if (*endOfNumber != '\0')
|
||||||
|
{
|
||||||
|
fprintf(stderr, "ERROR: Invalid file number\n");
|
||||||
|
exit(1);
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
fprintf(stderr, "ERROR: Unrecognized option -%c\n", argv[i][j]);
|
||||||
|
exit(1);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (!fmt)
|
||||||
|
fmt = binaryMode ? &radix64Format : &hexFormat;
|
||||||
|
|
||||||
|
for (; i < argc; i++)
|
||||||
|
{
|
||||||
|
if ((result = MungeFile(argv[i], stdout, fmt, binaryMode,
|
||||||
|
defaultTabWidth, productNumber,
|
||||||
|
fileNumber)) != 0)
|
||||||
|
{
|
||||||
|
/* If result > 0, message should have already been printed */
|
||||||
|
if (result < 0)
|
||||||
|
fprintf(stderr, "ERROR: %s\n", strerror(result));
|
||||||
|
exit(1);
|
||||||
|
}
|
||||||
|
fileNumber++;
|
||||||
|
}
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Local Variables:
|
||||||
|
* tab-width: 4
|
||||||
|
* End:
|
||||||
|
* vi: ts=4 sw=4
|
||||||
|
* vim: si
|
||||||
|
*/
|
324
tools/psgen
Normal file
324
tools/psgen
Normal file
@ -0,0 +1,324 @@
|
|||||||
|
#!/usr/bin/perl
|
||||||
|
#
|
||||||
|
# psgen -- Postscript generator for code portion of source books
|
||||||
|
#
|
||||||
|
# Reads in a list of files/dirs from <filelist>, runs munge on each of
|
||||||
|
# them, and generates a single postscript file to stdout. The page numbers
|
||||||
|
# for each file/dir are put into the file <pagenums>.
|
||||||
|
#
|
||||||
|
# usage: psgen [ options... ] <filelist> <pagenums> <volume #> > foo.ps
|
||||||
|
# -l<firstLogicalPage>
|
||||||
|
# -p<firstPhysicalPage>
|
||||||
|
# -f<font>
|
||||||
|
# -D<defs> (passed to yapp)
|
||||||
|
# -P<productNumber>
|
||||||
|
# -o<mungedOutFile>
|
||||||
|
# -e (auto edit errors)
|
||||||
|
#
|
||||||
|
# $Id: psgen,v 1.18 1997/11/13 21:44:16 colin Exp $
|
||||||
|
#
|
||||||
|
|
||||||
|
$bookRoot = $ENV{"BOOKROOT"} || ".";
|
||||||
|
$toolsDir = "$bookRoot/tools";
|
||||||
|
$psDir = "$bookRoot/ps";
|
||||||
|
$editor = $ENV{"EDITOR"} || "vi";
|
||||||
|
|
||||||
|
# Configuration settings - external file names
|
||||||
|
$mungeProg = "$toolsDir/munge";
|
||||||
|
$yappProg = "$toolsDir/yapp";
|
||||||
|
$preambleFile = "$psDir/prolog.ps";
|
||||||
|
$tempFile = "/tmp/psgen-$$";
|
||||||
|
|
||||||
|
# Parse arguments
|
||||||
|
$firstLogPage = $firstPhysPage = 0;
|
||||||
|
$productNumber = 1;
|
||||||
|
$font = "OCRB";
|
||||||
|
$autoEdit = 0;
|
||||||
|
while ($#ARGV >= 0 && $ARGV[0] =~ /^-/)
|
||||||
|
{
|
||||||
|
$_ = shift @ARGV;
|
||||||
|
if (/^--$/)
|
||||||
|
{
|
||||||
|
last;
|
||||||
|
}
|
||||||
|
elsif (/^-l(\d+)$/)
|
||||||
|
{
|
||||||
|
$firstLogPage = $1;
|
||||||
|
}
|
||||||
|
elsif (/^-p(\d+)$/)
|
||||||
|
{
|
||||||
|
$firstPhysPage = $1;
|
||||||
|
}
|
||||||
|
elsif (/^-f(.+)$/)
|
||||||
|
{
|
||||||
|
$font = $1;
|
||||||
|
}
|
||||||
|
elsif (/^-D(.+)$/)
|
||||||
|
{
|
||||||
|
$yappDefs .= " " . $_;
|
||||||
|
}
|
||||||
|
elsif (/^-P(\d+)$/)
|
||||||
|
{
|
||||||
|
$productNumber = $1;
|
||||||
|
}
|
||||||
|
elsif (/^-o(.+)$/)
|
||||||
|
{
|
||||||
|
$mungedOutFile = $1;
|
||||||
|
}
|
||||||
|
elsif (/^-e$/)
|
||||||
|
{
|
||||||
|
$autoEdit = 1;
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
&Error("Unrecognized option: '$_'");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
$fileListFile = shift @ARGV || die "Missing file list argument (arg 1)";
|
||||||
|
$pageNumFile = shift @ARGV || die "Missing page number file argument (arg 2)";
|
||||||
|
$volume = shift @ARGV || die "Missing volume number argument (arg 3)";
|
||||||
|
|
||||||
|
# Determine initial page numbers
|
||||||
|
{
|
||||||
|
my $nextLogPage = 1;
|
||||||
|
my $nextPhysPage = 3;
|
||||||
|
my $volNum = 0; # Which volume's page numbers we're reading
|
||||||
|
|
||||||
|
if ($volume > 1)
|
||||||
|
{
|
||||||
|
open(OLDPAGENUMS, "<$pageNumFile") || die;
|
||||||
|
while (<OLDPAGENUMS>)
|
||||||
|
{
|
||||||
|
if (/^Volume\s+(\d+)$/)
|
||||||
|
{
|
||||||
|
$volNum = $1;
|
||||||
|
}
|
||||||
|
elsif (/^Next:\s+(\d+)\s*$/ && $volNum == $volume - 1)
|
||||||
|
{
|
||||||
|
$nextLogPage = $1;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
close(OLDPAGENUMS);
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
unlink($pageNumFile);
|
||||||
|
}
|
||||||
|
$firstLogPage = $nextLogPage if ($firstLogPage == 0);
|
||||||
|
$firstPhysPage = $nextPhysPage if ($firstPhysPage == 0);
|
||||||
|
}
|
||||||
|
|
||||||
|
# Names of PostScript operators invoked. These are the interface
|
||||||
|
# between this file and the $preambleFile.
|
||||||
|
$oddPageStartPS = "OddPageStart";
|
||||||
|
$evenPageStartPS = "EvenPageStart";
|
||||||
|
$oddPageEndPS = "OddPageEnd";
|
||||||
|
$evenPageEndPS = "EvenPageEnd";
|
||||||
|
$dirPagePS = "DirPage";
|
||||||
|
# This is short because it's emitted every line
|
||||||
|
$linePS = "L";
|
||||||
|
|
||||||
|
# Handle an error from munge.
|
||||||
|
# A result of 0 means to retry, 1 means to exit
|
||||||
|
sub MungeError
|
||||||
|
{
|
||||||
|
my $result = 1;
|
||||||
|
|
||||||
|
open(FILEH, "<$tempFile") || die;
|
||||||
|
while (<FILEH>)
|
||||||
|
{
|
||||||
|
print STDERR;
|
||||||
|
if (/ in (.*) line (\d+)$/)
|
||||||
|
{
|
||||||
|
my ($fileName, $lineNumber) = ($1, $2);
|
||||||
|
|
||||||
|
if ($autoEdit)
|
||||||
|
{
|
||||||
|
my @statResult = stat($fileName);
|
||||||
|
my $oldMTime = $statResult[9];
|
||||||
|
|
||||||
|
system("'$editor' '+$lineNumber' '$fileName' 1>&2");
|
||||||
|
@statResult = stat($fileName);
|
||||||
|
$result = ($statResult[9] == $oldMTime);
|
||||||
|
last;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
close(FILEH);
|
||||||
|
unlink($tempFile) || die "Couldn't unlink $tempFile";
|
||||||
|
return $result;
|
||||||
|
}
|
||||||
|
|
||||||
|
sub CopyFileToPS
|
||||||
|
{
|
||||||
|
local $fileName = $_[0];
|
||||||
|
local $args = "'-I$psDir' '-Dfont=$font'";
|
||||||
|
local $_;
|
||||||
|
|
||||||
|
$args .= $yappDefs;
|
||||||
|
open(FILEH, "$yappProg $args '$fileName' |") || die;
|
||||||
|
while (<FILEH>)
|
||||||
|
{
|
||||||
|
print PSOUT $_;
|
||||||
|
}
|
||||||
|
close(FILEH) || exit(1);
|
||||||
|
1;
|
||||||
|
}
|
||||||
|
|
||||||
|
# Wrap a string in parens as required by PostScript, with proper quoting.
|
||||||
|
sub StringPS
|
||||||
|
{
|
||||||
|
local $str = $_[0];
|
||||||
|
|
||||||
|
$str =~ s/([\\()])/\\$1/g;
|
||||||
|
"(" . $str . ")";
|
||||||
|
}
|
||||||
|
|
||||||
|
# Emit a start of page. The Postscript DSC %%Page: header
|
||||||
|
# (followed by logical page number, then physical) and
|
||||||
|
# the top-of-page function (which is passed the page number as a string)
|
||||||
|
sub PageStartPS
|
||||||
|
{
|
||||||
|
local $pageNum = $_[0];
|
||||||
|
|
||||||
|
"%%Page: " . ($pageNum + $firstLogPage) . " " .
|
||||||
|
($pageNum + $firstPhysPage) . "\n" .
|
||||||
|
&StringPS($pageNum + $firstLogPage) .
|
||||||
|
((($pageNum + $firstLogPage) % 2) ? $oddPageStartPS
|
||||||
|
: $evenPageStartPS) . "\n";
|
||||||
|
}
|
||||||
|
|
||||||
|
sub PageEndPS
|
||||||
|
{
|
||||||
|
local $pageNum = $_[0];
|
||||||
|
|
||||||
|
((($pageNum + $firstLogPage) % 2) ? $oddPageEndPS : $evenPageEndPS) . "\n";
|
||||||
|
}
|
||||||
|
|
||||||
|
# Save the page number to a table-of-contents file
|
||||||
|
sub SavePageNum
|
||||||
|
{
|
||||||
|
local ($fileName, $pageNum) = @_;
|
||||||
|
|
||||||
|
print PAGENUMS ($pageNum + $firstLogPage), ": $fileName\n";
|
||||||
|
}
|
||||||
|
|
||||||
|
# The main code.
|
||||||
|
|
||||||
|
open(PSOUT, ">-") || die;
|
||||||
|
open(FILELIST, "<$fileListFile") || die;
|
||||||
|
open(PAGENUMS, ">>$pageNumFile") || die;
|
||||||
|
if ($mungedOutFile ne "")
|
||||||
|
{
|
||||||
|
open(MUNGEDOUT, ">$mungedOutFile") || die;
|
||||||
|
}
|
||||||
|
|
||||||
|
print PAGENUMS "Volume $volume\n";
|
||||||
|
|
||||||
|
&CopyFileToPS($preambleFile);
|
||||||
|
|
||||||
|
$fileNumber = 0;
|
||||||
|
$pageNum = 0; # This is 0-based, since it is added to $first{Log,Phys}Page
|
||||||
|
$enable = 0;
|
||||||
|
|
||||||
|
while (<FILELIST>)
|
||||||
|
{
|
||||||
|
/^([VDTB])(\S*)\s+(.*)/ || die "Illegal file list line $.";
|
||||||
|
|
||||||
|
local ($fileType, $options, $arg) = ($1, $2, $3);
|
||||||
|
|
||||||
|
if ($fileType eq "V")
|
||||||
|
{
|
||||||
|
@args = split(/\s+/, $arg);
|
||||||
|
if ($enable = ($args[0] == $volume))
|
||||||
|
{
|
||||||
|
$defaultTabWidth = int($args[1]);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
elsif ($fileType eq "D")
|
||||||
|
{
|
||||||
|
next unless $enable; # Do nothing if we're in the wrong volume
|
||||||
|
$dirName = $arg;
|
||||||
|
&SavePageNum($dirName, $pageNum);
|
||||||
|
print PSOUT &PageStartPS($pageNum);
|
||||||
|
print PSOUT &StringPS($dirName), $dirPagePS, "\n";
|
||||||
|
print PSOUT &PageEndPS($pageNum);
|
||||||
|
$pageNum++;
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
my $done = 0;
|
||||||
|
|
||||||
|
$fileNumber++;
|
||||||
|
$fileName = $arg;
|
||||||
|
next unless $enable; # Do nothing if we're in the wrong volume
|
||||||
|
&SavePageNum($fileName, $pageNum);
|
||||||
|
$quotedFileName = $fileName;
|
||||||
|
$quotedFileName =~ s/'/\\'/g;
|
||||||
|
$tabWidth = ($options =~ /(\d)/) ? $1 : $defaultTabWidth;
|
||||||
|
$args = ($fileType eq "B") ? "-b" : "";
|
||||||
|
$args .= " -$tabWidth -p$productNumber -f$fileNumber";
|
||||||
|
while (!$done)
|
||||||
|
{
|
||||||
|
if (open(FILE, "$mungeProg $args '$quotedFileName' 2>$tempFile |"))
|
||||||
|
{
|
||||||
|
$line = <FILE>;
|
||||||
|
print MUNGEDOUT $line;
|
||||||
|
|
||||||
|
while ($line ne "")
|
||||||
|
{
|
||||||
|
print PSOUT &PageStartPS($pageNum);
|
||||||
|
|
||||||
|
while ($line ne "" and $line !~ /^\f/)
|
||||||
|
{
|
||||||
|
chop $line;
|
||||||
|
print PSOUT &StringPS($line), $linePS, "\n";
|
||||||
|
$line = <FILE>;
|
||||||
|
print MUNGEDOUT $line;
|
||||||
|
}
|
||||||
|
$line =~ s/^\f//;
|
||||||
|
|
||||||
|
print PSOUT &PageEndPS($pageNum);
|
||||||
|
$pageNum++;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (close(FILE))
|
||||||
|
{
|
||||||
|
$done = 2;
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
$done = &MungeError();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
$done = &MungeError();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if ($done == 1)
|
||||||
|
{
|
||||||
|
die;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
# Print PostScript DSC trailer with the correct number of pages
|
||||||
|
print PSOUT "%%Trailer\n%%Pages: ", $pageNum, "\n%%EOF\n";
|
||||||
|
|
||||||
|
print PAGENUMS "Pages: ", $pageNum, "\n";
|
||||||
|
print PAGENUMS "Next: ", ((($pageNum+1) & ~1) + $firstLogPage), "\n";
|
||||||
|
|
||||||
|
close(PAGENUMS) || die;
|
||||||
|
close(FILELIST) || die;
|
||||||
|
close(PSOUT) || die;
|
||||||
|
|
||||||
|
if ($mungedOutFile ne "")
|
||||||
|
{
|
||||||
|
close(MUNGEDOUT) || die;
|
||||||
|
}
|
||||||
|
|
||||||
|
#
|
||||||
|
# vi: ai ts=4
|
||||||
|
# vim: si
|
||||||
|
#
|
1851
tools/repair.c
Normal file
1851
tools/repair.c
Normal file
File diff suppressed because it is too large
Load Diff
185
tools/sortpages
Normal file
185
tools/sortpages
Normal file
@ -0,0 +1,185 @@
|
|||||||
|
#!/usr/bin/perl
|
||||||
|
#
|
||||||
|
# $Id: sortpages,v 1.8 1997/12/11 19:20:58 mhw Exp $
|
||||||
|
#
|
||||||
|
|
||||||
|
@fileNameFromNumber = ();
|
||||||
|
@pagesFound = ();
|
||||||
|
$theProductNumber = 0;
|
||||||
|
|
||||||
|
for $fileIndex (0..$#ARGV)
|
||||||
|
{
|
||||||
|
$fileName = $ARGV[$fileIndex];
|
||||||
|
open(FILE, "<$fileName") || die;
|
||||||
|
while (!eof(FILE))
|
||||||
|
{
|
||||||
|
$filePos = tell(FILE);
|
||||||
|
$_ = <FILE>;
|
||||||
|
if (/^\f?-\S/)
|
||||||
|
{
|
||||||
|
my ($versionHex, $flagsHex, $pageCRCHex, $tabWidthHex,
|
||||||
|
$productNumberHex, $fileNumberHex, $pageNumber, $name)
|
||||||
|
= (/^\f?-\S\S{4}\ # CRC followed by a space
|
||||||
|
([0-9a-f]) # Format version
|
||||||
|
([0-9a-f]{2}) # Flags
|
||||||
|
([0-9a-f]{8}) # Running CRC32
|
||||||
|
([0-9a-f]) # Tab width (0 means radix64)
|
||||||
|
([0-9a-f]{3}) # Product number
|
||||||
|
([0-9a-f]{4}) # File number
|
||||||
|
\ Page\ (\d+)\ of\ (.*)/x);
|
||||||
|
my $version = hex($versionHex);
|
||||||
|
my $flags = hex($flagsHex);
|
||||||
|
my $productNumber = hex($productNumberHex);
|
||||||
|
my $fileNumber = hex($fileNumberHex);
|
||||||
|
|
||||||
|
unless ($version == 0 && $productNumber > 0
|
||||||
|
&& $fileNumber > 0 && $pageNumber > 0
|
||||||
|
&& $name ne "")
|
||||||
|
{
|
||||||
|
print STDERR "ERROR: Invalid header info ",
|
||||||
|
"at $fileName line $.\n";
|
||||||
|
exit(1);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!defined($fileNameFromNumber[$fileNumber]))
|
||||||
|
{
|
||||||
|
$fileNameFromNumber[$fileNumber] = $name;
|
||||||
|
}
|
||||||
|
elsif ($fileNameFromNumber[$fileNumber] ne $name)
|
||||||
|
{
|
||||||
|
print STDERR "ERROR: Mismatched filename ",
|
||||||
|
"at $fileName line $.\n";
|
||||||
|
exit(1);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!$theProductNumber)
|
||||||
|
{
|
||||||
|
$theProductNumber = $productNumber;
|
||||||
|
}
|
||||||
|
elsif ($theProductNumber != $productNumber)
|
||||||
|
{
|
||||||
|
print STDERR "ERROR: Different product number ",
|
||||||
|
"at $fileName line $.\n";
|
||||||
|
exit(1);
|
||||||
|
}
|
||||||
|
|
||||||
|
push @pagesFound, (sprintf "%5d:%4d:%d:%d:%d",
|
||||||
|
$fileNumber, $pageNumber, $flags, $fileIndex, $filePos);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
close(FILE) || die;
|
||||||
|
}
|
||||||
|
|
||||||
|
@pagesFound = sort @pagesFound;
|
||||||
|
|
||||||
|
$result = 0;
|
||||||
|
$lastFileNumber = 0;
|
||||||
|
$lastPageNumber = 0;
|
||||||
|
$nextFileNumber = 1;
|
||||||
|
$nextPageNumber = 1;
|
||||||
|
$fileIndexOpen = -1;
|
||||||
|
foreach (@pagesFound)
|
||||||
|
{
|
||||||
|
my ($fileNumber, $pageNumber, $flags, $fileIndex, $filePos) = split /:/;
|
||||||
|
|
||||||
|
$fileNumber = int($fileNumber);
|
||||||
|
$pageNumber = int($pageNumber);
|
||||||
|
|
||||||
|
if ($fileNumber == $lastFileNumber && $pageNumber == $lastPageNumber)
|
||||||
|
{
|
||||||
|
print STDERR "DUPLICATE: File $fileNumber, page $pageNumber, skipped\n";
|
||||||
|
next;
|
||||||
|
}
|
||||||
|
|
||||||
|
if ($nextFileNumber < $fileNumber && $nextPageNumber != 1)
|
||||||
|
{
|
||||||
|
print STDERR "MISSING: File $nextFileNumber, ",
|
||||||
|
"pages $nextPageNumber - END\n";
|
||||||
|
$nextPageNumber = 1;
|
||||||
|
$nextFileNumber++;
|
||||||
|
$result = 1;
|
||||||
|
}
|
||||||
|
if ($nextFileNumber < $fileNumber)
|
||||||
|
{
|
||||||
|
print STDERR "MISSING: Files $nextFileNumber - ",
|
||||||
|
$fileNumber-1, "\n";
|
||||||
|
$nextFileNumber = $fileNumber;
|
||||||
|
$nextPageNumber = 1;
|
||||||
|
$result = 1;
|
||||||
|
}
|
||||||
|
if ($nextFileNumber != $fileNumber)
|
||||||
|
{
|
||||||
|
print STDERR "ERROR: Internal error, unexpected fileNumber\n";
|
||||||
|
exit(1);
|
||||||
|
}
|
||||||
|
|
||||||
|
if ($nextPageNumber < $pageNumber)
|
||||||
|
{
|
||||||
|
print STDERR "MISSING: File $fileNumber, pages $nextPageNumber - ",
|
||||||
|
$pageNumber-1, "\n";
|
||||||
|
$nextPageNumber = $pageNumber;
|
||||||
|
$result = 1;
|
||||||
|
}
|
||||||
|
if ($nextPageNumber != $pageNumber)
|
||||||
|
{
|
||||||
|
print STDERR "ERROR: Internal error, unexpected pageNumber\n";
|
||||||
|
exit(1);
|
||||||
|
}
|
||||||
|
|
||||||
|
if ($fileIndexOpen != $fileIndex)
|
||||||
|
{
|
||||||
|
if ($fileIndexOpen >= 0)
|
||||||
|
{
|
||||||
|
close(FILE) || die;
|
||||||
|
$fileIndexOpen = -1;
|
||||||
|
}
|
||||||
|
$fileName = $ARGV[$fileIndex];
|
||||||
|
open(FILE, "<$fileName") || die;
|
||||||
|
$fileIndexOpen = $fileIndex;
|
||||||
|
}
|
||||||
|
seek(FILE, $filePos, 0) || die($!);
|
||||||
|
|
||||||
|
$_ = <FILE>;
|
||||||
|
print;
|
||||||
|
while (<FILE>)
|
||||||
|
{
|
||||||
|
last if /^\f?-\S/;
|
||||||
|
print;
|
||||||
|
}
|
||||||
|
$lastFileNumber = $fileNumber;
|
||||||
|
$lastPageNumber = $pageNumber;
|
||||||
|
|
||||||
|
if ($flags & 1) # Bit 0 of flags indicates last page of file
|
||||||
|
{
|
||||||
|
$nextFileNumber++;
|
||||||
|
$nextPageNumber = 1;
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
$nextPageNumber++;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if ($nextPageNumber != 1)
|
||||||
|
{
|
||||||
|
print STDERR "MISSING: File $nextFileNumber, ",
|
||||||
|
"pages $nextPageNumber - END\n";
|
||||||
|
$nextPageNumber = 1;
|
||||||
|
$nextFileNumber++;
|
||||||
|
$result = 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
print STDERR "Highest file number encountered: ", $nextFileNumber - 1, "\n";
|
||||||
|
|
||||||
|
if ($fileIndexOpen >= 0)
|
||||||
|
{
|
||||||
|
close(FILE) || die;
|
||||||
|
$fileIndexOpen = -1;
|
||||||
|
}
|
||||||
|
|
||||||
|
exit($result);
|
||||||
|
|
||||||
|
#
|
||||||
|
# vi: ai ts=4
|
||||||
|
# vim: si
|
||||||
|
#
|
222
tools/subst.c
Normal file
222
tools/subst.c
Normal file
@ -0,0 +1,222 @@
|
|||||||
|
/*
|
||||||
|
* subst.c -- Repair substitution tables
|
||||||
|
*
|
||||||
|
* Copyright (C) 1997 Pretty Good Privacy, Inc.
|
||||||
|
*
|
||||||
|
* Written by Colin Plumb
|
||||||
|
*
|
||||||
|
* $Id: subst.c,v 1.14 1997/11/03 22:12:00 colin Exp $
|
||||||
|
*
|
||||||
|
* IT IS EXPECTED that users of this program will play with these tables
|
||||||
|
* and the cost values in the subst.h header. (Some day, they'll all
|
||||||
|
* get moved to an external config file.)
|
||||||
|
*
|
||||||
|
* NOTE: Other cost are hiding in the Filter functions in repair.c.
|
||||||
|
* Remember to keep them all on the same scale.
|
||||||
|
*/
|
||||||
|
|
||||||
|
/*
|
||||||
|
* The repair program copies its input to its output, making various
|
||||||
|
* substitutions, until it manages to produce a version that satisfies
|
||||||
|
* the parser. This includes having a correct CRC for each line.
|
||||||
|
* Each substitution has a cost, and the combinations are tried in order
|
||||||
|
* of increasing cost. NOTE that even translating "A"->"A" counts as
|
||||||
|
* a substitution, although it may have zero cost.
|
||||||
|
*
|
||||||
|
* The intention is to correct transcription errors, where the
|
||||||
|
* errors have a distinctly non-uniform distribution. Slight
|
||||||
|
* differences in cost produce a preference in trying some errors
|
||||||
|
* first. If an error costs half as much as another, combinations
|
||||||
|
* of two of that error will be compared to one of the more expensive.
|
||||||
|
* Too many cheap substitutions will result is repair spending
|
||||||
|
* a very log time searching before considering the more expensive
|
||||||
|
* substitutions.
|
||||||
|
*
|
||||||
|
* The following parameters and the raw substitution tables are expected
|
||||||
|
* to be edited by the user based on experience. Eventually, this
|
||||||
|
* will be moved into an external config file, but for now it's a matter
|
||||||
|
* of recompiling.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include "subst.h"
|
||||||
|
#include "util.h"
|
||||||
|
|
||||||
|
/* what the OCR software reports for "unrecognizable */
|
||||||
|
#define UNRECOG_STRING "~\274"
|
||||||
|
|
||||||
|
/*
|
||||||
|
* The input substitutions to make (one-to-one). These are listed in
|
||||||
|
* the order of correction. i.e. uncorrected input first, then corrected
|
||||||
|
* output. Substitutions are one-way; to get two-way, list it twice.
|
||||||
|
*/
|
||||||
|
|
||||||
|
struct RawSubst const substSingles[] = {
|
||||||
|
/* Identity substitutions - note that period (.) is excluded */
|
||||||
|
{ "!\"#$%&'()*+,-./0123456789:;<=>?" SPACE_STRING,
|
||||||
|
"!\"#$%&'()*+,-./0123456789:;<=>?" SPACE_STRING, 0, 0, NULL },
|
||||||
|
{ "@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_\t" TAB_STRING,
|
||||||
|
"@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_\t" TAB_STRING, 0, 0, NULL },
|
||||||
|
{ "`abcdefghijklmnopqrstuvwxyz{|}~\f" FORMFEED_STRING,
|
||||||
|
"`abcdefghijklmnopqrstuvwxyz{|}~\f" FORMFEED_STRING, 0, 0, NULL },
|
||||||
|
#if (TAB_PAD_CHAR & 128) /* Not already included? */
|
||||||
|
{ TAB_PAD_STRING, TAB_PAD_STRING, 0, NULL },
|
||||||
|
#endif
|
||||||
|
{ "\r\n" CONTIN_STRING, "\n\n" CONTIN_STRING, 0, 0, NULL },
|
||||||
|
|
||||||
|
/* Occasionally these just get inserted as glitches */
|
||||||
|
{ ".,'`", NULL, 5, 10, FilterNearBlanks },
|
||||||
|
/* This is now pretty infrequent */
|
||||||
|
{ "-_", "_-", 0, 10, FilterAfterRepeat },
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Capitalization errors are common in some cases
|
||||||
|
* c/C, s/S, u/U are fucked up all the time.
|
||||||
|
* Also o/O, v/V and w/W. x, y and z also give some problems.
|
||||||
|
*/
|
||||||
|
{ "cilmopsuvwxyz", "CILMOPSUVWXYZ", 7, 13, FilterNearLower },
|
||||||
|
{ "CILMOPSUVWXYZ", "cilmopsuvwxyz", 7, 13, FilterNearUpper },
|
||||||
|
/* Other errors */
|
||||||
|
{ "g9aaiji;xX00Si", "9gg2ji;i%%oO3f", 10, 0, NULL },
|
||||||
|
/* This seems to happen a lot */
|
||||||
|
{ "c", "r", 9, 0, NULL },
|
||||||
|
|
||||||
|
{ "j", ";", 9, 0, NULL },
|
||||||
|
{ "' ", "``", 10, 0, NULL },
|
||||||
|
|
||||||
|
/* Uncommon errors */
|
||||||
|
|
||||||
|
/* Wierd stuff that's happened in the checksum part */
|
||||||
|
/* A highish weight is okay here */
|
||||||
|
{ "sSEdJl", "554437", 15, 0, NULL },
|
||||||
|
{ "LESsPZ", "bb8a22", 15, 0, NULL },
|
||||||
|
|
||||||
|
/* Wierd stuff that has happened */
|
||||||
|
{ "BasAeaeRoooo", "3334a@QQpqbd", 5, 15, FilterIsBinary },
|
||||||
|
{ "oooo", "pqbd", 0, 15, FilterIsBinary },
|
||||||
|
{ "ttTCCflO", "iff{[lfG", 12, 0, NULL },
|
||||||
|
#if 0
|
||||||
|
/* If the line-breaks get screwed up, use these */
|
||||||
|
{ " ", "\n", 10, COST_INFINITY, FilterChecksumFollows },
|
||||||
|
{ "\n", " ", COST_INFINITY, 10, FilterChecksumFollows },
|
||||||
|
{ "\n", NULL, COST_INFINITY , 11, FilterChecksumFollows },
|
||||||
|
#endif
|
||||||
|
|
||||||
|
{ NULL, NULL, 0, 0, NULL }
|
||||||
|
};
|
||||||
|
|
||||||
|
/* The many-to-many substitutions */
|
||||||
|
struct RawSubst const substMultiples[] = {
|
||||||
|
{ "''", "\"", 2, 0, NULL },
|
||||||
|
{ "``", "\"", 2, 0, NULL },
|
||||||
|
{ ",'", "\"", 2, 0, NULL },
|
||||||
|
{ "',", "\"", 2, 0, NULL },
|
||||||
|
{ ",,", "\"", 2, 0, NULL },
|
||||||
|
/* Extra inserted spaces are common */
|
||||||
|
{ " ", " ", COST_INFINITY, 0, FilterFollowsSpace },
|
||||||
|
{ " ", "", 0, 15, FilterFollowsSpace },
|
||||||
|
{ "\t", " ", COST_INFINITY, 0, FilterFollowsSpace },
|
||||||
|
{ "\t", "", 0, 10, FilterFollowsSpace },
|
||||||
|
/* Convert between SPACE_CHAR dots and periods */
|
||||||
|
{ ".", SPACE_STRING, 1, COST_INFINITY, FilterFollowsSpace },
|
||||||
|
{ ".", " "SPACE_STRING, COST_INFINITY, 10, FilterFollowsSpace },
|
||||||
|
{ SPACE_STRING, ".", 15, 5, FilterFollowsSpace },
|
||||||
|
{ SPACE_STRING, " "SPACE_STRING, COST_INFINITY, 5, FilterFollowsSpace },
|
||||||
|
|
||||||
|
/* Replace "unknown" by zero - it often is */
|
||||||
|
{ UNRECOG_STRING, "0", 1, 0, NULL },
|
||||||
|
{ UNRECOG_STRING, "_", 2, 0, NULL },
|
||||||
|
{ UNRECOG_STRING, ")", 3, 0, NULL },
|
||||||
|
{ UNRECOG_STRING, "^", 4, 0, NULL },
|
||||||
|
/* Except that these glitches are common */
|
||||||
|
{ UNRECOG_STRING"'", "\\\"", 0, 0, NULL },
|
||||||
|
{ UNRECOG_STRING"'", "\"", 1, 0, NULL },
|
||||||
|
{ "'"UNRECOG_STRING, "\"", 0, 0, NULL },
|
||||||
|
{ UNRECOG_STRING UNRECOG_STRING , "\"", 0, 0, NULL },
|
||||||
|
/* Something else that has been seen */
|
||||||
|
{ "V'", "\\\"", 5, 0, NULL },
|
||||||
|
|
||||||
|
/* A common transposition */
|
||||||
|
{ "\"'", "'\"", 5, 0, NULL },
|
||||||
|
{ "'\"", "\"'", 5, 0, NULL },
|
||||||
|
/* These also happen fairly often */
|
||||||
|
{ " \"", "''", 5, 0, NULL },
|
||||||
|
{ "\" ", "''", 5, 0, NULL },
|
||||||
|
|
||||||
|
/* Common glitches */
|
||||||
|
{ "\t.\n", "\n", 5, 0, NULL },
|
||||||
|
{ "\t,\n", "\n", 5, 0, NULL },
|
||||||
|
{ "\t-\n", "\n", 5, 0, NULL },
|
||||||
|
{ "\t_\n", "\n", 5, 0, NULL },
|
||||||
|
{ "\t'\n", "\n", 5, 0, NULL },
|
||||||
|
{ "\t`\n", "\n", 5, 0, NULL },
|
||||||
|
{ "\t~\n", "\n", 5, 0, NULL },
|
||||||
|
{ "\t:\n", "\n", 5, 0, NULL },
|
||||||
|
{ "\t"SPACE_STRING"\n", "\n", 5, 0, NULL },
|
||||||
|
|
||||||
|
/* Less common */
|
||||||
|
{ " .\n", "\n", 10, 0, NULL },
|
||||||
|
{ " ,\n", "\n", 10, 0, NULL },
|
||||||
|
{ " -\n", "\n", 10, 0, NULL },
|
||||||
|
{ " _\n", "\n", 10, 0, NULL },
|
||||||
|
{ " '\n", "\n", 10, 0, NULL },
|
||||||
|
{ " `\n", "\n", 10, 0, NULL },
|
||||||
|
{ " ~\n", "\n", 10, 0, NULL },
|
||||||
|
{ " :\n", "\n", 10, 0, NULL },
|
||||||
|
{ " "SPACE_STRING"\n", "\n", 10, 0, NULL },
|
||||||
|
|
||||||
|
/* Even less common */
|
||||||
|
{ ".\n", "\n", 15, 0, NULL },
|
||||||
|
{ ",\n", "\n", 15, 0, NULL },
|
||||||
|
{ "-\n", "\n", 15, 0, NULL },
|
||||||
|
{ "_\n", "\n", 15, 0, NULL },
|
||||||
|
{ "'\n", "\n", 15, 0, NULL },
|
||||||
|
{ "`\n", "\n", 15, 0, NULL },
|
||||||
|
{ "~\n", "\n", 15, 0, NULL },
|
||||||
|
{ ":\n", "\n", 15, 0, NULL },
|
||||||
|
{ SPACE_STRING"\n", "\n", 15, 0, NULL },
|
||||||
|
|
||||||
|
/* Wierd stuff that has happened */
|
||||||
|
{ "lJ", "U", 10, 0, NULL },
|
||||||
|
{ "ll", "U", 10, 0, NULL },
|
||||||
|
{ "l1", "U", 10, 0, NULL },
|
||||||
|
{ "il", "U", 10, 0, NULL }, /* Fairly common, actually */
|
||||||
|
{ "li", "U", 10, 0, NULL },
|
||||||
|
{ "l)", "U", 10, 0, NULL },
|
||||||
|
{ "Ll", "U", 10, 0, NULL },
|
||||||
|
{ "LI", "U", 10, 0, NULL },
|
||||||
|
{ "L1", "U", 10, 0, NULL },
|
||||||
|
|
||||||
|
{ "lo", "b", 10, 0, NULL },
|
||||||
|
{ "cl", "d", 10, 0, NULL },
|
||||||
|
{ "cliff", "diff", 2, 0, NULL },
|
||||||
|
{ "*\n", "*/\n", 10, 0, NULL },
|
||||||
|
|
||||||
|
/* That big black block has odd things happen to it */
|
||||||
|
{ "d", CONTIN_STRING, 10, 0, NULL },
|
||||||
|
{ "d\n", CONTIN_STRING"\n", 3, 0, NULL },
|
||||||
|
{ "S", CONTIN_STRING, 10, 0, NULL },
|
||||||
|
{ "S\n", CONTIN_STRING"\n", 3, 0, NULL },
|
||||||
|
|
||||||
|
/* Tab-stop wonders */
|
||||||
|
{ TAB_STRING, TAB_STRING"", 0, 0, TabFilter },
|
||||||
|
{ TAB_STRING, TAB_STRING" ", 0, 0, TabFilter },
|
||||||
|
{ TAB_STRING, TAB_STRING" ", 0, 0, TabFilter },
|
||||||
|
{ TAB_STRING, TAB_STRING" ", 0, 0, TabFilter },
|
||||||
|
{ TAB_STRING, TAB_STRING" ", 0, 0, TabFilter },
|
||||||
|
{ TAB_STRING, TAB_STRING" ", 0, 0, TabFilter },
|
||||||
|
{ TAB_STRING, TAB_STRING" ", 0, 0, TabFilter },
|
||||||
|
{ TAB_STRING, TAB_STRING" ", 0, 0, TabFilter },
|
||||||
|
/* Some scan errors */
|
||||||
|
{ "D ", TAB_STRING"", 1, 5, TabFilter },
|
||||||
|
{ "D ", TAB_STRING" ", 1, 5, TabFilter },
|
||||||
|
{ "D ", TAB_STRING" ", 1, 5, TabFilter },
|
||||||
|
{ "D ", TAB_STRING" ", 1, 5, TabFilter },
|
||||||
|
{ "D ", TAB_STRING" ", 1, 5, TabFilter },
|
||||||
|
{ "D ", TAB_STRING" ", 1, 5, TabFilter },
|
||||||
|
{ "D ", TAB_STRING" ", 1, 5, TabFilter },
|
||||||
|
{ "D ", TAB_STRING" ", 1, 5, TabFilter },
|
||||||
|
#if TAB_PAD_CHAR != ' '
|
||||||
|
#error Fix those tab patterns!
|
||||||
|
#endif
|
||||||
|
{ NULL, NULL, 0, 0, NULL }
|
||||||
|
};
|
66
tools/subst.h
Normal file
66
tools/subst.h
Normal file
@ -0,0 +1,66 @@
|
|||||||
|
/*
|
||||||
|
* subst.h -- Header for repair substitutions
|
||||||
|
*
|
||||||
|
* Copyright (C) 1997 Pretty Good Privacy, Inc.
|
||||||
|
*
|
||||||
|
* Written by Colin Plumb
|
||||||
|
*
|
||||||
|
* $Id: subst.h,v 1.9 1997/11/03 22:12:00 colin Exp $
|
||||||
|
*/
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Give up if the list of pending changes to attempt grows to this many
|
||||||
|
* elements. Each element is 32 bytes, so 128K is 8 MB of memory.
|
||||||
|
* (Other than this, repair's memory usage is fairly modest.)
|
||||||
|
*/
|
||||||
|
#define MAX_HEAP (1<<17)
|
||||||
|
|
||||||
|
/*
|
||||||
|
* There is a hack in the code to find a single substitution that will fix a
|
||||||
|
* line, even if it's not in the tables. It gets added to the tables "on
|
||||||
|
* probation", with an infinite cost, and if it leads to a successful
|
||||||
|
* correction of the entire page, is "learned" for future use and its
|
||||||
|
* cost reduced to something finite.
|
||||||
|
* (This is not remembered across runs of the program, though.
|
||||||
|
* Edit the tables in the source to fix it.)
|
||||||
|
*/
|
||||||
|
#define DYNAMIC_COST_LEARNED 15
|
||||||
|
|
||||||
|
/*
|
||||||
|
* This negative-cost bonus for passing the end of a line with the right
|
||||||
|
* CRC makes the search engine reluctant to backtrack past a correct CRC,
|
||||||
|
* greatly improving efficiency. It's rather a hack, though. Think of
|
||||||
|
* this in terms of "how many errors should be considered in the current
|
||||||
|
* line before considering the possibility of errors in the previous line?"
|
||||||
|
*
|
||||||
|
* This bonus is halved for lines that are the result of a correction
|
||||||
|
* that was computed from the checksum, since a correct checksum is
|
||||||
|
* much less significant in such a case.
|
||||||
|
*/
|
||||||
|
#define COST_LINE -30
|
||||||
|
|
||||||
|
/* The cost of a full-line nastyline substitution. */
|
||||||
|
#define NASTY_COST 5
|
||||||
|
|
||||||
|
/* Type describing filter functions used in substitutions */
|
||||||
|
struct ParseNode;
|
||||||
|
struct Substitution;
|
||||||
|
#include "heap.h"
|
||||||
|
typedef HeapCost FilterFunc(struct ParseNode *parent, char const *limit,
|
||||||
|
struct Substitution const *subst);
|
||||||
|
FilterFunc TabFilter, FilterFollowsSpace, FilterNearBlanks;
|
||||||
|
FilterFunc FilterNearUpper, FilterNearLower, FilterNearXDigit;
|
||||||
|
FilterFunc FilterAfterRepeat, FilterCharConst, FilterChecksumFollows;
|
||||||
|
FilterFunc FilterLikelyUnderscore, FilterIsDynamic, FilterIsBinary;
|
||||||
|
|
||||||
|
/* The external substitution format */
|
||||||
|
typedef struct RawSubst {
|
||||||
|
char const *input;
|
||||||
|
char const *output;
|
||||||
|
HeapCost cost, cost2;
|
||||||
|
FilterFunc *filter;
|
||||||
|
} RawSubst;
|
||||||
|
|
||||||
|
/* The substitutions to make */
|
||||||
|
extern struct RawSubst const substSingles[];
|
||||||
|
extern struct RawSubst const substMultiples[];
|
666
tools/unmunge.c
Normal file
666
tools/unmunge.c
Normal file
@ -0,0 +1,666 @@
|
|||||||
|
/*
|
||||||
|
* unmunge.c -- Program to convert a munged file to original form
|
||||||
|
*
|
||||||
|
* Copyright (C) 1997 Pretty Good Privacy, Inc.
|
||||||
|
*
|
||||||
|
* Designed by Colin Plumb, Mark H. Weaver, and Philip R. Zimmermann
|
||||||
|
* Written by Mark H. Weaver
|
||||||
|
*
|
||||||
|
* $Id: unmunge.c,v 1.13 1997/11/13 23:27:08 mhw Exp $
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include <sys/stat.h>
|
||||||
|
#include <sys/types.h>
|
||||||
|
#include <fcntl.h>
|
||||||
|
#include <unistd.h>
|
||||||
|
|
||||||
|
/*#include <direct.h> teun: MS VC wants direct.h for mkdir */
|
||||||
|
|
||||||
|
#include <stdio.h>
|
||||||
|
#include <errno.h>
|
||||||
|
#include <string.h>
|
||||||
|
#include <ctype.h>
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include <assert.h>
|
||||||
|
|
||||||
|
#include "util.h"
|
||||||
|
|
||||||
|
typedef struct UnMungeState
|
||||||
|
{
|
||||||
|
char const * mungedFileName;
|
||||||
|
char dirName[128];
|
||||||
|
char fileName[128];
|
||||||
|
char * fileNameTail;
|
||||||
|
int binaryMode, tabWidth;
|
||||||
|
long productNumber, fileNumber, pageNumber, lineNumber;
|
||||||
|
long manifestLineNumber;
|
||||||
|
word16 hdrFlags;
|
||||||
|
CRC pageCRC, seenPageCRC;
|
||||||
|
FILE * manifest;
|
||||||
|
FILE * file;
|
||||||
|
FILE * out;
|
||||||
|
} UnMungeState;
|
||||||
|
|
||||||
|
|
||||||
|
/* Returns number of characters decoded, or -1 on error */
|
||||||
|
static int
|
||||||
|
Decode4(char const src[4], byte dest[3])
|
||||||
|
{
|
||||||
|
int i, length;
|
||||||
|
byte srcVal[4];
|
||||||
|
|
||||||
|
for (i = 0; i < 4 && src[i] != RADIX64_END_CHAR; i++)
|
||||||
|
if ((srcVal[i] = Radix64DigitValue(src[i])) == (byte) -1)
|
||||||
|
return 1;
|
||||||
|
|
||||||
|
length = i - 1;
|
||||||
|
if (length < 1)
|
||||||
|
return -1;
|
||||||
|
|
||||||
|
for (; i < 4; i++)
|
||||||
|
srcVal[0] = 0;
|
||||||
|
|
||||||
|
dest[0] = (srcVal[0] << 2) | (srcVal[1] >> 4);
|
||||||
|
dest[1] = (srcVal[1] << 4) | (srcVal[2] >> 2);
|
||||||
|
dest[2] = (srcVal[2] << 6) | (srcVal[3]);
|
||||||
|
|
||||||
|
return length;
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Return number of characters decoded, or -1 on error
|
||||||
|
*/
|
||||||
|
static int
|
||||||
|
DecodeLine(char const *src, char *dest, int srclength)
|
||||||
|
{
|
||||||
|
int destlength = 0;
|
||||||
|
int result;
|
||||||
|
|
||||||
|
if (srclength % 4 || !srclength)
|
||||||
|
return -1; /* Must be a multiple of 4 */
|
||||||
|
|
||||||
|
while (srclength -= 4) {
|
||||||
|
if (Decode4(src, dest + destlength) != 3)
|
||||||
|
return -1;
|
||||||
|
src += 4;
|
||||||
|
destlength += 3;
|
||||||
|
}
|
||||||
|
result = Decode4(src, dest + destlength);
|
||||||
|
if (result < 1)
|
||||||
|
return -1;
|
||||||
|
return destlength + result;
|
||||||
|
}
|
||||||
|
|
||||||
|
int PrintFileError(UnMungeState *state, char const *message)
|
||||||
|
{
|
||||||
|
fprintf(stderr, "%s, %s line %ld\n", message,
|
||||||
|
state->mungedFileName, state->lineNumber);
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
int ReadManifest(UnMungeState *state, long fileNumberWanted,
|
||||||
|
char const *fileTailPrefix, long prefixLen)
|
||||||
|
{
|
||||||
|
long fileNumber = 0;
|
||||||
|
long firstMissingFileNum = 0, lastMissingFileNum = 0;
|
||||||
|
char buffer[512];
|
||||||
|
char * p;
|
||||||
|
|
||||||
|
if (state->manifest == NULL)
|
||||||
|
{
|
||||||
|
if (fileNumberWanted != 0)
|
||||||
|
{
|
||||||
|
assert(fileTailPrefix != NULL);
|
||||||
|
strncpy(state->fileName, fileTailPrefix, sizeof(state->fileName));
|
||||||
|
state->fileName[sizeof(state->fileName) - 1] = '\0';
|
||||||
|
state->fileNameTail = state->fileName;
|
||||||
|
}
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
while (fgets(buffer, sizeof(buffer), state->manifest))
|
||||||
|
{
|
||||||
|
if ((p = strchr(buffer, '\n')) != NULL)
|
||||||
|
*p = '\0';
|
||||||
|
state->manifestLineNumber++;
|
||||||
|
if (buffer[0] == 'D')
|
||||||
|
{
|
||||||
|
if (buffer[1] != ' ')
|
||||||
|
goto invalidManifest;
|
||||||
|
strncpy(state->dirName, buffer + 2, sizeof(state->dirName));
|
||||||
|
if (state->dirName[sizeof(state->dirName) - 1] != '\0')
|
||||||
|
goto invalidManifest;
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
fileNumber = strtol(buffer, &p, 10);
|
||||||
|
if (p == buffer || *p != ' ')
|
||||||
|
goto invalidManifest;
|
||||||
|
p++;
|
||||||
|
|
||||||
|
if (fileNumberWanted == 0 || fileNumber < fileNumberWanted)
|
||||||
|
{
|
||||||
|
if (firstMissingFileNum == 0)
|
||||||
|
firstMissingFileNum = fileNumber;
|
||||||
|
lastMissingFileNum = fileNumber;
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
else if (fileNumber > fileNumberWanted)
|
||||||
|
break;
|
||||||
|
else
|
||||||
|
{
|
||||||
|
size_t len;
|
||||||
|
|
||||||
|
len = strlen(state->dirName);
|
||||||
|
assert(sizeof(state->fileName) >= sizeof(state->dirName));
|
||||||
|
memcpy(state->fileName, state->dirName, len);
|
||||||
|
strncpy(state->fileName + len, p,
|
||||||
|
sizeof(state->fileName) - len);
|
||||||
|
if (strncmp(p, fileTailPrefix, prefixLen) != 0)
|
||||||
|
{
|
||||||
|
fprintf(stderr, "Mismatched filename, headers say '%s',\n"
|
||||||
|
" manifest says '%s'\n",
|
||||||
|
fileTailPrefix, p);
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
p = state->dirName;
|
||||||
|
while ((p = strchr(p, '/')) != NULL)
|
||||||
|
{
|
||||||
|
*p = '\0';
|
||||||
|
mkdir(state->dirName, 0777);
|
||||||
|
*p++ = '/';
|
||||||
|
}
|
||||||
|
state->fileNameTail = state->fileName + len;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (firstMissingFileNum != 0)
|
||||||
|
{
|
||||||
|
fprintf(stderr, "Missing files %ld-%ld\n",
|
||||||
|
firstMissingFileNum, lastMissingFileNum);
|
||||||
|
}
|
||||||
|
if (fileNumberWanted != 0 && fileNumber != fileNumberWanted)
|
||||||
|
{
|
||||||
|
fprintf(stderr, "Can't find file %ld in manifest file\n",
|
||||||
|
fileNumberWanted);
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
return 0;
|
||||||
|
|
||||||
|
invalidManifest:
|
||||||
|
fprintf(stderr, "Error parsing manifest file, line %ld\n",
|
||||||
|
state->manifestLineNumber);
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
int UnMungeFile(char const *mungedFileName, char const *manifestFileName,
|
||||||
|
int forceOverwrite, int forcePartialFiles)
|
||||||
|
{
|
||||||
|
UnMungeState * state;
|
||||||
|
EncodeFormat const * fmt = NULL;
|
||||||
|
char buffer[512];
|
||||||
|
char outbuf[BYTES_PER_LINE+1];
|
||||||
|
char * line;
|
||||||
|
char * lineData;
|
||||||
|
char * p;
|
||||||
|
int length;
|
||||||
|
int result = 0;
|
||||||
|
int skipPage = 0;
|
||||||
|
CRC lineCRC;
|
||||||
|
word32 num;
|
||||||
|
|
||||||
|
state = (UnMungeState *)calloc(1, sizeof(*state));
|
||||||
|
state->mungedFileName = mungedFileName;
|
||||||
|
|
||||||
|
if (manifestFileName != NULL)
|
||||||
|
{
|
||||||
|
if ((state->manifest = fopen(manifestFileName, "r")) == NULL)
|
||||||
|
goto errnoError;
|
||||||
|
}
|
||||||
|
|
||||||
|
if ((state->file = fopen(state->mungedFileName, "r")) == NULL)
|
||||||
|
goto errnoError;
|
||||||
|
|
||||||
|
while (!feof(state->file))
|
||||||
|
{
|
||||||
|
if (fgets(buffer, sizeof(buffer), state->file) == NULL)
|
||||||
|
{
|
||||||
|
if (feof(state->file))
|
||||||
|
break;
|
||||||
|
goto fileError;
|
||||||
|
}
|
||||||
|
|
||||||
|
state->lineNumber++;
|
||||||
|
|
||||||
|
line = buffer;
|
||||||
|
/* Strip leading whitespace */
|
||||||
|
while (isspace(*line))
|
||||||
|
line++;
|
||||||
|
if (*line == '\0')
|
||||||
|
continue;
|
||||||
|
|
||||||
|
/* Strip trailing whitespace */
|
||||||
|
p = line + strlen(line);
|
||||||
|
while (p > line && (byte)p[-1] < 128 && isspace(p[-1]))
|
||||||
|
p--;
|
||||||
|
|
||||||
|
lineData = line + PREFIX_LENGTH;
|
||||||
|
|
||||||
|
/* Pad up to at least PREFIX_LENGTH */
|
||||||
|
while (p < lineData)
|
||||||
|
*p++ = ' ';
|
||||||
|
*p++ = '\n';
|
||||||
|
*p = '\0';
|
||||||
|
length = p - lineData;
|
||||||
|
|
||||||
|
if (line[0] == HDR_PREFIX_CHAR)
|
||||||
|
{
|
||||||
|
fmt = FindFormat(line[1]);
|
||||||
|
if (!fmt)
|
||||||
|
{
|
||||||
|
result = PrintFileError(state, "ERROR: Invalid header type");
|
||||||
|
goto error;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
lineCRC = CalculateCRC(fmt->lineCRC, 0, (byte const *)lineData, length);
|
||||||
|
|
||||||
|
p = line + EncodedLength(fmt, fmt->runningCRCBits);
|
||||||
|
if (DecodeCheckDigits(fmt, p, NULL, fmt->lineCRC->bits, &num)
|
||||||
|
|| lineCRC != num)
|
||||||
|
{
|
||||||
|
result = PrintFileError(state, "ERROR: Line CRC failed");
|
||||||
|
goto error;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (line[0] == HDR_PREFIX_CHAR)
|
||||||
|
{
|
||||||
|
int formatVersion;
|
||||||
|
int flags;
|
||||||
|
CRC seenPageCRC;
|
||||||
|
int tabWidth;
|
||||||
|
long productNumber;
|
||||||
|
long fileNumber;
|
||||||
|
long pageNumber;
|
||||||
|
char * fileNameTail;
|
||||||
|
int skipNextPage = 0;
|
||||||
|
char * p;
|
||||||
|
EncodeFormat const * hFmt = &hexFormat;
|
||||||
|
|
||||||
|
/* Parse header line */
|
||||||
|
p = lineData;
|
||||||
|
|
||||||
|
if (DecodeCheckDigits(hFmt, p, &p, HDR_VERSION_BITS, &num))
|
||||||
|
{
|
||||||
|
invalidHeader:
|
||||||
|
result = PrintFileError(state, "ERROR: Invalid header");
|
||||||
|
goto error;
|
||||||
|
}
|
||||||
|
formatVersion = num;
|
||||||
|
|
||||||
|
if (DecodeCheckDigits(hFmt, p, &p, HDR_FLAG_BITS, &num))
|
||||||
|
goto invalidHeader;
|
||||||
|
flags = num;
|
||||||
|
|
||||||
|
if (DecodeCheckDigits(hFmt, p, &p, fmt->pageCRC->bits, &num))
|
||||||
|
goto invalidHeader;
|
||||||
|
seenPageCRC = num;
|
||||||
|
|
||||||
|
if (DecodeCheckDigits(hFmt, p, &p, HDR_TABWIDTH_BITS, &num))
|
||||||
|
goto invalidHeader;
|
||||||
|
tabWidth = num;
|
||||||
|
|
||||||
|
if (DecodeCheckDigits(hFmt, p, &p, HDR_PRODNUM_BITS, &num))
|
||||||
|
goto invalidHeader;
|
||||||
|
productNumber = num;
|
||||||
|
|
||||||
|
if (DecodeCheckDigits(hFmt, p, &p, HDR_FILENUM_BITS, &num))
|
||||||
|
goto invalidHeader;
|
||||||
|
fileNumber = num;
|
||||||
|
|
||||||
|
if (sscanf(p, " Page %ld of ", &pageNumber) < 1)
|
||||||
|
goto invalidHeader;
|
||||||
|
|
||||||
|
if (formatVersion > 0)
|
||||||
|
{
|
||||||
|
result = PrintFileError(state,
|
||||||
|
"ERROR: Format too new for "
|
||||||
|
"this version of unmunge");
|
||||||
|
goto error;
|
||||||
|
}
|
||||||
|
|
||||||
|
p = strstr(p, " of ");
|
||||||
|
if (p == NULL)
|
||||||
|
goto invalidHeader;
|
||||||
|
|
||||||
|
fileNameTail = p + 4;
|
||||||
|
p = fileNameTail + strlen(fileNameTail);
|
||||||
|
if (p < fileNameTail + 3 || p[-1] != '\n')
|
||||||
|
goto invalidHeader;
|
||||||
|
else
|
||||||
|
p[-1] = '\0';
|
||||||
|
|
||||||
|
if (state->out != NULL && state->pageCRC != state->seenPageCRC)
|
||||||
|
{
|
||||||
|
result = PrintFileError(state,
|
||||||
|
"ERROR: Page CRC mismatch on page before");
|
||||||
|
goto error;
|
||||||
|
}
|
||||||
|
|
||||||
|
if ((state->hdrFlags & HDR_FLAG_LASTPAGE) && state->out != NULL)
|
||||||
|
{
|
||||||
|
fclose(state->out);
|
||||||
|
state->out = NULL;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (state->out != NULL)
|
||||||
|
{
|
||||||
|
if (pageNumber != state->pageNumber + 1 ||
|
||||||
|
fileNumber != state->fileNumber ||
|
||||||
|
productNumber != state->productNumber ||
|
||||||
|
tabWidth != state->tabWidth ||
|
||||||
|
strcmp(fileNameTail, state->fileNameTail) != 0)
|
||||||
|
{
|
||||||
|
if (fileNumber == state->fileNumber &&
|
||||||
|
pageNumber > state->pageNumber + 1)
|
||||||
|
{
|
||||||
|
(void)PrintFileError(state,
|
||||||
|
"ERROR: Missing pages of this file");
|
||||||
|
if (forcePartialFiles && !state->binaryMode)
|
||||||
|
{
|
||||||
|
fputs("\n\n@@@@@@ Missing pages here! @@@@@@\n\n",
|
||||||
|
state->out);
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
skipNextPage = 1;
|
||||||
|
fclose(state->out);
|
||||||
|
state->out = NULL;
|
||||||
|
remove(state->fileName);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
(void)PrintFileError(state,
|
||||||
|
"ERROR: Missing pages of previous file");
|
||||||
|
if (forcePartialFiles && !state->binaryMode)
|
||||||
|
{
|
||||||
|
fputs("\n\n@@@@@@ Missing pages here! @@@@@@\n\n",
|
||||||
|
state->out);
|
||||||
|
/* Make it non-fatal, though... */
|
||||||
|
fclose(state->out);
|
||||||
|
state->out = NULL;
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
fclose(state->out);
|
||||||
|
state->out = NULL;
|
||||||
|
remove(state->fileName);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (state->out == NULL)
|
||||||
|
{
|
||||||
|
if (pageNumber != 1 && !skipPage)
|
||||||
|
(void)PrintFileError(state,
|
||||||
|
"ERROR: File doesn't begin with page 1");
|
||||||
|
|
||||||
|
state->binaryMode = (tabWidth == 0);
|
||||||
|
|
||||||
|
if (pageNumber != 1 && (state->binaryMode
|
||||||
|
|| !forcePartialFiles))
|
||||||
|
{
|
||||||
|
skipNextPage = 1;
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
/* TODO: Use global filelist to get pathname */
|
||||||
|
result = ReadManifest(state, fileNumber, fileNameTail,
|
||||||
|
strlen(fileNameTail));
|
||||||
|
if (result != 0)
|
||||||
|
goto error;
|
||||||
|
|
||||||
|
if (!forceOverwrite)
|
||||||
|
{
|
||||||
|
FILE * file;
|
||||||
|
|
||||||
|
/* Make sure file doesn't already exist */
|
||||||
|
file = fopen(state->fileName, "r");
|
||||||
|
if (file != NULL)
|
||||||
|
{
|
||||||
|
fclose(file);
|
||||||
|
fprintf(stderr, "ERROR: %s already exists\n",
|
||||||
|
state->fileName);
|
||||||
|
result = 1;
|
||||||
|
goto error;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
state->out = fopen(state->fileName,
|
||||||
|
state->binaryMode ? "wb" : "w");
|
||||||
|
if (state->out == NULL)
|
||||||
|
goto errnoError;
|
||||||
|
|
||||||
|
if (pageNumber != 1)
|
||||||
|
fputs("\n\n@@@@@@ Missing pages here! @@@@@@\n\n",
|
||||||
|
state->out);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
state->pageCRC = 0;
|
||||||
|
state->seenPageCRC = seenPageCRC;
|
||||||
|
state->hdrFlags = (word16)flags;
|
||||||
|
state->pageNumber = pageNumber;
|
||||||
|
state->fileNumber = fileNumber;
|
||||||
|
state->productNumber = productNumber;
|
||||||
|
state->tabWidth = tabWidth;
|
||||||
|
skipPage = skipNextPage;
|
||||||
|
}
|
||||||
|
else if (!skipPage)
|
||||||
|
{
|
||||||
|
if (state->out == NULL)
|
||||||
|
{
|
||||||
|
result = PrintFileError(state, "ERROR: Missing header line");
|
||||||
|
goto error;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Normal data line */
|
||||||
|
state->pageCRC = CalculateCRC(fmt->pageCRC, state->pageCRC,
|
||||||
|
(byte const *)lineData,
|
||||||
|
length);
|
||||||
|
line[2] = '\0';
|
||||||
|
if (DecodeCheckDigits(fmt, line, NULL, fmt->runningCRCBits, &num)
|
||||||
|
|| RunningCRCFromPageCRC(fmt, state->pageCRC) != num)
|
||||||
|
{
|
||||||
|
result = PrintFileError(state, "ERROR: Running CRC failed");
|
||||||
|
goto error;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (state->binaryMode)
|
||||||
|
{
|
||||||
|
length = DecodeLine(lineData, outbuf, length-1);
|
||||||
|
if (length < 0 || length > BYTES_PER_LINE) {
|
||||||
|
result = PrintFileError(state,
|
||||||
|
"ERROR: Corrupt radix-64 data");
|
||||||
|
goto error;
|
||||||
|
}
|
||||||
|
fwrite(outbuf, 1, length, state->out);
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
p = lineData;
|
||||||
|
while (*p != '\0')
|
||||||
|
{
|
||||||
|
if (*p == TAB_CHAR)
|
||||||
|
{
|
||||||
|
p++;
|
||||||
|
putc('\t', state->out);
|
||||||
|
while ((p - lineData) % state->tabWidth)
|
||||||
|
{
|
||||||
|
if (*p == '\n')
|
||||||
|
break;
|
||||||
|
else if (*p == ' ')
|
||||||
|
p++;
|
||||||
|
else
|
||||||
|
{
|
||||||
|
result = PrintFileError(state,
|
||||||
|
"ERROR: Not enough spaces "
|
||||||
|
"after a tab character");
|
||||||
|
goto error;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
else if (*p == FORMFEED_CHAR)
|
||||||
|
{
|
||||||
|
p++;
|
||||||
|
if (*p != '\n')
|
||||||
|
{
|
||||||
|
result = PrintFileError(state,
|
||||||
|
"ERROR: Formfeed character "
|
||||||
|
"not at end of line");
|
||||||
|
goto error;
|
||||||
|
}
|
||||||
|
p++; /* Skip newline */
|
||||||
|
putc('\f', state->out);
|
||||||
|
}
|
||||||
|
else if (*p == CONTIN_CHAR)
|
||||||
|
{
|
||||||
|
p++;
|
||||||
|
if (*p != '\n')
|
||||||
|
{
|
||||||
|
result = PrintFileError(state,
|
||||||
|
"ERROR: Continuation character "
|
||||||
|
"not at end of line");
|
||||||
|
goto error;
|
||||||
|
}
|
||||||
|
p++; /* Skip newline */
|
||||||
|
}
|
||||||
|
else if (*p == SPACE_CHAR)
|
||||||
|
{
|
||||||
|
putc(' ', state->out);
|
||||||
|
p++;
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
putc(*p, state->out);
|
||||||
|
p++;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (state->out != NULL)
|
||||||
|
{
|
||||||
|
if (!(state->hdrFlags & HDR_FLAG_LASTPAGE))
|
||||||
|
{
|
||||||
|
result = PrintFileError(state, "ERROR: Missing pages");
|
||||||
|
goto error;
|
||||||
|
}
|
||||||
|
if (state->pageCRC != state->seenPageCRC)
|
||||||
|
{
|
||||||
|
result = PrintFileError(state,
|
||||||
|
"ERROR: Page CRC failed on previous page");
|
||||||
|
goto error;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Check for missing files at the end */
|
||||||
|
result = ReadManifest(state, 0, NULL, 0);
|
||||||
|
goto done;
|
||||||
|
|
||||||
|
errnoError:
|
||||||
|
result = errno;
|
||||||
|
goto printError;
|
||||||
|
|
||||||
|
fileError:
|
||||||
|
result = ferror(state->file);
|
||||||
|
|
||||||
|
printError:
|
||||||
|
fprintf(stderr, "ERROR: %s\n", strerror(result));
|
||||||
|
|
||||||
|
error:
|
||||||
|
done:
|
||||||
|
if (state != NULL)
|
||||||
|
{
|
||||||
|
if (state->out != NULL)
|
||||||
|
fclose(state->out);
|
||||||
|
if (state->file != NULL)
|
||||||
|
fclose(state->file);
|
||||||
|
if (state->manifest != NULL)
|
||||||
|
fclose(state->manifest);
|
||||||
|
free(state);
|
||||||
|
}
|
||||||
|
return result;
|
||||||
|
}
|
||||||
|
|
||||||
|
void UsageAndExit(int result)
|
||||||
|
{
|
||||||
|
fprintf(stderr,
|
||||||
|
"Usage: unmunge [-fp] <file> [<manifest>]\n"
|
||||||
|
" -f Force overwrites of existing files\n"
|
||||||
|
" -p Force unmunge of partial files\n");
|
||||||
|
exit(result);
|
||||||
|
}
|
||||||
|
|
||||||
|
int main(int argc, char *argv[])
|
||||||
|
{
|
||||||
|
int result = 0;
|
||||||
|
int forceOverwrite = 0;
|
||||||
|
int forcePartialFiles = 0;
|
||||||
|
char * fileName = NULL;
|
||||||
|
char * manifestFileName = NULL;
|
||||||
|
int i, j;
|
||||||
|
|
||||||
|
InitUtil();
|
||||||
|
|
||||||
|
for (i = 1; i < argc && argv[i][0] == '-'; i++)
|
||||||
|
{
|
||||||
|
if (0 == strcmp(argv[i], "--"))
|
||||||
|
{
|
||||||
|
i++;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
for (j = 1; argv[i][j] != '\0'; j++)
|
||||||
|
{
|
||||||
|
if (argv[i][j] == 'h')
|
||||||
|
UsageAndExit(0);
|
||||||
|
else if (argv[i][j] == 'f')
|
||||||
|
forceOverwrite = 1;
|
||||||
|
else if (argv[i][j] == 'p')
|
||||||
|
forcePartialFiles = 1;
|
||||||
|
else
|
||||||
|
{
|
||||||
|
fprintf(stderr, "ERROR: Unrecognized option -%c\n", argv[i][j]);
|
||||||
|
UsageAndExit(1);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if (i < argc)
|
||||||
|
fileName = argv[i++];
|
||||||
|
if (i < argc)
|
||||||
|
manifestFileName = argv[i++];
|
||||||
|
if (fileName == NULL || i < argc)
|
||||||
|
UsageAndExit(1);
|
||||||
|
|
||||||
|
if ((result = UnMungeFile(fileName, manifestFileName,
|
||||||
|
forceOverwrite, forcePartialFiles)) != 0)
|
||||||
|
{
|
||||||
|
/* If result > 0, message should have already been printed */
|
||||||
|
if (result < 0)
|
||||||
|
fprintf(stderr, "ERROR: %s\n", strerror(result));
|
||||||
|
exit(1);
|
||||||
|
}
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Local Variables:
|
||||||
|
* tab-width: 4
|
||||||
|
* End:
|
||||||
|
* vi: ts=4 sw=4
|
||||||
|
* vim: si
|
||||||
|
*/
|
||||||
|
|
198
tools/util.c
Normal file
198
tools/util.c
Normal file
@ -0,0 +1,198 @@
|
|||||||
|
/*
|
||||||
|
* util.c -- Miscellaneous shared code/data
|
||||||
|
*
|
||||||
|
* Copyright (C) 1997 Pretty Good Privacy, Inc.
|
||||||
|
*
|
||||||
|
* Written by Mark H. Weaver
|
||||||
|
*
|
||||||
|
* $Id: util.c,v 1.11 1997/11/07 00:44:10 mhw Exp $
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include "util.h"
|
||||||
|
|
||||||
|
char const hexDigits[] = "0123456789abcdef";
|
||||||
|
char const radix64Digits[] =
|
||||||
|
#if 0 /* Standard */
|
||||||
|
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
|
||||||
|
#else /* Modified form that avoids hard-to-OCR characters */
|
||||||
|
"ABCDEFGHIJKLMNPQRSTVWXYZabcdehijklmnpqtuwy145689\\^!#$%&*+=/:<>?@";
|
||||||
|
#endif
|
||||||
|
|
||||||
|
signed char hexDigitsInv[256];
|
||||||
|
signed char radix64DigitsInv[256];
|
||||||
|
|
||||||
|
/* teun: moved intitialisation of all three CRCPoly's to initUtil() */
|
||||||
|
|
||||||
|
/* CRC-CCITT: x^16 + x^12 + x^5 + 1 */
|
||||||
|
CRCPoly crcCCITTPoly;
|
||||||
|
/*
|
||||||
|
* PRZ's magic 24-bit polynomial - (x+1) * (irreducible of degree 23)
|
||||||
|
* x^24 +x^23 +x^18 +x^17 +x^14 +x^11 +x^10 +x^7 +x^6 +x^5 +x^4 +x^3 +x +1
|
||||||
|
* (Developed by Neal Glover). Note: this is bit-reversed from the form
|
||||||
|
* used in PGP, 0x1864cfb.
|
||||||
|
*/
|
||||||
|
CRCPoly crc24Poly;
|
||||||
|
/* CRC-32: x^32+x^26+x^23+x^22+x^16+x^12+x^11+x^10+x^8+x^7+x^5+x^4+x^2+x+1 */
|
||||||
|
CRCPoly crc32Poly;
|
||||||
|
|
||||||
|
EncodeFormat const hexFormat =
|
||||||
|
{
|
||||||
|
NULL, /* nextFormat */
|
||||||
|
'-', /* headerTypeChar */
|
||||||
|
hexDigits, /* digits */
|
||||||
|
hexDigitsInv, /* digitsInv */
|
||||||
|
4, /* bitsPerDigit */
|
||||||
|
16, /* radix */
|
||||||
|
&crcCCITTPoly, /* lineCRC */
|
||||||
|
&crc32Poly, /* pageCRC */
|
||||||
|
8, /* runningCRCBits */
|
||||||
|
24, /* runningCRCShift */
|
||||||
|
0xFF /* runningCRCMask */
|
||||||
|
};
|
||||||
|
|
||||||
|
EncodeFormat const radix64Format =
|
||||||
|
{
|
||||||
|
&hexFormat, /* nextFormat */
|
||||||
|
'A', /* headerTypeChar */
|
||||||
|
radix64Digits, /* digits */
|
||||||
|
radix64DigitsInv, /* digitsInv */
|
||||||
|
6, /* bitsPerDigit */
|
||||||
|
64, /* radix */
|
||||||
|
&crc24Poly, /* lineCRC */
|
||||||
|
&crc32Poly, /* pageCRC */
|
||||||
|
12, /* runningCRCBits */
|
||||||
|
20, /* runningCRCShift */
|
||||||
|
0xFFF /* runningCRCMask */
|
||||||
|
};
|
||||||
|
|
||||||
|
EncodeFormat const * firstFormat = &radix64Format;
|
||||||
|
|
||||||
|
|
||||||
|
static void InitCRCPoly(CRCPoly *poly)
|
||||||
|
{
|
||||||
|
int i, oneBit;
|
||||||
|
CRC crc = 1;
|
||||||
|
|
||||||
|
poly->table[0] = 0;
|
||||||
|
for (oneBit = 0x80; oneBit > 0; oneBit >>= 1) {
|
||||||
|
crc = (crc >> 1) ^ ((crc & 1) ? poly->poly : 0);
|
||||||
|
for (i = 0; i < 0x100; i += 2 * oneBit)
|
||||||
|
poly->table[i + oneBit] = poly->table[i] ^ crc;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
CRC CalculateCRC(CRCPoly const *poly, CRC crc,
|
||||||
|
byte const *buffer, size_t length)
|
||||||
|
{
|
||||||
|
while (length--)
|
||||||
|
crc = (crc >> 8) ^ poly->table[(crc & 0xFF) ^ (*buffer++)];
|
||||||
|
return crc;
|
||||||
|
}
|
||||||
|
|
||||||
|
CRC ReverseCRC(CRCPoly const *poly, CRC crc, byte b)
|
||||||
|
{
|
||||||
|
int i, highBit = poly->highBit;
|
||||||
|
|
||||||
|
for (i = 0; i < 8; i++) {
|
||||||
|
if (crc & highBit) /* highBit is 2^(poly->bits-1) */
|
||||||
|
crc = ((crc ^ poly->poly) << 1) ^ 1;
|
||||||
|
else
|
||||||
|
crc <<= 1;
|
||||||
|
}
|
||||||
|
return crc ^ b;
|
||||||
|
}
|
||||||
|
|
||||||
|
static void InitDigitsInv(char const *digits, signed char *digitsInv)
|
||||||
|
{
|
||||||
|
int i;
|
||||||
|
|
||||||
|
for (i = 0; i < 256; i++)
|
||||||
|
digitsInv[i] = -1;
|
||||||
|
for (i = 0; digits[i]; i++)
|
||||||
|
digitsInv[(byte)digits[i]] = i;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Returns the number of chars encoded */
|
||||||
|
int EncodeCheckDigits(EncodeFormat const *fmt, word32 num,
|
||||||
|
int numBits, char *dest)
|
||||||
|
{
|
||||||
|
int destLen = EncodedLength(fmt, numBits);
|
||||||
|
word32 digitMask = fmt->radix - 1;
|
||||||
|
int i;
|
||||||
|
|
||||||
|
for (i = destLen - 1; i >= 0; i--)
|
||||||
|
{
|
||||||
|
dest[i] = EncodeDigit(fmt, num & digitMask);
|
||||||
|
num >>= fmt->bitsPerDigit;
|
||||||
|
}
|
||||||
|
return destLen;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Returns 1 if there's an error */
|
||||||
|
int DecodeCheckDigits(EncodeFormat const *fmt, char const *src, char **endPtr,
|
||||||
|
int numBits, word32 *valuePtr)
|
||||||
|
{
|
||||||
|
word32 value = 0;
|
||||||
|
int digitValue;
|
||||||
|
int i = EncodedLength(fmt, numBits);
|
||||||
|
|
||||||
|
while (i--)
|
||||||
|
{
|
||||||
|
digitValue = DecodeDigit(fmt, *src++);
|
||||||
|
if (digitValue < 0)
|
||||||
|
{
|
||||||
|
/* Invalid digit found */
|
||||||
|
*valuePtr = 0;
|
||||||
|
if (endPtr)
|
||||||
|
*endPtr = NULL;
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
value = (value << fmt->bitsPerDigit) | digitValue;
|
||||||
|
}
|
||||||
|
*valuePtr = value;
|
||||||
|
if (endPtr)
|
||||||
|
*endPtr = (char *)src;
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
EncodeFormat const *FindFormat(char headerTypeChar)
|
||||||
|
{
|
||||||
|
EncodeFormat const * fmt = firstFormat;
|
||||||
|
|
||||||
|
while (fmt && fmt->headerTypeChar != headerTypeChar)
|
||||||
|
fmt = fmt->nextFormat;
|
||||||
|
return fmt;
|
||||||
|
}
|
||||||
|
|
||||||
|
void InitUtil()
|
||||||
|
{
|
||||||
|
/* teun: removed "{ }" for MS VC compile */
|
||||||
|
|
||||||
|
crcCCITTPoly.bits = 16;
|
||||||
|
crcCCITTPoly.poly = 0x8408;
|
||||||
|
crcCCITTPoly.highBit = 0x8000;
|
||||||
|
|
||||||
|
crc24Poly.bits = 24;
|
||||||
|
crc24Poly.poly = 0xdf3261;
|
||||||
|
crc24Poly.highBit = 0x800000;
|
||||||
|
|
||||||
|
crc32Poly.bits = 32;
|
||||||
|
crc32Poly.poly = 0xedb88320;
|
||||||
|
crc32Poly.highBit = 0x80000000;
|
||||||
|
|
||||||
|
InitCRCPoly(&crcCCITTPoly);
|
||||||
|
InitCRCPoly(&crc24Poly);
|
||||||
|
InitCRCPoly(&crc32Poly);
|
||||||
|
InitDigitsInv(hexDigits, hexDigitsInv);
|
||||||
|
InitDigitsInv(radix64Digits, radix64DigitsInv);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Local Variables:
|
||||||
|
* tab-width: 4
|
||||||
|
* End:
|
||||||
|
* vi: ts=4 sw=4
|
||||||
|
* vim: si
|
||||||
|
*/
|
149
tools/util.h
Normal file
149
tools/util.h
Normal file
@ -0,0 +1,149 @@
|
|||||||
|
/*
|
||||||
|
* util.h -- Miscellaneous defines
|
||||||
|
*
|
||||||
|
* Copyright (C) 1997 Pretty Good Privacy, Inc.
|
||||||
|
*
|
||||||
|
* Written by Mark H. Weaver
|
||||||
|
*
|
||||||
|
* $Id: util.h,v 1.23 1997/11/12 23:28:56 mhw Exp $
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifndef UTIL_H
|
||||||
|
#define UTIL_H 1
|
||||||
|
|
||||||
|
typedef unsigned long word32;
|
||||||
|
typedef unsigned short word16;
|
||||||
|
typedef unsigned char byte;
|
||||||
|
|
||||||
|
#define FMT32 "%08lx"
|
||||||
|
#define FMT16 "%04x"
|
||||||
|
#define FMT8 "%02x"
|
||||||
|
|
||||||
|
#define TAB_CHAR '\244' /* Currency symbol, like o in top of x */
|
||||||
|
#define TAB_STRING "\244"
|
||||||
|
#define TAB_PAD_CHAR ' ' /* The fact that this is space has leaked. */
|
||||||
|
#define TAB_PAD_STRING " " /* It may not be freely changed. */
|
||||||
|
#define FORMFEED_CHAR '\245' /* Yen symbol, like = on top of Y */
|
||||||
|
#define FORMFEED_STRING "\245"
|
||||||
|
#define SPACE_CHAR '\267' /* Middle dot, or bullet */
|
||||||
|
#define SPACE_STRING "\267"
|
||||||
|
#define CONTIN_CHAR '\266' /* Pilcrow (paragraph symbol) */
|
||||||
|
#define CONTIN_STRING "\266"
|
||||||
|
|
||||||
|
#define BYTES_PER_LINE 60 /* When using radix 64 */
|
||||||
|
|
||||||
|
#define LINES_PER_PAGE 72 /* Exclusive of 2 header lines */
|
||||||
|
#define LINE_LENGTH 80
|
||||||
|
#define PREFIX_LENGTH 7 /* Length of prefix, including the space */
|
||||||
|
|
||||||
|
#define HDR_PREFIX_CHAR '-'
|
||||||
|
#define RADIX64_END_CHAR '-'
|
||||||
|
|
||||||
|
typedef struct EncodeFormat EncodeFormat;
|
||||||
|
typedef word32 CRC;
|
||||||
|
typedef word16 CRCFragment;
|
||||||
|
|
||||||
|
typedef struct
|
||||||
|
{
|
||||||
|
CRC table[256];
|
||||||
|
int bits;
|
||||||
|
CRC poly;
|
||||||
|
CRC highBit;
|
||||||
|
} CRCPoly;
|
||||||
|
|
||||||
|
struct EncodeFormat
|
||||||
|
{
|
||||||
|
EncodeFormat const *nextFormat;
|
||||||
|
char headerTypeChar;
|
||||||
|
char const * digits;
|
||||||
|
signed char const * digitsInv;
|
||||||
|
int bitsPerDigit;
|
||||||
|
int radix;
|
||||||
|
CRCPoly const * lineCRC;
|
||||||
|
CRCPoly const * pageCRC;
|
||||||
|
int runningCRCBits;
|
||||||
|
int runningCRCShift;
|
||||||
|
int runningCRCMask;
|
||||||
|
};
|
||||||
|
|
||||||
|
|
||||||
|
#define HDR_ENC_LENGTH 19 /* Length of encoded prefix on header */
|
||||||
|
|
||||||
|
#define HDR_VERSION_BITS 4
|
||||||
|
#define HDR_FLAG_BITS 8
|
||||||
|
/* Page CRC bits omitted, since it's not constant */
|
||||||
|
#define HDR_TABWIDTH_BITS 4
|
||||||
|
#define HDR_PRODNUM_BITS 12
|
||||||
|
#define HDR_FILENUM_BITS 16
|
||||||
|
|
||||||
|
|
||||||
|
/* Enough to hold one whole page of munged data */
|
||||||
|
/* There is no point making this excessively too large */
|
||||||
|
#define PAGE_BUFFER_SIZE 8192
|
||||||
|
|
||||||
|
#if PAGE_BUFFER_SIZE < (LINES_PER_PAGE + 2) * (LINE_LENGTH + PREFIX_LENGTH + 2)
|
||||||
|
#error PAGE_BUFFER_SIZE is too small
|
||||||
|
#endif
|
||||||
|
|
||||||
|
|
||||||
|
/* Header flags */
|
||||||
|
#define HDR_FLAG_LASTPAGE 0x01 /* Indicates last page of file */
|
||||||
|
|
||||||
|
|
||||||
|
#define elemsof(array) (sizeof(array)/sizeof(*(array)))
|
||||||
|
|
||||||
|
|
||||||
|
extern char const hexDigits[];
|
||||||
|
extern char const radix64Digits[];
|
||||||
|
|
||||||
|
extern signed char hexDigitsInv[256];
|
||||||
|
extern signed char radix64DigitsInv[256];
|
||||||
|
|
||||||
|
extern CRCPoly crcCCITTPoly, crc24Poly, crc32Poly;
|
||||||
|
|
||||||
|
extern EncodeFormat const hexFormat, radix64Format;
|
||||||
|
extern EncodeFormat const * firstFormat;
|
||||||
|
|
||||||
|
|
||||||
|
#define HexDigitValue(ch) hexDigitsInv[(byte)(ch)]
|
||||||
|
#define Radix64DigitValue(ch) radix64DigitsInv[(byte)(ch)]
|
||||||
|
|
||||||
|
/* Returns the number of chars needed to encode the given number of bits */
|
||||||
|
#define EncodedLength(fmt, numBits) \
|
||||||
|
(((numBits) + (fmt)->bitsPerDigit - 1) / (fmt)->bitsPerDigit)
|
||||||
|
#define EncodeDigit(fmt, value) ((fmt)->digits[value])
|
||||||
|
#define DecodeDigit(fmt, digit) ((fmt)->digitsInv[(byte)digit])
|
||||||
|
|
||||||
|
#define AdvanceCRC(poly, crc, b) \
|
||||||
|
((crc) >> 8) ^ (poly)->table[((crc) ^ (b)) & 0xFF]
|
||||||
|
|
||||||
|
#define RunningCRCFromPageCRC(fmt, pageCRC) \
|
||||||
|
(((pageCRC) >> (fmt)->runningCRCShift) & (fmt)->runningCRCMask)
|
||||||
|
|
||||||
|
|
||||||
|
CRC CalculateCRC(CRCPoly const *poly, CRC crc,
|
||||||
|
byte const *buffer, size_t length);
|
||||||
|
CRC ReverseCRC(CRCPoly const *poly, CRC crc, byte b);
|
||||||
|
|
||||||
|
/* Returns the number of chars encoded */
|
||||||
|
int EncodeCheckDigits(EncodeFormat const *fmt, word32 num,
|
||||||
|
int numBits, char *dest);
|
||||||
|
|
||||||
|
/* Returns 1 if there's an error */
|
||||||
|
int DecodeCheckDigits(EncodeFormat const *fmt, char const *src, char **endPtr,
|
||||||
|
int numBits, word32 *valuePtr);
|
||||||
|
|
||||||
|
EncodeFormat const *FindFormat(char headerTypeChar);
|
||||||
|
|
||||||
|
void InitUtil();
|
||||||
|
|
||||||
|
|
||||||
|
#endif /* !UTIL_H */
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Local Variables:
|
||||||
|
* tab-width: 4
|
||||||
|
* End:
|
||||||
|
* vi: ts=4 sw=4
|
||||||
|
* vim: si
|
||||||
|
*/
|
286
tools/yapp
Normal file
286
tools/yapp
Normal file
@ -0,0 +1,286 @@
|
|||||||
|
#!/usr/bin/perl
|
||||||
|
#
|
||||||
|
# Yet another preprocessor
|
||||||
|
#
|
||||||
|
# $Id: yapp,v 1.5 1997/10/24 07:51:05 mhw Exp $
|
||||||
|
#
|
||||||
|
|
||||||
|
%vars = ('' => '$');
|
||||||
|
@incPath = (".");
|
||||||
|
|
||||||
|
sub Error
|
||||||
|
{
|
||||||
|
print STDERR $_[0], "\n";
|
||||||
|
exit(1);
|
||||||
|
}
|
||||||
|
|
||||||
|
sub VarSubst
|
||||||
|
{
|
||||||
|
my ($varName, $undefOkay) = @_;
|
||||||
|
|
||||||
|
if (defined($vars{$varName}))
|
||||||
|
{
|
||||||
|
return $vars{$varName};
|
||||||
|
}
|
||||||
|
elsif (!$undefOkay)
|
||||||
|
{
|
||||||
|
&Error("Undefined variable '$varName' in $fileName line $.");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
sub NullFilter
|
||||||
|
{
|
||||||
|
0;
|
||||||
|
}
|
||||||
|
|
||||||
|
sub IfFilter
|
||||||
|
{
|
||||||
|
local $_ = $_[0];
|
||||||
|
|
||||||
|
if (/^##else(\s+.*)?/)
|
||||||
|
{
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
elsif (/^##endif(\s+.*)?/)
|
||||||
|
{
|
||||||
|
return 2;
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
sub DoFile
|
||||||
|
{
|
||||||
|
local $fileName = $_[0];
|
||||||
|
my $path;
|
||||||
|
local *FILE;
|
||||||
|
|
||||||
|
if ($fileName =~ m|^/|)
|
||||||
|
{
|
||||||
|
$path = $fileName;
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
for $dir (@incPath)
|
||||||
|
{
|
||||||
|
if (-e "$dir/$fileName")
|
||||||
|
{
|
||||||
|
$path = "$dir/$fileName";
|
||||||
|
last;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if ($path eq "")
|
||||||
|
{
|
||||||
|
&Error("Can't find '$fileName', from $fileName line $.");
|
||||||
|
}
|
||||||
|
|
||||||
|
open(FILE, "<$path") || &Error("Can't open $path: $!");
|
||||||
|
&DoOpenFile(*FILE, *NullFilter, 0);
|
||||||
|
close(FILE) || die;
|
||||||
|
0;
|
||||||
|
}
|
||||||
|
|
||||||
|
sub DoPrepass
|
||||||
|
{
|
||||||
|
local ($_, $skipFlag) = @_;
|
||||||
|
|
||||||
|
return "" if /^###/;
|
||||||
|
s/\s*###.*//; # Strip comments
|
||||||
|
s/\${(\w+)}/&VarSubst($1, $skipFlag)/eg; # Do variable substitutions
|
||||||
|
$_;
|
||||||
|
}
|
||||||
|
|
||||||
|
sub DoOpenFile
|
||||||
|
{
|
||||||
|
local *FILE = $_[0];
|
||||||
|
local *filter = $_[1];
|
||||||
|
my $skipFlag = $_[2];
|
||||||
|
my $result;
|
||||||
|
local $_;
|
||||||
|
|
||||||
|
while (<FILE>)
|
||||||
|
{
|
||||||
|
$_ = &DoPrepass($_, $skipFlag);
|
||||||
|
if ($result = &filter($_))
|
||||||
|
{
|
||||||
|
return $result;
|
||||||
|
}
|
||||||
|
elsif (/^##(\w*)(\s+(.*))?/)
|
||||||
|
{
|
||||||
|
my ($cmd, $params) = ($1, $3);
|
||||||
|
|
||||||
|
if ($cmd =~ /^if/)
|
||||||
|
{
|
||||||
|
my $condition;
|
||||||
|
my $ifStartLine = $.;
|
||||||
|
|
||||||
|
if ($cmd eq "if")
|
||||||
|
{
|
||||||
|
if ($params =~ /^(\d+)\s*$/)
|
||||||
|
{
|
||||||
|
$condition = int($1);
|
||||||
|
}
|
||||||
|
elsif ($params =~ /^(\d+)\s*([=!]=|[<>]=?)\s*(\d+)\s*$/)
|
||||||
|
{
|
||||||
|
my ($left, $op, $right) = ($1, $2, $3);
|
||||||
|
|
||||||
|
$condition = eval($left . $op . $right);
|
||||||
|
}
|
||||||
|
elsif ($params =~ /^(\S+)\s*(eq|ne)\s*(\S+)\s*$/)
|
||||||
|
{
|
||||||
|
my ($left, $op, $right) = ($1, $2, $3);
|
||||||
|
|
||||||
|
$left =~ s/([\\'])/\\$1/g;
|
||||||
|
$right =~ s/([\\'])/\\$1/g;
|
||||||
|
$condition = eval("'$left' $op '$right'");
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
&Error("Invalid ##if params: '$params' " .
|
||||||
|
"in $fileName line $.");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
elsif ($cmd =~ /^ifn?def$/)
|
||||||
|
{
|
||||||
|
if ($params =~ /^(\w+)\s*$/)
|
||||||
|
{
|
||||||
|
$condition = defined($vars{$1});
|
||||||
|
$condition = !$condition if ($cmd eq "ifndef");
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
&Error("Invalid ##$cmd param: '$params' " .
|
||||||
|
"in $fileName line $.");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
# Do main body of if
|
||||||
|
$result = &DoOpenFile(*FILE, *IfFilter,
|
||||||
|
$skipFlag || !$condition);
|
||||||
|
|
||||||
|
if ($result == 1) # an '##else' was found
|
||||||
|
{
|
||||||
|
# Handle else
|
||||||
|
$result = &DoOpenFile(*FILE, *IfFilter,
|
||||||
|
$skipFlag || $condition);
|
||||||
|
}
|
||||||
|
|
||||||
|
if ($result == 1) # a second '##else' was found
|
||||||
|
{
|
||||||
|
&Error("Two ##else's in a row in $fileName line $.");
|
||||||
|
}
|
||||||
|
elsif ($result == 0) # EOF was encountered
|
||||||
|
{
|
||||||
|
&Error("Unterminated ##if " .
|
||||||
|
"in $fileName line $ifStartLine");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
elsif ($cmd eq "include")
|
||||||
|
{
|
||||||
|
if ($skipFlag)
|
||||||
|
{
|
||||||
|
}
|
||||||
|
elsif ($params =~ /^"(.*)"\s*$/)
|
||||||
|
{
|
||||||
|
my $incFile = $1;
|
||||||
|
|
||||||
|
&DoFile($incFile);
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
&Error("Invalid ##include params: '$params'");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
elsif ($cmd eq "set")
|
||||||
|
{
|
||||||
|
if ($params =~ /^(\w+)=<<(")(.*)"\s*$/ or
|
||||||
|
$params =~ /^(\w+)=<<(')(.*)'\s*$/)
|
||||||
|
{
|
||||||
|
my $varName = $1;
|
||||||
|
my $quoteChar = $2;
|
||||||
|
my $endTag = $3 . "\n";
|
||||||
|
my $value;
|
||||||
|
|
||||||
|
while (<FILE>)
|
||||||
|
{
|
||||||
|
if ($_ eq $endTag)
|
||||||
|
{
|
||||||
|
chop $value;
|
||||||
|
last;
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
if ($quoteChar eq '"')
|
||||||
|
{
|
||||||
|
$_ = &DoPrepass($_, $skipFlag);
|
||||||
|
}
|
||||||
|
$value .= $_;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (!$skipFlag)
|
||||||
|
{
|
||||||
|
$vars{$varName} = $value;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
elsif ($params =~ /^(\w+)="(.*)"\s*$/ or
|
||||||
|
$params =~ /^(\w+)=(\S*)\s*$/)
|
||||||
|
{
|
||||||
|
if (!$skipFlag)
|
||||||
|
{
|
||||||
|
$vars{$1} = $2;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
&Error("Invalid ##set command: '$params'");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
&Error("Unrecognized command: '$_'");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
elsif (!$skipFlag)
|
||||||
|
{
|
||||||
|
print;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
$optEnable = 1;
|
||||||
|
|
||||||
|
foreach (@ARGV)
|
||||||
|
{
|
||||||
|
if ($optEnable and /^-/)
|
||||||
|
{
|
||||||
|
if (/^--$/)
|
||||||
|
{
|
||||||
|
$optEnable = 0;
|
||||||
|
}
|
||||||
|
elsif (/^-D(\w+)=(.*)$/)
|
||||||
|
{
|
||||||
|
$vars{$1} = $2;
|
||||||
|
}
|
||||||
|
elsif (/^-I(.*)$/)
|
||||||
|
{
|
||||||
|
unshift @incPath, $1;
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
&Error("Unrecognized option: '$_'");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
&DoFile($_);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#
|
||||||
|
# vi: ai ts=4
|
||||||
|
# vim: si
|
||||||
|
#
|
48
tools/yapp.doc
Normal file
48
tools/yapp.doc
Normal file
@ -0,0 +1,48 @@
|
|||||||
|
YAPP is a simple macro preprocessor designed to do minor tweaking to
|
||||||
|
another program's inputs.
|
||||||
|
|
||||||
|
In its input, anything of the form ${foo} is expanded with the variable
|
||||||
|
named foo. It is an error if ${foo} is not defined.
|
||||||
|
If you need to escape a dollar sign for some reason, the variable
|
||||||
|
with the empty string name , ${}, has the value "$".
|
||||||
|
|
||||||
|
The result of macro expansion is *not* re-expanded. Expansion is done only
|
||||||
|
when definitions are made.
|
||||||
|
|
||||||
|
After variable expansion, lines are checked to see if they are control lines.
|
||||||
|
Control lines begin with ## (after optional leading whitespace) All such lines are deleted and
|
||||||
|
do not appear in the output. ### is a comment. Other options
|
||||||
|
are:
|
||||||
|
|
||||||
|
##set variable=value
|
||||||
|
|
||||||
|
value may have one of the following forms:
|
||||||
|
token: Trailing whitespace is stripped. The token may not contain
|
||||||
|
any whitespace. Use quotes if it's complicated.
|
||||||
|
"string": The string may have embedded quotes, and whitespace after
|
||||||
|
the closing quote.
|
||||||
|
<<"DELIM": This is a here-document, and the value is all of the following
|
||||||
|
lines up until, but not including, the newline that precedes a line
|
||||||
|
that consists soley of DELIM, for any DELIM string.
|
||||||
|
The Delim must be in quotes. You have two options:
|
||||||
|
"DELIM": Expand macros in the body of the here-document.
|
||||||
|
'DELIM': Do not expand macros in the here-document.
|
||||||
|
|
||||||
|
##include "filename": Insert the named file in place of the current line.
|
||||||
|
|
||||||
|
##if num == num
|
||||||
|
##if num != num
|
||||||
|
##if num < num
|
||||||
|
##if num > num
|
||||||
|
##if num <= num
|
||||||
|
##if num >= num
|
||||||
|
##if token eq token
|
||||||
|
##if token ne token
|
||||||
|
##ifdef symbol
|
||||||
|
##ifndef symbol
|
||||||
|
##else
|
||||||
|
##endif
|
||||||
|
You can figure this one out. Macros in between are expanded as usual
|
||||||
|
(so the ##else or ##endif may be in a macro expansion), but the result
|
||||||
|
is ignored. String comparison is allowed only between simple words.
|
||||||
|
#ifdef symbol is true if ${symbol} is defined.
|
Loading…
Reference in New Issue
Block a user