initial commit
This commit is contained in:
commit
60052b2f16
32
MANIFEST
Normal file
32
MANIFEST
Normal file
@ -0,0 +1,32 @@
|
||||
1 test-file
|
||||
2 MANIFEST
|
||||
D books/
|
||||
D books/tools/
|
||||
3 bootstrap
|
||||
4 bootstrap2
|
||||
5 sortpages
|
||||
6 Makefile
|
||||
7 heap.c
|
||||
8 heap.h
|
||||
9 mempool.c
|
||||
10 mempool.h
|
||||
11 util.c
|
||||
12 util.h
|
||||
13 repair.c
|
||||
14 subst.c
|
||||
15 subst.h
|
||||
16 unmunge.c
|
||||
17 munge.c
|
||||
18 yapp.doc
|
||||
19 yapp
|
||||
20 psgen
|
||||
21 makemanifest
|
||||
D books/ps/
|
||||
22 prolog.ps
|
||||
23 charmap.ps
|
||||
D books/example/
|
||||
24 Makefile
|
||||
25 .cvsignore
|
||||
26 filelist
|
||||
27 footer.ps
|
||||
28 us-constitution.gz
|
477
README
Normal file
477
README
Normal file
@ -0,0 +1,477 @@
|
||||
PREFACE
|
||||
-------
|
||||
|
||||
This book grew out of a project to publish source code for cryptographic
|
||||
software, namely PGP (Pretty Good Privacy), a software package for the
|
||||
encryption of electronic mail and computer files. PGP is the most widely
|
||||
used software in the world for email encryption. Pretty Good Privacy, Inc
|
||||
(or "PGP") has published the source code of PGP for peer review, a long-
|
||||
standing tradition in the history of PGP. The first time a fully implemented
|
||||
cryptographic software package was published in its entirety in book form
|
||||
was "PGP Source Code and Internals," by Philip Zimmermann, published by The
|
||||
MIT Press, 1995, ISBN 0-262-24039-4.
|
||||
|
||||
Peer review of the source code is important to get users to trust the
|
||||
software, since any weaknesses can be detected by knowledgeable experts who
|
||||
make the effort to review the code. But peer review cannot be completely
|
||||
effective unless the experts conducting the review can compile and test the
|
||||
software, and verify that it is the same as the software products that are
|
||||
published electronically. To facilitate that, PGP publishes its source code
|
||||
in printed form that can be scanned into a computer via OCR (optical
|
||||
character recognition) technology.
|
||||
|
||||
Why not publish the source code in electronic form? As you may know,
|
||||
cryptographic software is subject to U.S. export control laws and
|
||||
regulations. The new 1997 Commerce Department Export Administration
|
||||
Regulations (EAR) explicitly provide that "A printed book or other printed
|
||||
material setting forth encryption source code is not itself subject to the
|
||||
EAR." (see 15 C.F.R. §734.3(b)(2)). PGP, in an overabundance of caution,
|
||||
has only made available its source code in a form that is not subject to
|
||||
those regulations. So, books containing cryptographic source code may be
|
||||
published, and after they are published they may be exported, but only
|
||||
while they are still in printed form.
|
||||
|
||||
Electronic commerce on the Internet cannot fully be successful without
|
||||
strong cryptography. Cryptography is important for protecting our privacy,
|
||||
civil liberties, and the security of our personal and business transactions
|
||||
in the information age. The widespread deployment of strong cryptography
|
||||
can help us regain some of the privacy and security that we have lost due
|
||||
to information technology. Further, strong cryptography (in the form of
|
||||
PGP) has already proven itself to be a valuable tool for the protection of
|
||||
human rights in oppressive countries around the world, by keeping those
|
||||
governments from reading the communications of human rights workers.
|
||||
|
||||
This book of tools contains no cryptographic software of any kind, nor does
|
||||
it call, connect, nor integrate in any way with cryptographic software. But
|
||||
it does contain tools that make it easy to publish source code in book form.
|
||||
And it makes it easy to scan such source code in with OCR software rapidly
|
||||
and accurately.
|
||||
|
||||
Philip Zimmermann
|
||||
prz@acm.org
|
||||
|
||||
November 1997
|
||||
|
||||
|
||||
|
||||
INTRODUCTION
|
||||
------------
|
||||
|
||||
This book contains tools for printing computer source code on paper in
|
||||
human-readable form and reconstructing it exactly using automated tools.
|
||||
While standard OCR software can recover most of the graphic characters,
|
||||
non-printing characters like tabs, spaces, newlines and form feeds cause
|
||||
problems.
|
||||
|
||||
In fact, these tools can print any ASCII text file; it's just that the
|
||||
attention these tools pay to spacing is particularly valuable for computer
|
||||
source code. The two-dimensional indentation structure of source code is
|
||||
very important to its comprehensibility. In some cases, distinctions
|
||||
between non-printing characters are critical: the standard make utility
|
||||
will not accept spaces where it expects to see a tab character.
|
||||
|
||||
Producing a byte-for-byte identical copy of the original is also valuable
|
||||
for authentication, as you can verify a checksum.
|
||||
|
||||
There are five problems we have addressed:
|
||||
|
||||
1. Getting good OCR accuracy.
|
||||
2. Preserving whitespace.
|
||||
3. Preserving lines longer than can be printed on the page.
|
||||
4. Dealing with data that isn't human-readable.
|
||||
5. Detecting and correcting any residual errors.
|
||||
|
||||
The first problem is partly addressed by using a font designed for OCR
|
||||
purposes, OCR-B. OCR-A is a very ugly font that contains only the digits 0
|
||||
through 9 and a few special punctuation symbols. OCR-B is a very readable
|
||||
monospaced font that contains a full ASCII set, and has been popular as a
|
||||
font on line printers for years because it distinguishes ambiguous
|
||||
characters and is clear even if fuzzy or distorted.
|
||||
|
||||
The most unusual thing about the OCR-B font is the way that it prints a
|
||||
lower-case letter 1, with a small hook on the bottom, something like an
|
||||
upper-case L. This is to distinguish it from the numeral 1. We also made
|
||||
some modifications to the font, to print the numeral 0 with a slash, and
|
||||
to print the vertical bar in a broken form. Both of these are such common
|
||||
variants that they should not present any intelligibility barrier. Finally,
|
||||
we print the underscore character in a distinct manner that is hopefully
|
||||
not visually distracting, but is clearly distinguishable from the minus
|
||||
sign even in the absence of a baseline reference.
|
||||
|
||||
The most significant part of getting good OCR accuracy is, however, using
|
||||
the OCR tools well. We've done a lot of testing and experimentation and
|
||||
present here a lot of information on what works and what doesn't.
|
||||
|
||||
To preserve whitespace, we added some special symbols to display spaces,
|
||||
tabs, and form feeds. A space is printed as a small triangular dot
|
||||
character, while a hollow rightward-pointing triangle (followed by blank
|
||||
spaces to the right tab stop) signifies a tab. A form feed is printed as
|
||||
a yen symbol, and the printed line is broken after the form feed.
|
||||
|
||||
Making the dot triangular instead of square helps distinguish it from a
|
||||
period. To reduce the clutter on the page and make the text more readable,
|
||||
the space character is only printed as a small dot if it follows a blank
|
||||
on the page (a tab or another space), or comes immediately before the end
|
||||
of the line. Thus, the reader (human or software) must be able to
|
||||
distinguish one space from no spaces, but can find multiple spaces by
|
||||
counting the dots (and adding one).
|
||||
|
||||
The format is designed so that 80 characters, plus checksums, can be
|
||||
printed on one line of an 8.5x11" (or A4) page, the still-common punched
|
||||
card line length. Longer lines are managed with the simple technique of
|
||||
appending a big ugly black blob to the first part of the line indicating
|
||||
that the next printed line should be concatenated with the current one
|
||||
with no intervening newline. Hopefully, its use is infrequent.
|
||||
|
||||
While ASCII text is by far the most popular form, some source code is not
|
||||
readable in the usual way. It may be an audio clip, a graphic image bitmap,
|
||||
or something else that is manipulated with a specialized editing tool. For
|
||||
printing purposes, these tools just print any such files as a long string
|
||||
of gibberish in a 64-character set designed to be easy to OCR unambiguously.
|
||||
Although the tools recognize such binary data and apply extra consistency
|
||||
checks, that can be considered a separate step.
|
||||
|
||||
Finally, the problem of residual errors arises. OCR software is not perfect,
|
||||
and uses a variety of heuristics and spelling-check dictionaries to clean up
|
||||
any residual errors in human-language text. This isn't reliable enough for
|
||||
source code, so we have added per-page and per-line checksums to the printed
|
||||
material, and a series of tools to use those checksums to correct any
|
||||
remaining errors and convert the scanned text into a series of files again.
|
||||
|
||||
This "munged" form is what you see in most of the body of this book. We
|
||||
think it does a good job of presenting source code in a way that can be read
|
||||
easily by both humans and computers.
|
||||
|
||||
The tools are command-line oriented and a bit clunky. This has a purpose
|
||||
beyond laziness on the authors' parts: it keeps them small. Keeping them
|
||||
small makes the "bootstrapping" part of scanning this book easier, since you
|
||||
don't have the tools to help you with that.
|
||||
|
||||
|
||||
|
||||
SCANNING
|
||||
--------
|
||||
|
||||
Our tests were done with OmniPage 7.0 on a Power Macintosh 8500/120 and an
|
||||
HP ScanJet 4c scanner with an automatic document feeder. The first part of
|
||||
this is heavily OmniPage-specific, as that appears to be the most widely
|
||||
available OCR software.
|
||||
|
||||
The tools here were developed under Linux, and should be generally portable
|
||||
to any Unix platform. Since this book is about printing and scanning source
|
||||
code, we assume the readers have enough programming background to know how
|
||||
to build a program from a Makefile, understand the hazards of CR, LF or CRLF
|
||||
line endings, and such minor details without explicit mention.
|
||||
|
||||
The first step to getting OrnniPage 7 to work well is to set it up with
|
||||
options to disable all of its more advanced features for preserving font
|
||||
changes and formatting. Look in the Seffings menu.
|
||||
|
||||
· Create a Zone Contents File with all of ASCII in it, plus the extra
|
||||
bullet, currency, yen and pilcrow symbols. Name it "Source Code".
|
||||
· Create a Source Code style set. Within it, create a Source Code zone style
|
||||
and make it the default.
|
||||
· Set the font to something fixed-width, like Courier.
|
||||
· Set a fixed font size (10 point) and plain text, left-aligned.
|
||||
· Set the tab character to a space.
|
||||
· Set the text flow to hard line returns.
|
||||
· Set the margins to their widest.
|
||||
· The font mapping options are irrelevant.
|
||||
|
||||
Go to the settings panel and:
|
||||
|
||||
· Under Scanner, set the brightness to manual. With careful setting of the
|
||||
threshold, this generates much better results than either the automatic
|
||||
threshold or the 3D OCR. Around 144 has been a good setting for us; you
|
||||
may want to start there.
|
||||
· Under OCR, you'll build a training file to use later, but turn off
|
||||
automatic page orientation and select your Source Code style set in the
|
||||
Output Options. Also set a reasonable reject character. (For test, we
|
||||
used the pi symbol, which came across from the Macintosh as a weird
|
||||
sequence, but you can use anything as long as you make the appropriate
|
||||
definition in subst.c.)
|
||||
|
||||
Do an initial scan of a few pages and create a manual zone encompassing
|
||||
all of the text. Leave some margin for page misalignment, and leave space
|
||||
on the sides for the left-right shift caused by the book binding being in
|
||||
different places on odd and even pages.
|
||||
|
||||
Set the Zone Contents and the Style set to the Source Code settings. After
|
||||
setting the Style Set, the Zone Style should be automatically set correctly
|
||||
(since you set Source Code as the default).
|
||||
|
||||
Then save the Zone Template, and in the pop-up menu under the Zone step on
|
||||
the main toolbar you can now select it.
|
||||
|
||||
Now we're ready to get characters recognized. The first results will be
|
||||
terrible, with lots of red (unrecognizable) and green (suspicious) text in
|
||||
the recognized window. Some tweaking will improve this enormously.
|
||||
|
||||
The first step is setting a good black threshold. Auto brightness sets the
|
||||
threshold too low, making the character outlines bleed and picking up a lot
|
||||
of glitches on mostly-blank pages. Try training OCR on the few pages you've
|
||||
scanned and look at the representative characters. Adjust the threshold so
|
||||
the strokes are clear and distinct, neither so thin they are broken nor so
|
||||
think they smear into each other. The character that bleeds worst is
|
||||
lowercase w, while the underscore and tab symbols have the thinnest lines
|
||||
that need worry.
|
||||
|
||||
You'll have to re-scan (you can just click the AUTO button) until you get
|
||||
satisfactory results.
|
||||
|
||||
The next step is training. You should scan a significant number of pages
|
||||
and teach OmniPage about any characters it has difficulty with. There are
|
||||
several characters which have been printed in unusual ways which you must
|
||||
teach OmniPage about before it can recognize them reliably. We also have
|
||||
some characters that are unique, which the tools expect to be mapped to
|
||||
specific Latin-1 characters to be processed.
|
||||
|
||||
They characters most in need of training are as follows:
|
||||
|
||||
· Zero is printed 'slashed.'
|
||||
· Lowercase L has a curled tail to distinguish it clearly from other
|
||||
vertical characters like 1 and I.
|
||||
· The or-bar or pipe symbol '|' is printed "broken" with a gap in the
|
||||
middle to distinguish it similarly.
|
||||
· The underscore character has little "serifs" on the end to distinguish
|
||||
it from a minus sign. We also raised it a just a tad higher than the
|
||||
normal underscore character, which was too low in the character cell to
|
||||
be reliably seen by OmniPage.
|
||||
· Tabs are printed as a hollow right-pointing triangle, followed by blanks
|
||||
to the correct alignment position. If not trained enough, OmniPage
|
||||
guesses this is a capital D. You should train OmniPage to recognize this
|
||||
symbol as a currency symbol (Latin-1 244).
|
||||
· Any spaces in the original that follow a space, or a blank on the printed
|
||||
page, are printed as a tiny black triangle. You should train OmniPage to
|
||||
recognize this as a center dot or bullet (Latin-1 267). We didn't use a
|
||||
standard center dot because OmniPage confused it with a period.
|
||||
· Any form feeds in the original are printed as a yen currency symbol
|
||||
(Latin-1 245).
|
||||
· Lines over 80 columns long are broken after 79 columns by appending a big
|
||||
ugly black block. You should train OmniPage to recognize this as a
|
||||
pilcrow (paragraph symbol, Latin-1 266). We did this because after
|
||||
deciding something black and visible was suitable, we found out the font
|
||||
we used doesn't have a pilcrow in it.
|
||||
|
||||
The zero and the tab character, because of their frequency, deserve special
|
||||
attention.
|
||||
|
||||
In addition, look for any unrecognized characters (in red) and retrain those
|
||||
pages. If you get an unrecognized character, that character needs training,
|
||||
but Caere says that "good examples" are best to train on, so if the training
|
||||
doesn't recognize a slightly fuzzy K, and there's a nice crisp K available
|
||||
to train on, use that.
|
||||
|
||||
Other things that need training:
|
||||
|
||||
· ~ (tilde), ^ (caret), ` (backquote) and ' (quote). These get dropped
|
||||
frequently unless you train them.
|
||||
· i, j and; (semicolon). These get mixed up.
|
||||
· 3 and S. These also get mixed up.
|
||||
· Q can fail to be recognized.
|
||||
· C and [ can be confused.
|
||||
· c/C, o/O, p/P, s/S, u/U, v/V, w/W, y/Y and z/Z are often confused. This
|
||||
can be helped by some training.
|
||||
· r gets confused with c and n. I don't understand c, but it happens.
|
||||
· f gets confused with i.
|
||||
|
||||
The OCR training pages have lots of useful examples of troublesome
|
||||
characters. Scan a few pages of material, training each page, then scan a
|
||||
few dozen pages and look for recognition problems. Look for what OmniPage
|
||||
reports as troublesome, and when you have the repair program working, use
|
||||
it to find and report further errors. Train a few pages particularly dense
|
||||
in problems and append the troublesome characters to the training file, the
|
||||
re-recognize the lot.
|
||||
|
||||
Double-check your training file for case errors. It's easy to miss the shift
|
||||
key in the middle of a lot of training and will result in terrible results
|
||||
even though OmniPage won't report anything amiss. We have spent a while
|
||||
wondering why OmniPage wasn't recognizing capital S or capital W, only to
|
||||
find that OmniPage was just doing what it was trained to do.
|
||||
|
||||
We have heard some reports that OmniPage has problems with large training
|
||||
files. We have observed OmniPage suffering repeatable internal errors
|
||||
sometimes after massive training additions, but they were cured by deleting
|
||||
a few training images. Appending more training images to the training file
|
||||
did not cause the problem to re-appear.
|
||||
|
||||
Repairing the OCR results
|
||||
|
||||
If the only copy of the tools you have is printed in this book, see the next
|
||||
chapter on bootstrapping at this point. Here, we assume that you have the
|
||||
tools and they work.
|
||||
|
||||
When you have some reasonable OCR results, delete any directory pages. With
|
||||
no checksum information, they just confuse the postprocessing tools. (The
|
||||
tools will just stop with an error when they get to the "uncorrectable"
|
||||
directory name and you'll have to delete it then, so it's not fatal if you
|
||||
forget.) Copy the data to a machine that you have the repair and unmunge
|
||||
utilities on.
|
||||
|
||||
The repair utility attempts automatic table-driven correction of common
|
||||
scanning errors. You have to recompile it to change the tables, but are
|
||||
encouraged to if you find a common problem that it does not correct reliably.
|
||||
If it gets stuck, it will deposit you into your favorite editor on or
|
||||
slightly after the offending line. (The file you will be editing is the
|
||||
unprocessed portion of the input.) After you correct the problem and quit
|
||||
the editor, repair will resume.
|
||||
|
||||
"Your favorite editor" is taken from the $VISUAL and $EDITOR environment
|
||||
variables, or the -e option to repair.
|
||||
|
||||
The repair utility never alters the original input file. It will produce
|
||||
corrected output for file in file.out, and when it has to stop, it writes
|
||||
any remaining uncorrected input back out to file.in (via a temporary
|
||||
file.dump) and lets you edit this file. If you re-run repair on file and
|
||||
file.in exists, repair will restart from there, so you may safely quit and
|
||||
re-run repair as often as you like. (But if you change the input file, you
|
||||
need to delete the .in file for repair to notice the change.)
|
||||
|
||||
Statistics on repair's work are printed to file.log. This is an excellent
|
||||
place to look to see if any characters require more training.
|
||||
|
||||
As it works, repair prints the line it is working on. If you see it make a
|
||||
mistake or get stuck, you can interrupt it (control-C or whatever is
|
||||
appropriate), and it will immediately drop into the editor. If you interrupt
|
||||
it a second time, it will exit rather than invoking the editor. If the
|
||||
editor returns a non-zero result code (fails), repair will also stop. (E.g.
|
||||
:cq in vim.)
|
||||
|
||||
One thing that repair fixes without the least trouble is the number of
|
||||
spaces expected after a printing tab character. It's such an omnipresent OCR
|
||||
software error that repair doesn't even log it as a correction.
|
||||
|
||||
In some cases, repair can miscorrect a line and go on to the next line,
|
||||
possibly even more than once, finally giving up a few lines below the actual
|
||||
error. If you are having trouble spotting the error, one helpful trick is to
|
||||
exit the editor and let repair try to fix the page again, but interrupt it
|
||||
while it is still working on the first line, before it has found the
|
||||
miscorrection.
|
||||
|
||||
The Nasty Lines
|
||||
|
||||
Some lines of code, particularly those containing long runs of underscore or
|
||||
minus characters, are particularly difficult to scan reliably. The repair
|
||||
program has a special "nasty lines" feature to deal with this. If a file
|
||||
named "nastylines" (or as specified by the -l option) exists, they are
|
||||
checksummed and are considered as total replacements for any input line with
|
||||
the same checksum. So, for example, if you place a blank line in the
|
||||
nastylines file, any scanner noise on blank lines will be ignored.
|
||||
|
||||
The "nastylines" file is re-read every time repair restarts after an edit,
|
||||
so you can add more lines as the program runs. (The error-correction patterns
|
||||
should be done this way, too, but that'll have to wait for the next release.)
|
||||
|
||||
Sortpages
|
||||
|
||||
If, in the course of scanning, the pages have been split up or have gotten
|
||||
out of order, a perl script called sortpages can restore them to the proper
|
||||
order. It can merge multiple input files, discard duplicates, and warns about
|
||||
any missing pages it encounters. This script requires that the pages have
|
||||
been repaired, so that the page headers can be read reliably. The repair
|
||||
program does not care about the order it works on pages in; it examines each
|
||||
page independently. Unmunge, however, does need the pages in order.
|
||||
|
||||
Unmunging
|
||||
|
||||
After repair has finished its work, the unmunge program strips out the
|
||||
checksums and, based on the page headers, divides the data up among various
|
||||
files. Its first argument is the file to unpack. The optional second argument
|
||||
is a manifest file that lists all of the files and the directories they go
|
||||
in. Supplying this (an excellent idea) lets unmunge recreate a directory
|
||||
hierarchy and warn about missing files.
|
||||
|
||||
When you have unmunged everything and reconstructed the original source code,
|
||||
you are done. Unmunge verifies all of the checksums independently of repair,
|
||||
as a sanity check, and you can have high confidence that the files are
|
||||
exactly the same as the originals that were printed.
|
||||
|
||||
|
||||
|
||||
BOOTSTRAPPING
|
||||
-------------
|
||||
|
||||
There's a problem using the postprocessing tools to correct OCR errors, when
|
||||
the code being OCRed is the tools themselves. We've tried to provide a
|
||||
reasonably easy way to get the system up and running starting from nothing
|
||||
but a copy of OmniPage.
|
||||
|
||||
You could just scan all of the tools in, correct any errors by hand, delete
|
||||
the error-checking information in a text editor, and compile them. But
|
||||
finding all the errors by hand is painful in a body of code that large.
|
||||
With the aid of perl (version 5), which provides a lot of power in very
|
||||
little code, we have provided some utilities to make this process easier.
|
||||
|
||||
The first-stage bootstrap is a one-page perl script designed to be as small
|
||||
and simple as possible, because you'll have to hand-correct it. It can verify
|
||||
the checksums on each line, and drop you into the editor on any lines where
|
||||
an error has occurred. It also knows how to strip out the visible spaces and
|
||||
tabs, how to correct spacing errors after visible tab characters, and how to
|
||||
invoke an editor on the erroneous line.
|
||||
|
||||
Scan in the first-stage bootstrap as carefully as possible, using OmniPage's
|
||||
warnings to guide you to any errors, and either use a text editor or the
|
||||
one-line perl command at the top of the file to remove the checksums and
|
||||
convert any funny printed characters to whitespace form.
|
||||
|
||||
The first thing to do is try running it on itself, and correct any errors you
|
||||
find this way. Note that the script writes its output to the file named in
|
||||
the page header, so you should name your hand-corrected version differently
|
||||
(or put it in a different directory) to avoid having it overwritten.
|
||||
|
||||
The second-stage bootstrap is a much denser one-pager, with better error
|
||||
detection; it can detect missing lines and missing pages, and takes an
|
||||
optional second argument of a manifest file which it can use to put files
|
||||
in their proper directories. It's not strictly necessary, but it's only one
|
||||
more (dense) page and you can check it against itself and the original
|
||||
bootstrap.
|
||||
|
||||
Both of the botstrap utilities can correct tab spacing errors in the OCR
|
||||
output. Although this doesn't matter in most source code, it is included
|
||||
in the checksums.
|
||||
|
||||
Once you have reached this point, you can scan in the C code for repair and
|
||||
unmunge. The C unmunge is actually less friendly than the bootstrap
|
||||
utilities, because it is only intended to work with the output of repair.
|
||||
It is, however, much faster, since computing CRCs a bit at a time in an
|
||||
interpreted language is painfully slow for large amounts of data. It can
|
||||
also deal with binary files printed in radix-64.
|
||||
|
||||
|
||||
|
||||
PRINTING
|
||||
--------
|
||||
|
||||
Despite the title of this book, this process of producing a book is not well
|
||||
documented, since it's been evolving up to the moment of publication. There,
|
||||
is, however, a very useful working example of how to produce a book
|
||||
(strikingly similar to this book) in the example directory, all controlled
|
||||
by a Makefile.
|
||||
|
||||
Briefly, a master perl script called psgen takes three parameters: a file
|
||||
list, a page numbers file to write to, and a volume number (which should
|
||||
always be 1 for a one-volume book). It runs the listed files through the
|
||||
munge utility, wraps them in some simple PostScript, and prepends a prolog
|
||||
that defines the special characters and PostScript functions needed by the
|
||||
text.
|
||||
|
||||
The file list also includes per-file flags. The most important is the
|
||||
text/binary marker. Text files can also have a tab width specified, although
|
||||
munge knows how to read Emacs-style tab width settings from the end of a
|
||||
source file.
|
||||
|
||||
The prolog is assembled from various other files and defines by psgen using
|
||||
a simple preprocessor called yapp (Yet Another Preprocessor). This process
|
||||
includes some book-specific information like the page footer.
|
||||
|
||||
Producing the final PostScript requires the necessary non-standard fonts
|
||||
(Futura for the footers and OCRB for the code) and the psutils package,
|
||||
which provides the includeres utility used to embed the fonts in the
|
||||
PostScript file. The fonts should go in the books/ps directory, as
|
||||
"Futura.pfa" and the like.
|
||||
|
||||
The pagenums file can be used to produce a table of contents. For this book,
|
||||
we generated the front matter (such as this chapter) separately, told psgen
|
||||
to start on the next page after this, and concatenated the resultant
|
||||
PostScript files for printing. The only trick was making the page footers
|
||||
look identical.
|
3
example/.cvsignore
Normal file
3
example/.cvsignore
Normal file
@ -0,0 +1,3 @@
|
||||
pagenums
|
||||
MANIFEST
|
||||
code.ps
|
23
example/Makefile
Normal file
23
example/Makefile
Normal file
@ -0,0 +1,23 @@
|
||||
BOOKROOT=..
|
||||
TOOLSDIR=$(BOOKROOT)/tools
|
||||
PSDIR=$(BOOKROOT)/ps
|
||||
YAPP=$(TOOLSDIR)/yapp
|
||||
MAKEMANIFEST=$(TOOLSDIR)/makemanifest
|
||||
PSGEN=BOOKROOT=$(BOOKROOT) $(TOOLSDIR)/psgen
|
||||
INCLUDERES=(cd $(PSDIR); includeres)
|
||||
|
||||
code.ps pagenums: filelist footer.ps MANIFEST books
|
||||
$(PSGEN) -P2 -l3 -DfooterFile=footer.ps filelist pagenums 1 \
|
||||
| $(INCLUDERES) > code.ps
|
||||
|
||||
books:
|
||||
ln -s $(BOOKROOT) books
|
||||
|
||||
MANIFEST: filelist
|
||||
$(MAKEMANIFEST) $< > $@
|
||||
|
||||
clean:
|
||||
rm -f `cat .cvsignore`
|
||||
|
||||
gv%: %.ps
|
||||
gv $<
|
32
example/filelist
Normal file
32
example/filelist
Normal file
@ -0,0 +1,32 @@
|
||||
V 1 8
|
||||
T MANIFEST
|
||||
D books/
|
||||
D books/tools/
|
||||
T books/tools/bootstrap
|
||||
T books/tools/bootstrap2
|
||||
T4 books/tools/sortpages
|
||||
T books/tools/Makefile
|
||||
T books/tools/heap.c
|
||||
T books/tools/heap.h
|
||||
T books/tools/mempool.c
|
||||
T books/tools/mempool.h
|
||||
T books/tools/util.c
|
||||
T books/tools/util.h
|
||||
T books/tools/repair.c
|
||||
T books/tools/subst.c
|
||||
T books/tools/subst.h
|
||||
T books/tools/unmunge.c
|
||||
T books/tools/munge.c
|
||||
T books/tools/yapp.doc
|
||||
T4 books/tools/yapp
|
||||
T4 books/tools/psgen
|
||||
T4 books/tools/makemanifest
|
||||
D books/ps/
|
||||
T books/ps/prolog.ps
|
||||
T books/ps/charmap.ps
|
||||
D books/example/
|
||||
T books/example/Makefile
|
||||
T books/example/.cvsignore
|
||||
T books/example/filelist
|
||||
T books/example/footer.ps
|
||||
B books/example/us-constitution.gz
|
5
example/footer.ps
Normal file
5
example/footer.ps
Normal file
@ -0,0 +1,5 @@
|
||||
% A program to print the page footer, using the magic P function,
|
||||
% which takes a string and a font.
|
||||
(Tools for Publishing Source Code via OCR ) /Futura P
|
||||
(\343) /Symbol P % Copyright symbol
|
||||
( 1997 Pretty Good Privacy, Inc.) /Futura P
|
BIN
example/us-constitution.gz
Normal file
BIN
example/us-constitution.gz
Normal file
Binary file not shown.
68
ps/charmap.ps
Normal file
68
ps/charmap.ps
Normal file
@ -0,0 +1,68 @@
|
||||
%%BeginResource: procset Latin1-vec 0 0
|
||||
/Latin1-vec [
|
||||
/.notdef /.notdef /.notdef /.notdef
|
||||
/.notdef /.notdef /.notdef /.notdef
|
||||
/.notdef /.notdef /.notdef /.notdef
|
||||
/.notdef /.notdef /.notdef /.notdef
|
||||
/.notdef /.notdef /.notdef /.notdef
|
||||
/.notdef /.notdef /.notdef /.notdef
|
||||
/.notdef /.notdef /.notdef /.notdef
|
||||
/.notdef /.notdef /.notdef /.notdef
|
||||
/space /exclam /quotedbl /numbersign
|
||||
/dollar /percent /ampersand /${rightQuoteGlyph}
|
||||
/parenleft /parenright /asterisk /plus
|
||||
/comma /hyphen /period /slash
|
||||
/${zeroGlyph} /one /two /three
|
||||
/four /five /six /seven
|
||||
/eight /nine /colon /semicolon
|
||||
/less /equal /greater /question
|
||||
/at /A /B /C
|
||||
/D /E /F /G
|
||||
/H /I /J /K
|
||||
/L /M /N /O
|
||||
/P /Q /R /S
|
||||
/T /U /V /W
|
||||
/X /Y /Z /bracketleft
|
||||
/backslash /bracketright /asciicircum /${underscoreGlyph}
|
||||
/${leftQuoteGlyph} /a /b /c
|
||||
/d /e /f /g
|
||||
/h /i /j /k
|
||||
/l /m /n /o
|
||||
/p /q /r /s
|
||||
/t /u /v /w
|
||||
/x /y /z /braceleft
|
||||
/${barGlyph} /braceright /tilde /.notdef
|
||||
/.notdef /.notdef /.notdef /.notdef
|
||||
/.notdef /.notdef /.notdef /.notdef
|
||||
/.notdef /.notdef /.notdef /.notdef
|
||||
/.notdef /.notdef /.notdef /.notdef
|
||||
/.notdef /.notdef /.notdef /.notdef
|
||||
/.notdef /.notdef /.notdef /.notdef
|
||||
/.notdef /.notdef /.notdef /.notdef
|
||||
/.notdef /.notdef /.notdef /.notdef
|
||||
/space /exclamdown /cent /sterling
|
||||
/${tabGlyph} /yen /brokenbar /section
|
||||
/dieresis /copyright /ordfeminine /guillemotleft
|
||||
/logicalnot /hyphen /registered /macron
|
||||
/degree /plusminus /twosuperior /threesuperior
|
||||
/acute /mu /${pilcrowGlyph} /${bulletGlyph}
|
||||
/cedilla /dotlessi /ordmasculine /guillemotright
|
||||
/onequarter /onehalf /threequarters /questiondown
|
||||
/Agrave /Aacute /Acircumflex /Atilde
|
||||
/Adieresis /Aring /AE /Ccedilla
|
||||
/Egrave /Eacute /Ecircumflex /Edieresis
|
||||
/Igrave /Iacute /Icircumflex /Idieresis
|
||||
/Eth /Ntilde /Ograve /Oacute
|
||||
/Ocircumflex /Otilde /Odieresis /multiply
|
||||
/Oslash /Ugrave /Uacute /Ucircumflex
|
||||
/Udieresis /Yacute /Thorn /germandbls
|
||||
/agrave /aacute /acircumflex /atilde
|
||||
/adieresis /aring /ae /ccedilla
|
||||
/egrave /eacute /ecircumflex /edieresis
|
||||
/igrave /iacute /icircumflex /idieresis
|
||||
/eth /ntilde /ograve /oacute
|
||||
/ocircumflex /otilde /odieresis /divide
|
||||
/oslash /ugrave /uacute /ucircumflex
|
||||
/udieresis /yacute /thorn /ydieresis
|
||||
]def
|
||||
%%EndResource
|
306
ps/prolog.ps
Normal file
306
ps/prolog.ps
Normal file
@ -0,0 +1,306 @@
|
||||
##set pageNumFont="Futura"
|
||||
##set dirNameFont="Futura-Heavy"
|
||||
##set fontsNeeded="${font} Symbol Futura Futura-Heavy"
|
||||
##set includeFontComments=<<"END"
|
||||
%%IncludeResource: font ${font}
|
||||
%%IncludeResource: font Symbol
|
||||
%%IncludeResource: font Futura
|
||||
%%IncludeResource: font Futura-Heavy
|
||||
END
|
||||
##if ${font} eq Courier
|
||||
##set charShrinkFactor=0.93
|
||||
##set zeroGlyph=Oslash
|
||||
##set underscoreGlyph=underscore
|
||||
##set bulletGlyph=bullet
|
||||
##set tabGlyph=currency
|
||||
##set leftQuoteGlyph=quoteleft
|
||||
##set rightQuoteGlyph=quoteright
|
||||
##set pilcrowGlyph=paragraph
|
||||
##set barGlyph=bar
|
||||
##else
|
||||
##set charShrinkFactor=1
|
||||
##set zeroGlyph=Oslash
|
||||
##set underscoreGlyph=underscore2
|
||||
##set bulletGlyph=bullet2
|
||||
##set tabGlyph=tabsym
|
||||
##set leftQuoteGlyph=grave
|
||||
##set rightQuoteGlyph=quoteright ### was "acute"
|
||||
##set pilcrowGlyph=erase
|
||||
##set barGlyph=orsym
|
||||
##set do_custom_chars=1
|
||||
##endif
|
||||
%!PS-Adobe-3.0
|
||||
%%Orientation: Portrait
|
||||
%%Pages: (atend)
|
||||
%%DocumentNeededResources: font ${fontsNeeded}
|
||||
%%DocumentMedia: Letter 612 792 74 white ()
|
||||
%%EndComments
|
||||
%%BeginDefaults
|
||||
%%PageMedia: Letter
|
||||
%%PageResources: font ${fontsNeeded}
|
||||
%%EndDefaults
|
||||
%%BeginProlog
|
||||
%%BeginResource: procset Custom-Preamble 0 0
|
||||
%
|
||||
% Document definitions
|
||||
% (Upper case to avoid collisions)
|
||||
%
|
||||
|
||||
% 8.5x11 paper is 612x792 points, but 24 points near the edge or so
|
||||
% shouldn't be used.
|
||||
/Topmargin 770 def
|
||||
/Leftmargin 30 def
|
||||
/Rightmargin 612 Leftmargin sub def
|
||||
/Botmargin 22 def
|
||||
/Bindoffset 40 def
|
||||
|
||||
/Lineskip -10 def
|
||||
% How much to shrink characters by?
|
||||
/Factor ${charShrinkFactor} def
|
||||
/Fontsize 9.5 Factor mul def
|
||||
% (1000 units is std height, so Courier at 6/10 aspect ratio is 600.
|
||||
% Widen to make up for scaling loss.
|
||||
/Charwidth
|
||||
Rightmargin Leftmargin sub Bindoffset sub 87 div Fontsize div 1000 mul
|
||||
def
|
||||
|
||||
% Print a header (expects page number on stack)
|
||||
/OddPageStart
|
||||
{ save exch /MyFont findfont Fontsize scalefont setfont
|
||||
/CurrentLeft Leftmargin Bindoffset add def
|
||||
/CurrentRight Rightmargin def
|
||||
CurrentLeft Topmargin moveto } def
|
||||
|
||||
/EvenPageStart
|
||||
{ save exch /MyFont findfont Fontsize scalefont setfont
|
||||
/CurrentLeft Leftmargin def
|
||||
/CurrentRight Rightmargin Bindoffset sub def
|
||||
CurrentLeft Topmargin moveto } def
|
||||
|
||||
% /MyFont findfont [Fontsize 0 0 Fontsize 0 0] makefont setfont
|
||||
|
||||
% Print the name of the directory in a large font
|
||||
/DirPage
|
||||
{
|
||||
/${dirNameFont} findfont 14 scalefont setfont
|
||||
0 -10 rmoveto (Directory) show
|
||||
CurrentLeft 30 add currentpoint exch pop 20 sub moveto show
|
||||
} def
|
||||
|
||||
% Advance a line
|
||||
/L {show CurrentLeft currentpoint exch pop Lineskip add moveto} bind def
|
||||
|
||||
% Print the "inside" footer line using P (string font => )
|
||||
% We do some magic involving redefining P to first measure the
|
||||
% width of this string and then print it, so you must use it
|
||||
% to do all printing.
|
||||
/Foot {
|
||||
##ifdef footerFile
|
||||
##include "${footerFile}"
|
||||
##endif
|
||||
} def
|
||||
|
||||
% /P is defined in the Setup section
|
||||
|
||||
% Print an odd footer
|
||||
/OddPageEnd
|
||||
{ CurrentLeft Botmargin moveto CurrentRight Botmargin lineto
|
||||
1 setlinewidth stroke
|
||||
CurrentLeft Botmargin 10 sub moveto
|
||||
Foot
|
||||
10 string cvs dup stringwidth
|
||||
pop CurrentRight exch sub currentpoint exch pop moveto
|
||||
/${pageNumFont} P
|
||||
showpage
|
||||
restore
|
||||
} def
|
||||
|
||||
% Print an even footer
|
||||
/EvenPageEnd
|
||||
{ CurrentLeft Botmargin moveto CurrentRight Botmargin lineto
|
||||
1 setlinewidth stroke
|
||||
Leftmargin Botmargin 10 sub moveto
|
||||
/${pageNumFont} P
|
||||
CurrentRight FootWidth sub currentpoint exch pop moveto
|
||||
Foot
|
||||
showpage
|
||||
restore
|
||||
} def
|
||||
|
||||
##ifdef do_custom_chars
|
||||
% A 1000-point OCRB discunderline consists of:
|
||||
% 111.45 -173.688 moveto
|
||||
% 609.356 -173.688 lineto
|
||||
% 609.356 -70.9227 lineto
|
||||
% 111.45 -70.9227 lineto
|
||||
% closepath
|
||||
% 720.0 -0.0 moveto
|
||||
% Line thickness is
|
||||
% 102.7653 pts.
|
||||
|
||||
% This would suggest the following values:
|
||||
/underleft 111.45 def
|
||||
/underright 609.356 def
|
||||
/underthick 102.7643 def
|
||||
/underup underthick def
|
||||
/underdown 0 def
|
||||
/underserif 25 def
|
||||
|
||||
% These look better in GhostScript, but not on a real Adobe rasterizer
|
||||
%/underright 600 def
|
||||
%/underleft 100 def
|
||||
%/underthick 75 def
|
||||
|
||||
171
|
||||
211
|
||||
36081
|
||||
% The default bullet character is
|
||||
% 254.0 341.0 moveto
|
||||
% 254.0 170.0 lineto
|
||||
% 465.0 170.0 lineto
|
||||
% 465.0 341.0 lineto
|
||||
% closepath
|
||||
% Our modified version is based on:
|
||||
/bullwid 204 def
|
||||
/bullht 176.75 def
|
||||
/bullleft 254 341 add bullwid sub 2 div def
|
||||
/bullright 254 341 add bullwid add 2 div def
|
||||
/bullbot 254 def
|
||||
/bulltop bullbot bullht add def
|
||||
|
||||
% And a custom-created tab symbol
|
||||
/tableft 250 def
|
||||
/tabright 550 def
|
||||
/tabtop 550 def
|
||||
/tabbot 50 def
|
||||
/tablinewidth 35 def
|
||||
|
||||
% Let's try a vertical bar
|
||||
% OCRB defines (|)
|
||||
% 411.062 -173.688 moveto
|
||||
% 411.062 741.043 lineto
|
||||
% 308.297 741.043 lineto
|
||||
% 308.297 -173.688 lineto
|
||||
% closepath
|
||||
% 720.0 -0.0 moveto
|
||||
/orleft 308.297 def
|
||||
/orright 411.062 def
|
||||
/orbot -173.688 def
|
||||
/ortop 741.043 def
|
||||
/orbreak 150 def % Width of break
|
||||
/orbbot ortop orbot add orbreak sub 2 div def % Bottom of break
|
||||
/orbtop ortop orbot add orbreak add 2 div def % Top of break
|
||||
##endif
|
||||
|
||||
% newfontname encoding-vec fontname -> - make a new encoded font
|
||||
/MF2 {
|
||||
% Make a dict for the new font, with room for the /Metrics
|
||||
findfont dup length 1 add dict begin
|
||||
% Copy everything except the FID entry
|
||||
{1 index /FID eq {pop pop} {def} ifelse} forall
|
||||
% Set the encoding vector
|
||||
/Encoding exch def
|
||||
|
||||
##ifdef do_custom_chars
|
||||
% Create a new expanded CharStrings dictionary
|
||||
CharStrings dup length 5 add dict
|
||||
begin { def } forall
|
||||
% Create a custom underscore character
|
||||
/underscore2 {
|
||||
pop
|
||||
//Charwidth 0 % width, bounding box follows
|
||||
//underleft //underdown neg //underright //underthick //underup add
|
||||
setcachedevice
|
||||
//underleft //underthick //underup add moveto
|
||||
//underleft //underserif add //underthick //underup add lineto
|
||||
//underleft //underserif add //underthick lineto
|
||||
//underright //underserif sub //underthick lineto
|
||||
//underright //underserif sub //underthick //underup add lineto
|
||||
//underright //underthick //underup add lineto
|
||||
//underright //underdown neg lineto
|
||||
//underright //underserif sub //underdown neg lineto
|
||||
//underright //underserif sub 0 lineto
|
||||
//underleft //underserif add 0 lineto
|
||||
//underleft //underserif add //underdown neg lineto
|
||||
//underleft //underdown neg lineto
|
||||
closepath fill
|
||||
} bind def
|
||||
% Create a custom bullet character.
|
||||
/bullet2 {
|
||||
pop
|
||||
//Charwidth 0 % width, bounding box follows
|
||||
//bullleft //bullbot //bullright //bulltop
|
||||
setcachedevice
|
||||
//bullleft //bullbot moveto
|
||||
//bullleft bullright add 2 div bulltop lineto
|
||||
//bullright //bullbot lineto
|
||||
closepath fill
|
||||
} bind def
|
||||
% Create a custom tab character.
|
||||
/tabsym {
|
||||
pop
|
||||
//Charwidth 0 % width, bounding box follows
|
||||
//tableft //tablinewidth sub //tabbot //tablinewidth sub
|
||||
//tabright //tablinewidth add //tabtop //tablinewidth add
|
||||
setcachedevice
|
||||
//tablinewidth setlinewidth
|
||||
true setstrokeadjust
|
||||
0 setlinejoin
|
||||
//tableft //tabbot moveto
|
||||
//tabright //tabtop //tabbot add 2 div lineto
|
||||
//tableft //tabtop lineto
|
||||
closepath stroke
|
||||
} bind def
|
||||
/orsym {
|
||||
pop
|
||||
//Charwidth 0 % width, bounding box follows
|
||||
//orleft //orbot //orright //ortop
|
||||
setcachedevice
|
||||
//orleft //orbot moveto
|
||||
//orleft //orbbot lineto
|
||||
//orright //orbbot lineto
|
||||
//orright //orbot lineto
|
||||
closepath
|
||||
//orleft //ortop moveto
|
||||
//orleft //orbtop lineto
|
||||
//orright //orbtop lineto
|
||||
//orright //ortop lineto
|
||||
closepath fill
|
||||
} bind def
|
||||
/CharStrings currentdict end def
|
||||
##endif
|
||||
|
||||
% Create a new dict to be the /Metrics values
|
||||
CharStrings dup length dict
|
||||
% Now fill in the metrics dict with the desired width
|
||||
begin { pop Charwidth def } forall /Metrics currentdict end def
|
||||
% End of definitions
|
||||
currentdict end
|
||||
% Define the font
|
||||
definefont pop
|
||||
} bind def
|
||||
|
||||
% Check PostScript language level.
|
||||
/gs_languagelevel /languagelevel where { pop languagelevel } { 1 } ifelse def
|
||||
|
||||
%%EndResource
|
||||
##include "charmap.ps"
|
||||
${includeFontComments}
|
||||
%%EndProlog
|
||||
|
||||
|
||||
%%BeginSetup
|
||||
|
||||
/MyFont Latin1-vec /${font} MF2
|
||||
/#copies 1 def
|
||||
|
||||
% Compute the width of the /Foot string, by defining P to
|
||||
% add up the x-width of the characters.
|
||||
/P { findfont 9 scalefont setfont stringwidth pop add } def
|
||||
/FootWidth 0 Foot def
|
||||
% Redefine P to print, as usual
|
||||
/P { findfont 9 scalefont setfont show } def
|
||||
%%BeginResource: procset foo 0 0
|
||||
% This is an example
|
||||
%%EndResource
|
||||
%%EndSetup
|
30
tools/Makefile
Normal file
30
tools/Makefile
Normal file
@ -0,0 +1,30 @@
|
||||
all: unmunge repair munge
|
||||
|
||||
OPT = -g -O -W -Wall
|
||||
COMMON_OBJS = util.o
|
||||
|
||||
UNMUNGE_OBJS = $(COMMON_OBJS) unmunge.o
|
||||
MUNGE_OBJS = $(COMMON_OBJS) munge.o
|
||||
REPAIR_OBJS = $(COMMON_OBJS) heap.o mempool.o subst.o repair.o
|
||||
|
||||
unmunge: $(UNMUNGE_OBJS)
|
||||
$(CC) $(OPT) -o $@ $(UNMUNGE_OBJS)
|
||||
|
||||
munge: $(MUNGE_OBJS)
|
||||
$(CC) $(OPT) -o $@ $(MUNGE_OBJS)
|
||||
|
||||
repair: $(REPAIR_OBJS)
|
||||
$(CC) $(OPT) -o $@ $(REPAIR_OBJS)
|
||||
|
||||
.c.o:
|
||||
$(CC) $(OPT) -o $@ -c $<
|
||||
|
||||
clean:
|
||||
-rm -f *.o munge unmunge repair core *.core
|
||||
|
||||
unmunge.o: util.h
|
||||
munge.o: util.h
|
||||
repair.o: heap.h mempool.h util.h subst.h
|
||||
heap.o: heap.h
|
||||
mempool.o: mempool.h
|
||||
subst.o: subst.h
|
68
tools/bootstrap
Normal file
68
tools/bootstrap
Normal file
@ -0,0 +1,68 @@
|
||||
#!/usr/bin/perl -s
|
||||
#
|
||||
# bootstrap -- Simpler version of unmunge for bootstrapping
|
||||
#
|
||||
# Unmunge this file using:
|
||||
# perl -ne 'if (s/^ *[^-\s]\S{4,6} ?//) { s/[\244\245\267]/ /g; print; }'
|
||||
#
|
||||
# $Id: bootstrap,v 1.15 1997/11/14 03:52:53 mhw Exp $
|
||||
|
||||
sub Fatal { print STDERR @_; exit(1); }
|
||||
sub Max { my ($a, $b) = @_; ($a > $b) ? $a : $b; }
|
||||
sub TabSkip { $tabWidth - 1 - (length($_[0]) % $tabWidth); }
|
||||
|
||||
($tab,$yen,$pilc,$cdot,$tmp1,$tmp2)=("\244","\245","\266","\267","\377","\376");
|
||||
$editor = $ENV{'VISUAL'} || $ENV{'EDITOR'} || 'vi';
|
||||
$inFile = $ARGV[0];
|
||||
doFile: {
|
||||
open(IN, "<$inFile") || die;
|
||||
for ($lineNum = 1; ($_ = <IN>); $lineNum++) {
|
||||
s/^\s+//; s/\s+$//; # Strip leading and trailing spaces
|
||||
next if (/^$/); # Ignore blank lines
|
||||
($prefix, $seenCRCStr, $dummy, $_) = /^(\S{2})(\S{4})( (.*))?/;
|
||||
|
||||
# Correct the number of spaces after each tab
|
||||
while (s/$tab( *)/$tmp1 . ($tmp2 x &Max(length($1), &TabSkip($`)))/e) {}
|
||||
s/ ( +)/" " . ($cdot x length($1))/eg; # Correct center dots
|
||||
s/$tmp1/$tab/g; s/$tmp2/ /g; # Restore tabs and spaces from correction
|
||||
s/\s*$/\n/; # Strip trailing spaces, and add a newline
|
||||
|
||||
$crc = $seenCRC = 0; # Calculate CRC
|
||||
for ($data = $_; $data ne ""; $data = substr($data, 1)) {
|
||||
$crc ^= ord($data);
|
||||
for (1..8) {
|
||||
$crc = ($crc >> 1) ^ (($crc & 1) ? 0x8408 : 0);
|
||||
}
|
||||
}
|
||||
if ($crc != hex($seenCRCStr)) { # CRC mismatch
|
||||
close(IN); close(OUT);
|
||||
unlink(@filesCreated);
|
||||
@filesCreated = ();
|
||||
@oldStat = stat($inFile);
|
||||
system($editor, "+$lineNum", $inFile);
|
||||
@newStat = stat($inFile);
|
||||
redo doFile if ($oldStat[9] != $newStat[9]); # Check mod date
|
||||
&Fatal("Line $lineNum invalid: $_");
|
||||
}
|
||||
|
||||
if ($prefix eq '--') { # Process header line
|
||||
($code, $pageNum, $file) = /^(\S{19}) Page (\d+) of (.*)/;
|
||||
$tabWidth = hex(substr($code, 11, 1));
|
||||
if ($file ne $lastFile) {
|
||||
print "$file\n";
|
||||
&Fatal("$file: already exists\n") if (!$f && (-e $file));
|
||||
close(OUT);
|
||||
open(OUT, ">$file") || &Fatal("$file: $!\n");
|
||||
push(@filesCreated, ($lastFile = $file));
|
||||
}
|
||||
} else { # Unmunge normal line
|
||||
s/$tab( *)/"\t".(" " x (length($1) - &TabSkip($`)))/eg;
|
||||
s/$yen\n/\f/; # Handle form feeds
|
||||
s/$pilc\n//; # Handle continuation lines
|
||||
s/$cdot/ /g; # Center dots -> spaces
|
||||
|
||||
print OUT;
|
||||
}
|
||||
}
|
||||
close(IN); close(OUT);
|
||||
}
|
72
tools/bootstrap2
Normal file
72
tools/bootstrap2
Normal file
@ -0,0 +1,72 @@
|
||||
#!/usr/bin/perl -s
|
||||
#
|
||||
# bootstrap2 -- Second stage bootstrapper, a version of unmunge
|
||||
#
|
||||
# $Id: bootstrap2,v 1.4 1997/11/14 03:52:54 mhw Exp $
|
||||
|
||||
sub Cleanup { close(IN); close(OUT); unlink(@files); @files = (); }
|
||||
sub Fatal { &Cleanup(); print STDERR @_; exit(1); }
|
||||
sub TabSkip { $tabWidth - 1 - (length($_[0]) % $tabWidth); }
|
||||
sub TabFix { my ($needed, $actual) = (&TabSkip($_[0]), length($_[1]));
|
||||
$tmp1 . ($tmp2 x $needed) . (" " x ($actual - $needed)); }
|
||||
sub HumanEdit { my ($file, $line, @message) = ($inFile, @_); &Cleanup();
|
||||
@old = stat($file); system($editor, "+$line", $file); @new = stat($file);
|
||||
redo doFile if ($old[9] != $new[9]); # Check mod date
|
||||
&Fatal("Line $line, ", @message); }
|
||||
|
||||
($tab,$yen,$pilc,$cdot,$tmp1,$tmp2)=("\244","\245","\266","\267","\377","\376");
|
||||
$editor = $ENV{'VISUAL'} || $ENV{'EDITOR'} || 'vi';
|
||||
($inFile, $manifest, @rest) = @ARGV;
|
||||
if ($manifest ne "") { # Read manifest file
|
||||
open(MANIFEST, "<$manifest") || &Fatal("$manifest: $!\n");
|
||||
while (<MANIFEST>) { $dir = $1 if /^D\s+(.*)$/;
|
||||
$index[$1] = $dir . $2 if /^(\d+)\s+(.*)$/; }
|
||||
}
|
||||
doFile: {
|
||||
$seenPCRC = $pcrc1 = 0; $lastFlags = 1; $lastFileNum = 0;
|
||||
open(IN, "<$inFile") || &Fatal("$inFile: $!\n");
|
||||
for ($line = 1; ($_ = <IN>); $line++) {
|
||||
s/^\s+//; s/\s+$//; # Strip leading and trailing spaces
|
||||
next if (/^$/); # Ignore blank lines
|
||||
($prefix, $seenCRCStr, $dummy, $_) = /^(\S{2})(\S{4})( (.*))?/;
|
||||
while (s/$tab( *)/&TabFix($`, $1)/eo) {} # Correct spaces after tabs
|
||||
s/($tmp2| )( +)/$1 . ($cdot x length($2))/ego; # Correct center dots
|
||||
s/$tmp1/$tab/go; s/$tmp2/ /go; # Restore tabs/spaces from correction
|
||||
s/\s*$/\n/; # Strip trailing spaces, and add a newline
|
||||
|
||||
$crc = 0; $pcrc = $pcrc1; # Calculate CRCs
|
||||
for ($data = $_; $data ne ""; $data = substr($data, 1)) {
|
||||
$crc ^= ord($data); $pcrc1 ^= ord($data);
|
||||
for (1..8) { $crc = ($crc >> 1) ^ (($crc & 1) ? 0x8408 : 0);
|
||||
$pcrc1 = ($pcrc1 >> 1) ^ (($pcrc1 & 1) ? 0xedb88320 : 0); }
|
||||
}
|
||||
($seenPLCRC, $seenCRC) = map { hex($_) } ($prefix, $seenCRCStr);
|
||||
&HumanEdit($line, "CRC failed: $_") if $crc != $seenCRC;
|
||||
if ($prefix eq '--') { # Process header line
|
||||
&HumanEdit($line - 1, "Page CRC failed") if $pcrc != $seenPCRC;
|
||||
($humanHdr, $pageNum, $file) = /^\S{19} (Page (\d+) of (.*))/;
|
||||
($vers, $flags, $seenPCRC, $tabWidth, $prodNum, $fileNum) =
|
||||
map { hex($_) } /^(\S)(\S\S)(\S{8})(\S)(\S{3})(\S{4})/;
|
||||
if ($fileNum != $lastFileNum) {
|
||||
print STDERR "MISSING files\n" if $fileNum != $lastFileNum + 1;
|
||||
&Fatal("Missing pages\n") if $pageNum != 1 || !($lastFlags & 1);
|
||||
if ($manifest ne "") {
|
||||
($_ = $index[$fileNum]) =~ m%([^/]*)$%;
|
||||
&Fatal("Manifest mismatch\n") if ($file ne $1);
|
||||
($file = $_) =~ s|/+|mkdir($`, 0777), "/"|eg; # mkdir -p
|
||||
}
|
||||
&Fatal("$file: already exists\n") if (!$f && (-e $file));
|
||||
close(OUT); open(OUT, ">$file") || &Fatal("$file: $!\n");
|
||||
push(@files, $file); print "$fileNum $file\n";
|
||||
} else {
|
||||
&Fatal("MISSING pages\n") if ($pageNum != $lastPageNum + 1);
|
||||
}
|
||||
($lastFlags,$lastFileNum,$lastPageNum) = ($flags,$fileNum,$pageNum);
|
||||
$pcrc1 = 0;
|
||||
} else { # Unmunge normal line
|
||||
&HumanEdit($line, "CRC failed: $_") if ($pcrc1 >> 24) != $seenPLCRC;
|
||||
s/$tab( *)/"\t".(" " x (length($1) - &TabSkip($`)))/ego;
|
||||
s/$yen\n/\f/o; s/$pilc\n//o; s/$cdot/ /go; print OUT;
|
||||
}
|
||||
}
|
||||
}
|
144
tools/heap.c
Normal file
144
tools/heap.c
Normal file
@ -0,0 +1,144 @@
|
||||
/*
|
||||
* heap.c -- Simple priority queue. Takes pointers to cost values
|
||||
* (presumably the first field in a larger structure) and returns
|
||||
* them in increasing order of cost.
|
||||
*
|
||||
* Copyright (C) 1997 Pretty Good Privacy, Inc.
|
||||
*
|
||||
* Written by Colin Plumb and Mark H. Weaver
|
||||
*
|
||||
* $Id: heap.c,v 1.2 1997/07/05 02:55:23 colin Exp $
|
||||
*/
|
||||
|
||||
#include <stdio.h> /* For fprintf(stderr, "Out of memory") */
|
||||
#include <stdlib.h> /* For malloc() & co. */
|
||||
|
||||
#include "heap.h"
|
||||
|
||||
#define HeapParent(i) ((i) / 2)
|
||||
#define HeapLeftChild(i) ((i) * 2)
|
||||
#define HeapRightChild(i) ((i) * 2 + 1)
|
||||
#define HeapElem(h, i) (h)->elems[i]
|
||||
#define HeapMinElem(h) HeapElem(h, 1)
|
||||
#define HeapElemCost(e) (*(e))
|
||||
#define HeapCost(h, i) HeapElemCost(HeapElem(h, i))
|
||||
#define HeapSize(h) ((h)->numElems)
|
||||
|
||||
static void
|
||||
SiftDown(Heap const *heap, HeapCost *e)
|
||||
{
|
||||
HeapIndex size = HeapSize(heap), parent = 1, child;
|
||||
HeapCost cparent = HeapElemCost(e), cchild;
|
||||
|
||||
for (;;) {
|
||||
child = 2*parent;
|
||||
if (child > size)
|
||||
break;
|
||||
cchild = HeapCost(heap, child);
|
||||
if (child < size && cchild > HeapCost(heap, child+1)) {
|
||||
cchild = HeapCost(heap, child+1);
|
||||
child++;
|
||||
}
|
||||
if (cparent <= cchild)
|
||||
break; /* Stop sifting down */
|
||||
HeapElem(heap, parent) = HeapElem(heap, child);
|
||||
parent = child;
|
||||
}
|
||||
HeapElem(heap, parent) = e;
|
||||
}
|
||||
|
||||
/* Debug tool: verify heap property */
|
||||
void
|
||||
HeapVerify(Heap *heap)
|
||||
{
|
||||
HeapIndex i;
|
||||
|
||||
for (i = 2; i <= HeapSize(heap); i++)
|
||||
if (HeapCost(heap, i) < HeapCost(heap, HeapParent(i)))
|
||||
fprintf(stderr, "DEBUG: VerifyHeap failed at elem %d\n", i);
|
||||
}
|
||||
|
||||
/* Remove and return the minimum cost from the heap. */
|
||||
HeapCost *
|
||||
HeapGetMin(Heap *heap)
|
||||
{
|
||||
HeapIndex lastElem = HeapSize(heap);
|
||||
HeapCost *retval;
|
||||
|
||||
if (!lastElem)
|
||||
return NULL;
|
||||
retval = HeapMinElem(heap);
|
||||
HeapSize(heap) = lastElem-1;
|
||||
SiftDown(heap, HeapElem(heap, lastElem));
|
||||
return retval;
|
||||
}
|
||||
|
||||
/* Helper - set heap size, reallocating if needed */
|
||||
static void
|
||||
HeapResize(Heap *heap, HeapIndex newNumElems)
|
||||
{
|
||||
if (newNumElems >= heap->elemsAllocated) {
|
||||
HeapIndex newAllocSize = heap->elemsAllocated * 2;
|
||||
|
||||
if (newAllocSize <= newNumElems)
|
||||
newAllocSize = newNumElems + 1;
|
||||
heap->elems = (HeapCost **)realloc((void *)heap->elems,
|
||||
sizeof(*heap->elems) * newAllocSize);
|
||||
if (heap->elems == NULL) {
|
||||
fprintf(stderr, "Fatal error: Out of memory growing heap\n");
|
||||
exit(1);
|
||||
}
|
||||
heap->elemsAllocated = newAllocSize;
|
||||
}
|
||||
heap->numElems = newNumElems;
|
||||
}
|
||||
|
||||
/* Add an element to the heap */
|
||||
void
|
||||
HeapInsert(Heap *heap, HeapCost *newElem)
|
||||
{
|
||||
HeapIndex parent, i = ++HeapSize(heap);
|
||||
HeapCost cost = HeapElemCost(newElem);
|
||||
|
||||
HeapResize(heap, i);
|
||||
/* Sift up until parent = 0 */
|
||||
while ((parent = HeapParent(i)) && HeapCost(heap, parent) > cost) {
|
||||
HeapElem(heap, i) = HeapElem(heap, parent);
|
||||
i = parent;
|
||||
}
|
||||
heap->elems[i] = newElem;
|
||||
}
|
||||
|
||||
/* Initialize a new heap */
|
||||
void
|
||||
HeapInit(Heap *heap, HeapIndex initSize)
|
||||
{
|
||||
initSize++; /* Add one for temporary element */
|
||||
if (initSize < 1)
|
||||
initSize = 1;
|
||||
heap->elems = (HeapCost **)malloc(initSize * sizeof(*heap->elems));
|
||||
if (heap->elems == NULL) {
|
||||
fprintf(stderr, "Fatal error: Out of memory creating heap\n");
|
||||
exit(1);
|
||||
}
|
||||
heap->elemsAllocated = initSize;
|
||||
heap->numElems = 0;
|
||||
}
|
||||
|
||||
/* Free up a heap's resources. */
|
||||
void
|
||||
HeapDestroy(Heap *heap)
|
||||
{
|
||||
free((void *)heap->elems);
|
||||
heap->elemsAllocated = 0;
|
||||
heap->numElems = 0;
|
||||
heap->elems = NULL;
|
||||
}
|
||||
|
||||
/*
|
||||
* Local Variables:
|
||||
* tab-width: 4
|
||||
* End:
|
||||
* vi: ts=4 sw=4
|
||||
* vim: si
|
||||
*/
|
43
tools/heap.h
Normal file
43
tools/heap.h
Normal file
@ -0,0 +1,43 @@
|
||||
/*
|
||||
* heap.h -- Simple priority queue. Takes pointers to cost values
|
||||
* (presumably the first field in a larger structure) and returns
|
||||
* them in increasing order of cost.
|
||||
*
|
||||
* Copyright (C) 1997 Pretty Good Privacy, Inc.
|
||||
*
|
||||
* Written by Colin Plumb and Mark H. Weaver
|
||||
*
|
||||
* $Id: heap.h,v 1.6 1997/10/31 04:22:46 mhw Exp $
|
||||
*/
|
||||
|
||||
#ifndef HEAP_H
|
||||
#define HEAP_H 1
|
||||
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <limits.h>
|
||||
|
||||
typedef int HeapCost;
|
||||
#define COST_INFINITY INT_MAX
|
||||
typedef unsigned HeapIndex;
|
||||
|
||||
typedef struct Heap {
|
||||
HeapCost **elems;
|
||||
HeapIndex numElems, elemsAllocated;
|
||||
} Heap;
|
||||
|
||||
void HeapInit(Heap *heap, HeapIndex initSize);
|
||||
void HeapDestroy(Heap *heap);
|
||||
void HeapInsert(Heap *heap, HeapCost *newElem);
|
||||
HeapCost *HeapGetMin(Heap *heap);
|
||||
void HeapVerify(Heap *heap);
|
||||
|
||||
#endif
|
||||
|
||||
/*
|
||||
* Local Variables:
|
||||
* tab-width: 4
|
||||
* End:
|
||||
* vi: ts=4 sw=4
|
||||
* vim: si
|
||||
*/
|
31
tools/makemanifest
Normal file
31
tools/makemanifest
Normal file
@ -0,0 +1,31 @@
|
||||
#!/usr/bin/perl
|
||||
|
||||
$fileNum = 0;
|
||||
while(<>)
|
||||
{
|
||||
/^([VDTB])(\S*)\s+(.*)/ || die("Bad filelist, line $.");
|
||||
($type, $options, $name) = ($1, $2, $3);
|
||||
|
||||
if ($type eq "D")
|
||||
{
|
||||
$dir = $name;
|
||||
print "D $dir\n";
|
||||
}
|
||||
elsif ($type eq "V")
|
||||
{
|
||||
# Do nothing
|
||||
}
|
||||
else
|
||||
{
|
||||
$fileNum++;
|
||||
$tail = $name;
|
||||
$tail =~ s|^.*/||;
|
||||
die("Bad filelist, line $.") if $name ne $dir . $tail;
|
||||
print "$fileNum $tail\n";
|
||||
}
|
||||
}
|
||||
|
||||
#
|
||||
# vi: ai ts=4
|
||||
# vim: si
|
||||
#
|
137
tools/mempool.c
Normal file
137
tools/mempool.c
Normal file
@ -0,0 +1,137 @@
|
||||
/*
|
||||
* mempool.c - Pooled memory allocation, similar to GNU obstacks.
|
||||
*
|
||||
* $Id: mempool.c,v 1.5 1997/11/13 23:53:08 colin Exp $
|
||||
*/
|
||||
#include <assert.h>
|
||||
#include <stdio.h>
|
||||
#include <string.h>
|
||||
#include <stdlib.h> /* For malloc() & free() */
|
||||
|
||||
#include "mempool.h"
|
||||
|
||||
/*
|
||||
* The memory pool allocation functions
|
||||
*
|
||||
* These are based on a linked list of memory blocks, usually of uniform
|
||||
* size. New memory is allocated from the tail of the current block,
|
||||
* until that is inadequate, then a new block is allocated.
|
||||
* The entire pool can be freed at once by calling memPoolFree().
|
||||
*/
|
||||
struct PoolBuf {
|
||||
struct PoolBuf *next;
|
||||
unsigned size;
|
||||
/* Data follows */
|
||||
};
|
||||
|
||||
/* The prototype empty pool, including the default allocation size. */
|
||||
static struct MemPool EmptyPool = { 0, 0, 0, 4096, 0 , 0, 0};
|
||||
|
||||
/* Initialize the pool for first use */
|
||||
void
|
||||
memPoolInit(struct MemPool *pool)
|
||||
{
|
||||
*pool = EmptyPool;
|
||||
}
|
||||
|
||||
/* Set the pool's purge function */
|
||||
void
|
||||
memPoolSetPurge(struct MemPool *pool, int (*purge)(void *), void *arg)
|
||||
{
|
||||
pool->purge = purge;
|
||||
pool->purgearg = arg;
|
||||
}
|
||||
|
||||
/* Free all the memory in the pool */
|
||||
void
|
||||
memPoolEmpty(struct MemPool *pool)
|
||||
{
|
||||
struct PoolBuf *buf;
|
||||
|
||||
while ((buf = pool->head) != 0) {
|
||||
pool->head = buf->next;
|
||||
free(buf);
|
||||
}
|
||||
pool->freespace = 0;
|
||||
pool->totalsize = 0;
|
||||
}
|
||||
|
||||
|
||||
/*
|
||||
* Restore a pool to a marked position, freeing subsequently allocated
|
||||
* memory.
|
||||
*/
|
||||
void
|
||||
memPoolCutBack(struct MemPool *pool, struct MemPool const *cutback)
|
||||
{
|
||||
struct PoolBuf *buf;
|
||||
|
||||
assert(pool);
|
||||
assert(cutback);
|
||||
assert(pool->totalsize >= cutback->totalsize);
|
||||
|
||||
while((buf = pool->head) != cutback->head) {
|
||||
pool->head = buf->next;
|
||||
free(buf);
|
||||
}
|
||||
*pool = *cutback;
|
||||
}
|
||||
|
||||
/*
|
||||
* Allocate a chunk of memory for a structure. Alignment is assumed to be
|
||||
* a power of 2. It could be generalized, if that ever becomes relevant.
|
||||
* Note that alignment is from the beginning of an allocated chunk, which
|
||||
* is guaranteed by ANSI to be as aligned as can possibly matter.
|
||||
*/
|
||||
void *
|
||||
memPoolAlloc(struct MemPool *pool, unsigned len, unsigned alignment)
|
||||
{
|
||||
char *p;
|
||||
unsigned t;
|
||||
|
||||
/* Where to allocate next object */
|
||||
p = pool->freeptr;
|
||||
/* How far it is from the beginning of the chunk. */
|
||||
t = p - (char *)pool->head;
|
||||
/* How much to round up freeptr to make alignment */
|
||||
t = -t & --alignment;
|
||||
|
||||
/* Okay, does it fit? */
|
||||
if (pool->freespace >= len+t) {
|
||||
pool->freespace -= len+t;
|
||||
p += t;
|
||||
pool->freeptr = p + len;
|
||||
return p;
|
||||
}
|
||||
|
||||
/* It does not fit in the current chunk. Go for a bigger chunk. */
|
||||
|
||||
/* First, figure out how much to skip at the beginning of the chunk */
|
||||
alignment &= -(unsigned)sizeof(struct PoolBuf);
|
||||
alignment += sizeof(struct PoolBuf);
|
||||
/* Then, figure out a chunk size that will fit */
|
||||
t = pool->chunksize;
|
||||
assert(t);
|
||||
while (len + alignment > t)
|
||||
t *= 2;
|
||||
while ((p = malloc(t)) == 0) {
|
||||
/* If that didn't work, try purging or smaller allocations */
|
||||
if (!pool->purge || !pool->purge(pool->purgearg)) {
|
||||
t /= 2;
|
||||
if (len + alignment > t)
|
||||
fputs("Out of memory!\n", stderr);
|
||||
exit (1); /* Failed */
|
||||
}
|
||||
}
|
||||
|
||||
/* Update the various pointers. */
|
||||
pool->totalsize += t;
|
||||
((struct PoolBuf *)p)->next = pool->head;
|
||||
((struct PoolBuf *)p)->size = t;
|
||||
pool->head = (struct PoolBuf *)p;
|
||||
pool->freespace = t - len - alignment;
|
||||
p += alignment;
|
||||
pool->freeptr = p + len;
|
||||
|
||||
return p;
|
||||
}
|
36
tools/mempool.h
Normal file
36
tools/mempool.h
Normal file
@ -0,0 +1,36 @@
|
||||
/* $Id: mempool.h,v 1.2 1997/11/13 23:53:09 colin Exp $ */
|
||||
|
||||
#ifndef MEMPOOL_H
|
||||
#define MEMPOOL_H
|
||||
|
||||
typedef struct MemPool {
|
||||
struct PoolBuf *head;
|
||||
char *freeptr;
|
||||
unsigned freespace;
|
||||
unsigned chunksize; /* Default starting point */
|
||||
unsigned long totalsize;
|
||||
int (*purge)(void *); /* Return non-zero to retry alloc */
|
||||
void *purgearg;
|
||||
} MemPool;
|
||||
|
||||
/* A global pool for miscellaneous stuff. */
|
||||
extern struct MemPool MiscPool;
|
||||
|
||||
/*
|
||||
* Nice clean interfaces
|
||||
*/
|
||||
void memPoolInit(struct MemPool *pool);
|
||||
void memPoolSetPurge(struct MemPool *pool, int (*purge)(void *), void *arg);
|
||||
void memPoolEmpty(struct MemPool *pool);
|
||||
void memPoolCutBack(struct MemPool *dest, struct MemPool const *cutback);
|
||||
void *memPoolAlloc(struct MemPool *pool, unsigned len, unsigned alignment);
|
||||
#ifdef DEADCODE
|
||||
char const *memPoolStore(struct MemPool *pool, char const *str);
|
||||
#endif
|
||||
|
||||
/* Lookie here! An ASNI-compliant alignment finder! */
|
||||
#define alignof(type) (sizeof(struct{type _x; char _y;}) - sizeof(type))
|
||||
|
||||
#define memPoolNew(pool, type) memPoolAlloc(pool, sizeof(type), alignof(type))
|
||||
|
||||
#endif /* MEMPOOL_H */
|
543
tools/munge.c
Normal file
543
tools/munge.c
Normal file
@ -0,0 +1,543 @@
|
||||
/*
|
||||
* munge.c -- Program to convert a text file into "munged" form,
|
||||
* suitable for reconstruction from printed form. Tabs are
|
||||
* made visible and checksums are added to each line and each
|
||||
* page to protect against transcription errors.
|
||||
*
|
||||
* Copyright (C) 1997 Pretty Good Privacy, Inc.
|
||||
*
|
||||
* Designed by Colin Plumb, Mark H. Weaver, and Philip R. Zimmermann
|
||||
* Written by Mark H. Weaver
|
||||
*
|
||||
* $Id: munge.c,v 1.32 1997/11/12 23:28:53 mhw Exp $
|
||||
*/
|
||||
|
||||
#include <stdio.h>
|
||||
#include <errno.h>
|
||||
#include <string.h>
|
||||
#include <ctype.h>
|
||||
#include <stdlib.h>
|
||||
|
||||
#include "util.h"
|
||||
|
||||
/*
|
||||
* The file is divided into pages, and the format of each page is
|
||||
*
|
||||
--f414 000b2dc79af40010002 Page 1 of munge.c
|
||||
|
||||
bc38e5 /*
|
||||
40a838 * munge.c -- Program to convert a text file into munged form
|
||||
647222 *
|
||||
193f28 * Copyright (C) 1997 Pretty Good Privacy, Inc.
|
||||
827222 *
|
||||
699025 * Designed by Colin Plumb, Mark H. Weaver, and Philip R. Zimmermann
|
||||
0d050c * Written by Mark H. Weaver
|
||||
*
|
||||
* Where the first 2 columns are the high 8 bits (in hex) of a running
|
||||
* CRC-32 of the page (the string "--", unlikely to be confused with
|
||||
* any digits, indicates a page header line) and the next 4 columns
|
||||
* are a CRC-16 of the rest of the line. Then a space (not counted in
|
||||
* the CRC), and the line of text. Tabs are printed as the currency
|
||||
* symbol (ISO Latin 1 character 164) followed by the appropriate number
|
||||
* of spaces, and any form feeds are printed as a yen symbol (Latin 1 165).
|
||||
* The CRC is computed on the transformed line, including the trailing
|
||||
* newline. No trailing whitespace is permitted.
|
||||
*
|
||||
* The header line contains a (hex) number of the form 0ffcccccccctpppnnnn,
|
||||
* where the digit 0 is a version number, ff are flags, ccccccc is the CRC-32
|
||||
* of the page, t is the tab size (usually 4 or 8; 0 for binary files that
|
||||
* are sent in radix-64), ppp is the product number (usually 1, different
|
||||
* for different books), and nnnn is the file number (sequential from 1).
|
||||
*
|
||||
* This is followed by " Page %u of " and the file name.
|
||||
*/
|
||||
|
||||
typedef struct MungeState
|
||||
{
|
||||
EncodeFormat const * fmt;
|
||||
EncodeFormat const * hFmt;
|
||||
int binaryMode, tabWidth;
|
||||
long origLineNumber;
|
||||
long productNumber, fileNumber, pageNumber, lineNumber;
|
||||
unsigned long fileOffset;
|
||||
CRC pageCRC;
|
||||
char const * fileName;
|
||||
char const * fileNameTail;
|
||||
char * pageBuffer; /* Buffer large enough to hold one page */
|
||||
char * pagePos; /* Current position in pageBuffer */
|
||||
word16 hdrFlags;
|
||||
FILE * file;
|
||||
FILE * out;
|
||||
} MungeState;
|
||||
|
||||
|
||||
void ChecksumLine(EncodeFormat const *fmt, char const *line, size_t length,
|
||||
char *prefix, CRC *pageCRC)
|
||||
{
|
||||
CRC lineCRC;
|
||||
CRC runCRCPart = 0;
|
||||
|
||||
lineCRC = CalculateCRC(fmt->lineCRC, 0, (byte const *)line, length);
|
||||
if (pageCRC != NULL)
|
||||
{
|
||||
*pageCRC = CalculateCRC(fmt->pageCRC, *pageCRC,
|
||||
(byte const *)line, length);
|
||||
runCRCPart = RunningCRCFromPageCRC(fmt, *pageCRC);
|
||||
}
|
||||
|
||||
prefix += EncodeCheckDigits(fmt, runCRCPart, fmt->runningCRCBits, prefix);
|
||||
prefix += EncodeCheckDigits(fmt, lineCRC, fmt->lineCRC->bits, prefix);
|
||||
|
||||
*prefix++ = ' '; /* Write a space over the null byte */
|
||||
}
|
||||
|
||||
/* Returns 1 for convenience */
|
||||
int PrintFileError(MungeState *state, char const *message)
|
||||
{
|
||||
fprintf(stderr, "%s in %s %s %lu\n", message, state->fileName,
|
||||
state->binaryMode ? "offset" : "line",
|
||||
state->binaryMode ? state->fileOffset : state->origLineNumber);
|
||||
return 1;
|
||||
}
|
||||
|
||||
int MungeLine(MungeState *state, char *buffer, int length,
|
||||
char *line, int *bufferUsed)
|
||||
{
|
||||
int i = 0, j = 0, jOld = 0;
|
||||
char ch;
|
||||
|
||||
for (i = 0; i < length && j < LINE_LENGTH; i++)
|
||||
{
|
||||
jOld = j;
|
||||
ch = buffer[i];
|
||||
if (ch == '\t')
|
||||
{
|
||||
line[j++] = TAB_CHAR;
|
||||
if (state->tabWidth < 1)
|
||||
return PrintFileError(state,
|
||||
"ERROR: Tab found in radix64 stream");
|
||||
else
|
||||
while (j % state->tabWidth && j < LINE_LENGTH)
|
||||
line[j++] = TAB_PAD_CHAR;
|
||||
}
|
||||
else if (ch == '\n')
|
||||
{
|
||||
if (i + 1 < length)
|
||||
return PrintFileError(state,
|
||||
"UNEXPECTED ERROR: fgets read past newline!?");
|
||||
break;
|
||||
}
|
||||
else if (ch == '\f')
|
||||
{
|
||||
break;
|
||||
}
|
||||
else if (ch == ' ' && (j <= 0 || line[j-1] == ' ' ||
|
||||
line[j-1] == SPACE_CHAR ||
|
||||
i+1 >= length || buffer[i+1] == '\n'))
|
||||
{
|
||||
line[j++] = SPACE_CHAR;
|
||||
}
|
||||
else if (ch >= ' ' && ch <= '~')
|
||||
line[j++] = ch;
|
||||
else
|
||||
return PrintFileError(state, "ERROR: Non-ASCII char");
|
||||
}
|
||||
|
||||
if (i < length && buffer[i] == '\n')
|
||||
{
|
||||
i++;
|
||||
state->origLineNumber++;
|
||||
}
|
||||
else if (i < length && buffer[i] == '\f' && j < LINE_LENGTH)
|
||||
{
|
||||
i++;
|
||||
line[j++] = FORMFEED_CHAR;
|
||||
}
|
||||
else
|
||||
{
|
||||
/* If there's no newline, we need to add the continuation marker */
|
||||
if (i > 0 && j >= LINE_LENGTH)
|
||||
{
|
||||
/* Remove the last character if we're out of room */
|
||||
i--;
|
||||
j = jOld;
|
||||
}
|
||||
line[j++] = CONTIN_CHAR;
|
||||
}
|
||||
|
||||
/* Strip trailing spaces */
|
||||
while (j > 0 && isspace((unsigned char)line[j - 1]))
|
||||
j--;
|
||||
|
||||
if (j > LINE_LENGTH) /* This should never happen */
|
||||
return PrintFileError(state, "ERROR: Internal error, line too long");
|
||||
|
||||
/* Add trailing newline and NULL */
|
||||
line[j++] = '\n';
|
||||
line[j++] = '\0';
|
||||
|
||||
/* Return number of chars used from buffer */
|
||||
*bufferUsed = i;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void
|
||||
Encode3(byte const src[3], char dest[4])
|
||||
{
|
||||
dest[0] = radix64Digits[ (src[0]>>2 & 0x3f)];
|
||||
dest[1] = radix64Digits[(src[0]<<4 & 0x30) | (src[1]>>4 & 0x0f)];
|
||||
dest[2] = radix64Digits[(src[1]<<2 & 0x3c) | (src[2]>>6 & 0x03)];
|
||||
dest[3] = radix64Digits[(src[2] & 0x3f)];
|
||||
}
|
||||
|
||||
static int
|
||||
EncodeLine(byte const *src, int srcLen, char *dest)
|
||||
{
|
||||
char * destp = dest;
|
||||
byte tempSrc[3];
|
||||
|
||||
for (; srcLen >= 3; srcLen -= 3)
|
||||
{
|
||||
Encode3(src, destp);
|
||||
src += 3; destp += 4;
|
||||
}
|
||||
|
||||
if (srcLen > 0)
|
||||
{
|
||||
memset(tempSrc, 0, sizeof(tempSrc));
|
||||
memcpy(tempSrc, src, srcLen);
|
||||
Encode3(src, destp);
|
||||
src += 3; destp += 4; srcLen -= 3;
|
||||
while (srcLen < 0)
|
||||
destp[srcLen++] = RADIX64_END_CHAR;
|
||||
}
|
||||
|
||||
return destp - dest;
|
||||
}
|
||||
|
||||
static int
|
||||
MungeBinaryLine(MungeState *state, byte const *buffer, int length, char *line)
|
||||
{
|
||||
char binLine[128];
|
||||
int binLength; /* Destination length */
|
||||
int used;
|
||||
|
||||
binLength = EncodeLine(buffer, length, binLine);
|
||||
|
||||
/* Append newline */
|
||||
binLine[binLength++] = '\n';
|
||||
binLine[binLength] = '\0';
|
||||
|
||||
return MungeLine(state, binLine, binLength, line, &used);
|
||||
}
|
||||
|
||||
int MaybePageBreak(MungeState *state)
|
||||
{
|
||||
EncodeFormat const * fmt = state->fmt;
|
||||
EncodeFormat const * hFmt = state->hFmt;
|
||||
|
||||
if (state->lineNumber >= LINES_PER_PAGE)
|
||||
{
|
||||
char line[512];
|
||||
char * lineData = line + PREFIX_LENGTH;
|
||||
char * p = lineData;
|
||||
|
||||
p += EncodeCheckDigits(hFmt, 0, HDR_VERSION_BITS, p);
|
||||
p += EncodeCheckDigits(hFmt, state->hdrFlags, HDR_FLAG_BITS, p);
|
||||
p += EncodeCheckDigits(hFmt, state->pageCRC, fmt->pageCRC->bits, p);
|
||||
p += EncodeCheckDigits(hFmt, state->tabWidth, HDR_TABWIDTH_BITS, p);
|
||||
p += EncodeCheckDigits(hFmt, state->productNumber, HDR_PRODNUM_BITS, p);
|
||||
p += EncodeCheckDigits(hFmt, state->fileNumber, HDR_FILENUM_BITS, p);
|
||||
|
||||
sprintf(p, " Page %ld of %s\n", state->pageNumber + 1,
|
||||
state->fileNameTail);
|
||||
|
||||
if (strlen(lineData) > LINE_LENGTH + 1)
|
||||
{
|
||||
PrintFileError(state, "ERROR: Header line too long");
|
||||
fprintf(stderr, "> %s", lineData);
|
||||
return -1;
|
||||
}
|
||||
|
||||
/* Compute checksums and prefix them to line */
|
||||
ChecksumLine(fmt, lineData, strlen(lineData), line, NULL);
|
||||
|
||||
fprintf(state->out, "%c%c%s\n%s\f", HDR_PREFIX_CHAR,
|
||||
fmt->headerTypeChar, line + 2, state->pageBuffer);
|
||||
|
||||
state->pageNumber++;
|
||||
state->lineNumber = 0;
|
||||
state->pageCRC = 0;
|
||||
state->pagePos = state->pageBuffer; /* Clear page buffer */
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
/*
|
||||
* Search for Emacs "tab-width: " maker in file.
|
||||
* Emacs is stricter about the format, but this will do.
|
||||
*/
|
||||
int FindTabWidth(MungeState *state)
|
||||
{
|
||||
char const * const tabWidthMarker = " tab-width: ";
|
||||
char buffer[512];
|
||||
char * p;
|
||||
int length;
|
||||
int tabWidth = 0;
|
||||
|
||||
fseek(state->file, -(sizeof(buffer) - 1), SEEK_END);
|
||||
length = fread(buffer, 1, sizeof(buffer) - 1, state->file);
|
||||
buffer[length] = '\0';
|
||||
p = strstr(buffer, tabWidthMarker);
|
||||
if (p != NULL)
|
||||
{
|
||||
p += strlen(tabWidthMarker);
|
||||
while (*p != '\0' && *p != '\n' && isspace(*p))
|
||||
p++;
|
||||
tabWidth = strtol(p, &p, 10);
|
||||
while (*p != '\0' && *p != '\n' && isspace(*p))
|
||||
p++;
|
||||
if (*p != '\n' || tabWidth < 2)
|
||||
tabWidth = 0;
|
||||
else if (tabWidth > 16)
|
||||
fprintf(stderr, "WARNING: Weird tab-width (%d), %s\n",
|
||||
tabWidth, state->fileName);
|
||||
}
|
||||
return tabWidth;
|
||||
}
|
||||
|
||||
/*
|
||||
* Open the given source file and send the munged output to the
|
||||
* FILE *, with the given options.
|
||||
*/
|
||||
int MungeFile(char const *fileName, FILE *out, EncodeFormat const *fmt,
|
||||
int binaryMode, int defaultTabWidth,
|
||||
long productNumber, long fileNumber)
|
||||
{
|
||||
MungeState * state;
|
||||
int length, used;
|
||||
char line[PREFIX_LENGTH + LINE_LENGTH + 10];
|
||||
char * lineData = line + PREFIX_LENGTH;
|
||||
char buffer[128];
|
||||
int result = 0;
|
||||
|
||||
state = (MungeState *)calloc(1, sizeof(*state));
|
||||
state->fmt = fmt;
|
||||
state->hFmt = &hexFormat;
|
||||
state->origLineNumber = 1;
|
||||
state->fileName = fileName;
|
||||
state->pageCRC = 0;
|
||||
state->productNumber = productNumber;
|
||||
state->fileNumber = fileNumber;
|
||||
state->pageNumber = 0;
|
||||
state->lineNumber = 0;
|
||||
state->fileOffset = 0;
|
||||
state->binaryMode = binaryMode;
|
||||
state->pageBuffer = malloc(PAGE_BUFFER_SIZE);
|
||||
state->pageBuffer[0] = '\0';
|
||||
state->pagePos = state->pageBuffer;
|
||||
state->hdrFlags = 0;
|
||||
state->out = out;
|
||||
|
||||
state->fileNameTail = strrchr(state->fileName, '/');
|
||||
if (state->fileNameTail == NULL)
|
||||
state->fileNameTail = state->fileName;
|
||||
else
|
||||
state->fileNameTail++;
|
||||
|
||||
state->file = fopen(state->fileName, binaryMode ? "rb" : "r");
|
||||
if (state->file == NULL)
|
||||
{
|
||||
result = errno;
|
||||
fprintf(stderr, "ERROR opening %s: %s\n",
|
||||
state->fileName, strerror(result));
|
||||
goto error;
|
||||
}
|
||||
|
||||
if (state->binaryMode)
|
||||
{
|
||||
state->tabWidth = 0;
|
||||
}
|
||||
else
|
||||
{
|
||||
state->tabWidth = FindTabWidth(state);
|
||||
if (state->tabWidth == 0)
|
||||
state->tabWidth = defaultTabWidth;
|
||||
rewind(state->file);
|
||||
}
|
||||
|
||||
while (!feof(state->file))
|
||||
{
|
||||
if (state->binaryMode)
|
||||
{
|
||||
length = fread(buffer, 1, BYTES_PER_LINE, state->file);
|
||||
if (length < 1)
|
||||
{
|
||||
if (feof(state->file))
|
||||
break;
|
||||
goto fileError;
|
||||
}
|
||||
if ((result = MaybePageBreak(state)))
|
||||
goto error;
|
||||
if ((result = MungeBinaryLine(state, buffer, length, lineData)))
|
||||
goto error;
|
||||
state->fileOffset += length;
|
||||
}
|
||||
else
|
||||
{
|
||||
if (fgets(buffer, sizeof(buffer), state->file) == NULL)
|
||||
{
|
||||
if (feof(state->file))
|
||||
break;
|
||||
goto fileError;
|
||||
}
|
||||
length = strlen(buffer);
|
||||
if ((result = MaybePageBreak(state)))
|
||||
goto error;
|
||||
if ((result = MungeLine(state, buffer, length, lineData, &used)))
|
||||
goto error;
|
||||
|
||||
if (used < length)
|
||||
if (fseek(state->file, used - length, SEEK_CUR))
|
||||
goto fileError;
|
||||
}
|
||||
|
||||
/* Compute checksums and prefix them to the line */
|
||||
ChecksumLine(fmt, lineData, strlen(lineData), line, &state->pageCRC);
|
||||
|
||||
strcpy(state->pagePos, line);
|
||||
length = strlen(state->pagePos);
|
||||
/* Suppress trailing whitespace on blank lines */
|
||||
if (length == PREFIX_LENGTH+1 && state->pagePos[length-1] == '\n') {
|
||||
state->pagePos[--length-1] = '\n';
|
||||
state->pagePos[length] = '\0';
|
||||
}
|
||||
state->pagePos += length;
|
||||
|
||||
state->lineNumber++;
|
||||
}
|
||||
|
||||
if (state->lineNumber > 0)
|
||||
{
|
||||
/* Force a final page break */
|
||||
state->lineNumber = LINES_PER_PAGE;
|
||||
state->hdrFlags |= HDR_FLAG_LASTPAGE;
|
||||
if ((result = MaybePageBreak(state)))
|
||||
goto error;
|
||||
}
|
||||
|
||||
result = 0;
|
||||
goto done;
|
||||
|
||||
fileError:
|
||||
result = ferror(state->file);
|
||||
|
||||
error:
|
||||
done:
|
||||
if (state != NULL)
|
||||
{
|
||||
if (state->file != NULL)
|
||||
fclose(state->file);
|
||||
free(state);
|
||||
}
|
||||
return result;
|
||||
}
|
||||
|
||||
int main(int argc, char *argv[])
|
||||
{
|
||||
int result = 0;
|
||||
int i, j;
|
||||
int defaultTabWidth = 4;
|
||||
int binaryMode = 0;
|
||||
long productNumber = 1;
|
||||
long fileNumber = 1;
|
||||
char * endOfNumber;
|
||||
EncodeFormat const * fmt = NULL;
|
||||
|
||||
InitUtil();
|
||||
|
||||
for (i = 1; i < argc && argv[i][0] == '-'; i++)
|
||||
{
|
||||
if (0 == strcmp(argv[i], "--"))
|
||||
{
|
||||
i++;
|
||||
break;
|
||||
}
|
||||
for (j = 1; argv[i][j] != '\0'; j++)
|
||||
{
|
||||
if (isdigit(argv[i][j]))
|
||||
{
|
||||
defaultTabWidth = argv[i][j] - '0';
|
||||
if (defaultTabWidth < 2 || defaultTabWidth > 9)
|
||||
fprintf(stderr, "WARNING: Weird default tab-width (%d)\n",
|
||||
defaultTabWidth);
|
||||
}
|
||||
else if (argv[i][j] == 'b')
|
||||
{
|
||||
binaryMode = 1;
|
||||
}
|
||||
else if (argv[i][j] == 'F')
|
||||
{
|
||||
fmt = FindFormat(argv[i][j+1]);
|
||||
if (!fmt || argv[i][j+2] != '\0')
|
||||
{
|
||||
fprintf(stderr, "ERROR: Invalid format char\n");
|
||||
exit(1);
|
||||
}
|
||||
break;
|
||||
}
|
||||
else if (argv[i][j] == 'p')
|
||||
{
|
||||
productNumber = strtol(&argv[i][j+1], &endOfNumber, 10);
|
||||
if (*endOfNumber != '\0')
|
||||
{
|
||||
fprintf(stderr, "ERROR: Invalid product number\n");
|
||||
exit(1);
|
||||
}
|
||||
break;
|
||||
}
|
||||
else if (argv[i][j] == 'f')
|
||||
{
|
||||
fileNumber = strtol(&argv[i][j+1], &endOfNumber, 10);
|
||||
if (*endOfNumber != '\0')
|
||||
{
|
||||
fprintf(stderr, "ERROR: Invalid file number\n");
|
||||
exit(1);
|
||||
}
|
||||
break;
|
||||
}
|
||||
else
|
||||
{
|
||||
fprintf(stderr, "ERROR: Unrecognized option -%c\n", argv[i][j]);
|
||||
exit(1);
|
||||
}
|
||||
}
|
||||
}
|
||||
if (!fmt)
|
||||
fmt = binaryMode ? &radix64Format : &hexFormat;
|
||||
|
||||
for (; i < argc; i++)
|
||||
{
|
||||
if ((result = MungeFile(argv[i], stdout, fmt, binaryMode,
|
||||
defaultTabWidth, productNumber,
|
||||
fileNumber)) != 0)
|
||||
{
|
||||
/* If result > 0, message should have already been printed */
|
||||
if (result < 0)
|
||||
fprintf(stderr, "ERROR: %s\n", strerror(result));
|
||||
exit(1);
|
||||
}
|
||||
fileNumber++;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/*
|
||||
* Local Variables:
|
||||
* tab-width: 4
|
||||
* End:
|
||||
* vi: ts=4 sw=4
|
||||
* vim: si
|
||||
*/
|
324
tools/psgen
Normal file
324
tools/psgen
Normal file
@ -0,0 +1,324 @@
|
||||
#!/usr/bin/perl
|
||||
#
|
||||
# psgen -- Postscript generator for code portion of source books
|
||||
#
|
||||
# Reads in a list of files/dirs from <filelist>, runs munge on each of
|
||||
# them, and generates a single postscript file to stdout. The page numbers
|
||||
# for each file/dir are put into the file <pagenums>.
|
||||
#
|
||||
# usage: psgen [ options... ] <filelist> <pagenums> <volume #> > foo.ps
|
||||
# -l<firstLogicalPage>
|
||||
# -p<firstPhysicalPage>
|
||||
# -f<font>
|
||||
# -D<defs> (passed to yapp)
|
||||
# -P<productNumber>
|
||||
# -o<mungedOutFile>
|
||||
# -e (auto edit errors)
|
||||
#
|
||||
# $Id: psgen,v 1.18 1997/11/13 21:44:16 colin Exp $
|
||||
#
|
||||
|
||||
$bookRoot = $ENV{"BOOKROOT"} || ".";
|
||||
$toolsDir = "$bookRoot/tools";
|
||||
$psDir = "$bookRoot/ps";
|
||||
$editor = $ENV{"EDITOR"} || "vi";
|
||||
|
||||
# Configuration settings - external file names
|
||||
$mungeProg = "$toolsDir/munge";
|
||||
$yappProg = "$toolsDir/yapp";
|
||||
$preambleFile = "$psDir/prolog.ps";
|
||||
$tempFile = "/tmp/psgen-$$";
|
||||
|
||||
# Parse arguments
|
||||
$firstLogPage = $firstPhysPage = 0;
|
||||
$productNumber = 1;
|
||||
$font = "OCRB";
|
||||
$autoEdit = 0;
|
||||
while ($#ARGV >= 0 && $ARGV[0] =~ /^-/)
|
||||
{
|
||||
$_ = shift @ARGV;
|
||||
if (/^--$/)
|
||||
{
|
||||
last;
|
||||
}
|
||||
elsif (/^-l(\d+)$/)
|
||||
{
|
||||
$firstLogPage = $1;
|
||||
}
|
||||
elsif (/^-p(\d+)$/)
|
||||
{
|
||||
$firstPhysPage = $1;
|
||||
}
|
||||
elsif (/^-f(.+)$/)
|
||||
{
|
||||
$font = $1;
|
||||
}
|
||||
elsif (/^-D(.+)$/)
|
||||
{
|
||||
$yappDefs .= " " . $_;
|
||||
}
|
||||
elsif (/^-P(\d+)$/)
|
||||
{
|
||||
$productNumber = $1;
|
||||
}
|
||||
elsif (/^-o(.+)$/)
|
||||
{
|
||||
$mungedOutFile = $1;
|
||||
}
|
||||
elsif (/^-e$/)
|
||||
{
|
||||
$autoEdit = 1;
|
||||
}
|
||||
else
|
||||
{
|
||||
&Error("Unrecognized option: '$_'");
|
||||
}
|
||||
}
|
||||
$fileListFile = shift @ARGV || die "Missing file list argument (arg 1)";
|
||||
$pageNumFile = shift @ARGV || die "Missing page number file argument (arg 2)";
|
||||
$volume = shift @ARGV || die "Missing volume number argument (arg 3)";
|
||||
|
||||
# Determine initial page numbers
|
||||
{
|
||||
my $nextLogPage = 1;
|
||||
my $nextPhysPage = 3;
|
||||
my $volNum = 0; # Which volume's page numbers we're reading
|
||||
|
||||
if ($volume > 1)
|
||||
{
|
||||
open(OLDPAGENUMS, "<$pageNumFile") || die;
|
||||
while (<OLDPAGENUMS>)
|
||||
{
|
||||
if (/^Volume\s+(\d+)$/)
|
||||
{
|
||||
$volNum = $1;
|
||||
}
|
||||
elsif (/^Next:\s+(\d+)\s*$/ && $volNum == $volume - 1)
|
||||
{
|
||||
$nextLogPage = $1;
|
||||
}
|
||||
}
|
||||
close(OLDPAGENUMS);
|
||||
}
|
||||
else
|
||||
{
|
||||
unlink($pageNumFile);
|
||||
}
|
||||
$firstLogPage = $nextLogPage if ($firstLogPage == 0);
|
||||
$firstPhysPage = $nextPhysPage if ($firstPhysPage == 0);
|
||||
}
|
||||
|
||||
# Names of PostScript operators invoked. These are the interface
|
||||
# between this file and the $preambleFile.
|
||||
$oddPageStartPS = "OddPageStart";
|
||||
$evenPageStartPS = "EvenPageStart";
|
||||
$oddPageEndPS = "OddPageEnd";
|
||||
$evenPageEndPS = "EvenPageEnd";
|
||||
$dirPagePS = "DirPage";
|
||||
# This is short because it's emitted every line
|
||||
$linePS = "L";
|
||||
|
||||
# Handle an error from munge.
|
||||
# A result of 0 means to retry, 1 means to exit
|
||||
sub MungeError
|
||||
{
|
||||
my $result = 1;
|
||||
|
||||
open(FILEH, "<$tempFile") || die;
|
||||
while (<FILEH>)
|
||||
{
|
||||
print STDERR;
|
||||
if (/ in (.*) line (\d+)$/)
|
||||
{
|
||||
my ($fileName, $lineNumber) = ($1, $2);
|
||||
|
||||
if ($autoEdit)
|
||||
{
|
||||
my @statResult = stat($fileName);
|
||||
my $oldMTime = $statResult[9];
|
||||
|
||||
system("'$editor' '+$lineNumber' '$fileName' 1>&2");
|
||||
@statResult = stat($fileName);
|
||||
$result = ($statResult[9] == $oldMTime);
|
||||
last;
|
||||
}
|
||||
}
|
||||
}
|
||||
close(FILEH);
|
||||
unlink($tempFile) || die "Couldn't unlink $tempFile";
|
||||
return $result;
|
||||
}
|
||||
|
||||
sub CopyFileToPS
|
||||
{
|
||||
local $fileName = $_[0];
|
||||
local $args = "'-I$psDir' '-Dfont=$font'";
|
||||
local $_;
|
||||
|
||||
$args .= $yappDefs;
|
||||
open(FILEH, "$yappProg $args '$fileName' |") || die;
|
||||
while (<FILEH>)
|
||||
{
|
||||
print PSOUT $_;
|
||||
}
|
||||
close(FILEH) || exit(1);
|
||||
1;
|
||||
}
|
||||
|
||||
# Wrap a string in parens as required by PostScript, with proper quoting.
|
||||
sub StringPS
|
||||
{
|
||||
local $str = $_[0];
|
||||
|
||||
$str =~ s/([\\()])/\\$1/g;
|
||||
"(" . $str . ")";
|
||||
}
|
||||
|
||||
# Emit a start of page. The Postscript DSC %%Page: header
|
||||
# (followed by logical page number, then physical) and
|
||||
# the top-of-page function (which is passed the page number as a string)
|
||||
sub PageStartPS
|
||||
{
|
||||
local $pageNum = $_[0];
|
||||
|
||||
"%%Page: " . ($pageNum + $firstLogPage) . " " .
|
||||
($pageNum + $firstPhysPage) . "\n" .
|
||||
&StringPS($pageNum + $firstLogPage) .
|
||||
((($pageNum + $firstLogPage) % 2) ? $oddPageStartPS
|
||||
: $evenPageStartPS) . "\n";
|
||||
}
|
||||
|
||||
sub PageEndPS
|
||||
{
|
||||
local $pageNum = $_[0];
|
||||
|
||||
((($pageNum + $firstLogPage) % 2) ? $oddPageEndPS : $evenPageEndPS) . "\n";
|
||||
}
|
||||
|
||||
# Save the page number to a table-of-contents file
|
||||
sub SavePageNum
|
||||
{
|
||||
local ($fileName, $pageNum) = @_;
|
||||
|
||||
print PAGENUMS ($pageNum + $firstLogPage), ": $fileName\n";
|
||||
}
|
||||
|
||||
# The main code.
|
||||
|
||||
open(PSOUT, ">-") || die;
|
||||
open(FILELIST, "<$fileListFile") || die;
|
||||
open(PAGENUMS, ">>$pageNumFile") || die;
|
||||
if ($mungedOutFile ne "")
|
||||
{
|
||||
open(MUNGEDOUT, ">$mungedOutFile") || die;
|
||||
}
|
||||
|
||||
print PAGENUMS "Volume $volume\n";
|
||||
|
||||
&CopyFileToPS($preambleFile);
|
||||
|
||||
$fileNumber = 0;
|
||||
$pageNum = 0; # This is 0-based, since it is added to $first{Log,Phys}Page
|
||||
$enable = 0;
|
||||
|
||||
while (<FILELIST>)
|
||||
{
|
||||
/^([VDTB])(\S*)\s+(.*)/ || die "Illegal file list line $.";
|
||||
|
||||
local ($fileType, $options, $arg) = ($1, $2, $3);
|
||||
|
||||
if ($fileType eq "V")
|
||||
{
|
||||
@args = split(/\s+/, $arg);
|
||||
if ($enable = ($args[0] == $volume))
|
||||
{
|
||||
$defaultTabWidth = int($args[1]);
|
||||
}
|
||||
}
|
||||
elsif ($fileType eq "D")
|
||||
{
|
||||
next unless $enable; # Do nothing if we're in the wrong volume
|
||||
$dirName = $arg;
|
||||
&SavePageNum($dirName, $pageNum);
|
||||
print PSOUT &PageStartPS($pageNum);
|
||||
print PSOUT &StringPS($dirName), $dirPagePS, "\n";
|
||||
print PSOUT &PageEndPS($pageNum);
|
||||
$pageNum++;
|
||||
}
|
||||
else
|
||||
{
|
||||
my $done = 0;
|
||||
|
||||
$fileNumber++;
|
||||
$fileName = $arg;
|
||||
next unless $enable; # Do nothing if we're in the wrong volume
|
||||
&SavePageNum($fileName, $pageNum);
|
||||
$quotedFileName = $fileName;
|
||||
$quotedFileName =~ s/'/\\'/g;
|
||||
$tabWidth = ($options =~ /(\d)/) ? $1 : $defaultTabWidth;
|
||||
$args = ($fileType eq "B") ? "-b" : "";
|
||||
$args .= " -$tabWidth -p$productNumber -f$fileNumber";
|
||||
while (!$done)
|
||||
{
|
||||
if (open(FILE, "$mungeProg $args '$quotedFileName' 2>$tempFile |"))
|
||||
{
|
||||
$line = <FILE>;
|
||||
print MUNGEDOUT $line;
|
||||
|
||||
while ($line ne "")
|
||||
{
|
||||
print PSOUT &PageStartPS($pageNum);
|
||||
|
||||
while ($line ne "" and $line !~ /^\f/)
|
||||
{
|
||||
chop $line;
|
||||
print PSOUT &StringPS($line), $linePS, "\n";
|
||||
$line = <FILE>;
|
||||
print MUNGEDOUT $line;
|
||||
}
|
||||
$line =~ s/^\f//;
|
||||
|
||||
print PSOUT &PageEndPS($pageNum);
|
||||
$pageNum++;
|
||||
}
|
||||
|
||||
if (close(FILE))
|
||||
{
|
||||
$done = 2;
|
||||
}
|
||||
else
|
||||
{
|
||||
$done = &MungeError();
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
$done = &MungeError();
|
||||
}
|
||||
}
|
||||
if ($done == 1)
|
||||
{
|
||||
die;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# Print PostScript DSC trailer with the correct number of pages
|
||||
print PSOUT "%%Trailer\n%%Pages: ", $pageNum, "\n%%EOF\n";
|
||||
|
||||
print PAGENUMS "Pages: ", $pageNum, "\n";
|
||||
print PAGENUMS "Next: ", ((($pageNum+1) & ~1) + $firstLogPage), "\n";
|
||||
|
||||
close(PAGENUMS) || die;
|
||||
close(FILELIST) || die;
|
||||
close(PSOUT) || die;
|
||||
|
||||
if ($mungedOutFile ne "")
|
||||
{
|
||||
close(MUNGEDOUT) || die;
|
||||
}
|
||||
|
||||
#
|
||||
# vi: ai ts=4
|
||||
# vim: si
|
||||
#
|
1851
tools/repair.c
Normal file
1851
tools/repair.c
Normal file
File diff suppressed because it is too large
Load Diff
185
tools/sortpages
Normal file
185
tools/sortpages
Normal file
@ -0,0 +1,185 @@
|
||||
#!/usr/bin/perl
|
||||
#
|
||||
# $Id: sortpages,v 1.8 1997/12/11 19:20:58 mhw Exp $
|
||||
#
|
||||
|
||||
@fileNameFromNumber = ();
|
||||
@pagesFound = ();
|
||||
$theProductNumber = 0;
|
||||
|
||||
for $fileIndex (0..$#ARGV)
|
||||
{
|
||||
$fileName = $ARGV[$fileIndex];
|
||||
open(FILE, "<$fileName") || die;
|
||||
while (!eof(FILE))
|
||||
{
|
||||
$filePos = tell(FILE);
|
||||
$_ = <FILE>;
|
||||
if (/^\f?-\S/)
|
||||
{
|
||||
my ($versionHex, $flagsHex, $pageCRCHex, $tabWidthHex,
|
||||
$productNumberHex, $fileNumberHex, $pageNumber, $name)
|
||||
= (/^\f?-\S\S{4}\ # CRC followed by a space
|
||||
([0-9a-f]) # Format version
|
||||
([0-9a-f]{2}) # Flags
|
||||
([0-9a-f]{8}) # Running CRC32
|
||||
([0-9a-f]) # Tab width (0 means radix64)
|
||||
([0-9a-f]{3}) # Product number
|
||||
([0-9a-f]{4}) # File number
|
||||
\ Page\ (\d+)\ of\ (.*)/x);
|
||||
my $version = hex($versionHex);
|
||||
my $flags = hex($flagsHex);
|
||||
my $productNumber = hex($productNumberHex);
|
||||
my $fileNumber = hex($fileNumberHex);
|
||||
|
||||
unless ($version == 0 && $productNumber > 0
|
||||
&& $fileNumber > 0 && $pageNumber > 0
|
||||
&& $name ne "")
|
||||
{
|
||||
print STDERR "ERROR: Invalid header info ",
|
||||
"at $fileName line $.\n";
|
||||
exit(1);
|
||||
}
|
||||
|
||||
if (!defined($fileNameFromNumber[$fileNumber]))
|
||||
{
|
||||
$fileNameFromNumber[$fileNumber] = $name;
|
||||
}
|
||||
elsif ($fileNameFromNumber[$fileNumber] ne $name)
|
||||
{
|
||||
print STDERR "ERROR: Mismatched filename ",
|
||||
"at $fileName line $.\n";
|
||||
exit(1);
|
||||
}
|
||||
|
||||
if (!$theProductNumber)
|
||||
{
|
||||
$theProductNumber = $productNumber;
|
||||
}
|
||||
elsif ($theProductNumber != $productNumber)
|
||||
{
|
||||
print STDERR "ERROR: Different product number ",
|
||||
"at $fileName line $.\n";
|
||||
exit(1);
|
||||
}
|
||||
|
||||
push @pagesFound, (sprintf "%5d:%4d:%d:%d:%d",
|
||||
$fileNumber, $pageNumber, $flags, $fileIndex, $filePos);
|
||||
}
|
||||
}
|
||||
close(FILE) || die;
|
||||
}
|
||||
|
||||
@pagesFound = sort @pagesFound;
|
||||
|
||||
$result = 0;
|
||||
$lastFileNumber = 0;
|
||||
$lastPageNumber = 0;
|
||||
$nextFileNumber = 1;
|
||||
$nextPageNumber = 1;
|
||||
$fileIndexOpen = -1;
|
||||
foreach (@pagesFound)
|
||||
{
|
||||
my ($fileNumber, $pageNumber, $flags, $fileIndex, $filePos) = split /:/;
|
||||
|
||||
$fileNumber = int($fileNumber);
|
||||
$pageNumber = int($pageNumber);
|
||||
|
||||
if ($fileNumber == $lastFileNumber && $pageNumber == $lastPageNumber)
|
||||
{
|
||||
print STDERR "DUPLICATE: File $fileNumber, page $pageNumber, skipped\n";
|
||||
next;
|
||||
}
|
||||
|
||||
if ($nextFileNumber < $fileNumber && $nextPageNumber != 1)
|
||||
{
|
||||
print STDERR "MISSING: File $nextFileNumber, ",
|
||||
"pages $nextPageNumber - END\n";
|
||||
$nextPageNumber = 1;
|
||||
$nextFileNumber++;
|
||||
$result = 1;
|
||||
}
|
||||
if ($nextFileNumber < $fileNumber)
|
||||
{
|
||||
print STDERR "MISSING: Files $nextFileNumber - ",
|
||||
$fileNumber-1, "\n";
|
||||
$nextFileNumber = $fileNumber;
|
||||
$nextPageNumber = 1;
|
||||
$result = 1;
|
||||
}
|
||||
if ($nextFileNumber != $fileNumber)
|
||||
{
|
||||
print STDERR "ERROR: Internal error, unexpected fileNumber\n";
|
||||
exit(1);
|
||||
}
|
||||
|
||||
if ($nextPageNumber < $pageNumber)
|
||||
{
|
||||
print STDERR "MISSING: File $fileNumber, pages $nextPageNumber - ",
|
||||
$pageNumber-1, "\n";
|
||||
$nextPageNumber = $pageNumber;
|
||||
$result = 1;
|
||||
}
|
||||
if ($nextPageNumber != $pageNumber)
|
||||
{
|
||||
print STDERR "ERROR: Internal error, unexpected pageNumber\n";
|
||||
exit(1);
|
||||
}
|
||||
|
||||
if ($fileIndexOpen != $fileIndex)
|
||||
{
|
||||
if ($fileIndexOpen >= 0)
|
||||
{
|
||||
close(FILE) || die;
|
||||
$fileIndexOpen = -1;
|
||||
}
|
||||
$fileName = $ARGV[$fileIndex];
|
||||
open(FILE, "<$fileName") || die;
|
||||
$fileIndexOpen = $fileIndex;
|
||||
}
|
||||
seek(FILE, $filePos, 0) || die($!);
|
||||
|
||||
$_ = <FILE>;
|
||||
print;
|
||||
while (<FILE>)
|
||||
{
|
||||
last if /^\f?-\S/;
|
||||
print;
|
||||
}
|
||||
$lastFileNumber = $fileNumber;
|
||||
$lastPageNumber = $pageNumber;
|
||||
|
||||
if ($flags & 1) # Bit 0 of flags indicates last page of file
|
||||
{
|
||||
$nextFileNumber++;
|
||||
$nextPageNumber = 1;
|
||||
}
|
||||
else
|
||||
{
|
||||
$nextPageNumber++;
|
||||
}
|
||||
}
|
||||
|
||||
if ($nextPageNumber != 1)
|
||||
{
|
||||
print STDERR "MISSING: File $nextFileNumber, ",
|
||||
"pages $nextPageNumber - END\n";
|
||||
$nextPageNumber = 1;
|
||||
$nextFileNumber++;
|
||||
$result = 1;
|
||||
}
|
||||
|
||||
print STDERR "Highest file number encountered: ", $nextFileNumber - 1, "\n";
|
||||
|
||||
if ($fileIndexOpen >= 0)
|
||||
{
|
||||
close(FILE) || die;
|
||||
$fileIndexOpen = -1;
|
||||
}
|
||||
|
||||
exit($result);
|
||||
|
||||
#
|
||||
# vi: ai ts=4
|
||||
# vim: si
|
||||
#
|
222
tools/subst.c
Normal file
222
tools/subst.c
Normal file
@ -0,0 +1,222 @@
|
||||
/*
|
||||
* subst.c -- Repair substitution tables
|
||||
*
|
||||
* Copyright (C) 1997 Pretty Good Privacy, Inc.
|
||||
*
|
||||
* Written by Colin Plumb
|
||||
*
|
||||
* $Id: subst.c,v 1.14 1997/11/03 22:12:00 colin Exp $
|
||||
*
|
||||
* IT IS EXPECTED that users of this program will play with these tables
|
||||
* and the cost values in the subst.h header. (Some day, they'll all
|
||||
* get moved to an external config file.)
|
||||
*
|
||||
* NOTE: Other cost are hiding in the Filter functions in repair.c.
|
||||
* Remember to keep them all on the same scale.
|
||||
*/
|
||||
|
||||
/*
|
||||
* The repair program copies its input to its output, making various
|
||||
* substitutions, until it manages to produce a version that satisfies
|
||||
* the parser. This includes having a correct CRC for each line.
|
||||
* Each substitution has a cost, and the combinations are tried in order
|
||||
* of increasing cost. NOTE that even translating "A"->"A" counts as
|
||||
* a substitution, although it may have zero cost.
|
||||
*
|
||||
* The intention is to correct transcription errors, where the
|
||||
* errors have a distinctly non-uniform distribution. Slight
|
||||
* differences in cost produce a preference in trying some errors
|
||||
* first. If an error costs half as much as another, combinations
|
||||
* of two of that error will be compared to one of the more expensive.
|
||||
* Too many cheap substitutions will result is repair spending
|
||||
* a very log time searching before considering the more expensive
|
||||
* substitutions.
|
||||
*
|
||||
* The following parameters and the raw substitution tables are expected
|
||||
* to be edited by the user based on experience. Eventually, this
|
||||
* will be moved into an external config file, but for now it's a matter
|
||||
* of recompiling.
|
||||
*/
|
||||
|
||||
#include "subst.h"
|
||||
#include "util.h"
|
||||
|
||||
/* what the OCR software reports for "unrecognizable */
|
||||
#define UNRECOG_STRING "~\274"
|
||||
|
||||
/*
|
||||
* The input substitutions to make (one-to-one). These are listed in
|
||||
* the order of correction. i.e. uncorrected input first, then corrected
|
||||
* output. Substitutions are one-way; to get two-way, list it twice.
|
||||
*/
|
||||
|
||||
struct RawSubst const substSingles[] = {
|
||||
/* Identity substitutions - note that period (.) is excluded */
|
||||
{ "!\"#$%&'()*+,-./0123456789:;<=>?" SPACE_STRING,
|
||||
"!\"#$%&'()*+,-./0123456789:;<=>?" SPACE_STRING, 0, 0, NULL },
|
||||
{ "@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_\t" TAB_STRING,
|
||||
"@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_\t" TAB_STRING, 0, 0, NULL },
|
||||
{ "`abcdefghijklmnopqrstuvwxyz{|}~\f" FORMFEED_STRING,
|
||||
"`abcdefghijklmnopqrstuvwxyz{|}~\f" FORMFEED_STRING, 0, 0, NULL },
|
||||
#if (TAB_PAD_CHAR & 128) /* Not already included? */
|
||||
{ TAB_PAD_STRING, TAB_PAD_STRING, 0, NULL },
|
||||
#endif
|
||||
{ "\r\n" CONTIN_STRING, "\n\n" CONTIN_STRING, 0, 0, NULL },
|
||||
|
||||
/* Occasionally these just get inserted as glitches */
|
||||
{ ".,'`", NULL, 5, 10, FilterNearBlanks },
|
||||
/* This is now pretty infrequent */
|
||||
{ "-_", "_-", 0, 10, FilterAfterRepeat },
|
||||
|
||||
/*
|
||||
* Capitalization errors are common in some cases
|
||||
* c/C, s/S, u/U are fucked up all the time.
|
||||
* Also o/O, v/V and w/W. x, y and z also give some problems.
|
||||
*/
|
||||
{ "cilmopsuvwxyz", "CILMOPSUVWXYZ", 7, 13, FilterNearLower },
|
||||
{ "CILMOPSUVWXYZ", "cilmopsuvwxyz", 7, 13, FilterNearUpper },
|
||||
/* Other errors */
|
||||
{ "g9aaiji;xX00Si", "9gg2ji;i%%oO3f", 10, 0, NULL },
|
||||
/* This seems to happen a lot */
|
||||
{ "c", "r", 9, 0, NULL },
|
||||
|
||||
{ "j", ";", 9, 0, NULL },
|
||||
{ "' ", "``", 10, 0, NULL },
|
||||
|
||||
/* Uncommon errors */
|
||||
|
||||
/* Wierd stuff that's happened in the checksum part */
|
||||
/* A highish weight is okay here */
|
||||
{ "sSEdJl", "554437", 15, 0, NULL },
|
||||
{ "LESsPZ", "bb8a22", 15, 0, NULL },
|
||||
|
||||
/* Wierd stuff that has happened */
|
||||
{ "BasAeaeRoooo", "3334a@QQpqbd", 5, 15, FilterIsBinary },
|
||||
{ "oooo", "pqbd", 0, 15, FilterIsBinary },
|
||||
{ "ttTCCflO", "iff{[lfG", 12, 0, NULL },
|
||||
#if 0
|
||||
/* If the line-breaks get screwed up, use these */
|
||||
{ " ", "\n", 10, COST_INFINITY, FilterChecksumFollows },
|
||||
{ "\n", " ", COST_INFINITY, 10, FilterChecksumFollows },
|
||||
{ "\n", NULL, COST_INFINITY , 11, FilterChecksumFollows },
|
||||
#endif
|
||||
|
||||
{ NULL, NULL, 0, 0, NULL }
|
||||
};
|
||||
|
||||
/* The many-to-many substitutions */
|
||||
struct RawSubst const substMultiples[] = {
|
||||
{ "''", "\"", 2, 0, NULL },
|
||||
{ "``", "\"", 2, 0, NULL },
|
||||
{ ",'", "\"", 2, 0, NULL },
|
||||
{ "',", "\"", 2, 0, NULL },
|
||||
{ ",,", "\"", 2, 0, NULL },
|
||||
/* Extra inserted spaces are common */
|
||||
{ " ", " ", COST_INFINITY, 0, FilterFollowsSpace },
|
||||
{ " ", "", 0, 15, FilterFollowsSpace },
|
||||
{ "\t", " ", COST_INFINITY, 0, FilterFollowsSpace },
|
||||
{ "\t", "", 0, 10, FilterFollowsSpace },
|
||||
/* Convert between SPACE_CHAR dots and periods */
|
||||
{ ".", SPACE_STRING, 1, COST_INFINITY, FilterFollowsSpace },
|
||||
{ ".", " "SPACE_STRING, COST_INFINITY, 10, FilterFollowsSpace },
|
||||
{ SPACE_STRING, ".", 15, 5, FilterFollowsSpace },
|
||||
{ SPACE_STRING, " "SPACE_STRING, COST_INFINITY, 5, FilterFollowsSpace },
|
||||
|
||||
/* Replace "unknown" by zero - it often is */
|
||||
{ UNRECOG_STRING, "0", 1, 0, NULL },
|
||||
{ UNRECOG_STRING, "_", 2, 0, NULL },
|
||||
{ UNRECOG_STRING, ")", 3, 0, NULL },
|
||||
{ UNRECOG_STRING, "^", 4, 0, NULL },
|
||||
/* Except that these glitches are common */
|
||||
{ UNRECOG_STRING"'", "\\\"", 0, 0, NULL },
|
||||
{ UNRECOG_STRING"'", "\"", 1, 0, NULL },
|
||||
{ "'"UNRECOG_STRING, "\"", 0, 0, NULL },
|
||||
{ UNRECOG_STRING UNRECOG_STRING , "\"", 0, 0, NULL },
|
||||
/* Something else that has been seen */
|
||||
{ "V'", "\\\"", 5, 0, NULL },
|
||||
|
||||
/* A common transposition */
|
||||
{ "\"'", "'\"", 5, 0, NULL },
|
||||
{ "'\"", "\"'", 5, 0, NULL },
|
||||
/* These also happen fairly often */
|
||||
{ " \"", "''", 5, 0, NULL },
|
||||
{ "\" ", "''", 5, 0, NULL },
|
||||
|
||||
/* Common glitches */
|
||||
{ "\t.\n", "\n", 5, 0, NULL },
|
||||
{ "\t,\n", "\n", 5, 0, NULL },
|
||||
{ "\t-\n", "\n", 5, 0, NULL },
|
||||
{ "\t_\n", "\n", 5, 0, NULL },
|
||||
{ "\t'\n", "\n", 5, 0, NULL },
|
||||
{ "\t`\n", "\n", 5, 0, NULL },
|
||||
{ "\t~\n", "\n", 5, 0, NULL },
|
||||
{ "\t:\n", "\n", 5, 0, NULL },
|
||||
{ "\t"SPACE_STRING"\n", "\n", 5, 0, NULL },
|
||||
|
||||
/* Less common */
|
||||
{ " .\n", "\n", 10, 0, NULL },
|
||||
{ " ,\n", "\n", 10, 0, NULL },
|
||||
{ " -\n", "\n", 10, 0, NULL },
|
||||
{ " _\n", "\n", 10, 0, NULL },
|
||||
{ " '\n", "\n", 10, 0, NULL },
|
||||
{ " `\n", "\n", 10, 0, NULL },
|
||||
{ " ~\n", "\n", 10, 0, NULL },
|
||||
{ " :\n", "\n", 10, 0, NULL },
|
||||
{ " "SPACE_STRING"\n", "\n", 10, 0, NULL },
|
||||
|
||||
/* Even less common */
|
||||
{ ".\n", "\n", 15, 0, NULL },
|
||||
{ ",\n", "\n", 15, 0, NULL },
|
||||
{ "-\n", "\n", 15, 0, NULL },
|
||||
{ "_\n", "\n", 15, 0, NULL },
|
||||
{ "'\n", "\n", 15, 0, NULL },
|
||||
{ "`\n", "\n", 15, 0, NULL },
|
||||
{ "~\n", "\n", 15, 0, NULL },
|
||||
{ ":\n", "\n", 15, 0, NULL },
|
||||
{ SPACE_STRING"\n", "\n", 15, 0, NULL },
|
||||
|
||||
/* Wierd stuff that has happened */
|
||||
{ "lJ", "U", 10, 0, NULL },
|
||||
{ "ll", "U", 10, 0, NULL },
|
||||
{ "l1", "U", 10, 0, NULL },
|
||||
{ "il", "U", 10, 0, NULL }, /* Fairly common, actually */
|
||||
{ "li", "U", 10, 0, NULL },
|
||||
{ "l)", "U", 10, 0, NULL },
|
||||
{ "Ll", "U", 10, 0, NULL },
|
||||
{ "LI", "U", 10, 0, NULL },
|
||||
{ "L1", "U", 10, 0, NULL },
|
||||
|
||||
{ "lo", "b", 10, 0, NULL },
|
||||
{ "cl", "d", 10, 0, NULL },
|
||||
{ "cliff", "diff", 2, 0, NULL },
|
||||
{ "*\n", "*/\n", 10, 0, NULL },
|
||||
|
||||
/* That big black block has odd things happen to it */
|
||||
{ "d", CONTIN_STRING, 10, 0, NULL },
|
||||
{ "d\n", CONTIN_STRING"\n", 3, 0, NULL },
|
||||
{ "S", CONTIN_STRING, 10, 0, NULL },
|
||||
{ "S\n", CONTIN_STRING"\n", 3, 0, NULL },
|
||||
|
||||
/* Tab-stop wonders */
|
||||
{ TAB_STRING, TAB_STRING"", 0, 0, TabFilter },
|
||||
{ TAB_STRING, TAB_STRING" ", 0, 0, TabFilter },
|
||||
{ TAB_STRING, TAB_STRING" ", 0, 0, TabFilter },
|
||||
{ TAB_STRING, TAB_STRING" ", 0, 0, TabFilter },
|
||||
{ TAB_STRING, TAB_STRING" ", 0, 0, TabFilter },
|
||||
{ TAB_STRING, TAB_STRING" ", 0, 0, TabFilter },
|
||||
{ TAB_STRING, TAB_STRING" ", 0, 0, TabFilter },
|
||||
{ TAB_STRING, TAB_STRING" ", 0, 0, TabFilter },
|
||||
/* Some scan errors */
|
||||
{ "D ", TAB_STRING"", 1, 5, TabFilter },
|
||||
{ "D ", TAB_STRING" ", 1, 5, TabFilter },
|
||||
{ "D ", TAB_STRING" ", 1, 5, TabFilter },
|
||||
{ "D ", TAB_STRING" ", 1, 5, TabFilter },
|
||||
{ "D ", TAB_STRING" ", 1, 5, TabFilter },
|
||||
{ "D ", TAB_STRING" ", 1, 5, TabFilter },
|
||||
{ "D ", TAB_STRING" ", 1, 5, TabFilter },
|
||||
{ "D ", TAB_STRING" ", 1, 5, TabFilter },
|
||||
#if TAB_PAD_CHAR != ' '
|
||||
#error Fix those tab patterns!
|
||||
#endif
|
||||
{ NULL, NULL, 0, 0, NULL }
|
||||
};
|
66
tools/subst.h
Normal file
66
tools/subst.h
Normal file
@ -0,0 +1,66 @@
|
||||
/*
|
||||
* subst.h -- Header for repair substitutions
|
||||
*
|
||||
* Copyright (C) 1997 Pretty Good Privacy, Inc.
|
||||
*
|
||||
* Written by Colin Plumb
|
||||
*
|
||||
* $Id: subst.h,v 1.9 1997/11/03 22:12:00 colin Exp $
|
||||
*/
|
||||
|
||||
/*
|
||||
* Give up if the list of pending changes to attempt grows to this many
|
||||
* elements. Each element is 32 bytes, so 128K is 8 MB of memory.
|
||||
* (Other than this, repair's memory usage is fairly modest.)
|
||||
*/
|
||||
#define MAX_HEAP (1<<17)
|
||||
|
||||
/*
|
||||
* There is a hack in the code to find a single substitution that will fix a
|
||||
* line, even if it's not in the tables. It gets added to the tables "on
|
||||
* probation", with an infinite cost, and if it leads to a successful
|
||||
* correction of the entire page, is "learned" for future use and its
|
||||
* cost reduced to something finite.
|
||||
* (This is not remembered across runs of the program, though.
|
||||
* Edit the tables in the source to fix it.)
|
||||
*/
|
||||
#define DYNAMIC_COST_LEARNED 15
|
||||
|
||||
/*
|
||||
* This negative-cost bonus for passing the end of a line with the right
|
||||
* CRC makes the search engine reluctant to backtrack past a correct CRC,
|
||||
* greatly improving efficiency. It's rather a hack, though. Think of
|
||||
* this in terms of "how many errors should be considered in the current
|
||||
* line before considering the possibility of errors in the previous line?"
|
||||
*
|
||||
* This bonus is halved for lines that are the result of a correction
|
||||
* that was computed from the checksum, since a correct checksum is
|
||||
* much less significant in such a case.
|
||||
*/
|
||||
#define COST_LINE -30
|
||||
|
||||
/* The cost of a full-line nastyline substitution. */
|
||||
#define NASTY_COST 5
|
||||
|
||||
/* Type describing filter functions used in substitutions */
|
||||
struct ParseNode;
|
||||
struct Substitution;
|
||||
#include "heap.h"
|
||||
typedef HeapCost FilterFunc(struct ParseNode *parent, char const *limit,
|
||||
struct Substitution const *subst);
|
||||
FilterFunc TabFilter, FilterFollowsSpace, FilterNearBlanks;
|
||||
FilterFunc FilterNearUpper, FilterNearLower, FilterNearXDigit;
|
||||
FilterFunc FilterAfterRepeat, FilterCharConst, FilterChecksumFollows;
|
||||
FilterFunc FilterLikelyUnderscore, FilterIsDynamic, FilterIsBinary;
|
||||
|
||||
/* The external substitution format */
|
||||
typedef struct RawSubst {
|
||||
char const *input;
|
||||
char const *output;
|
||||
HeapCost cost, cost2;
|
||||
FilterFunc *filter;
|
||||
} RawSubst;
|
||||
|
||||
/* The substitutions to make */
|
||||
extern struct RawSubst const substSingles[];
|
||||
extern struct RawSubst const substMultiples[];
|
666
tools/unmunge.c
Normal file
666
tools/unmunge.c
Normal file
@ -0,0 +1,666 @@
|
||||
/*
|
||||
* unmunge.c -- Program to convert a munged file to original form
|
||||
*
|
||||
* Copyright (C) 1997 Pretty Good Privacy, Inc.
|
||||
*
|
||||
* Designed by Colin Plumb, Mark H. Weaver, and Philip R. Zimmermann
|
||||
* Written by Mark H. Weaver
|
||||
*
|
||||
* $Id: unmunge.c,v 1.13 1997/11/13 23:27:08 mhw Exp $
|
||||
*/
|
||||
|
||||
#include <sys/stat.h>
|
||||
#include <sys/types.h>
|
||||
#include <fcntl.h>
|
||||
#include <unistd.h>
|
||||
|
||||
/*#include <direct.h> teun: MS VC wants direct.h for mkdir */
|
||||
|
||||
#include <stdio.h>
|
||||
#include <errno.h>
|
||||
#include <string.h>
|
||||
#include <ctype.h>
|
||||
#include <stdlib.h>
|
||||
#include <assert.h>
|
||||
|
||||
#include "util.h"
|
||||
|
||||
typedef struct UnMungeState
|
||||
{
|
||||
char const * mungedFileName;
|
||||
char dirName[128];
|
||||
char fileName[128];
|
||||
char * fileNameTail;
|
||||
int binaryMode, tabWidth;
|
||||
long productNumber, fileNumber, pageNumber, lineNumber;
|
||||
long manifestLineNumber;
|
||||
word16 hdrFlags;
|
||||
CRC pageCRC, seenPageCRC;
|
||||
FILE * manifest;
|
||||
FILE * file;
|
||||
FILE * out;
|
||||
} UnMungeState;
|
||||
|
||||
|
||||
/* Returns number of characters decoded, or -1 on error */
|
||||
static int
|
||||
Decode4(char const src[4], byte dest[3])
|
||||
{
|
||||
int i, length;
|
||||
byte srcVal[4];
|
||||
|
||||
for (i = 0; i < 4 && src[i] != RADIX64_END_CHAR; i++)
|
||||
if ((srcVal[i] = Radix64DigitValue(src[i])) == (byte) -1)
|
||||
return 1;
|
||||
|
||||
length = i - 1;
|
||||
if (length < 1)
|
||||
return -1;
|
||||
|
||||
for (; i < 4; i++)
|
||||
srcVal[0] = 0;
|
||||
|
||||
dest[0] = (srcVal[0] << 2) | (srcVal[1] >> 4);
|
||||
dest[1] = (srcVal[1] << 4) | (srcVal[2] >> 2);
|
||||
dest[2] = (srcVal[2] << 6) | (srcVal[3]);
|
||||
|
||||
return length;
|
||||
}
|
||||
|
||||
/*
|
||||
* Return number of characters decoded, or -1 on error
|
||||
*/
|
||||
static int
|
||||
DecodeLine(char const *src, char *dest, int srclength)
|
||||
{
|
||||
int destlength = 0;
|
||||
int result;
|
||||
|
||||
if (srclength % 4 || !srclength)
|
||||
return -1; /* Must be a multiple of 4 */
|
||||
|
||||
while (srclength -= 4) {
|
||||
if (Decode4(src, dest + destlength) != 3)
|
||||
return -1;
|
||||
src += 4;
|
||||
destlength += 3;
|
||||
}
|
||||
result = Decode4(src, dest + destlength);
|
||||
if (result < 1)
|
||||
return -1;
|
||||
return destlength + result;
|
||||
}
|
||||
|
||||
int PrintFileError(UnMungeState *state, char const *message)
|
||||
{
|
||||
fprintf(stderr, "%s, %s line %ld\n", message,
|
||||
state->mungedFileName, state->lineNumber);
|
||||
return 1;
|
||||
}
|
||||
|
||||
int ReadManifest(UnMungeState *state, long fileNumberWanted,
|
||||
char const *fileTailPrefix, long prefixLen)
|
||||
{
|
||||
long fileNumber = 0;
|
||||
long firstMissingFileNum = 0, lastMissingFileNum = 0;
|
||||
char buffer[512];
|
||||
char * p;
|
||||
|
||||
if (state->manifest == NULL)
|
||||
{
|
||||
if (fileNumberWanted != 0)
|
||||
{
|
||||
assert(fileTailPrefix != NULL);
|
||||
strncpy(state->fileName, fileTailPrefix, sizeof(state->fileName));
|
||||
state->fileName[sizeof(state->fileName) - 1] = '\0';
|
||||
state->fileNameTail = state->fileName;
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
while (fgets(buffer, sizeof(buffer), state->manifest))
|
||||
{
|
||||
if ((p = strchr(buffer, '\n')) != NULL)
|
||||
*p = '\0';
|
||||
state->manifestLineNumber++;
|
||||
if (buffer[0] == 'D')
|
||||
{
|
||||
if (buffer[1] != ' ')
|
||||
goto invalidManifest;
|
||||
strncpy(state->dirName, buffer + 2, sizeof(state->dirName));
|
||||
if (state->dirName[sizeof(state->dirName) - 1] != '\0')
|
||||
goto invalidManifest;
|
||||
}
|
||||
else
|
||||
{
|
||||
fileNumber = strtol(buffer, &p, 10);
|
||||
if (p == buffer || *p != ' ')
|
||||
goto invalidManifest;
|
||||
p++;
|
||||
|
||||
if (fileNumberWanted == 0 || fileNumber < fileNumberWanted)
|
||||
{
|
||||
if (firstMissingFileNum == 0)
|
||||
firstMissingFileNum = fileNumber;
|
||||
lastMissingFileNum = fileNumber;
|
||||
continue;
|
||||
}
|
||||
else if (fileNumber > fileNumberWanted)
|
||||
break;
|
||||
else
|
||||
{
|
||||
size_t len;
|
||||
|
||||
len = strlen(state->dirName);
|
||||
assert(sizeof(state->fileName) >= sizeof(state->dirName));
|
||||
memcpy(state->fileName, state->dirName, len);
|
||||
strncpy(state->fileName + len, p,
|
||||
sizeof(state->fileName) - len);
|
||||
if (strncmp(p, fileTailPrefix, prefixLen) != 0)
|
||||
{
|
||||
fprintf(stderr, "Mismatched filename, headers say '%s',\n"
|
||||
" manifest says '%s'\n",
|
||||
fileTailPrefix, p);
|
||||
return 1;
|
||||
}
|
||||
p = state->dirName;
|
||||
while ((p = strchr(p, '/')) != NULL)
|
||||
{
|
||||
*p = '\0';
|
||||
mkdir(state->dirName, 0777);
|
||||
*p++ = '/';
|
||||
}
|
||||
state->fileNameTail = state->fileName + len;
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
if (firstMissingFileNum != 0)
|
||||
{
|
||||
fprintf(stderr, "Missing files %ld-%ld\n",
|
||||
firstMissingFileNum, lastMissingFileNum);
|
||||
}
|
||||
if (fileNumberWanted != 0 && fileNumber != fileNumberWanted)
|
||||
{
|
||||
fprintf(stderr, "Can't find file %ld in manifest file\n",
|
||||
fileNumberWanted);
|
||||
return 1;
|
||||
}
|
||||
return 0;
|
||||
|
||||
invalidManifest:
|
||||
fprintf(stderr, "Error parsing manifest file, line %ld\n",
|
||||
state->manifestLineNumber);
|
||||
return 1;
|
||||
}
|
||||
|
||||
int UnMungeFile(char const *mungedFileName, char const *manifestFileName,
|
||||
int forceOverwrite, int forcePartialFiles)
|
||||
{
|
||||
UnMungeState * state;
|
||||
EncodeFormat const * fmt = NULL;
|
||||
char buffer[512];
|
||||
char outbuf[BYTES_PER_LINE+1];
|
||||
char * line;
|
||||
char * lineData;
|
||||
char * p;
|
||||
int length;
|
||||
int result = 0;
|
||||
int skipPage = 0;
|
||||
CRC lineCRC;
|
||||
word32 num;
|
||||
|
||||
state = (UnMungeState *)calloc(1, sizeof(*state));
|
||||
state->mungedFileName = mungedFileName;
|
||||
|
||||
if (manifestFileName != NULL)
|
||||
{
|
||||
if ((state->manifest = fopen(manifestFileName, "r")) == NULL)
|
||||
goto errnoError;
|
||||
}
|
||||
|
||||
if ((state->file = fopen(state->mungedFileName, "r")) == NULL)
|
||||
goto errnoError;
|
||||
|
||||
while (!feof(state->file))
|
||||
{
|
||||
if (fgets(buffer, sizeof(buffer), state->file) == NULL)
|
||||
{
|
||||
if (feof(state->file))
|
||||
break;
|
||||
goto fileError;
|
||||
}
|
||||
|
||||
state->lineNumber++;
|
||||
|
||||
line = buffer;
|
||||
/* Strip leading whitespace */
|
||||
while (isspace(*line))
|
||||
line++;
|
||||
if (*line == '\0')
|
||||
continue;
|
||||
|
||||
/* Strip trailing whitespace */
|
||||
p = line + strlen(line);
|
||||
while (p > line && (byte)p[-1] < 128 && isspace(p[-1]))
|
||||
p--;
|
||||
|
||||
lineData = line + PREFIX_LENGTH;
|
||||
|
||||
/* Pad up to at least PREFIX_LENGTH */
|
||||
while (p < lineData)
|
||||
*p++ = ' ';
|
||||
*p++ = '\n';
|
||||
*p = '\0';
|
||||
length = p - lineData;
|
||||
|
||||
if (line[0] == HDR_PREFIX_CHAR)
|
||||
{
|
||||
fmt = FindFormat(line[1]);
|
||||
if (!fmt)
|
||||
{
|
||||
result = PrintFileError(state, "ERROR: Invalid header type");
|
||||
goto error;
|
||||
}
|
||||
}
|
||||
|
||||
lineCRC = CalculateCRC(fmt->lineCRC, 0, (byte const *)lineData, length);
|
||||
|
||||
p = line + EncodedLength(fmt, fmt->runningCRCBits);
|
||||
if (DecodeCheckDigits(fmt, p, NULL, fmt->lineCRC->bits, &num)
|
||||
|| lineCRC != num)
|
||||
{
|
||||
result = PrintFileError(state, "ERROR: Line CRC failed");
|
||||
goto error;
|
||||
}
|
||||
|
||||
if (line[0] == HDR_PREFIX_CHAR)
|
||||
{
|
||||
int formatVersion;
|
||||
int flags;
|
||||
CRC seenPageCRC;
|
||||
int tabWidth;
|
||||
long productNumber;
|
||||
long fileNumber;
|
||||
long pageNumber;
|
||||
char * fileNameTail;
|
||||
int skipNextPage = 0;
|
||||
char * p;
|
||||
EncodeFormat const * hFmt = &hexFormat;
|
||||
|
||||
/* Parse header line */
|
||||
p = lineData;
|
||||
|
||||
if (DecodeCheckDigits(hFmt, p, &p, HDR_VERSION_BITS, &num))
|
||||
{
|
||||
invalidHeader:
|
||||
result = PrintFileError(state, "ERROR: Invalid header");
|
||||
goto error;
|
||||
}
|
||||
formatVersion = num;
|
||||
|
||||
if (DecodeCheckDigits(hFmt, p, &p, HDR_FLAG_BITS, &num))
|
||||
goto invalidHeader;
|
||||
flags = num;
|
||||
|
||||
if (DecodeCheckDigits(hFmt, p, &p, fmt->pageCRC->bits, &num))
|
||||
goto invalidHeader;
|
||||
seenPageCRC = num;
|
||||
|
||||
if (DecodeCheckDigits(hFmt, p, &p, HDR_TABWIDTH_BITS, &num))
|
||||
goto invalidHeader;
|
||||
tabWidth = num;
|
||||
|
||||
if (DecodeCheckDigits(hFmt, p, &p, HDR_PRODNUM_BITS, &num))
|
||||
goto invalidHeader;
|
||||
productNumber = num;
|
||||
|
||||
if (DecodeCheckDigits(hFmt, p, &p, HDR_FILENUM_BITS, &num))
|
||||
goto invalidHeader;
|
||||
fileNumber = num;
|
||||
|
||||
if (sscanf(p, " Page %ld of ", &pageNumber) < 1)
|
||||
goto invalidHeader;
|
||||
|
||||
if (formatVersion > 0)
|
||||
{
|
||||
result = PrintFileError(state,
|
||||
"ERROR: Format too new for "
|
||||
"this version of unmunge");
|
||||
goto error;
|
||||
}
|
||||
|
||||
p = strstr(p, " of ");
|
||||
if (p == NULL)
|
||||
goto invalidHeader;
|
||||
|
||||
fileNameTail = p + 4;
|
||||
p = fileNameTail + strlen(fileNameTail);
|
||||
if (p < fileNameTail + 3 || p[-1] != '\n')
|
||||
goto invalidHeader;
|
||||
else
|
||||
p[-1] = '\0';
|
||||
|
||||
if (state->out != NULL && state->pageCRC != state->seenPageCRC)
|
||||
{
|
||||
result = PrintFileError(state,
|
||||
"ERROR: Page CRC mismatch on page before");
|
||||
goto error;
|
||||
}
|
||||
|
||||
if ((state->hdrFlags & HDR_FLAG_LASTPAGE) && state->out != NULL)
|
||||
{
|
||||
fclose(state->out);
|
||||
state->out = NULL;
|
||||
}
|
||||
|
||||
if (state->out != NULL)
|
||||
{
|
||||
if (pageNumber != state->pageNumber + 1 ||
|
||||
fileNumber != state->fileNumber ||
|
||||
productNumber != state->productNumber ||
|
||||
tabWidth != state->tabWidth ||
|
||||
strcmp(fileNameTail, state->fileNameTail) != 0)
|
||||
{
|
||||
if (fileNumber == state->fileNumber &&
|
||||
pageNumber > state->pageNumber + 1)
|
||||
{
|
||||
(void)PrintFileError(state,
|
||||
"ERROR: Missing pages of this file");
|
||||
if (forcePartialFiles && !state->binaryMode)
|
||||
{
|
||||
fputs("\n\n@@@@@@ Missing pages here! @@@@@@\n\n",
|
||||
state->out);
|
||||
}
|
||||
else
|
||||
{
|
||||
skipNextPage = 1;
|
||||
fclose(state->out);
|
||||
state->out = NULL;
|
||||
remove(state->fileName);
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
(void)PrintFileError(state,
|
||||
"ERROR: Missing pages of previous file");
|
||||
if (forcePartialFiles && !state->binaryMode)
|
||||
{
|
||||
fputs("\n\n@@@@@@ Missing pages here! @@@@@@\n\n",
|
||||
state->out);
|
||||
/* Make it non-fatal, though... */
|
||||
fclose(state->out);
|
||||
state->out = NULL;
|
||||
}
|
||||
else
|
||||
{
|
||||
fclose(state->out);
|
||||
state->out = NULL;
|
||||
remove(state->fileName);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
if (state->out == NULL)
|
||||
{
|
||||
if (pageNumber != 1 && !skipPage)
|
||||
(void)PrintFileError(state,
|
||||
"ERROR: File doesn't begin with page 1");
|
||||
|
||||
state->binaryMode = (tabWidth == 0);
|
||||
|
||||
if (pageNumber != 1 && (state->binaryMode
|
||||
|| !forcePartialFiles))
|
||||
{
|
||||
skipNextPage = 1;
|
||||
}
|
||||
else
|
||||
{
|
||||
/* TODO: Use global filelist to get pathname */
|
||||
result = ReadManifest(state, fileNumber, fileNameTail,
|
||||
strlen(fileNameTail));
|
||||
if (result != 0)
|
||||
goto error;
|
||||
|
||||
if (!forceOverwrite)
|
||||
{
|
||||
FILE * file;
|
||||
|
||||
/* Make sure file doesn't already exist */
|
||||
file = fopen(state->fileName, "r");
|
||||
if (file != NULL)
|
||||
{
|
||||
fclose(file);
|
||||
fprintf(stderr, "ERROR: %s already exists\n",
|
||||
state->fileName);
|
||||
result = 1;
|
||||
goto error;
|
||||
}
|
||||
}
|
||||
|
||||
state->out = fopen(state->fileName,
|
||||
state->binaryMode ? "wb" : "w");
|
||||
if (state->out == NULL)
|
||||
goto errnoError;
|
||||
|
||||
if (pageNumber != 1)
|
||||
fputs("\n\n@@@@@@ Missing pages here! @@@@@@\n\n",
|
||||
state->out);
|
||||
}
|
||||
}
|
||||
|
||||
state->pageCRC = 0;
|
||||
state->seenPageCRC = seenPageCRC;
|
||||
state->hdrFlags = (word16)flags;
|
||||
state->pageNumber = pageNumber;
|
||||
state->fileNumber = fileNumber;
|
||||
state->productNumber = productNumber;
|
||||
state->tabWidth = tabWidth;
|
||||
skipPage = skipNextPage;
|
||||
}
|
||||
else if (!skipPage)
|
||||
{
|
||||
if (state->out == NULL)
|
||||
{
|
||||
result = PrintFileError(state, "ERROR: Missing header line");
|
||||
goto error;
|
||||
}
|
||||
|
||||
/* Normal data line */
|
||||
state->pageCRC = CalculateCRC(fmt->pageCRC, state->pageCRC,
|
||||
(byte const *)lineData,
|
||||
length);
|
||||
line[2] = '\0';
|
||||
if (DecodeCheckDigits(fmt, line, NULL, fmt->runningCRCBits, &num)
|
||||
|| RunningCRCFromPageCRC(fmt, state->pageCRC) != num)
|
||||
{
|
||||
result = PrintFileError(state, "ERROR: Running CRC failed");
|
||||
goto error;
|
||||
}
|
||||
|
||||
if (state->binaryMode)
|
||||
{
|
||||
length = DecodeLine(lineData, outbuf, length-1);
|
||||
if (length < 0 || length > BYTES_PER_LINE) {
|
||||
result = PrintFileError(state,
|
||||
"ERROR: Corrupt radix-64 data");
|
||||
goto error;
|
||||
}
|
||||
fwrite(outbuf, 1, length, state->out);
|
||||
}
|
||||
else
|
||||
{
|
||||
p = lineData;
|
||||
while (*p != '\0')
|
||||
{
|
||||
if (*p == TAB_CHAR)
|
||||
{
|
||||
p++;
|
||||
putc('\t', state->out);
|
||||
while ((p - lineData) % state->tabWidth)
|
||||
{
|
||||
if (*p == '\n')
|
||||
break;
|
||||
else if (*p == ' ')
|
||||
p++;
|
||||
else
|
||||
{
|
||||
result = PrintFileError(state,
|
||||
"ERROR: Not enough spaces "
|
||||
"after a tab character");
|
||||
goto error;
|
||||
}
|
||||
}
|
||||
}
|
||||
else if (*p == FORMFEED_CHAR)
|
||||
{
|
||||
p++;
|
||||
if (*p != '\n')
|
||||
{
|
||||
result = PrintFileError(state,
|
||||
"ERROR: Formfeed character "
|
||||
"not at end of line");
|
||||
goto error;
|
||||
}
|
||||
p++; /* Skip newline */
|
||||
putc('\f', state->out);
|
||||
}
|
||||
else if (*p == CONTIN_CHAR)
|
||||
{
|
||||
p++;
|
||||
if (*p != '\n')
|
||||
{
|
||||
result = PrintFileError(state,
|
||||
"ERROR: Continuation character "
|
||||
"not at end of line");
|
||||
goto error;
|
||||
}
|
||||
p++; /* Skip newline */
|
||||
}
|
||||
else if (*p == SPACE_CHAR)
|
||||
{
|
||||
putc(' ', state->out);
|
||||
p++;
|
||||
}
|
||||
else
|
||||
{
|
||||
putc(*p, state->out);
|
||||
p++;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
if (state->out != NULL)
|
||||
{
|
||||
if (!(state->hdrFlags & HDR_FLAG_LASTPAGE))
|
||||
{
|
||||
result = PrintFileError(state, "ERROR: Missing pages");
|
||||
goto error;
|
||||
}
|
||||
if (state->pageCRC != state->seenPageCRC)
|
||||
{
|
||||
result = PrintFileError(state,
|
||||
"ERROR: Page CRC failed on previous page");
|
||||
goto error;
|
||||
}
|
||||
}
|
||||
|
||||
/* Check for missing files at the end */
|
||||
result = ReadManifest(state, 0, NULL, 0);
|
||||
goto done;
|
||||
|
||||
errnoError:
|
||||
result = errno;
|
||||
goto printError;
|
||||
|
||||
fileError:
|
||||
result = ferror(state->file);
|
||||
|
||||
printError:
|
||||
fprintf(stderr, "ERROR: %s\n", strerror(result));
|
||||
|
||||
error:
|
||||
done:
|
||||
if (state != NULL)
|
||||
{
|
||||
if (state->out != NULL)
|
||||
fclose(state->out);
|
||||
if (state->file != NULL)
|
||||
fclose(state->file);
|
||||
if (state->manifest != NULL)
|
||||
fclose(state->manifest);
|
||||
free(state);
|
||||
}
|
||||
return result;
|
||||
}
|
||||
|
||||
void UsageAndExit(int result)
|
||||
{
|
||||
fprintf(stderr,
|
||||
"Usage: unmunge [-fp] <file> [<manifest>]\n"
|
||||
" -f Force overwrites of existing files\n"
|
||||
" -p Force unmunge of partial files\n");
|
||||
exit(result);
|
||||
}
|
||||
|
||||
int main(int argc, char *argv[])
|
||||
{
|
||||
int result = 0;
|
||||
int forceOverwrite = 0;
|
||||
int forcePartialFiles = 0;
|
||||
char * fileName = NULL;
|
||||
char * manifestFileName = NULL;
|
||||
int i, j;
|
||||
|
||||
InitUtil();
|
||||
|
||||
for (i = 1; i < argc && argv[i][0] == '-'; i++)
|
||||
{
|
||||
if (0 == strcmp(argv[i], "--"))
|
||||
{
|
||||
i++;
|
||||
break;
|
||||
}
|
||||
for (j = 1; argv[i][j] != '\0'; j++)
|
||||
{
|
||||
if (argv[i][j] == 'h')
|
||||
UsageAndExit(0);
|
||||
else if (argv[i][j] == 'f')
|
||||
forceOverwrite = 1;
|
||||
else if (argv[i][j] == 'p')
|
||||
forcePartialFiles = 1;
|
||||
else
|
||||
{
|
||||
fprintf(stderr, "ERROR: Unrecognized option -%c\n", argv[i][j]);
|
||||
UsageAndExit(1);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if (i < argc)
|
||||
fileName = argv[i++];
|
||||
if (i < argc)
|
||||
manifestFileName = argv[i++];
|
||||
if (fileName == NULL || i < argc)
|
||||
UsageAndExit(1);
|
||||
|
||||
if ((result = UnMungeFile(fileName, manifestFileName,
|
||||
forceOverwrite, forcePartialFiles)) != 0)
|
||||
{
|
||||
/* If result > 0, message should have already been printed */
|
||||
if (result < 0)
|
||||
fprintf(stderr, "ERROR: %s\n", strerror(result));
|
||||
exit(1);
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/*
|
||||
* Local Variables:
|
||||
* tab-width: 4
|
||||
* End:
|
||||
* vi: ts=4 sw=4
|
||||
* vim: si
|
||||
*/
|
||||
|
198
tools/util.c
Normal file
198
tools/util.c
Normal file
@ -0,0 +1,198 @@
|
||||
/*
|
||||
* util.c -- Miscellaneous shared code/data
|
||||
*
|
||||
* Copyright (C) 1997 Pretty Good Privacy, Inc.
|
||||
*
|
||||
* Written by Mark H. Weaver
|
||||
*
|
||||
* $Id: util.c,v 1.11 1997/11/07 00:44:10 mhw Exp $
|
||||
*/
|
||||
|
||||
#include <stdlib.h>
|
||||
#include "util.h"
|
||||
|
||||
char const hexDigits[] = "0123456789abcdef";
|
||||
char const radix64Digits[] =
|
||||
#if 0 /* Standard */
|
||||
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
|
||||
#else /* Modified form that avoids hard-to-OCR characters */
|
||||
"ABCDEFGHIJKLMNPQRSTVWXYZabcdehijklmnpqtuwy145689\\^!#$%&*+=/:<>?@";
|
||||
#endif
|
||||
|
||||
signed char hexDigitsInv[256];
|
||||
signed char radix64DigitsInv[256];
|
||||
|
||||
/* teun: moved intitialisation of all three CRCPoly's to initUtil() */
|
||||
|
||||
/* CRC-CCITT: x^16 + x^12 + x^5 + 1 */
|
||||
CRCPoly crcCCITTPoly;
|
||||
/*
|
||||
* PRZ's magic 24-bit polynomial - (x+1) * (irreducible of degree 23)
|
||||
* x^24 +x^23 +x^18 +x^17 +x^14 +x^11 +x^10 +x^7 +x^6 +x^5 +x^4 +x^3 +x +1
|
||||
* (Developed by Neal Glover). Note: this is bit-reversed from the form
|
||||
* used in PGP, 0x1864cfb.
|
||||
*/
|
||||
CRCPoly crc24Poly;
|
||||
/* CRC-32: x^32+x^26+x^23+x^22+x^16+x^12+x^11+x^10+x^8+x^7+x^5+x^4+x^2+x+1 */
|
||||
CRCPoly crc32Poly;
|
||||
|
||||
EncodeFormat const hexFormat =
|
||||
{
|
||||
NULL, /* nextFormat */
|
||||
'-', /* headerTypeChar */
|
||||
hexDigits, /* digits */
|
||||
hexDigitsInv, /* digitsInv */
|
||||
4, /* bitsPerDigit */
|
||||
16, /* radix */
|
||||
&crcCCITTPoly, /* lineCRC */
|
||||
&crc32Poly, /* pageCRC */
|
||||
8, /* runningCRCBits */
|
||||
24, /* runningCRCShift */
|
||||
0xFF /* runningCRCMask */
|
||||
};
|
||||
|
||||
EncodeFormat const radix64Format =
|
||||
{
|
||||
&hexFormat, /* nextFormat */
|
||||
'A', /* headerTypeChar */
|
||||
radix64Digits, /* digits */
|
||||
radix64DigitsInv, /* digitsInv */
|
||||
6, /* bitsPerDigit */
|
||||
64, /* radix */
|
||||
&crc24Poly, /* lineCRC */
|
||||
&crc32Poly, /* pageCRC */
|
||||
12, /* runningCRCBits */
|
||||
20, /* runningCRCShift */
|
||||
0xFFF /* runningCRCMask */
|
||||
};
|
||||
|
||||
EncodeFormat const * firstFormat = &radix64Format;
|
||||
|
||||
|
||||
static void InitCRCPoly(CRCPoly *poly)
|
||||
{
|
||||
int i, oneBit;
|
||||
CRC crc = 1;
|
||||
|
||||
poly->table[0] = 0;
|
||||
for (oneBit = 0x80; oneBit > 0; oneBit >>= 1) {
|
||||
crc = (crc >> 1) ^ ((crc & 1) ? poly->poly : 0);
|
||||
for (i = 0; i < 0x100; i += 2 * oneBit)
|
||||
poly->table[i + oneBit] = poly->table[i] ^ crc;
|
||||
}
|
||||
}
|
||||
|
||||
CRC CalculateCRC(CRCPoly const *poly, CRC crc,
|
||||
byte const *buffer, size_t length)
|
||||
{
|
||||
while (length--)
|
||||
crc = (crc >> 8) ^ poly->table[(crc & 0xFF) ^ (*buffer++)];
|
||||
return crc;
|
||||
}
|
||||
|
||||
CRC ReverseCRC(CRCPoly const *poly, CRC crc, byte b)
|
||||
{
|
||||
int i, highBit = poly->highBit;
|
||||
|
||||
for (i = 0; i < 8; i++) {
|
||||
if (crc & highBit) /* highBit is 2^(poly->bits-1) */
|
||||
crc = ((crc ^ poly->poly) << 1) ^ 1;
|
||||
else
|
||||
crc <<= 1;
|
||||
}
|
||||
return crc ^ b;
|
||||
}
|
||||
|
||||
static void InitDigitsInv(char const *digits, signed char *digitsInv)
|
||||
{
|
||||
int i;
|
||||
|
||||
for (i = 0; i < 256; i++)
|
||||
digitsInv[i] = -1;
|
||||
for (i = 0; digits[i]; i++)
|
||||
digitsInv[(byte)digits[i]] = i;
|
||||
}
|
||||
|
||||
/* Returns the number of chars encoded */
|
||||
int EncodeCheckDigits(EncodeFormat const *fmt, word32 num,
|
||||
int numBits, char *dest)
|
||||
{
|
||||
int destLen = EncodedLength(fmt, numBits);
|
||||
word32 digitMask = fmt->radix - 1;
|
||||
int i;
|
||||
|
||||
for (i = destLen - 1; i >= 0; i--)
|
||||
{
|
||||
dest[i] = EncodeDigit(fmt, num & digitMask);
|
||||
num >>= fmt->bitsPerDigit;
|
||||
}
|
||||
return destLen;
|
||||
}
|
||||
|
||||
/* Returns 1 if there's an error */
|
||||
int DecodeCheckDigits(EncodeFormat const *fmt, char const *src, char **endPtr,
|
||||
int numBits, word32 *valuePtr)
|
||||
{
|
||||
word32 value = 0;
|
||||
int digitValue;
|
||||
int i = EncodedLength(fmt, numBits);
|
||||
|
||||
while (i--)
|
||||
{
|
||||
digitValue = DecodeDigit(fmt, *src++);
|
||||
if (digitValue < 0)
|
||||
{
|
||||
/* Invalid digit found */
|
||||
*valuePtr = 0;
|
||||
if (endPtr)
|
||||
*endPtr = NULL;
|
||||
return 1;
|
||||
}
|
||||
value = (value << fmt->bitsPerDigit) | digitValue;
|
||||
}
|
||||
*valuePtr = value;
|
||||
if (endPtr)
|
||||
*endPtr = (char *)src;
|
||||
return 0;
|
||||
}
|
||||
|
||||
EncodeFormat const *FindFormat(char headerTypeChar)
|
||||
{
|
||||
EncodeFormat const * fmt = firstFormat;
|
||||
|
||||
while (fmt && fmt->headerTypeChar != headerTypeChar)
|
||||
fmt = fmt->nextFormat;
|
||||
return fmt;
|
||||
}
|
||||
|
||||
void InitUtil()
|
||||
{
|
||||
/* teun: removed "{ }" for MS VC compile */
|
||||
|
||||
crcCCITTPoly.bits = 16;
|
||||
crcCCITTPoly.poly = 0x8408;
|
||||
crcCCITTPoly.highBit = 0x8000;
|
||||
|
||||
crc24Poly.bits = 24;
|
||||
crc24Poly.poly = 0xdf3261;
|
||||
crc24Poly.highBit = 0x800000;
|
||||
|
||||
crc32Poly.bits = 32;
|
||||
crc32Poly.poly = 0xedb88320;
|
||||
crc32Poly.highBit = 0x80000000;
|
||||
|
||||
InitCRCPoly(&crcCCITTPoly);
|
||||
InitCRCPoly(&crc24Poly);
|
||||
InitCRCPoly(&crc32Poly);
|
||||
InitDigitsInv(hexDigits, hexDigitsInv);
|
||||
InitDigitsInv(radix64Digits, radix64DigitsInv);
|
||||
}
|
||||
|
||||
|
||||
/*
|
||||
* Local Variables:
|
||||
* tab-width: 4
|
||||
* End:
|
||||
* vi: ts=4 sw=4
|
||||
* vim: si
|
||||
*/
|
149
tools/util.h
Normal file
149
tools/util.h
Normal file
@ -0,0 +1,149 @@
|
||||
/*
|
||||
* util.h -- Miscellaneous defines
|
||||
*
|
||||
* Copyright (C) 1997 Pretty Good Privacy, Inc.
|
||||
*
|
||||
* Written by Mark H. Weaver
|
||||
*
|
||||
* $Id: util.h,v 1.23 1997/11/12 23:28:56 mhw Exp $
|
||||
*/
|
||||
|
||||
#ifndef UTIL_H
|
||||
#define UTIL_H 1
|
||||
|
||||
typedef unsigned long word32;
|
||||
typedef unsigned short word16;
|
||||
typedef unsigned char byte;
|
||||
|
||||
#define FMT32 "%08lx"
|
||||
#define FMT16 "%04x"
|
||||
#define FMT8 "%02x"
|
||||
|
||||
#define TAB_CHAR '\244' /* Currency symbol, like o in top of x */
|
||||
#define TAB_STRING "\244"
|
||||
#define TAB_PAD_CHAR ' ' /* The fact that this is space has leaked. */
|
||||
#define TAB_PAD_STRING " " /* It may not be freely changed. */
|
||||
#define FORMFEED_CHAR '\245' /* Yen symbol, like = on top of Y */
|
||||
#define FORMFEED_STRING "\245"
|
||||
#define SPACE_CHAR '\267' /* Middle dot, or bullet */
|
||||
#define SPACE_STRING "\267"
|
||||
#define CONTIN_CHAR '\266' /* Pilcrow (paragraph symbol) */
|
||||
#define CONTIN_STRING "\266"
|
||||
|
||||
#define BYTES_PER_LINE 60 /* When using radix 64 */
|
||||
|
||||
#define LINES_PER_PAGE 72 /* Exclusive of 2 header lines */
|
||||
#define LINE_LENGTH 80
|
||||
#define PREFIX_LENGTH 7 /* Length of prefix, including the space */
|
||||
|
||||
#define HDR_PREFIX_CHAR '-'
|
||||
#define RADIX64_END_CHAR '-'
|
||||
|
||||
typedef struct EncodeFormat EncodeFormat;
|
||||
typedef word32 CRC;
|
||||
typedef word16 CRCFragment;
|
||||
|
||||
typedef struct
|
||||
{
|
||||
CRC table[256];
|
||||
int bits;
|
||||
CRC poly;
|
||||
CRC highBit;
|
||||
} CRCPoly;
|
||||
|
||||
struct EncodeFormat
|
||||
{
|
||||
EncodeFormat const *nextFormat;
|
||||
char headerTypeChar;
|
||||
char const * digits;
|
||||
signed char const * digitsInv;
|
||||
int bitsPerDigit;
|
||||
int radix;
|
||||
CRCPoly const * lineCRC;
|
||||
CRCPoly const * pageCRC;
|
||||
int runningCRCBits;
|
||||
int runningCRCShift;
|
||||
int runningCRCMask;
|
||||
};
|
||||
|
||||
|
||||
#define HDR_ENC_LENGTH 19 /* Length of encoded prefix on header */
|
||||
|
||||
#define HDR_VERSION_BITS 4
|
||||
#define HDR_FLAG_BITS 8
|
||||
/* Page CRC bits omitted, since it's not constant */
|
||||
#define HDR_TABWIDTH_BITS 4
|
||||
#define HDR_PRODNUM_BITS 12
|
||||
#define HDR_FILENUM_BITS 16
|
||||
|
||||
|
||||
/* Enough to hold one whole page of munged data */
|
||||
/* There is no point making this excessively too large */
|
||||
#define PAGE_BUFFER_SIZE 8192
|
||||
|
||||
#if PAGE_BUFFER_SIZE < (LINES_PER_PAGE + 2) * (LINE_LENGTH + PREFIX_LENGTH + 2)
|
||||
#error PAGE_BUFFER_SIZE is too small
|
||||
#endif
|
||||
|
||||
|
||||
/* Header flags */
|
||||
#define HDR_FLAG_LASTPAGE 0x01 /* Indicates last page of file */
|
||||
|
||||
|
||||
#define elemsof(array) (sizeof(array)/sizeof(*(array)))
|
||||
|
||||
|
||||
extern char const hexDigits[];
|
||||
extern char const radix64Digits[];
|
||||
|
||||
extern signed char hexDigitsInv[256];
|
||||
extern signed char radix64DigitsInv[256];
|
||||
|
||||
extern CRCPoly crcCCITTPoly, crc24Poly, crc32Poly;
|
||||
|
||||
extern EncodeFormat const hexFormat, radix64Format;
|
||||
extern EncodeFormat const * firstFormat;
|
||||
|
||||
|
||||
#define HexDigitValue(ch) hexDigitsInv[(byte)(ch)]
|
||||
#define Radix64DigitValue(ch) radix64DigitsInv[(byte)(ch)]
|
||||
|
||||
/* Returns the number of chars needed to encode the given number of bits */
|
||||
#define EncodedLength(fmt, numBits) \
|
||||
(((numBits) + (fmt)->bitsPerDigit - 1) / (fmt)->bitsPerDigit)
|
||||
#define EncodeDigit(fmt, value) ((fmt)->digits[value])
|
||||
#define DecodeDigit(fmt, digit) ((fmt)->digitsInv[(byte)digit])
|
||||
|
||||
#define AdvanceCRC(poly, crc, b) \
|
||||
((crc) >> 8) ^ (poly)->table[((crc) ^ (b)) & 0xFF]
|
||||
|
||||
#define RunningCRCFromPageCRC(fmt, pageCRC) \
|
||||
(((pageCRC) >> (fmt)->runningCRCShift) & (fmt)->runningCRCMask)
|
||||
|
||||
|
||||
CRC CalculateCRC(CRCPoly const *poly, CRC crc,
|
||||
byte const *buffer, size_t length);
|
||||
CRC ReverseCRC(CRCPoly const *poly, CRC crc, byte b);
|
||||
|
||||
/* Returns the number of chars encoded */
|
||||
int EncodeCheckDigits(EncodeFormat const *fmt, word32 num,
|
||||
int numBits, char *dest);
|
||||
|
||||
/* Returns 1 if there's an error */
|
||||
int DecodeCheckDigits(EncodeFormat const *fmt, char const *src, char **endPtr,
|
||||
int numBits, word32 *valuePtr);
|
||||
|
||||
EncodeFormat const *FindFormat(char headerTypeChar);
|
||||
|
||||
void InitUtil();
|
||||
|
||||
|
||||
#endif /* !UTIL_H */
|
||||
|
||||
/*
|
||||
* Local Variables:
|
||||
* tab-width: 4
|
||||
* End:
|
||||
* vi: ts=4 sw=4
|
||||
* vim: si
|
||||
*/
|
286
tools/yapp
Normal file
286
tools/yapp
Normal file
@ -0,0 +1,286 @@
|
||||
#!/usr/bin/perl
|
||||
#
|
||||
# Yet another preprocessor
|
||||
#
|
||||
# $Id: yapp,v 1.5 1997/10/24 07:51:05 mhw Exp $
|
||||
#
|
||||
|
||||
%vars = ('' => '$');
|
||||
@incPath = (".");
|
||||
|
||||
sub Error
|
||||
{
|
||||
print STDERR $_[0], "\n";
|
||||
exit(1);
|
||||
}
|
||||
|
||||
sub VarSubst
|
||||
{
|
||||
my ($varName, $undefOkay) = @_;
|
||||
|
||||
if (defined($vars{$varName}))
|
||||
{
|
||||
return $vars{$varName};
|
||||
}
|
||||
elsif (!$undefOkay)
|
||||
{
|
||||
&Error("Undefined variable '$varName' in $fileName line $.");
|
||||
}
|
||||
}
|
||||
|
||||
sub NullFilter
|
||||
{
|
||||
0;
|
||||
}
|
||||
|
||||
sub IfFilter
|
||||
{
|
||||
local $_ = $_[0];
|
||||
|
||||
if (/^##else(\s+.*)?/)
|
||||
{
|
||||
return 1;
|
||||
}
|
||||
elsif (/^##endif(\s+.*)?/)
|
||||
{
|
||||
return 2;
|
||||
}
|
||||
else
|
||||
{
|
||||
return 0;
|
||||
}
|
||||
}
|
||||
|
||||
sub DoFile
|
||||
{
|
||||
local $fileName = $_[0];
|
||||
my $path;
|
||||
local *FILE;
|
||||
|
||||
if ($fileName =~ m|^/|)
|
||||
{
|
||||
$path = $fileName;
|
||||
}
|
||||
else
|
||||
{
|
||||
for $dir (@incPath)
|
||||
{
|
||||
if (-e "$dir/$fileName")
|
||||
{
|
||||
$path = "$dir/$fileName";
|
||||
last;
|
||||
}
|
||||
}
|
||||
}
|
||||
if ($path eq "")
|
||||
{
|
||||
&Error("Can't find '$fileName', from $fileName line $.");
|
||||
}
|
||||
|
||||
open(FILE, "<$path") || &Error("Can't open $path: $!");
|
||||
&DoOpenFile(*FILE, *NullFilter, 0);
|
||||
close(FILE) || die;
|
||||
0;
|
||||
}
|
||||
|
||||
sub DoPrepass
|
||||
{
|
||||
local ($_, $skipFlag) = @_;
|
||||
|
||||
return "" if /^###/;
|
||||
s/\s*###.*//; # Strip comments
|
||||
s/\${(\w+)}/&VarSubst($1, $skipFlag)/eg; # Do variable substitutions
|
||||
$_;
|
||||
}
|
||||
|
||||
sub DoOpenFile
|
||||
{
|
||||
local *FILE = $_[0];
|
||||
local *filter = $_[1];
|
||||
my $skipFlag = $_[2];
|
||||
my $result;
|
||||
local $_;
|
||||
|
||||
while (<FILE>)
|
||||
{
|
||||
$_ = &DoPrepass($_, $skipFlag);
|
||||
if ($result = &filter($_))
|
||||
{
|
||||
return $result;
|
||||
}
|
||||
elsif (/^##(\w*)(\s+(.*))?/)
|
||||
{
|
||||
my ($cmd, $params) = ($1, $3);
|
||||
|
||||
if ($cmd =~ /^if/)
|
||||
{
|
||||
my $condition;
|
||||
my $ifStartLine = $.;
|
||||
|
||||
if ($cmd eq "if")
|
||||
{
|
||||
if ($params =~ /^(\d+)\s*$/)
|
||||
{
|
||||
$condition = int($1);
|
||||
}
|
||||
elsif ($params =~ /^(\d+)\s*([=!]=|[<>]=?)\s*(\d+)\s*$/)
|
||||
{
|
||||
my ($left, $op, $right) = ($1, $2, $3);
|
||||
|
||||
$condition = eval($left . $op . $right);
|
||||
}
|
||||
elsif ($params =~ /^(\S+)\s*(eq|ne)\s*(\S+)\s*$/)
|
||||
{
|
||||
my ($left, $op, $right) = ($1, $2, $3);
|
||||
|
||||
$left =~ s/([\\'])/\\$1/g;
|
||||
$right =~ s/([\\'])/\\$1/g;
|
||||
$condition = eval("'$left' $op '$right'");
|
||||
}
|
||||
else
|
||||
{
|
||||
&Error("Invalid ##if params: '$params' " .
|
||||
"in $fileName line $.");
|
||||
}
|
||||
}
|
||||
elsif ($cmd =~ /^ifn?def$/)
|
||||
{
|
||||
if ($params =~ /^(\w+)\s*$/)
|
||||
{
|
||||
$condition = defined($vars{$1});
|
||||
$condition = !$condition if ($cmd eq "ifndef");
|
||||
}
|
||||
else
|
||||
{
|
||||
&Error("Invalid ##$cmd param: '$params' " .
|
||||
"in $fileName line $.");
|
||||
}
|
||||
}
|
||||
|
||||
# Do main body of if
|
||||
$result = &DoOpenFile(*FILE, *IfFilter,
|
||||
$skipFlag || !$condition);
|
||||
|
||||
if ($result == 1) # an '##else' was found
|
||||
{
|
||||
# Handle else
|
||||
$result = &DoOpenFile(*FILE, *IfFilter,
|
||||
$skipFlag || $condition);
|
||||
}
|
||||
|
||||
if ($result == 1) # a second '##else' was found
|
||||
{
|
||||
&Error("Two ##else's in a row in $fileName line $.");
|
||||
}
|
||||
elsif ($result == 0) # EOF was encountered
|
||||
{
|
||||
&Error("Unterminated ##if " .
|
||||
"in $fileName line $ifStartLine");
|
||||
}
|
||||
}
|
||||
elsif ($cmd eq "include")
|
||||
{
|
||||
if ($skipFlag)
|
||||
{
|
||||
}
|
||||
elsif ($params =~ /^"(.*)"\s*$/)
|
||||
{
|
||||
my $incFile = $1;
|
||||
|
||||
&DoFile($incFile);
|
||||
}
|
||||
else
|
||||
{
|
||||
&Error("Invalid ##include params: '$params'");
|
||||
}
|
||||
}
|
||||
elsif ($cmd eq "set")
|
||||
{
|
||||
if ($params =~ /^(\w+)=<<(")(.*)"\s*$/ or
|
||||
$params =~ /^(\w+)=<<(')(.*)'\s*$/)
|
||||
{
|
||||
my $varName = $1;
|
||||
my $quoteChar = $2;
|
||||
my $endTag = $3 . "\n";
|
||||
my $value;
|
||||
|
||||
while (<FILE>)
|
||||
{
|
||||
if ($_ eq $endTag)
|
||||
{
|
||||
chop $value;
|
||||
last;
|
||||
}
|
||||
else
|
||||
{
|
||||
if ($quoteChar eq '"')
|
||||
{
|
||||
$_ = &DoPrepass($_, $skipFlag);
|
||||
}
|
||||
$value .= $_;
|
||||
}
|
||||
}
|
||||
if (!$skipFlag)
|
||||
{
|
||||
$vars{$varName} = $value;
|
||||
}
|
||||
}
|
||||
elsif ($params =~ /^(\w+)="(.*)"\s*$/ or
|
||||
$params =~ /^(\w+)=(\S*)\s*$/)
|
||||
{
|
||||
if (!$skipFlag)
|
||||
{
|
||||
$vars{$1} = $2;
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
&Error("Invalid ##set command: '$params'");
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
&Error("Unrecognized command: '$_'");
|
||||
}
|
||||
}
|
||||
elsif (!$skipFlag)
|
||||
{
|
||||
print;
|
||||
}
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
$optEnable = 1;
|
||||
|
||||
foreach (@ARGV)
|
||||
{
|
||||
if ($optEnable and /^-/)
|
||||
{
|
||||
if (/^--$/)
|
||||
{
|
||||
$optEnable = 0;
|
||||
}
|
||||
elsif (/^-D(\w+)=(.*)$/)
|
||||
{
|
||||
$vars{$1} = $2;
|
||||
}
|
||||
elsif (/^-I(.*)$/)
|
||||
{
|
||||
unshift @incPath, $1;
|
||||
}
|
||||
else
|
||||
{
|
||||
&Error("Unrecognized option: '$_'");
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
&DoFile($_);
|
||||
}
|
||||
}
|
||||
|
||||
#
|
||||
# vi: ai ts=4
|
||||
# vim: si
|
||||
#
|
48
tools/yapp.doc
Normal file
48
tools/yapp.doc
Normal file
@ -0,0 +1,48 @@
|
||||
YAPP is a simple macro preprocessor designed to do minor tweaking to
|
||||
another program's inputs.
|
||||
|
||||
In its input, anything of the form ${foo} is expanded with the variable
|
||||
named foo. It is an error if ${foo} is not defined.
|
||||
If you need to escape a dollar sign for some reason, the variable
|
||||
with the empty string name , ${}, has the value "$".
|
||||
|
||||
The result of macro expansion is *not* re-expanded. Expansion is done only
|
||||
when definitions are made.
|
||||
|
||||
After variable expansion, lines are checked to see if they are control lines.
|
||||
Control lines begin with ## (after optional leading whitespace) All such lines are deleted and
|
||||
do not appear in the output. ### is a comment. Other options
|
||||
are:
|
||||
|
||||
##set variable=value
|
||||
|
||||
value may have one of the following forms:
|
||||
token: Trailing whitespace is stripped. The token may not contain
|
||||
any whitespace. Use quotes if it's complicated.
|
||||
"string": The string may have embedded quotes, and whitespace after
|
||||
the closing quote.
|
||||
<<"DELIM": This is a here-document, and the value is all of the following
|
||||
lines up until, but not including, the newline that precedes a line
|
||||
that consists soley of DELIM, for any DELIM string.
|
||||
The Delim must be in quotes. You have two options:
|
||||
"DELIM": Expand macros in the body of the here-document.
|
||||
'DELIM': Do not expand macros in the here-document.
|
||||
|
||||
##include "filename": Insert the named file in place of the current line.
|
||||
|
||||
##if num == num
|
||||
##if num != num
|
||||
##if num < num
|
||||
##if num > num
|
||||
##if num <= num
|
||||
##if num >= num
|
||||
##if token eq token
|
||||
##if token ne token
|
||||
##ifdef symbol
|
||||
##ifndef symbol
|
||||
##else
|
||||
##endif
|
||||
You can figure this one out. Macros in between are expanded as usual
|
||||
(so the ##else or ##endif may be in a macro expansion), but the result
|
||||
is ignored. String comparison is allowed only between simple words.
|
||||
#ifdef symbol is true if ${symbol} is defined.
|
Loading…
Reference in New Issue
Block a user