initial commit

This commit is contained in:
Michele Guerini Rocco 2019-05-15 16:55:03 +02:00
commit 60052b2f16
Signed by: rnhmjoj
GPG Key ID: 91BE884FBA4B591A
28 changed files with 6045 additions and 0 deletions

32
MANIFEST Normal file
View File

@ -0,0 +1,32 @@
1 test-file
2 MANIFEST
D books/
D books/tools/
3 bootstrap
4 bootstrap2
5 sortpages
6 Makefile
7 heap.c
8 heap.h
9 mempool.c
10 mempool.h
11 util.c
12 util.h
13 repair.c
14 subst.c
15 subst.h
16 unmunge.c
17 munge.c
18 yapp.doc
19 yapp
20 psgen
21 makemanifest
D books/ps/
22 prolog.ps
23 charmap.ps
D books/example/
24 Makefile
25 .cvsignore
26 filelist
27 footer.ps
28 us-constitution.gz

477
README Normal file
View File

@ -0,0 +1,477 @@
PREFACE
-------
This book grew out of a project to publish source code for cryptographic
software, namely PGP (Pretty Good Privacy), a software package for the
encryption of electronic mail and computer files. PGP is the most widely
used software in the world for email encryption. Pretty Good Privacy, Inc
(or "PGP") has published the source code of PGP for peer review, a long-
standing tradition in the history of PGP. The first time a fully implemented
cryptographic software package was published in its entirety in book form
was "PGP Source Code and Internals," by Philip Zimmermann, published by The
MIT Press, 1995, ISBN 0-262-24039-4.
Peer review of the source code is important to get users to trust the
software, since any weaknesses can be detected by knowledgeable experts who
make the effort to review the code. But peer review cannot be completely
effective unless the experts conducting the review can compile and test the
software, and verify that it is the same as the software products that are
published electronically. To facilitate that, PGP publishes its source code
in printed form that can be scanned into a computer via OCR (optical
character recognition) technology.
Why not publish the source code in electronic form? As you may know,
cryptographic software is subject to U.S. export control laws and
regulations. The new 1997 Commerce Department Export Administration
Regulations (EAR) explicitly provide that "A printed book or other printed
material setting forth encryption source code is not itself subject to the
EAR." (see 15 C.F.R. §734.3(b)(2)). PGP, in an overabundance of caution,
has only made available its source code in a form that is not subject to
those regulations. So, books containing cryptographic source code may be
published, and after they are published they may be exported, but only
while they are still in printed form.
Electronic commerce on the Internet cannot fully be successful without
strong cryptography. Cryptography is important for protecting our privacy,
civil liberties, and the security of our personal and business transactions
in the information age. The widespread deployment of strong cryptography
can help us regain some of the privacy and security that we have lost due
to information technology. Further, strong cryptography (in the form of
PGP) has already proven itself to be a valuable tool for the protection of
human rights in oppressive countries around the world, by keeping those
governments from reading the communications of human rights workers.
This book of tools contains no cryptographic software of any kind, nor does
it call, connect, nor integrate in any way with cryptographic software. But
it does contain tools that make it easy to publish source code in book form.
And it makes it easy to scan such source code in with OCR software rapidly
and accurately.
Philip Zimmermann
prz@acm.org
November 1997
INTRODUCTION
------------
This book contains tools for printing computer source code on paper in
human-readable form and reconstructing it exactly using automated tools.
While standard OCR software can recover most of the graphic characters,
non-printing characters like tabs, spaces, newlines and form feeds cause
problems.
In fact, these tools can print any ASCII text file; it's just that the
attention these tools pay to spacing is particularly valuable for computer
source code. The two-dimensional indentation structure of source code is
very important to its comprehensibility. In some cases, distinctions
between non-printing characters are critical: the standard make utility
will not accept spaces where it expects to see a tab character.
Producing a byte-for-byte identical copy of the original is also valuable
for authentication, as you can verify a checksum.
There are five problems we have addressed:
1. Getting good OCR accuracy.
2. Preserving whitespace.
3. Preserving lines longer than can be printed on the page.
4. Dealing with data that isn't human-readable.
5. Detecting and correcting any residual errors.
The first problem is partly addressed by using a font designed for OCR
purposes, OCR-B. OCR-A is a very ugly font that contains only the digits 0
through 9 and a few special punctuation symbols. OCR-B is a very readable
monospaced font that contains a full ASCII set, and has been popular as a
font on line printers for years because it distinguishes ambiguous
characters and is clear even if fuzzy or distorted.
The most unusual thing about the OCR-B font is the way that it prints a
lower-case letter 1, with a small hook on the bottom, something like an
upper-case L. This is to distinguish it from the numeral 1. We also made
some modifications to the font, to print the numeral 0 with a slash, and
to print the vertical bar in a broken form. Both of these are such common
variants that they should not present any intelligibility barrier. Finally,
we print the underscore character in a distinct manner that is hopefully
not visually distracting, but is clearly distinguishable from the minus
sign even in the absence of a baseline reference.
The most significant part of getting good OCR accuracy is, however, using
the OCR tools well. We've done a lot of testing and experimentation and
present here a lot of information on what works and what doesn't.
To preserve whitespace, we added some special symbols to display spaces,
tabs, and form feeds. A space is printed as a small triangular dot
character, while a hollow rightward-pointing triangle (followed by blank
spaces to the right tab stop) signifies a tab. A form feed is printed as
a yen symbol, and the printed line is broken after the form feed.
Making the dot triangular instead of square helps distinguish it from a
period. To reduce the clutter on the page and make the text more readable,
the space character is only printed as a small dot if it follows a blank
on the page (a tab or another space), or comes immediately before the end
of the line. Thus, the reader (human or software) must be able to
distinguish one space from no spaces, but can find multiple spaces by
counting the dots (and adding one).
The format is designed so that 80 characters, plus checksums, can be
printed on one line of an 8.5x11" (or A4) page, the still-common punched
card line length. Longer lines are managed with the simple technique of
appending a big ugly black blob to the first part of the line indicating
that the next printed line should be concatenated with the current one
with no intervening newline. Hopefully, its use is infrequent.
While ASCII text is by far the most popular form, some source code is not
readable in the usual way. It may be an audio clip, a graphic image bitmap,
or something else that is manipulated with a specialized editing tool. For
printing purposes, these tools just print any such files as a long string
of gibberish in a 64-character set designed to be easy to OCR unambiguously.
Although the tools recognize such binary data and apply extra consistency
checks, that can be considered a separate step.
Finally, the problem of residual errors arises. OCR software is not perfect,
and uses a variety of heuristics and spelling-check dictionaries to clean up
any residual errors in human-language text. This isn't reliable enough for
source code, so we have added per-page and per-line checksums to the printed
material, and a series of tools to use those checksums to correct any
remaining errors and convert the scanned text into a series of files again.
This "munged" form is what you see in most of the body of this book. We
think it does a good job of presenting source code in a way that can be read
easily by both humans and computers.
The tools are command-line oriented and a bit clunky. This has a purpose
beyond laziness on the authors' parts: it keeps them small. Keeping them
small makes the "bootstrapping" part of scanning this book easier, since you
don't have the tools to help you with that.
SCANNING
--------
Our tests were done with OmniPage 7.0 on a Power Macintosh 8500/120 and an
HP ScanJet 4c scanner with an automatic document feeder. The first part of
this is heavily OmniPage-specific, as that appears to be the most widely
available OCR software.
The tools here were developed under Linux, and should be generally portable
to any Unix platform. Since this book is about printing and scanning source
code, we assume the readers have enough programming background to know how
to build a program from a Makefile, understand the hazards of CR, LF or CRLF
line endings, and such minor details without explicit mention.
The first step to getting OrnniPage 7 to work well is to set it up with
options to disable all of its more advanced features for preserving font
changes and formatting. Look in the Seffings menu.
· Create a Zone Contents File with all of ASCII in it, plus the extra
bullet, currency, yen and pilcrow symbols. Name it "Source Code".
· Create a Source Code style set. Within it, create a Source Code zone style
and make it the default.
· Set the font to something fixed-width, like Courier.
· Set a fixed font size (10 point) and plain text, left-aligned.
· Set the tab character to a space.
· Set the text flow to hard line returns.
· Set the margins to their widest.
· The font mapping options are irrelevant.
Go to the settings panel and:
· Under Scanner, set the brightness to manual. With careful setting of the
threshold, this generates much better results than either the automatic
threshold or the 3D OCR. Around 144 has been a good setting for us; you
may want to start there.
· Under OCR, you'll build a training file to use later, but turn off
automatic page orientation and select your Source Code style set in the
Output Options. Also set a reasonable reject character. (For test, we
used the pi symbol, which came across from the Macintosh as a weird
sequence, but you can use anything as long as you make the appropriate
definition in subst.c.)
Do an initial scan of a few pages and create a manual zone encompassing
all of the text. Leave some margin for page misalignment, and leave space
on the sides for the left-right shift caused by the book binding being in
different places on odd and even pages.
Set the Zone Contents and the Style set to the Source Code settings. After
setting the Style Set, the Zone Style should be automatically set correctly
(since you set Source Code as the default).
Then save the Zone Template, and in the pop-up menu under the Zone step on
the main toolbar you can now select it.
Now we're ready to get characters recognized. The first results will be
terrible, with lots of red (unrecognizable) and green (suspicious) text in
the recognized window. Some tweaking will improve this enormously.
The first step is setting a good black threshold. Auto brightness sets the
threshold too low, making the character outlines bleed and picking up a lot
of glitches on mostly-blank pages. Try training OCR on the few pages you've
scanned and look at the representative characters. Adjust the threshold so
the strokes are clear and distinct, neither so thin they are broken nor so
think they smear into each other. The character that bleeds worst is
lowercase w, while the underscore and tab symbols have the thinnest lines
that need worry.
You'll have to re-scan (you can just click the AUTO button) until you get
satisfactory results.
The next step is training. You should scan a significant number of pages
and teach OmniPage about any characters it has difficulty with. There are
several characters which have been printed in unusual ways which you must
teach OmniPage about before it can recognize them reliably. We also have
some characters that are unique, which the tools expect to be mapped to
specific Latin-1 characters to be processed.
They characters most in need of training are as follows:
· Zero is printed 'slashed.'
· Lowercase L has a curled tail to distinguish it clearly from other
vertical characters like 1 and I.
· The or-bar or pipe symbol '|' is printed "broken" with a gap in the
middle to distinguish it similarly.
· The underscore character has little "serifs" on the end to distinguish
it from a minus sign. We also raised it a just a tad higher than the
normal underscore character, which was too low in the character cell to
be reliably seen by OmniPage.
· Tabs are printed as a hollow right-pointing triangle, followed by blanks
to the correct alignment position. If not trained enough, OmniPage
guesses this is a capital D. You should train OmniPage to recognize this
symbol as a currency symbol (Latin-1 244).
· Any spaces in the original that follow a space, or a blank on the printed
page, are printed as a tiny black triangle. You should train OmniPage to
recognize this as a center dot or bullet (Latin-1 267). We didn't use a
standard center dot because OmniPage confused it with a period.
· Any form feeds in the original are printed as a yen currency symbol
(Latin-1 245).
· Lines over 80 columns long are broken after 79 columns by appending a big
ugly black block. You should train OmniPage to recognize this as a
pilcrow (paragraph symbol, Latin-1 266). We did this because after
deciding something black and visible was suitable, we found out the font
we used doesn't have a pilcrow in it.
The zero and the tab character, because of their frequency, deserve special
attention.
In addition, look for any unrecognized characters (in red) and retrain those
pages. If you get an unrecognized character, that character needs training,
but Caere says that "good examples" are best to train on, so if the training
doesn't recognize a slightly fuzzy K, and there's a nice crisp K available
to train on, use that.
Other things that need training:
· ~ (tilde), ^ (caret), ` (backquote) and ' (quote). These get dropped
frequently unless you train them.
· i, j and; (semicolon). These get mixed up.
· 3 and S. These also get mixed up.
· Q can fail to be recognized.
· C and [ can be confused.
· c/C, o/O, p/P, s/S, u/U, v/V, w/W, y/Y and z/Z are often confused. This
can be helped by some training.
· r gets confused with c and n. I don't understand c, but it happens.
· f gets confused with i.
The OCR training pages have lots of useful examples of troublesome
characters. Scan a few pages of material, training each page, then scan a
few dozen pages and look for recognition problems. Look for what OmniPage
reports as troublesome, and when you have the repair program working, use
it to find and report further errors. Train a few pages particularly dense
in problems and append the troublesome characters to the training file, the
re-recognize the lot.
Double-check your training file for case errors. It's easy to miss the shift
key in the middle of a lot of training and will result in terrible results
even though OmniPage won't report anything amiss. We have spent a while
wondering why OmniPage wasn't recognizing capital S or capital W, only to
find that OmniPage was just doing what it was trained to do.
We have heard some reports that OmniPage has problems with large training
files. We have observed OmniPage suffering repeatable internal errors
sometimes after massive training additions, but they were cured by deleting
a few training images. Appending more training images to the training file
did not cause the problem to re-appear.
Repairing the OCR results
If the only copy of the tools you have is printed in this book, see the next
chapter on bootstrapping at this point. Here, we assume that you have the
tools and they work.
When you have some reasonable OCR results, delete any directory pages. With
no checksum information, they just confuse the postprocessing tools. (The
tools will just stop with an error when they get to the "uncorrectable"
directory name and you'll have to delete it then, so it's not fatal if you
forget.) Copy the data to a machine that you have the repair and unmunge
utilities on.
The repair utility attempts automatic table-driven correction of common
scanning errors. You have to recompile it to change the tables, but are
encouraged to if you find a common problem that it does not correct reliably.
If it gets stuck, it will deposit you into your favorite editor on or
slightly after the offending line. (The file you will be editing is the
unprocessed portion of the input.) After you correct the problem and quit
the editor, repair will resume.
"Your favorite editor" is taken from the $VISUAL and $EDITOR environment
variables, or the -e option to repair.
The repair utility never alters the original input file. It will produce
corrected output for file in file.out, and when it has to stop, it writes
any remaining uncorrected input back out to file.in (via a temporary
file.dump) and lets you edit this file. If you re-run repair on file and
file.in exists, repair will restart from there, so you may safely quit and
re-run repair as often as you like. (But if you change the input file, you
need to delete the .in file for repair to notice the change.)
Statistics on repair's work are printed to file.log. This is an excellent
place to look to see if any characters require more training.
As it works, repair prints the line it is working on. If you see it make a
mistake or get stuck, you can interrupt it (control-C or whatever is
appropriate), and it will immediately drop into the editor. If you interrupt
it a second time, it will exit rather than invoking the editor. If the
editor returns a non-zero result code (fails), repair will also stop. (E.g.
:cq in vim.)
One thing that repair fixes without the least trouble is the number of
spaces expected after a printing tab character. It's such an omnipresent OCR
software error that repair doesn't even log it as a correction.
In some cases, repair can miscorrect a line and go on to the next line,
possibly even more than once, finally giving up a few lines below the actual
error. If you are having trouble spotting the error, one helpful trick is to
exit the editor and let repair try to fix the page again, but interrupt it
while it is still working on the first line, before it has found the
miscorrection.
The Nasty Lines
Some lines of code, particularly those containing long runs of underscore or
minus characters, are particularly difficult to scan reliably. The repair
program has a special "nasty lines" feature to deal with this. If a file
named "nastylines" (or as specified by the -l option) exists, they are
checksummed and are considered as total replacements for any input line with
the same checksum. So, for example, if you place a blank line in the
nastylines file, any scanner noise on blank lines will be ignored.
The "nastylines" file is re-read every time repair restarts after an edit,
so you can add more lines as the program runs. (The error-correction patterns
should be done this way, too, but that'll have to wait for the next release.)
Sortpages
If, in the course of scanning, the pages have been split up or have gotten
out of order, a perl script called sortpages can restore them to the proper
order. It can merge multiple input files, discard duplicates, and warns about
any missing pages it encounters. This script requires that the pages have
been repaired, so that the page headers can be read reliably. The repair
program does not care about the order it works on pages in; it examines each
page independently. Unmunge, however, does need the pages in order.
Unmunging
After repair has finished its work, the unmunge program strips out the
checksums and, based on the page headers, divides the data up among various
files. Its first argument is the file to unpack. The optional second argument
is a manifest file that lists all of the files and the directories they go
in. Supplying this (an excellent idea) lets unmunge recreate a directory
hierarchy and warn about missing files.
When you have unmunged everything and reconstructed the original source code,
you are done. Unmunge verifies all of the checksums independently of repair,
as a sanity check, and you can have high confidence that the files are
exactly the same as the originals that were printed.
BOOTSTRAPPING
-------------
There's a problem using the postprocessing tools to correct OCR errors, when
the code being OCRed is the tools themselves. We've tried to provide a
reasonably easy way to get the system up and running starting from nothing
but a copy of OmniPage.
You could just scan all of the tools in, correct any errors by hand, delete
the error-checking information in a text editor, and compile them. But
finding all the errors by hand is painful in a body of code that large.
With the aid of perl (version 5), which provides a lot of power in very
little code, we have provided some utilities to make this process easier.
The first-stage bootstrap is a one-page perl script designed to be as small
and simple as possible, because you'll have to hand-correct it. It can verify
the checksums on each line, and drop you into the editor on any lines where
an error has occurred. It also knows how to strip out the visible spaces and
tabs, how to correct spacing errors after visible tab characters, and how to
invoke an editor on the erroneous line.
Scan in the first-stage bootstrap as carefully as possible, using OmniPage's
warnings to guide you to any errors, and either use a text editor or the
one-line perl command at the top of the file to remove the checksums and
convert any funny printed characters to whitespace form.
The first thing to do is try running it on itself, and correct any errors you
find this way. Note that the script writes its output to the file named in
the page header, so you should name your hand-corrected version differently
(or put it in a different directory) to avoid having it overwritten.
The second-stage bootstrap is a much denser one-pager, with better error
detection; it can detect missing lines and missing pages, and takes an
optional second argument of a manifest file which it can use to put files
in their proper directories. It's not strictly necessary, but it's only one
more (dense) page and you can check it against itself and the original
bootstrap.
Both of the botstrap utilities can correct tab spacing errors in the OCR
output. Although this doesn't matter in most source code, it is included
in the checksums.
Once you have reached this point, you can scan in the C code for repair and
unmunge. The C unmunge is actually less friendly than the bootstrap
utilities, because it is only intended to work with the output of repair.
It is, however, much faster, since computing CRCs a bit at a time in an
interpreted language is painfully slow for large amounts of data. It can
also deal with binary files printed in radix-64.
PRINTING
--------
Despite the title of this book, this process of producing a book is not well
documented, since it's been evolving up to the moment of publication. There,
is, however, a very useful working example of how to produce a book
(strikingly similar to this book) in the example directory, all controlled
by a Makefile.
Briefly, a master perl script called psgen takes three parameters: a file
list, a page numbers file to write to, and a volume number (which should
always be 1 for a one-volume book). It runs the listed files through the
munge utility, wraps them in some simple PostScript, and prepends a prolog
that defines the special characters and PostScript functions needed by the
text.
The file list also includes per-file flags. The most important is the
text/binary marker. Text files can also have a tab width specified, although
munge knows how to read Emacs-style tab width settings from the end of a
source file.
The prolog is assembled from various other files and defines by psgen using
a simple preprocessor called yapp (Yet Another Preprocessor). This process
includes some book-specific information like the page footer.
Producing the final PostScript requires the necessary non-standard fonts
(Futura for the footers and OCRB for the code) and the psutils package,
which provides the includeres utility used to embed the fonts in the
PostScript file. The fonts should go in the books/ps directory, as
"Futura.pfa" and the like.
The pagenums file can be used to produce a table of contents. For this book,
we generated the front matter (such as this chapter) separately, told psgen
to start on the next page after this, and concatenated the resultant
PostScript files for printing. The only trick was making the page footers
look identical.

3
example/.cvsignore Normal file
View File

@ -0,0 +1,3 @@
pagenums
MANIFEST
code.ps

23
example/Makefile Normal file
View File

@ -0,0 +1,23 @@
BOOKROOT=..
TOOLSDIR=$(BOOKROOT)/tools
PSDIR=$(BOOKROOT)/ps
YAPP=$(TOOLSDIR)/yapp
MAKEMANIFEST=$(TOOLSDIR)/makemanifest
PSGEN=BOOKROOT=$(BOOKROOT) $(TOOLSDIR)/psgen
INCLUDERES=(cd $(PSDIR); includeres)
code.ps pagenums: filelist footer.ps MANIFEST books
$(PSGEN) -P2 -l3 -DfooterFile=footer.ps filelist pagenums 1 \
| $(INCLUDERES) > code.ps
books:
ln -s $(BOOKROOT) books
MANIFEST: filelist
$(MAKEMANIFEST) $< > $@
clean:
rm -f `cat .cvsignore`
gv%: %.ps
gv $<

32
example/filelist Normal file
View File

@ -0,0 +1,32 @@
V 1 8
T MANIFEST
D books/
D books/tools/
T books/tools/bootstrap
T books/tools/bootstrap2
T4 books/tools/sortpages
T books/tools/Makefile
T books/tools/heap.c
T books/tools/heap.h
T books/tools/mempool.c
T books/tools/mempool.h
T books/tools/util.c
T books/tools/util.h
T books/tools/repair.c
T books/tools/subst.c
T books/tools/subst.h
T books/tools/unmunge.c
T books/tools/munge.c
T books/tools/yapp.doc
T4 books/tools/yapp
T4 books/tools/psgen
T4 books/tools/makemanifest
D books/ps/
T books/ps/prolog.ps
T books/ps/charmap.ps
D books/example/
T books/example/Makefile
T books/example/.cvsignore
T books/example/filelist
T books/example/footer.ps
B books/example/us-constitution.gz

5
example/footer.ps Normal file
View File

@ -0,0 +1,5 @@
% A program to print the page footer, using the magic P function,
% which takes a string and a font.
(Tools for Publishing Source Code via OCR ) /Futura P
(\343) /Symbol P % Copyright symbol
( 1997 Pretty Good Privacy, Inc.) /Futura P

BIN
example/us-constitution.gz Normal file

Binary file not shown.

68
ps/charmap.ps Normal file
View File

@ -0,0 +1,68 @@
%%BeginResource: procset Latin1-vec 0 0
/Latin1-vec [
/.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef
/space /exclam /quotedbl /numbersign
/dollar /percent /ampersand /${rightQuoteGlyph}
/parenleft /parenright /asterisk /plus
/comma /hyphen /period /slash
/${zeroGlyph} /one /two /three
/four /five /six /seven
/eight /nine /colon /semicolon
/less /equal /greater /question
/at /A /B /C
/D /E /F /G
/H /I /J /K
/L /M /N /O
/P /Q /R /S
/T /U /V /W
/X /Y /Z /bracketleft
/backslash /bracketright /asciicircum /${underscoreGlyph}
/${leftQuoteGlyph} /a /b /c
/d /e /f /g
/h /i /j /k
/l /m /n /o
/p /q /r /s
/t /u /v /w
/x /y /z /braceleft
/${barGlyph} /braceright /tilde /.notdef
/.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef
/space /exclamdown /cent /sterling
/${tabGlyph} /yen /brokenbar /section
/dieresis /copyright /ordfeminine /guillemotleft
/logicalnot /hyphen /registered /macron
/degree /plusminus /twosuperior /threesuperior
/acute /mu /${pilcrowGlyph} /${bulletGlyph}
/cedilla /dotlessi /ordmasculine /guillemotright
/onequarter /onehalf /threequarters /questiondown
/Agrave /Aacute /Acircumflex /Atilde
/Adieresis /Aring /AE /Ccedilla
/Egrave /Eacute /Ecircumflex /Edieresis
/Igrave /Iacute /Icircumflex /Idieresis
/Eth /Ntilde /Ograve /Oacute
/Ocircumflex /Otilde /Odieresis /multiply
/Oslash /Ugrave /Uacute /Ucircumflex
/Udieresis /Yacute /Thorn /germandbls
/agrave /aacute /acircumflex /atilde
/adieresis /aring /ae /ccedilla
/egrave /eacute /ecircumflex /edieresis
/igrave /iacute /icircumflex /idieresis
/eth /ntilde /ograve /oacute
/ocircumflex /otilde /odieresis /divide
/oslash /ugrave /uacute /ucircumflex
/udieresis /yacute /thorn /ydieresis
]def
%%EndResource

306
ps/prolog.ps Normal file
View File

@ -0,0 +1,306 @@
##set pageNumFont="Futura"
##set dirNameFont="Futura-Heavy"
##set fontsNeeded="${font} Symbol Futura Futura-Heavy"
##set includeFontComments=<<"END"
%%IncludeResource: font ${font}
%%IncludeResource: font Symbol
%%IncludeResource: font Futura
%%IncludeResource: font Futura-Heavy
END
##if ${font} eq Courier
##set charShrinkFactor=0.93
##set zeroGlyph=Oslash
##set underscoreGlyph=underscore
##set bulletGlyph=bullet
##set tabGlyph=currency
##set leftQuoteGlyph=quoteleft
##set rightQuoteGlyph=quoteright
##set pilcrowGlyph=paragraph
##set barGlyph=bar
##else
##set charShrinkFactor=1
##set zeroGlyph=Oslash
##set underscoreGlyph=underscore2
##set bulletGlyph=bullet2
##set tabGlyph=tabsym
##set leftQuoteGlyph=grave
##set rightQuoteGlyph=quoteright ### was "acute"
##set pilcrowGlyph=erase
##set barGlyph=orsym
##set do_custom_chars=1
##endif
%!PS-Adobe-3.0
%%Orientation: Portrait
%%Pages: (atend)
%%DocumentNeededResources: font ${fontsNeeded}
%%DocumentMedia: Letter 612 792 74 white ()
%%EndComments
%%BeginDefaults
%%PageMedia: Letter
%%PageResources: font ${fontsNeeded}
%%EndDefaults
%%BeginProlog
%%BeginResource: procset Custom-Preamble 0 0
%
% Document definitions
% (Upper case to avoid collisions)
%
% 8.5x11 paper is 612x792 points, but 24 points near the edge or so
% shouldn't be used.
/Topmargin 770 def
/Leftmargin 30 def
/Rightmargin 612 Leftmargin sub def
/Botmargin 22 def
/Bindoffset 40 def
/Lineskip -10 def
% How much to shrink characters by?
/Factor ${charShrinkFactor} def
/Fontsize 9.5 Factor mul def
% (1000 units is std height, so Courier at 6/10 aspect ratio is 600.
% Widen to make up for scaling loss.
/Charwidth
Rightmargin Leftmargin sub Bindoffset sub 87 div Fontsize div 1000 mul
def
% Print a header (expects page number on stack)
/OddPageStart
{ save exch /MyFont findfont Fontsize scalefont setfont
/CurrentLeft Leftmargin Bindoffset add def
/CurrentRight Rightmargin def
CurrentLeft Topmargin moveto } def
/EvenPageStart
{ save exch /MyFont findfont Fontsize scalefont setfont
/CurrentLeft Leftmargin def
/CurrentRight Rightmargin Bindoffset sub def
CurrentLeft Topmargin moveto } def
% /MyFont findfont [Fontsize 0 0 Fontsize 0 0] makefont setfont
% Print the name of the directory in a large font
/DirPage
{
/${dirNameFont} findfont 14 scalefont setfont
0 -10 rmoveto (Directory) show
CurrentLeft 30 add currentpoint exch pop 20 sub moveto show
} def
% Advance a line
/L {show CurrentLeft currentpoint exch pop Lineskip add moveto} bind def
% Print the "inside" footer line using P (string font => )
% We do some magic involving redefining P to first measure the
% width of this string and then print it, so you must use it
% to do all printing.
/Foot {
##ifdef footerFile
##include "${footerFile}"
##endif
} def
% /P is defined in the Setup section
% Print an odd footer
/OddPageEnd
{ CurrentLeft Botmargin moveto CurrentRight Botmargin lineto
1 setlinewidth stroke
CurrentLeft Botmargin 10 sub moveto
Foot
10 string cvs dup stringwidth
pop CurrentRight exch sub currentpoint exch pop moveto
/${pageNumFont} P
showpage
restore
} def
% Print an even footer
/EvenPageEnd
{ CurrentLeft Botmargin moveto CurrentRight Botmargin lineto
1 setlinewidth stroke
Leftmargin Botmargin 10 sub moveto
/${pageNumFont} P
CurrentRight FootWidth sub currentpoint exch pop moveto
Foot
showpage
restore
} def
##ifdef do_custom_chars
% A 1000-point OCRB discunderline consists of:
% 111.45 -173.688 moveto
% 609.356 -173.688 lineto
% 609.356 -70.9227 lineto
% 111.45 -70.9227 lineto
% closepath
% 720.0 -0.0 moveto
% Line thickness is
% 102.7653 pts.
% This would suggest the following values:
/underleft 111.45 def
/underright 609.356 def
/underthick 102.7643 def
/underup underthick def
/underdown 0 def
/underserif 25 def
% These look better in GhostScript, but not on a real Adobe rasterizer
%/underright 600 def
%/underleft 100 def
%/underthick 75 def
171
211
36081
% The default bullet character is
% 254.0 341.0 moveto
% 254.0 170.0 lineto
% 465.0 170.0 lineto
% 465.0 341.0 lineto
% closepath
% Our modified version is based on:
/bullwid 204 def
/bullht 176.75 def
/bullleft 254 341 add bullwid sub 2 div def
/bullright 254 341 add bullwid add 2 div def
/bullbot 254 def
/bulltop bullbot bullht add def
% And a custom-created tab symbol
/tableft 250 def
/tabright 550 def
/tabtop 550 def
/tabbot 50 def
/tablinewidth 35 def
% Let's try a vertical bar
% OCRB defines (|)
% 411.062 -173.688 moveto
% 411.062 741.043 lineto
% 308.297 741.043 lineto
% 308.297 -173.688 lineto
% closepath
% 720.0 -0.0 moveto
/orleft 308.297 def
/orright 411.062 def
/orbot -173.688 def
/ortop 741.043 def
/orbreak 150 def % Width of break
/orbbot ortop orbot add orbreak sub 2 div def % Bottom of break
/orbtop ortop orbot add orbreak add 2 div def % Top of break
##endif
% newfontname encoding-vec fontname -> - make a new encoded font
/MF2 {
% Make a dict for the new font, with room for the /Metrics
findfont dup length 1 add dict begin
% Copy everything except the FID entry
{1 index /FID eq {pop pop} {def} ifelse} forall
% Set the encoding vector
/Encoding exch def
##ifdef do_custom_chars
% Create a new expanded CharStrings dictionary
CharStrings dup length 5 add dict
begin { def } forall
% Create a custom underscore character
/underscore2 {
pop
//Charwidth 0 % width, bounding box follows
//underleft //underdown neg //underright //underthick //underup add
setcachedevice
//underleft //underthick //underup add moveto
//underleft //underserif add //underthick //underup add lineto
//underleft //underserif add //underthick lineto
//underright //underserif sub //underthick lineto
//underright //underserif sub //underthick //underup add lineto
//underright //underthick //underup add lineto
//underright //underdown neg lineto
//underright //underserif sub //underdown neg lineto
//underright //underserif sub 0 lineto
//underleft //underserif add 0 lineto
//underleft //underserif add //underdown neg lineto
//underleft //underdown neg lineto
closepath fill
} bind def
% Create a custom bullet character.
/bullet2 {
pop
//Charwidth 0 % width, bounding box follows
//bullleft //bullbot //bullright //bulltop
setcachedevice
//bullleft //bullbot moveto
//bullleft bullright add 2 div bulltop lineto
//bullright //bullbot lineto
closepath fill
} bind def
% Create a custom tab character.
/tabsym {
pop
//Charwidth 0 % width, bounding box follows
//tableft //tablinewidth sub //tabbot //tablinewidth sub
//tabright //tablinewidth add //tabtop //tablinewidth add
setcachedevice
//tablinewidth setlinewidth
true setstrokeadjust
0 setlinejoin
//tableft //tabbot moveto
//tabright //tabtop //tabbot add 2 div lineto
//tableft //tabtop lineto
closepath stroke
} bind def
/orsym {
pop
//Charwidth 0 % width, bounding box follows
//orleft //orbot //orright //ortop
setcachedevice
//orleft //orbot moveto
//orleft //orbbot lineto
//orright //orbbot lineto
//orright //orbot lineto
closepath
//orleft //ortop moveto
//orleft //orbtop lineto
//orright //orbtop lineto
//orright //ortop lineto
closepath fill
} bind def
/CharStrings currentdict end def
##endif
% Create a new dict to be the /Metrics values
CharStrings dup length dict
% Now fill in the metrics dict with the desired width
begin { pop Charwidth def } forall /Metrics currentdict end def
% End of definitions
currentdict end
% Define the font
definefont pop
} bind def
% Check PostScript language level.
/gs_languagelevel /languagelevel where { pop languagelevel } { 1 } ifelse def
%%EndResource
##include "charmap.ps"
${includeFontComments}
%%EndProlog
%%BeginSetup
/MyFont Latin1-vec /${font} MF2
/#copies 1 def
% Compute the width of the /Foot string, by defining P to
% add up the x-width of the characters.
/P { findfont 9 scalefont setfont stringwidth pop add } def
/FootWidth 0 Foot def
% Redefine P to print, as usual
/P { findfont 9 scalefont setfont show } def
%%BeginResource: procset foo 0 0
% This is an example
%%EndResource
%%EndSetup

30
tools/Makefile Normal file
View File

@ -0,0 +1,30 @@
all: unmunge repair munge
OPT = -g -O -W -Wall
COMMON_OBJS = util.o
UNMUNGE_OBJS = $(COMMON_OBJS) unmunge.o
MUNGE_OBJS = $(COMMON_OBJS) munge.o
REPAIR_OBJS = $(COMMON_OBJS) heap.o mempool.o subst.o repair.o
unmunge: $(UNMUNGE_OBJS)
$(CC) $(OPT) -o $@ $(UNMUNGE_OBJS)
munge: $(MUNGE_OBJS)
$(CC) $(OPT) -o $@ $(MUNGE_OBJS)
repair: $(REPAIR_OBJS)
$(CC) $(OPT) -o $@ $(REPAIR_OBJS)
.c.o:
$(CC) $(OPT) -o $@ -c $<
clean:
-rm -f *.o munge unmunge repair core *.core
unmunge.o: util.h
munge.o: util.h
repair.o: heap.h mempool.h util.h subst.h
heap.o: heap.h
mempool.o: mempool.h
subst.o: subst.h

68
tools/bootstrap Normal file
View File

@ -0,0 +1,68 @@
#!/usr/bin/perl -s
#
# bootstrap -- Simpler version of unmunge for bootstrapping
#
# Unmunge this file using:
# perl -ne 'if (s/^ *[^-\s]\S{4,6} ?//) { s/[\244\245\267]/ /g; print; }'
#
# $Id: bootstrap,v 1.15 1997/11/14 03:52:53 mhw Exp $
sub Fatal { print STDERR @_; exit(1); }
sub Max { my ($a, $b) = @_; ($a > $b) ? $a : $b; }
sub TabSkip { $tabWidth - 1 - (length($_[0]) % $tabWidth); }
($tab,$yen,$pilc,$cdot,$tmp1,$tmp2)=("\244","\245","\266","\267","\377","\376");
$editor = $ENV{'VISUAL'} || $ENV{'EDITOR'} || 'vi';
$inFile = $ARGV[0];
doFile: {
open(IN, "<$inFile") || die;
for ($lineNum = 1; ($_ = <IN>); $lineNum++) {
s/^\s+//; s/\s+$//; # Strip leading and trailing spaces
next if (/^$/); # Ignore blank lines
($prefix, $seenCRCStr, $dummy, $_) = /^(\S{2})(\S{4})( (.*))?/;
# Correct the number of spaces after each tab
while (s/$tab( *)/$tmp1 . ($tmp2 x &Max(length($1), &TabSkip($`)))/e) {}
s/ ( +)/" " . ($cdot x length($1))/eg; # Correct center dots
s/$tmp1/$tab/g; s/$tmp2/ /g; # Restore tabs and spaces from correction
s/\s*$/\n/; # Strip trailing spaces, and add a newline
$crc = $seenCRC = 0; # Calculate CRC
for ($data = $_; $data ne ""; $data = substr($data, 1)) {
$crc ^= ord($data);
for (1..8) {
$crc = ($crc >> 1) ^ (($crc & 1) ? 0x8408 : 0);
}
}
if ($crc != hex($seenCRCStr)) { # CRC mismatch
close(IN); close(OUT);
unlink(@filesCreated);
@filesCreated = ();
@oldStat = stat($inFile);
system($editor, "+$lineNum", $inFile);
@newStat = stat($inFile);
redo doFile if ($oldStat[9] != $newStat[9]); # Check mod date
&Fatal("Line $lineNum invalid: $_");
}
if ($prefix eq '--') { # Process header line
($code, $pageNum, $file) = /^(\S{19}) Page (\d+) of (.*)/;
$tabWidth = hex(substr($code, 11, 1));
if ($file ne $lastFile) {
print "$file\n";
&Fatal("$file: already exists\n") if (!$f && (-e $file));
close(OUT);
open(OUT, ">$file") || &Fatal("$file: $!\n");
push(@filesCreated, ($lastFile = $file));
}
} else { # Unmunge normal line
s/$tab( *)/"\t".(" " x (length($1) - &TabSkip($`)))/eg;
s/$yen\n/\f/; # Handle form feeds
s/$pilc\n//; # Handle continuation lines
s/$cdot/ /g; # Center dots -> spaces
print OUT;
}
}
close(IN); close(OUT);
}

72
tools/bootstrap2 Normal file
View File

@ -0,0 +1,72 @@
#!/usr/bin/perl -s
#
# bootstrap2 -- Second stage bootstrapper, a version of unmunge
#
# $Id: bootstrap2,v 1.4 1997/11/14 03:52:54 mhw Exp $
sub Cleanup { close(IN); close(OUT); unlink(@files); @files = (); }
sub Fatal { &Cleanup(); print STDERR @_; exit(1); }
sub TabSkip { $tabWidth - 1 - (length($_[0]) % $tabWidth); }
sub TabFix { my ($needed, $actual) = (&TabSkip($_[0]), length($_[1]));
$tmp1 . ($tmp2 x $needed) . (" " x ($actual - $needed)); }
sub HumanEdit { my ($file, $line, @message) = ($inFile, @_); &Cleanup();
@old = stat($file); system($editor, "+$line", $file); @new = stat($file);
redo doFile if ($old[9] != $new[9]); # Check mod date
&Fatal("Line $line, ", @message); }
($tab,$yen,$pilc,$cdot,$tmp1,$tmp2)=("\244","\245","\266","\267","\377","\376");
$editor = $ENV{'VISUAL'} || $ENV{'EDITOR'} || 'vi';
($inFile, $manifest, @rest) = @ARGV;
if ($manifest ne "") { # Read manifest file
open(MANIFEST, "<$manifest") || &Fatal("$manifest: $!\n");
while (<MANIFEST>) { $dir = $1 if /^D\s+(.*)$/;
$index[$1] = $dir . $2 if /^(\d+)\s+(.*)$/; }
}
doFile: {
$seenPCRC = $pcrc1 = 0; $lastFlags = 1; $lastFileNum = 0;
open(IN, "<$inFile") || &Fatal("$inFile: $!\n");
for ($line = 1; ($_ = <IN>); $line++) {
s/^\s+//; s/\s+$//; # Strip leading and trailing spaces
next if (/^$/); # Ignore blank lines
($prefix, $seenCRCStr, $dummy, $_) = /^(\S{2})(\S{4})( (.*))?/;
while (s/$tab( *)/&TabFix($`, $1)/eo) {} # Correct spaces after tabs
s/($tmp2| )( +)/$1 . ($cdot x length($2))/ego; # Correct center dots
s/$tmp1/$tab/go; s/$tmp2/ /go; # Restore tabs/spaces from correction
s/\s*$/\n/; # Strip trailing spaces, and add a newline
$crc = 0; $pcrc = $pcrc1; # Calculate CRCs
for ($data = $_; $data ne ""; $data = substr($data, 1)) {
$crc ^= ord($data); $pcrc1 ^= ord($data);
for (1..8) { $crc = ($crc >> 1) ^ (($crc & 1) ? 0x8408 : 0);
$pcrc1 = ($pcrc1 >> 1) ^ (($pcrc1 & 1) ? 0xedb88320 : 0); }
}
($seenPLCRC, $seenCRC) = map { hex($_) } ($prefix, $seenCRCStr);
&HumanEdit($line, "CRC failed: $_") if $crc != $seenCRC;
if ($prefix eq '--') { # Process header line
&HumanEdit($line - 1, "Page CRC failed") if $pcrc != $seenPCRC;
($humanHdr, $pageNum, $file) = /^\S{19} (Page (\d+) of (.*))/;
($vers, $flags, $seenPCRC, $tabWidth, $prodNum, $fileNum) =
map { hex($_) } /^(\S)(\S\S)(\S{8})(\S)(\S{3})(\S{4})/;
if ($fileNum != $lastFileNum) {
print STDERR "MISSING files\n" if $fileNum != $lastFileNum + 1;
&Fatal("Missing pages\n") if $pageNum != 1 || !($lastFlags & 1);
if ($manifest ne "") {
($_ = $index[$fileNum]) =~ m%([^/]*)$%;
&Fatal("Manifest mismatch\n") if ($file ne $1);
($file = $_) =~ s|/+|mkdir($`, 0777), "/"|eg; # mkdir -p
}
&Fatal("$file: already exists\n") if (!$f && (-e $file));
close(OUT); open(OUT, ">$file") || &Fatal("$file: $!\n");
push(@files, $file); print "$fileNum $file\n";
} else {
&Fatal("MISSING pages\n") if ($pageNum != $lastPageNum + 1);
}
($lastFlags,$lastFileNum,$lastPageNum) = ($flags,$fileNum,$pageNum);
$pcrc1 = 0;
} else { # Unmunge normal line
&HumanEdit($line, "CRC failed: $_") if ($pcrc1 >> 24) != $seenPLCRC;
s/$tab( *)/"\t".(" " x (length($1) - &TabSkip($`)))/ego;
s/$yen\n/\f/o; s/$pilc\n//o; s/$cdot/ /go; print OUT;
}
}
}

144
tools/heap.c Normal file
View File

@ -0,0 +1,144 @@
/*
* heap.c -- Simple priority queue. Takes pointers to cost values
* (presumably the first field in a larger structure) and returns
* them in increasing order of cost.
*
* Copyright (C) 1997 Pretty Good Privacy, Inc.
*
* Written by Colin Plumb and Mark H. Weaver
*
* $Id: heap.c,v 1.2 1997/07/05 02:55:23 colin Exp $
*/
#include <stdio.h> /* For fprintf(stderr, "Out of memory") */
#include <stdlib.h> /* For malloc() & co. */
#include "heap.h"
#define HeapParent(i) ((i) / 2)
#define HeapLeftChild(i) ((i) * 2)
#define HeapRightChild(i) ((i) * 2 + 1)
#define HeapElem(h, i) (h)->elems[i]
#define HeapMinElem(h) HeapElem(h, 1)
#define HeapElemCost(e) (*(e))
#define HeapCost(h, i) HeapElemCost(HeapElem(h, i))
#define HeapSize(h) ((h)->numElems)
static void
SiftDown(Heap const *heap, HeapCost *e)
{
HeapIndex size = HeapSize(heap), parent = 1, child;
HeapCost cparent = HeapElemCost(e), cchild;
for (;;) {
child = 2*parent;
if (child > size)
break;
cchild = HeapCost(heap, child);
if (child < size && cchild > HeapCost(heap, child+1)) {
cchild = HeapCost(heap, child+1);
child++;
}
if (cparent <= cchild)
break; /* Stop sifting down */
HeapElem(heap, parent) = HeapElem(heap, child);
parent = child;
}
HeapElem(heap, parent) = e;
}
/* Debug tool: verify heap property */
void
HeapVerify(Heap *heap)
{
HeapIndex i;
for (i = 2; i <= HeapSize(heap); i++)
if (HeapCost(heap, i) < HeapCost(heap, HeapParent(i)))
fprintf(stderr, "DEBUG: VerifyHeap failed at elem %d\n", i);
}
/* Remove and return the minimum cost from the heap. */
HeapCost *
HeapGetMin(Heap *heap)
{
HeapIndex lastElem = HeapSize(heap);
HeapCost *retval;
if (!lastElem)
return NULL;
retval = HeapMinElem(heap);
HeapSize(heap) = lastElem-1;
SiftDown(heap, HeapElem(heap, lastElem));
return retval;
}
/* Helper - set heap size, reallocating if needed */
static void
HeapResize(Heap *heap, HeapIndex newNumElems)
{
if (newNumElems >= heap->elemsAllocated) {
HeapIndex newAllocSize = heap->elemsAllocated * 2;
if (newAllocSize <= newNumElems)
newAllocSize = newNumElems + 1;
heap->elems = (HeapCost **)realloc((void *)heap->elems,
sizeof(*heap->elems) * newAllocSize);
if (heap->elems == NULL) {
fprintf(stderr, "Fatal error: Out of memory growing heap\n");
exit(1);
}
heap->elemsAllocated = newAllocSize;
}
heap->numElems = newNumElems;
}
/* Add an element to the heap */
void
HeapInsert(Heap *heap, HeapCost *newElem)
{
HeapIndex parent, i = ++HeapSize(heap);
HeapCost cost = HeapElemCost(newElem);
HeapResize(heap, i);
/* Sift up until parent = 0 */
while ((parent = HeapParent(i)) && HeapCost(heap, parent) > cost) {
HeapElem(heap, i) = HeapElem(heap, parent);
i = parent;
}
heap->elems[i] = newElem;
}
/* Initialize a new heap */
void
HeapInit(Heap *heap, HeapIndex initSize)
{
initSize++; /* Add one for temporary element */
if (initSize < 1)
initSize = 1;
heap->elems = (HeapCost **)malloc(initSize * sizeof(*heap->elems));
if (heap->elems == NULL) {
fprintf(stderr, "Fatal error: Out of memory creating heap\n");
exit(1);
}
heap->elemsAllocated = initSize;
heap->numElems = 0;
}
/* Free up a heap's resources. */
void
HeapDestroy(Heap *heap)
{
free((void *)heap->elems);
heap->elemsAllocated = 0;
heap->numElems = 0;
heap->elems = NULL;
}
/*
* Local Variables:
* tab-width: 4
* End:
* vi: ts=4 sw=4
* vim: si
*/

43
tools/heap.h Normal file
View File

@ -0,0 +1,43 @@
/*
* heap.h -- Simple priority queue. Takes pointers to cost values
* (presumably the first field in a larger structure) and returns
* them in increasing order of cost.
*
* Copyright (C) 1997 Pretty Good Privacy, Inc.
*
* Written by Colin Plumb and Mark H. Weaver
*
* $Id: heap.h,v 1.6 1997/10/31 04:22:46 mhw Exp $
*/
#ifndef HEAP_H
#define HEAP_H 1
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
typedef int HeapCost;
#define COST_INFINITY INT_MAX
typedef unsigned HeapIndex;
typedef struct Heap {
HeapCost **elems;
HeapIndex numElems, elemsAllocated;
} Heap;
void HeapInit(Heap *heap, HeapIndex initSize);
void HeapDestroy(Heap *heap);
void HeapInsert(Heap *heap, HeapCost *newElem);
HeapCost *HeapGetMin(Heap *heap);
void HeapVerify(Heap *heap);
#endif
/*
* Local Variables:
* tab-width: 4
* End:
* vi: ts=4 sw=4
* vim: si
*/

31
tools/makemanifest Normal file
View File

@ -0,0 +1,31 @@
#!/usr/bin/perl
$fileNum = 0;
while(<>)
{
/^([VDTB])(\S*)\s+(.*)/ || die("Bad filelist, line $.");
($type, $options, $name) = ($1, $2, $3);
if ($type eq "D")
{
$dir = $name;
print "D $dir\n";
}
elsif ($type eq "V")
{
# Do nothing
}
else
{
$fileNum++;
$tail = $name;
$tail =~ s|^.*/||;
die("Bad filelist, line $.") if $name ne $dir . $tail;
print "$fileNum $tail\n";
}
}
#
# vi: ai ts=4
# vim: si
#

137
tools/mempool.c Normal file
View File

@ -0,0 +1,137 @@
/*
* mempool.c - Pooled memory allocation, similar to GNU obstacks.
*
* $Id: mempool.c,v 1.5 1997/11/13 23:53:08 colin Exp $
*/
#include <assert.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h> /* For malloc() & free() */
#include "mempool.h"
/*
* The memory pool allocation functions
*
* These are based on a linked list of memory blocks, usually of uniform
* size. New memory is allocated from the tail of the current block,
* until that is inadequate, then a new block is allocated.
* The entire pool can be freed at once by calling memPoolFree().
*/
struct PoolBuf {
struct PoolBuf *next;
unsigned size;
/* Data follows */
};
/* The prototype empty pool, including the default allocation size. */
static struct MemPool EmptyPool = { 0, 0, 0, 4096, 0 , 0, 0};
/* Initialize the pool for first use */
void
memPoolInit(struct MemPool *pool)
{
*pool = EmptyPool;
}
/* Set the pool's purge function */
void
memPoolSetPurge(struct MemPool *pool, int (*purge)(void *), void *arg)
{
pool->purge = purge;
pool->purgearg = arg;
}
/* Free all the memory in the pool */
void
memPoolEmpty(struct MemPool *pool)
{
struct PoolBuf *buf;
while ((buf = pool->head) != 0) {
pool->head = buf->next;
free(buf);
}
pool->freespace = 0;
pool->totalsize = 0;
}
/*
* Restore a pool to a marked position, freeing subsequently allocated
* memory.
*/
void
memPoolCutBack(struct MemPool *pool, struct MemPool const *cutback)
{
struct PoolBuf *buf;
assert(pool);
assert(cutback);
assert(pool->totalsize >= cutback->totalsize);
while((buf = pool->head) != cutback->head) {
pool->head = buf->next;
free(buf);
}
*pool = *cutback;
}
/*
* Allocate a chunk of memory for a structure. Alignment is assumed to be
* a power of 2. It could be generalized, if that ever becomes relevant.
* Note that alignment is from the beginning of an allocated chunk, which
* is guaranteed by ANSI to be as aligned as can possibly matter.
*/
void *
memPoolAlloc(struct MemPool *pool, unsigned len, unsigned alignment)
{
char *p;
unsigned t;
/* Where to allocate next object */
p = pool->freeptr;
/* How far it is from the beginning of the chunk. */
t = p - (char *)pool->head;
/* How much to round up freeptr to make alignment */
t = -t & --alignment;
/* Okay, does it fit? */
if (pool->freespace >= len+t) {
pool->freespace -= len+t;
p += t;
pool->freeptr = p + len;
return p;
}
/* It does not fit in the current chunk. Go for a bigger chunk. */
/* First, figure out how much to skip at the beginning of the chunk */
alignment &= -(unsigned)sizeof(struct PoolBuf);
alignment += sizeof(struct PoolBuf);
/* Then, figure out a chunk size that will fit */
t = pool->chunksize;
assert(t);
while (len + alignment > t)
t *= 2;
while ((p = malloc(t)) == 0) {
/* If that didn't work, try purging or smaller allocations */
if (!pool->purge || !pool->purge(pool->purgearg)) {
t /= 2;
if (len + alignment > t)
fputs("Out of memory!\n", stderr);
exit (1); /* Failed */
}
}
/* Update the various pointers. */
pool->totalsize += t;
((struct PoolBuf *)p)->next = pool->head;
((struct PoolBuf *)p)->size = t;
pool->head = (struct PoolBuf *)p;
pool->freespace = t - len - alignment;
p += alignment;
pool->freeptr = p + len;
return p;
}

36
tools/mempool.h Normal file
View File

@ -0,0 +1,36 @@
/* $Id: mempool.h,v 1.2 1997/11/13 23:53:09 colin Exp $ */
#ifndef MEMPOOL_H
#define MEMPOOL_H
typedef struct MemPool {
struct PoolBuf *head;
char *freeptr;
unsigned freespace;
unsigned chunksize; /* Default starting point */
unsigned long totalsize;
int (*purge)(void *); /* Return non-zero to retry alloc */
void *purgearg;
} MemPool;
/* A global pool for miscellaneous stuff. */
extern struct MemPool MiscPool;
/*
* Nice clean interfaces
*/
void memPoolInit(struct MemPool *pool);
void memPoolSetPurge(struct MemPool *pool, int (*purge)(void *), void *arg);
void memPoolEmpty(struct MemPool *pool);
void memPoolCutBack(struct MemPool *dest, struct MemPool const *cutback);
void *memPoolAlloc(struct MemPool *pool, unsigned len, unsigned alignment);
#ifdef DEADCODE
char const *memPoolStore(struct MemPool *pool, char const *str);
#endif
/* Lookie here! An ASNI-compliant alignment finder! */
#define alignof(type) (sizeof(struct{type _x; char _y;}) - sizeof(type))
#define memPoolNew(pool, type) memPoolAlloc(pool, sizeof(type), alignof(type))
#endif /* MEMPOOL_H */

543
tools/munge.c Normal file
View File

@ -0,0 +1,543 @@
/*
* munge.c -- Program to convert a text file into "munged" form,
* suitable for reconstruction from printed form. Tabs are
* made visible and checksums are added to each line and each
* page to protect against transcription errors.
*
* Copyright (C) 1997 Pretty Good Privacy, Inc.
*
* Designed by Colin Plumb, Mark H. Weaver, and Philip R. Zimmermann
* Written by Mark H. Weaver
*
* $Id: munge.c,v 1.32 1997/11/12 23:28:53 mhw Exp $
*/
#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <ctype.h>
#include <stdlib.h>
#include "util.h"
/*
* The file is divided into pages, and the format of each page is
*
--f414 000b2dc79af40010002 Page 1 of munge.c
bc38e5 /*
40a838 * munge.c -- Program to convert a text file into munged form
647222 *
193f28 * Copyright (C) 1997 Pretty Good Privacy, Inc.
827222 *
699025 * Designed by Colin Plumb, Mark H. Weaver, and Philip R. Zimmermann
0d050c * Written by Mark H. Weaver
*
* Where the first 2 columns are the high 8 bits (in hex) of a running
* CRC-32 of the page (the string "--", unlikely to be confused with
* any digits, indicates a page header line) and the next 4 columns
* are a CRC-16 of the rest of the line. Then a space (not counted in
* the CRC), and the line of text. Tabs are printed as the currency
* symbol (ISO Latin 1 character 164) followed by the appropriate number
* of spaces, and any form feeds are printed as a yen symbol (Latin 1 165).
* The CRC is computed on the transformed line, including the trailing
* newline. No trailing whitespace is permitted.
*
* The header line contains a (hex) number of the form 0ffcccccccctpppnnnn,
* where the digit 0 is a version number, ff are flags, ccccccc is the CRC-32
* of the page, t is the tab size (usually 4 or 8; 0 for binary files that
* are sent in radix-64), ppp is the product number (usually 1, different
* for different books), and nnnn is the file number (sequential from 1).
*
* This is followed by " Page %u of " and the file name.
*/
typedef struct MungeState
{
EncodeFormat const * fmt;
EncodeFormat const * hFmt;
int binaryMode, tabWidth;
long origLineNumber;
long productNumber, fileNumber, pageNumber, lineNumber;
unsigned long fileOffset;
CRC pageCRC;
char const * fileName;
char const * fileNameTail;
char * pageBuffer; /* Buffer large enough to hold one page */
char * pagePos; /* Current position in pageBuffer */
word16 hdrFlags;
FILE * file;
FILE * out;
} MungeState;
void ChecksumLine(EncodeFormat const *fmt, char const *line, size_t length,
char *prefix, CRC *pageCRC)
{
CRC lineCRC;
CRC runCRCPart = 0;
lineCRC = CalculateCRC(fmt->lineCRC, 0, (byte const *)line, length);
if (pageCRC != NULL)
{
*pageCRC = CalculateCRC(fmt->pageCRC, *pageCRC,
(byte const *)line, length);
runCRCPart = RunningCRCFromPageCRC(fmt, *pageCRC);
}
prefix += EncodeCheckDigits(fmt, runCRCPart, fmt->runningCRCBits, prefix);
prefix += EncodeCheckDigits(fmt, lineCRC, fmt->lineCRC->bits, prefix);
*prefix++ = ' '; /* Write a space over the null byte */
}
/* Returns 1 for convenience */
int PrintFileError(MungeState *state, char const *message)
{
fprintf(stderr, "%s in %s %s %lu\n", message, state->fileName,
state->binaryMode ? "offset" : "line",
state->binaryMode ? state->fileOffset : state->origLineNumber);
return 1;
}
int MungeLine(MungeState *state, char *buffer, int length,
char *line, int *bufferUsed)
{
int i = 0, j = 0, jOld = 0;
char ch;
for (i = 0; i < length && j < LINE_LENGTH; i++)
{
jOld = j;
ch = buffer[i];
if (ch == '\t')
{
line[j++] = TAB_CHAR;
if (state->tabWidth < 1)
return PrintFileError(state,
"ERROR: Tab found in radix64 stream");
else
while (j % state->tabWidth && j < LINE_LENGTH)
line[j++] = TAB_PAD_CHAR;
}
else if (ch == '\n')
{
if (i + 1 < length)
return PrintFileError(state,
"UNEXPECTED ERROR: fgets read past newline!?");
break;
}
else if (ch == '\f')
{
break;
}
else if (ch == ' ' && (j <= 0 || line[j-1] == ' ' ||
line[j-1] == SPACE_CHAR ||
i+1 >= length || buffer[i+1] == '\n'))
{
line[j++] = SPACE_CHAR;
}
else if (ch >= ' ' && ch <= '~')
line[j++] = ch;
else
return PrintFileError(state, "ERROR: Non-ASCII char");
}
if (i < length && buffer[i] == '\n')
{
i++;
state->origLineNumber++;
}
else if (i < length && buffer[i] == '\f' && j < LINE_LENGTH)
{
i++;
line[j++] = FORMFEED_CHAR;
}
else
{
/* If there's no newline, we need to add the continuation marker */
if (i > 0 && j >= LINE_LENGTH)
{
/* Remove the last character if we're out of room */
i--;
j = jOld;
}
line[j++] = CONTIN_CHAR;
}
/* Strip trailing spaces */
while (j > 0 && isspace((unsigned char)line[j - 1]))
j--;
if (j > LINE_LENGTH) /* This should never happen */
return PrintFileError(state, "ERROR: Internal error, line too long");
/* Add trailing newline and NULL */
line[j++] = '\n';
line[j++] = '\0';
/* Return number of chars used from buffer */
*bufferUsed = i;
return 0;
}
static void
Encode3(byte const src[3], char dest[4])
{
dest[0] = radix64Digits[ (src[0]>>2 & 0x3f)];
dest[1] = radix64Digits[(src[0]<<4 & 0x30) | (src[1]>>4 & 0x0f)];
dest[2] = radix64Digits[(src[1]<<2 & 0x3c) | (src[2]>>6 & 0x03)];
dest[3] = radix64Digits[(src[2] & 0x3f)];
}
static int
EncodeLine(byte const *src, int srcLen, char *dest)
{
char * destp = dest;
byte tempSrc[3];
for (; srcLen >= 3; srcLen -= 3)
{
Encode3(src, destp);
src += 3; destp += 4;
}
if (srcLen > 0)
{
memset(tempSrc, 0, sizeof(tempSrc));
memcpy(tempSrc, src, srcLen);
Encode3(src, destp);
src += 3; destp += 4; srcLen -= 3;
while (srcLen < 0)
destp[srcLen++] = RADIX64_END_CHAR;
}
return destp - dest;
}
static int
MungeBinaryLine(MungeState *state, byte const *buffer, int length, char *line)
{
char binLine[128];
int binLength; /* Destination length */
int used;
binLength = EncodeLine(buffer, length, binLine);
/* Append newline */
binLine[binLength++] = '\n';
binLine[binLength] = '\0';
return MungeLine(state, binLine, binLength, line, &used);
}
int MaybePageBreak(MungeState *state)
{
EncodeFormat const * fmt = state->fmt;
EncodeFormat const * hFmt = state->hFmt;
if (state->lineNumber >= LINES_PER_PAGE)
{
char line[512];
char * lineData = line + PREFIX_LENGTH;
char * p = lineData;
p += EncodeCheckDigits(hFmt, 0, HDR_VERSION_BITS, p);
p += EncodeCheckDigits(hFmt, state->hdrFlags, HDR_FLAG_BITS, p);
p += EncodeCheckDigits(hFmt, state->pageCRC, fmt->pageCRC->bits, p);
p += EncodeCheckDigits(hFmt, state->tabWidth, HDR_TABWIDTH_BITS, p);
p += EncodeCheckDigits(hFmt, state->productNumber, HDR_PRODNUM_BITS, p);
p += EncodeCheckDigits(hFmt, state->fileNumber, HDR_FILENUM_BITS, p);
sprintf(p, " Page %ld of %s\n", state->pageNumber + 1,
state->fileNameTail);
if (strlen(lineData) > LINE_LENGTH + 1)
{
PrintFileError(state, "ERROR: Header line too long");
fprintf(stderr, "> %s", lineData);
return -1;
}
/* Compute checksums and prefix them to line */
ChecksumLine(fmt, lineData, strlen(lineData), line, NULL);
fprintf(state->out, "%c%c%s\n%s\f", HDR_PREFIX_CHAR,
fmt->headerTypeChar, line + 2, state->pageBuffer);
state->pageNumber++;
state->lineNumber = 0;
state->pageCRC = 0;
state->pagePos = state->pageBuffer; /* Clear page buffer */
}
return 0;
}
/*
* Search for Emacs "tab-width: " maker in file.
* Emacs is stricter about the format, but this will do.
*/
int FindTabWidth(MungeState *state)
{
char const * const tabWidthMarker = " tab-width: ";
char buffer[512];
char * p;
int length;
int tabWidth = 0;
fseek(state->file, -(sizeof(buffer) - 1), SEEK_END);
length = fread(buffer, 1, sizeof(buffer) - 1, state->file);
buffer[length] = '\0';
p = strstr(buffer, tabWidthMarker);
if (p != NULL)
{
p += strlen(tabWidthMarker);
while (*p != '\0' && *p != '\n' && isspace(*p))
p++;
tabWidth = strtol(p, &p, 10);
while (*p != '\0' && *p != '\n' && isspace(*p))
p++;
if (*p != '\n' || tabWidth < 2)
tabWidth = 0;
else if (tabWidth > 16)
fprintf(stderr, "WARNING: Weird tab-width (%d), %s\n",
tabWidth, state->fileName);
}
return tabWidth;
}
/*
* Open the given source file and send the munged output to the
* FILE *, with the given options.
*/
int MungeFile(char const *fileName, FILE *out, EncodeFormat const *fmt,
int binaryMode, int defaultTabWidth,
long productNumber, long fileNumber)
{
MungeState * state;
int length, used;
char line[PREFIX_LENGTH + LINE_LENGTH + 10];
char * lineData = line + PREFIX_LENGTH;
char buffer[128];
int result = 0;
state = (MungeState *)calloc(1, sizeof(*state));
state->fmt = fmt;
state->hFmt = &hexFormat;
state->origLineNumber = 1;
state->fileName = fileName;
state->pageCRC = 0;
state->productNumber = productNumber;
state->fileNumber = fileNumber;
state->pageNumber = 0;
state->lineNumber = 0;
state->fileOffset = 0;
state->binaryMode = binaryMode;
state->pageBuffer = malloc(PAGE_BUFFER_SIZE);
state->pageBuffer[0] = '\0';
state->pagePos = state->pageBuffer;
state->hdrFlags = 0;
state->out = out;
state->fileNameTail = strrchr(state->fileName, '/');
if (state->fileNameTail == NULL)
state->fileNameTail = state->fileName;
else
state->fileNameTail++;
state->file = fopen(state->fileName, binaryMode ? "rb" : "r");
if (state->file == NULL)
{
result = errno;
fprintf(stderr, "ERROR opening %s: %s\n",
state->fileName, strerror(result));
goto error;
}
if (state->binaryMode)
{
state->tabWidth = 0;
}
else
{
state->tabWidth = FindTabWidth(state);
if (state->tabWidth == 0)
state->tabWidth = defaultTabWidth;
rewind(state->file);
}
while (!feof(state->file))
{
if (state->binaryMode)
{
length = fread(buffer, 1, BYTES_PER_LINE, state->file);
if (length < 1)
{
if (feof(state->file))
break;
goto fileError;
}
if ((result = MaybePageBreak(state)))
goto error;
if ((result = MungeBinaryLine(state, buffer, length, lineData)))
goto error;
state->fileOffset += length;
}
else
{
if (fgets(buffer, sizeof(buffer), state->file) == NULL)
{
if (feof(state->file))
break;
goto fileError;
}
length = strlen(buffer);
if ((result = MaybePageBreak(state)))
goto error;
if ((result = MungeLine(state, buffer, length, lineData, &used)))
goto error;
if (used < length)
if (fseek(state->file, used - length, SEEK_CUR))
goto fileError;
}
/* Compute checksums and prefix them to the line */
ChecksumLine(fmt, lineData, strlen(lineData), line, &state->pageCRC);
strcpy(state->pagePos, line);
length = strlen(state->pagePos);
/* Suppress trailing whitespace on blank lines */
if (length == PREFIX_LENGTH+1 && state->pagePos[length-1] == '\n') {
state->pagePos[--length-1] = '\n';
state->pagePos[length] = '\0';
}
state->pagePos += length;
state->lineNumber++;
}
if (state->lineNumber > 0)
{
/* Force a final page break */
state->lineNumber = LINES_PER_PAGE;
state->hdrFlags |= HDR_FLAG_LASTPAGE;
if ((result = MaybePageBreak(state)))
goto error;
}
result = 0;
goto done;
fileError:
result = ferror(state->file);
error:
done:
if (state != NULL)
{
if (state->file != NULL)
fclose(state->file);
free(state);
}
return result;
}
int main(int argc, char *argv[])
{
int result = 0;
int i, j;
int defaultTabWidth = 4;
int binaryMode = 0;
long productNumber = 1;
long fileNumber = 1;
char * endOfNumber;
EncodeFormat const * fmt = NULL;
InitUtil();
for (i = 1; i < argc && argv[i][0] == '-'; i++)
{
if (0 == strcmp(argv[i], "--"))
{
i++;
break;
}
for (j = 1; argv[i][j] != '\0'; j++)
{
if (isdigit(argv[i][j]))
{
defaultTabWidth = argv[i][j] - '0';
if (defaultTabWidth < 2 || defaultTabWidth > 9)
fprintf(stderr, "WARNING: Weird default tab-width (%d)\n",
defaultTabWidth);
}
else if (argv[i][j] == 'b')
{
binaryMode = 1;
}
else if (argv[i][j] == 'F')
{
fmt = FindFormat(argv[i][j+1]);
if (!fmt || argv[i][j+2] != '\0')
{
fprintf(stderr, "ERROR: Invalid format char\n");
exit(1);
}
break;
}
else if (argv[i][j] == 'p')
{
productNumber = strtol(&argv[i][j+1], &endOfNumber, 10);
if (*endOfNumber != '\0')
{
fprintf(stderr, "ERROR: Invalid product number\n");
exit(1);
}
break;
}
else if (argv[i][j] == 'f')
{
fileNumber = strtol(&argv[i][j+1], &endOfNumber, 10);
if (*endOfNumber != '\0')
{
fprintf(stderr, "ERROR: Invalid file number\n");
exit(1);
}
break;
}
else
{
fprintf(stderr, "ERROR: Unrecognized option -%c\n", argv[i][j]);
exit(1);
}
}
}
if (!fmt)
fmt = binaryMode ? &radix64Format : &hexFormat;
for (; i < argc; i++)
{
if ((result = MungeFile(argv[i], stdout, fmt, binaryMode,
defaultTabWidth, productNumber,
fileNumber)) != 0)
{
/* If result > 0, message should have already been printed */
if (result < 0)
fprintf(stderr, "ERROR: %s\n", strerror(result));
exit(1);
}
fileNumber++;
}
return 0;
}
/*
* Local Variables:
* tab-width: 4
* End:
* vi: ts=4 sw=4
* vim: si
*/

324
tools/psgen Normal file
View File

@ -0,0 +1,324 @@
#!/usr/bin/perl
#
# psgen -- Postscript generator for code portion of source books
#
# Reads in a list of files/dirs from <filelist>, runs munge on each of
# them, and generates a single postscript file to stdout. The page numbers
# for each file/dir are put into the file <pagenums>.
#
# usage: psgen [ options... ] <filelist> <pagenums> <volume #> > foo.ps
# -l<firstLogicalPage>
# -p<firstPhysicalPage>
# -f<font>
# -D<defs> (passed to yapp)
# -P<productNumber>
# -o<mungedOutFile>
# -e (auto edit errors)
#
# $Id: psgen,v 1.18 1997/11/13 21:44:16 colin Exp $
#
$bookRoot = $ENV{"BOOKROOT"} || ".";
$toolsDir = "$bookRoot/tools";
$psDir = "$bookRoot/ps";
$editor = $ENV{"EDITOR"} || "vi";
# Configuration settings - external file names
$mungeProg = "$toolsDir/munge";
$yappProg = "$toolsDir/yapp";
$preambleFile = "$psDir/prolog.ps";
$tempFile = "/tmp/psgen-$$";
# Parse arguments
$firstLogPage = $firstPhysPage = 0;
$productNumber = 1;
$font = "OCRB";
$autoEdit = 0;
while ($#ARGV >= 0 && $ARGV[0] =~ /^-/)
{
$_ = shift @ARGV;
if (/^--$/)
{
last;
}
elsif (/^-l(\d+)$/)
{
$firstLogPage = $1;
}
elsif (/^-p(\d+)$/)
{
$firstPhysPage = $1;
}
elsif (/^-f(.+)$/)
{
$font = $1;
}
elsif (/^-D(.+)$/)
{
$yappDefs .= " " . $_;
}
elsif (/^-P(\d+)$/)
{
$productNumber = $1;
}
elsif (/^-o(.+)$/)
{
$mungedOutFile = $1;
}
elsif (/^-e$/)
{
$autoEdit = 1;
}
else
{
&Error("Unrecognized option: '$_'");
}
}
$fileListFile = shift @ARGV || die "Missing file list argument (arg 1)";
$pageNumFile = shift @ARGV || die "Missing page number file argument (arg 2)";
$volume = shift @ARGV || die "Missing volume number argument (arg 3)";
# Determine initial page numbers
{
my $nextLogPage = 1;
my $nextPhysPage = 3;
my $volNum = 0; # Which volume's page numbers we're reading
if ($volume > 1)
{
open(OLDPAGENUMS, "<$pageNumFile") || die;
while (<OLDPAGENUMS>)
{
if (/^Volume\s+(\d+)$/)
{
$volNum = $1;
}
elsif (/^Next:\s+(\d+)\s*$/ && $volNum == $volume - 1)
{
$nextLogPage = $1;
}
}
close(OLDPAGENUMS);
}
else
{
unlink($pageNumFile);
}
$firstLogPage = $nextLogPage if ($firstLogPage == 0);
$firstPhysPage = $nextPhysPage if ($firstPhysPage == 0);
}
# Names of PostScript operators invoked. These are the interface
# between this file and the $preambleFile.
$oddPageStartPS = "OddPageStart";
$evenPageStartPS = "EvenPageStart";
$oddPageEndPS = "OddPageEnd";
$evenPageEndPS = "EvenPageEnd";
$dirPagePS = "DirPage";
# This is short because it's emitted every line
$linePS = "L";
# Handle an error from munge.
# A result of 0 means to retry, 1 means to exit
sub MungeError
{
my $result = 1;
open(FILEH, "<$tempFile") || die;
while (<FILEH>)
{
print STDERR;
if (/ in (.*) line (\d+)$/)
{
my ($fileName, $lineNumber) = ($1, $2);
if ($autoEdit)
{
my @statResult = stat($fileName);
my $oldMTime = $statResult[9];
system("'$editor' '+$lineNumber' '$fileName' 1>&2");
@statResult = stat($fileName);
$result = ($statResult[9] == $oldMTime);
last;
}
}
}
close(FILEH);
unlink($tempFile) || die "Couldn't unlink $tempFile";
return $result;
}
sub CopyFileToPS
{
local $fileName = $_[0];
local $args = "'-I$psDir' '-Dfont=$font'";
local $_;
$args .= $yappDefs;
open(FILEH, "$yappProg $args '$fileName' |") || die;
while (<FILEH>)
{
print PSOUT $_;
}
close(FILEH) || exit(1);
1;
}
# Wrap a string in parens as required by PostScript, with proper quoting.
sub StringPS
{
local $str = $_[0];
$str =~ s/([\\()])/\\$1/g;
"(" . $str . ")";
}
# Emit a start of page. The Postscript DSC %%Page: header
# (followed by logical page number, then physical) and
# the top-of-page function (which is passed the page number as a string)
sub PageStartPS
{
local $pageNum = $_[0];
"%%Page: " . ($pageNum + $firstLogPage) . " " .
($pageNum + $firstPhysPage) . "\n" .
&StringPS($pageNum + $firstLogPage) .
((($pageNum + $firstLogPage) % 2) ? $oddPageStartPS
: $evenPageStartPS) . "\n";
}
sub PageEndPS
{
local $pageNum = $_[0];
((($pageNum + $firstLogPage) % 2) ? $oddPageEndPS : $evenPageEndPS) . "\n";
}
# Save the page number to a table-of-contents file
sub SavePageNum
{
local ($fileName, $pageNum) = @_;
print PAGENUMS ($pageNum + $firstLogPage), ": $fileName\n";
}
# The main code.
open(PSOUT, ">-") || die;
open(FILELIST, "<$fileListFile") || die;
open(PAGENUMS, ">>$pageNumFile") || die;
if ($mungedOutFile ne "")
{
open(MUNGEDOUT, ">$mungedOutFile") || die;
}
print PAGENUMS "Volume $volume\n";
&CopyFileToPS($preambleFile);
$fileNumber = 0;
$pageNum = 0; # This is 0-based, since it is added to $first{Log,Phys}Page
$enable = 0;
while (<FILELIST>)
{
/^([VDTB])(\S*)\s+(.*)/ || die "Illegal file list line $.";
local ($fileType, $options, $arg) = ($1, $2, $3);
if ($fileType eq "V")
{
@args = split(/\s+/, $arg);
if ($enable = ($args[0] == $volume))
{
$defaultTabWidth = int($args[1]);
}
}
elsif ($fileType eq "D")
{
next unless $enable; # Do nothing if we're in the wrong volume
$dirName = $arg;
&SavePageNum($dirName, $pageNum);
print PSOUT &PageStartPS($pageNum);
print PSOUT &StringPS($dirName), $dirPagePS, "\n";
print PSOUT &PageEndPS($pageNum);
$pageNum++;
}
else
{
my $done = 0;
$fileNumber++;
$fileName = $arg;
next unless $enable; # Do nothing if we're in the wrong volume
&SavePageNum($fileName, $pageNum);
$quotedFileName = $fileName;
$quotedFileName =~ s/'/\\'/g;
$tabWidth = ($options =~ /(\d)/) ? $1 : $defaultTabWidth;
$args = ($fileType eq "B") ? "-b" : "";
$args .= " -$tabWidth -p$productNumber -f$fileNumber";
while (!$done)
{
if (open(FILE, "$mungeProg $args '$quotedFileName' 2>$tempFile |"))
{
$line = <FILE>;
print MUNGEDOUT $line;
while ($line ne "")
{
print PSOUT &PageStartPS($pageNum);
while ($line ne "" and $line !~ /^\f/)
{
chop $line;
print PSOUT &StringPS($line), $linePS, "\n";
$line = <FILE>;
print MUNGEDOUT $line;
}
$line =~ s/^\f//;
print PSOUT &PageEndPS($pageNum);
$pageNum++;
}
if (close(FILE))
{
$done = 2;
}
else
{
$done = &MungeError();
}
}
else
{
$done = &MungeError();
}
}
if ($done == 1)
{
die;
}
}
}
# Print PostScript DSC trailer with the correct number of pages
print PSOUT "%%Trailer\n%%Pages: ", $pageNum, "\n%%EOF\n";
print PAGENUMS "Pages: ", $pageNum, "\n";
print PAGENUMS "Next: ", ((($pageNum+1) & ~1) + $firstLogPage), "\n";
close(PAGENUMS) || die;
close(FILELIST) || die;
close(PSOUT) || die;
if ($mungedOutFile ne "")
{
close(MUNGEDOUT) || die;
}
#
# vi: ai ts=4
# vim: si
#

1851
tools/repair.c Normal file

File diff suppressed because it is too large Load Diff

185
tools/sortpages Normal file
View File

@ -0,0 +1,185 @@
#!/usr/bin/perl
#
# $Id: sortpages,v 1.8 1997/12/11 19:20:58 mhw Exp $
#
@fileNameFromNumber = ();
@pagesFound = ();
$theProductNumber = 0;
for $fileIndex (0..$#ARGV)
{
$fileName = $ARGV[$fileIndex];
open(FILE, "<$fileName") || die;
while (!eof(FILE))
{
$filePos = tell(FILE);
$_ = <FILE>;
if (/^\f?-\S/)
{
my ($versionHex, $flagsHex, $pageCRCHex, $tabWidthHex,
$productNumberHex, $fileNumberHex, $pageNumber, $name)
= (/^\f?-\S\S{4}\ # CRC followed by a space
([0-9a-f]) # Format version
([0-9a-f]{2}) # Flags
([0-9a-f]{8}) # Running CRC32
([0-9a-f]) # Tab width (0 means radix64)
([0-9a-f]{3}) # Product number
([0-9a-f]{4}) # File number
\ Page\ (\d+)\ of\ (.*)/x);
my $version = hex($versionHex);
my $flags = hex($flagsHex);
my $productNumber = hex($productNumberHex);
my $fileNumber = hex($fileNumberHex);
unless ($version == 0 && $productNumber > 0
&& $fileNumber > 0 && $pageNumber > 0
&& $name ne "")
{
print STDERR "ERROR: Invalid header info ",
"at $fileName line $.\n";
exit(1);
}
if (!defined($fileNameFromNumber[$fileNumber]))
{
$fileNameFromNumber[$fileNumber] = $name;
}
elsif ($fileNameFromNumber[$fileNumber] ne $name)
{
print STDERR "ERROR: Mismatched filename ",
"at $fileName line $.\n";
exit(1);
}
if (!$theProductNumber)
{
$theProductNumber = $productNumber;
}
elsif ($theProductNumber != $productNumber)
{
print STDERR "ERROR: Different product number ",
"at $fileName line $.\n";
exit(1);
}
push @pagesFound, (sprintf "%5d:%4d:%d:%d:%d",
$fileNumber, $pageNumber, $flags, $fileIndex, $filePos);
}
}
close(FILE) || die;
}
@pagesFound = sort @pagesFound;
$result = 0;
$lastFileNumber = 0;
$lastPageNumber = 0;
$nextFileNumber = 1;
$nextPageNumber = 1;
$fileIndexOpen = -1;
foreach (@pagesFound)
{
my ($fileNumber, $pageNumber, $flags, $fileIndex, $filePos) = split /:/;
$fileNumber = int($fileNumber);
$pageNumber = int($pageNumber);
if ($fileNumber == $lastFileNumber && $pageNumber == $lastPageNumber)
{
print STDERR "DUPLICATE: File $fileNumber, page $pageNumber, skipped\n";
next;
}
if ($nextFileNumber < $fileNumber && $nextPageNumber != 1)
{
print STDERR "MISSING: File $nextFileNumber, ",
"pages $nextPageNumber - END\n";
$nextPageNumber = 1;
$nextFileNumber++;
$result = 1;
}
if ($nextFileNumber < $fileNumber)
{
print STDERR "MISSING: Files $nextFileNumber - ",
$fileNumber-1, "\n";
$nextFileNumber = $fileNumber;
$nextPageNumber = 1;
$result = 1;
}
if ($nextFileNumber != $fileNumber)
{
print STDERR "ERROR: Internal error, unexpected fileNumber\n";
exit(1);
}
if ($nextPageNumber < $pageNumber)
{
print STDERR "MISSING: File $fileNumber, pages $nextPageNumber - ",
$pageNumber-1, "\n";
$nextPageNumber = $pageNumber;
$result = 1;
}
if ($nextPageNumber != $pageNumber)
{
print STDERR "ERROR: Internal error, unexpected pageNumber\n";
exit(1);
}
if ($fileIndexOpen != $fileIndex)
{
if ($fileIndexOpen >= 0)
{
close(FILE) || die;
$fileIndexOpen = -1;
}
$fileName = $ARGV[$fileIndex];
open(FILE, "<$fileName") || die;
$fileIndexOpen = $fileIndex;
}
seek(FILE, $filePos, 0) || die($!);
$_ = <FILE>;
print;
while (<FILE>)
{
last if /^\f?-\S/;
print;
}
$lastFileNumber = $fileNumber;
$lastPageNumber = $pageNumber;
if ($flags & 1) # Bit 0 of flags indicates last page of file
{
$nextFileNumber++;
$nextPageNumber = 1;
}
else
{
$nextPageNumber++;
}
}
if ($nextPageNumber != 1)
{
print STDERR "MISSING: File $nextFileNumber, ",
"pages $nextPageNumber - END\n";
$nextPageNumber = 1;
$nextFileNumber++;
$result = 1;
}
print STDERR "Highest file number encountered: ", $nextFileNumber - 1, "\n";
if ($fileIndexOpen >= 0)
{
close(FILE) || die;
$fileIndexOpen = -1;
}
exit($result);
#
# vi: ai ts=4
# vim: si
#

222
tools/subst.c Normal file
View File

@ -0,0 +1,222 @@
/*
* subst.c -- Repair substitution tables
*
* Copyright (C) 1997 Pretty Good Privacy, Inc.
*
* Written by Colin Plumb
*
* $Id: subst.c,v 1.14 1997/11/03 22:12:00 colin Exp $
*
* IT IS EXPECTED that users of this program will play with these tables
* and the cost values in the subst.h header. (Some day, they'll all
* get moved to an external config file.)
*
* NOTE: Other cost are hiding in the Filter functions in repair.c.
* Remember to keep them all on the same scale.
*/
/*
* The repair program copies its input to its output, making various
* substitutions, until it manages to produce a version that satisfies
* the parser. This includes having a correct CRC for each line.
* Each substitution has a cost, and the combinations are tried in order
* of increasing cost. NOTE that even translating "A"->"A" counts as
* a substitution, although it may have zero cost.
*
* The intention is to correct transcription errors, where the
* errors have a distinctly non-uniform distribution. Slight
* differences in cost produce a preference in trying some errors
* first. If an error costs half as much as another, combinations
* of two of that error will be compared to one of the more expensive.
* Too many cheap substitutions will result is repair spending
* a very log time searching before considering the more expensive
* substitutions.
*
* The following parameters and the raw substitution tables are expected
* to be edited by the user based on experience. Eventually, this
* will be moved into an external config file, but for now it's a matter
* of recompiling.
*/
#include "subst.h"
#include "util.h"
/* what the OCR software reports for "unrecognizable */
#define UNRECOG_STRING "~\274"
/*
* The input substitutions to make (one-to-one). These are listed in
* the order of correction. i.e. uncorrected input first, then corrected
* output. Substitutions are one-way; to get two-way, list it twice.
*/
struct RawSubst const substSingles[] = {
/* Identity substitutions - note that period (.) is excluded */
{ "!\"#$%&'()*+,-./0123456789:;<=>?" SPACE_STRING,
"!\"#$%&'()*+,-./0123456789:;<=>?" SPACE_STRING, 0, 0, NULL },
{ "@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_\t" TAB_STRING,
"@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_\t" TAB_STRING, 0, 0, NULL },
{ "`abcdefghijklmnopqrstuvwxyz{|}~\f" FORMFEED_STRING,
"`abcdefghijklmnopqrstuvwxyz{|}~\f" FORMFEED_STRING, 0, 0, NULL },
#if (TAB_PAD_CHAR & 128) /* Not already included? */
{ TAB_PAD_STRING, TAB_PAD_STRING, 0, NULL },
#endif
{ "\r\n" CONTIN_STRING, "\n\n" CONTIN_STRING, 0, 0, NULL },
/* Occasionally these just get inserted as glitches */
{ ".,'`", NULL, 5, 10, FilterNearBlanks },
/* This is now pretty infrequent */
{ "-_", "_-", 0, 10, FilterAfterRepeat },
/*
* Capitalization errors are common in some cases
* c/C, s/S, u/U are fucked up all the time.
* Also o/O, v/V and w/W. x, y and z also give some problems.
*/
{ "cilmopsuvwxyz", "CILMOPSUVWXYZ", 7, 13, FilterNearLower },
{ "CILMOPSUVWXYZ", "cilmopsuvwxyz", 7, 13, FilterNearUpper },
/* Other errors */
{ "g9aaiji;xX00Si", "9gg2ji;i%%oO3f", 10, 0, NULL },
/* This seems to happen a lot */
{ "c", "r", 9, 0, NULL },
{ "j", ";", 9, 0, NULL },
{ "' ", "``", 10, 0, NULL },
/* Uncommon errors */
/* Wierd stuff that's happened in the checksum part */
/* A highish weight is okay here */
{ "sSEdJl", "554437", 15, 0, NULL },
{ "LESsPZ", "bb8a22", 15, 0, NULL },
/* Wierd stuff that has happened */
{ "BasAeaeRoooo", "3334a@QQpqbd", 5, 15, FilterIsBinary },
{ "oooo", "pqbd", 0, 15, FilterIsBinary },
{ "ttTCCflO", "iff{[lfG", 12, 0, NULL },
#if 0
/* If the line-breaks get screwed up, use these */
{ " ", "\n", 10, COST_INFINITY, FilterChecksumFollows },
{ "\n", " ", COST_INFINITY, 10, FilterChecksumFollows },
{ "\n", NULL, COST_INFINITY , 11, FilterChecksumFollows },
#endif
{ NULL, NULL, 0, 0, NULL }
};
/* The many-to-many substitutions */
struct RawSubst const substMultiples[] = {
{ "''", "\"", 2, 0, NULL },
{ "``", "\"", 2, 0, NULL },
{ ",'", "\"", 2, 0, NULL },
{ "',", "\"", 2, 0, NULL },
{ ",,", "\"", 2, 0, NULL },
/* Extra inserted spaces are common */
{ " ", " ", COST_INFINITY, 0, FilterFollowsSpace },
{ " ", "", 0, 15, FilterFollowsSpace },
{ "\t", " ", COST_INFINITY, 0, FilterFollowsSpace },
{ "\t", "", 0, 10, FilterFollowsSpace },
/* Convert between SPACE_CHAR dots and periods */
{ ".", SPACE_STRING, 1, COST_INFINITY, FilterFollowsSpace },
{ ".", " "SPACE_STRING, COST_INFINITY, 10, FilterFollowsSpace },
{ SPACE_STRING, ".", 15, 5, FilterFollowsSpace },
{ SPACE_STRING, " "SPACE_STRING, COST_INFINITY, 5, FilterFollowsSpace },
/* Replace "unknown" by zero - it often is */
{ UNRECOG_STRING, "0", 1, 0, NULL },
{ UNRECOG_STRING, "_", 2, 0, NULL },
{ UNRECOG_STRING, ")", 3, 0, NULL },
{ UNRECOG_STRING, "^", 4, 0, NULL },
/* Except that these glitches are common */
{ UNRECOG_STRING"'", "\\\"", 0, 0, NULL },
{ UNRECOG_STRING"'", "\"", 1, 0, NULL },
{ "'"UNRECOG_STRING, "\"", 0, 0, NULL },
{ UNRECOG_STRING UNRECOG_STRING , "\"", 0, 0, NULL },
/* Something else that has been seen */
{ "V'", "\\\"", 5, 0, NULL },
/* A common transposition */
{ "\"'", "'\"", 5, 0, NULL },
{ "'\"", "\"'", 5, 0, NULL },
/* These also happen fairly often */
{ " \"", "''", 5, 0, NULL },
{ "\" ", "''", 5, 0, NULL },
/* Common glitches */
{ "\t.\n", "\n", 5, 0, NULL },
{ "\t,\n", "\n", 5, 0, NULL },
{ "\t-\n", "\n", 5, 0, NULL },
{ "\t_\n", "\n", 5, 0, NULL },
{ "\t'\n", "\n", 5, 0, NULL },
{ "\t`\n", "\n", 5, 0, NULL },
{ "\t~\n", "\n", 5, 0, NULL },
{ "\t:\n", "\n", 5, 0, NULL },
{ "\t"SPACE_STRING"\n", "\n", 5, 0, NULL },
/* Less common */
{ " .\n", "\n", 10, 0, NULL },
{ " ,\n", "\n", 10, 0, NULL },
{ " -\n", "\n", 10, 0, NULL },
{ " _\n", "\n", 10, 0, NULL },
{ " '\n", "\n", 10, 0, NULL },
{ " `\n", "\n", 10, 0, NULL },
{ " ~\n", "\n", 10, 0, NULL },
{ " :\n", "\n", 10, 0, NULL },
{ " "SPACE_STRING"\n", "\n", 10, 0, NULL },
/* Even less common */
{ ".\n", "\n", 15, 0, NULL },
{ ",\n", "\n", 15, 0, NULL },
{ "-\n", "\n", 15, 0, NULL },
{ "_\n", "\n", 15, 0, NULL },
{ "'\n", "\n", 15, 0, NULL },
{ "`\n", "\n", 15, 0, NULL },
{ "~\n", "\n", 15, 0, NULL },
{ ":\n", "\n", 15, 0, NULL },
{ SPACE_STRING"\n", "\n", 15, 0, NULL },
/* Wierd stuff that has happened */
{ "lJ", "U", 10, 0, NULL },
{ "ll", "U", 10, 0, NULL },
{ "l1", "U", 10, 0, NULL },
{ "il", "U", 10, 0, NULL }, /* Fairly common, actually */
{ "li", "U", 10, 0, NULL },
{ "l)", "U", 10, 0, NULL },
{ "Ll", "U", 10, 0, NULL },
{ "LI", "U", 10, 0, NULL },
{ "L1", "U", 10, 0, NULL },
{ "lo", "b", 10, 0, NULL },
{ "cl", "d", 10, 0, NULL },
{ "cliff", "diff", 2, 0, NULL },
{ "*\n", "*/\n", 10, 0, NULL },
/* That big black block has odd things happen to it */
{ "d", CONTIN_STRING, 10, 0, NULL },
{ "d\n", CONTIN_STRING"\n", 3, 0, NULL },
{ "S", CONTIN_STRING, 10, 0, NULL },
{ "S\n", CONTIN_STRING"\n", 3, 0, NULL },
/* Tab-stop wonders */
{ TAB_STRING, TAB_STRING"", 0, 0, TabFilter },
{ TAB_STRING, TAB_STRING" ", 0, 0, TabFilter },
{ TAB_STRING, TAB_STRING" ", 0, 0, TabFilter },
{ TAB_STRING, TAB_STRING" ", 0, 0, TabFilter },
{ TAB_STRING, TAB_STRING" ", 0, 0, TabFilter },
{ TAB_STRING, TAB_STRING" ", 0, 0, TabFilter },
{ TAB_STRING, TAB_STRING" ", 0, 0, TabFilter },
{ TAB_STRING, TAB_STRING" ", 0, 0, TabFilter },
/* Some scan errors */
{ "D ", TAB_STRING"", 1, 5, TabFilter },
{ "D ", TAB_STRING" ", 1, 5, TabFilter },
{ "D ", TAB_STRING" ", 1, 5, TabFilter },
{ "D ", TAB_STRING" ", 1, 5, TabFilter },
{ "D ", TAB_STRING" ", 1, 5, TabFilter },
{ "D ", TAB_STRING" ", 1, 5, TabFilter },
{ "D ", TAB_STRING" ", 1, 5, TabFilter },
{ "D ", TAB_STRING" ", 1, 5, TabFilter },
#if TAB_PAD_CHAR != ' '
#error Fix those tab patterns!
#endif
{ NULL, NULL, 0, 0, NULL }
};

66
tools/subst.h Normal file
View File

@ -0,0 +1,66 @@
/*
* subst.h -- Header for repair substitutions
*
* Copyright (C) 1997 Pretty Good Privacy, Inc.
*
* Written by Colin Plumb
*
* $Id: subst.h,v 1.9 1997/11/03 22:12:00 colin Exp $
*/
/*
* Give up if the list of pending changes to attempt grows to this many
* elements. Each element is 32 bytes, so 128K is 8 MB of memory.
* (Other than this, repair's memory usage is fairly modest.)
*/
#define MAX_HEAP (1<<17)
/*
* There is a hack in the code to find a single substitution that will fix a
* line, even if it's not in the tables. It gets added to the tables "on
* probation", with an infinite cost, and if it leads to a successful
* correction of the entire page, is "learned" for future use and its
* cost reduced to something finite.
* (This is not remembered across runs of the program, though.
* Edit the tables in the source to fix it.)
*/
#define DYNAMIC_COST_LEARNED 15
/*
* This negative-cost bonus for passing the end of a line with the right
* CRC makes the search engine reluctant to backtrack past a correct CRC,
* greatly improving efficiency. It's rather a hack, though. Think of
* this in terms of "how many errors should be considered in the current
* line before considering the possibility of errors in the previous line?"
*
* This bonus is halved for lines that are the result of a correction
* that was computed from the checksum, since a correct checksum is
* much less significant in such a case.
*/
#define COST_LINE -30
/* The cost of a full-line nastyline substitution. */
#define NASTY_COST 5
/* Type describing filter functions used in substitutions */
struct ParseNode;
struct Substitution;
#include "heap.h"
typedef HeapCost FilterFunc(struct ParseNode *parent, char const *limit,
struct Substitution const *subst);
FilterFunc TabFilter, FilterFollowsSpace, FilterNearBlanks;
FilterFunc FilterNearUpper, FilterNearLower, FilterNearXDigit;
FilterFunc FilterAfterRepeat, FilterCharConst, FilterChecksumFollows;
FilterFunc FilterLikelyUnderscore, FilterIsDynamic, FilterIsBinary;
/* The external substitution format */
typedef struct RawSubst {
char const *input;
char const *output;
HeapCost cost, cost2;
FilterFunc *filter;
} RawSubst;
/* The substitutions to make */
extern struct RawSubst const substSingles[];
extern struct RawSubst const substMultiples[];

666
tools/unmunge.c Normal file
View File

@ -0,0 +1,666 @@
/*
* unmunge.c -- Program to convert a munged file to original form
*
* Copyright (C) 1997 Pretty Good Privacy, Inc.
*
* Designed by Colin Plumb, Mark H. Weaver, and Philip R. Zimmermann
* Written by Mark H. Weaver
*
* $Id: unmunge.c,v 1.13 1997/11/13 23:27:08 mhw Exp $
*/
#include <sys/stat.h>
#include <sys/types.h>
#include <fcntl.h>
#include <unistd.h>
/*#include <direct.h> teun: MS VC wants direct.h for mkdir */
#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <ctype.h>
#include <stdlib.h>
#include <assert.h>
#include "util.h"
typedef struct UnMungeState
{
char const * mungedFileName;
char dirName[128];
char fileName[128];
char * fileNameTail;
int binaryMode, tabWidth;
long productNumber, fileNumber, pageNumber, lineNumber;
long manifestLineNumber;
word16 hdrFlags;
CRC pageCRC, seenPageCRC;
FILE * manifest;
FILE * file;
FILE * out;
} UnMungeState;
/* Returns number of characters decoded, or -1 on error */
static int
Decode4(char const src[4], byte dest[3])
{
int i, length;
byte srcVal[4];
for (i = 0; i < 4 && src[i] != RADIX64_END_CHAR; i++)
if ((srcVal[i] = Radix64DigitValue(src[i])) == (byte) -1)
return 1;
length = i - 1;
if (length < 1)
return -1;
for (; i < 4; i++)
srcVal[0] = 0;
dest[0] = (srcVal[0] << 2) | (srcVal[1] >> 4);
dest[1] = (srcVal[1] << 4) | (srcVal[2] >> 2);
dest[2] = (srcVal[2] << 6) | (srcVal[3]);
return length;
}
/*
* Return number of characters decoded, or -1 on error
*/
static int
DecodeLine(char const *src, char *dest, int srclength)
{
int destlength = 0;
int result;
if (srclength % 4 || !srclength)
return -1; /* Must be a multiple of 4 */
while (srclength -= 4) {
if (Decode4(src, dest + destlength) != 3)
return -1;
src += 4;
destlength += 3;
}
result = Decode4(src, dest + destlength);
if (result < 1)
return -1;
return destlength + result;
}
int PrintFileError(UnMungeState *state, char const *message)
{
fprintf(stderr, "%s, %s line %ld\n", message,
state->mungedFileName, state->lineNumber);
return 1;
}
int ReadManifest(UnMungeState *state, long fileNumberWanted,
char const *fileTailPrefix, long prefixLen)
{
long fileNumber = 0;
long firstMissingFileNum = 0, lastMissingFileNum = 0;
char buffer[512];
char * p;
if (state->manifest == NULL)
{
if (fileNumberWanted != 0)
{
assert(fileTailPrefix != NULL);
strncpy(state->fileName, fileTailPrefix, sizeof(state->fileName));
state->fileName[sizeof(state->fileName) - 1] = '\0';
state->fileNameTail = state->fileName;
}
return 0;
}
while (fgets(buffer, sizeof(buffer), state->manifest))
{
if ((p = strchr(buffer, '\n')) != NULL)
*p = '\0';
state->manifestLineNumber++;
if (buffer[0] == 'D')
{
if (buffer[1] != ' ')
goto invalidManifest;
strncpy(state->dirName, buffer + 2, sizeof(state->dirName));
if (state->dirName[sizeof(state->dirName) - 1] != '\0')
goto invalidManifest;
}
else
{
fileNumber = strtol(buffer, &p, 10);
if (p == buffer || *p != ' ')
goto invalidManifest;
p++;
if (fileNumberWanted == 0 || fileNumber < fileNumberWanted)
{
if (firstMissingFileNum == 0)
firstMissingFileNum = fileNumber;
lastMissingFileNum = fileNumber;
continue;
}
else if (fileNumber > fileNumberWanted)
break;
else
{
size_t len;
len = strlen(state->dirName);
assert(sizeof(state->fileName) >= sizeof(state->dirName));
memcpy(state->fileName, state->dirName, len);
strncpy(state->fileName + len, p,
sizeof(state->fileName) - len);
if (strncmp(p, fileTailPrefix, prefixLen) != 0)
{
fprintf(stderr, "Mismatched filename, headers say '%s',\n"
" manifest says '%s'\n",
fileTailPrefix, p);
return 1;
}
p = state->dirName;
while ((p = strchr(p, '/')) != NULL)
{
*p = '\0';
mkdir(state->dirName, 0777);
*p++ = '/';
}
state->fileNameTail = state->fileName + len;
break;
}
}
}
if (firstMissingFileNum != 0)
{
fprintf(stderr, "Missing files %ld-%ld\n",
firstMissingFileNum, lastMissingFileNum);
}
if (fileNumberWanted != 0 && fileNumber != fileNumberWanted)
{
fprintf(stderr, "Can't find file %ld in manifest file\n",
fileNumberWanted);
return 1;
}
return 0;
invalidManifest:
fprintf(stderr, "Error parsing manifest file, line %ld\n",
state->manifestLineNumber);
return 1;
}
int UnMungeFile(char const *mungedFileName, char const *manifestFileName,
int forceOverwrite, int forcePartialFiles)
{
UnMungeState * state;
EncodeFormat const * fmt = NULL;
char buffer[512];
char outbuf[BYTES_PER_LINE+1];
char * line;
char * lineData;
char * p;
int length;
int result = 0;
int skipPage = 0;
CRC lineCRC;
word32 num;
state = (UnMungeState *)calloc(1, sizeof(*state));
state->mungedFileName = mungedFileName;
if (manifestFileName != NULL)
{
if ((state->manifest = fopen(manifestFileName, "r")) == NULL)
goto errnoError;
}
if ((state->file = fopen(state->mungedFileName, "r")) == NULL)
goto errnoError;
while (!feof(state->file))
{
if (fgets(buffer, sizeof(buffer), state->file) == NULL)
{
if (feof(state->file))
break;
goto fileError;
}
state->lineNumber++;
line = buffer;
/* Strip leading whitespace */
while (isspace(*line))
line++;
if (*line == '\0')
continue;
/* Strip trailing whitespace */
p = line + strlen(line);
while (p > line && (byte)p[-1] < 128 && isspace(p[-1]))
p--;
lineData = line + PREFIX_LENGTH;
/* Pad up to at least PREFIX_LENGTH */
while (p < lineData)
*p++ = ' ';
*p++ = '\n';
*p = '\0';
length = p - lineData;
if (line[0] == HDR_PREFIX_CHAR)
{
fmt = FindFormat(line[1]);
if (!fmt)
{
result = PrintFileError(state, "ERROR: Invalid header type");
goto error;
}
}
lineCRC = CalculateCRC(fmt->lineCRC, 0, (byte const *)lineData, length);
p = line + EncodedLength(fmt, fmt->runningCRCBits);
if (DecodeCheckDigits(fmt, p, NULL, fmt->lineCRC->bits, &num)
|| lineCRC != num)
{
result = PrintFileError(state, "ERROR: Line CRC failed");
goto error;
}
if (line[0] == HDR_PREFIX_CHAR)
{
int formatVersion;
int flags;
CRC seenPageCRC;
int tabWidth;
long productNumber;
long fileNumber;
long pageNumber;
char * fileNameTail;
int skipNextPage = 0;
char * p;
EncodeFormat const * hFmt = &hexFormat;
/* Parse header line */
p = lineData;
if (DecodeCheckDigits(hFmt, p, &p, HDR_VERSION_BITS, &num))
{
invalidHeader:
result = PrintFileError(state, "ERROR: Invalid header");
goto error;
}
formatVersion = num;
if (DecodeCheckDigits(hFmt, p, &p, HDR_FLAG_BITS, &num))
goto invalidHeader;
flags = num;
if (DecodeCheckDigits(hFmt, p, &p, fmt->pageCRC->bits, &num))
goto invalidHeader;
seenPageCRC = num;
if (DecodeCheckDigits(hFmt, p, &p, HDR_TABWIDTH_BITS, &num))
goto invalidHeader;
tabWidth = num;
if (DecodeCheckDigits(hFmt, p, &p, HDR_PRODNUM_BITS, &num))
goto invalidHeader;
productNumber = num;
if (DecodeCheckDigits(hFmt, p, &p, HDR_FILENUM_BITS, &num))
goto invalidHeader;
fileNumber = num;
if (sscanf(p, " Page %ld of ", &pageNumber) < 1)
goto invalidHeader;
if (formatVersion > 0)
{
result = PrintFileError(state,
"ERROR: Format too new for "
"this version of unmunge");
goto error;
}
p = strstr(p, " of ");
if (p == NULL)
goto invalidHeader;
fileNameTail = p + 4;
p = fileNameTail + strlen(fileNameTail);
if (p < fileNameTail + 3 || p[-1] != '\n')
goto invalidHeader;
else
p[-1] = '\0';
if (state->out != NULL && state->pageCRC != state->seenPageCRC)
{
result = PrintFileError(state,
"ERROR: Page CRC mismatch on page before");
goto error;
}
if ((state->hdrFlags & HDR_FLAG_LASTPAGE) && state->out != NULL)
{
fclose(state->out);
state->out = NULL;
}
if (state->out != NULL)
{
if (pageNumber != state->pageNumber + 1 ||
fileNumber != state->fileNumber ||
productNumber != state->productNumber ||
tabWidth != state->tabWidth ||
strcmp(fileNameTail, state->fileNameTail) != 0)
{
if (fileNumber == state->fileNumber &&
pageNumber > state->pageNumber + 1)
{
(void)PrintFileError(state,
"ERROR: Missing pages of this file");
if (forcePartialFiles && !state->binaryMode)
{
fputs("\n\n@@@@@@ Missing pages here! @@@@@@\n\n",
state->out);
}
else
{
skipNextPage = 1;
fclose(state->out);
state->out = NULL;
remove(state->fileName);
}
}
else
{
(void)PrintFileError(state,
"ERROR: Missing pages of previous file");
if (forcePartialFiles && !state->binaryMode)
{
fputs("\n\n@@@@@@ Missing pages here! @@@@@@\n\n",
state->out);
/* Make it non-fatal, though... */
fclose(state->out);
state->out = NULL;
}
else
{
fclose(state->out);
state->out = NULL;
remove(state->fileName);
}
}
}
}
if (state->out == NULL)
{
if (pageNumber != 1 && !skipPage)
(void)PrintFileError(state,
"ERROR: File doesn't begin with page 1");
state->binaryMode = (tabWidth == 0);
if (pageNumber != 1 && (state->binaryMode
|| !forcePartialFiles))
{
skipNextPage = 1;
}
else
{
/* TODO: Use global filelist to get pathname */
result = ReadManifest(state, fileNumber, fileNameTail,
strlen(fileNameTail));
if (result != 0)
goto error;
if (!forceOverwrite)
{
FILE * file;
/* Make sure file doesn't already exist */
file = fopen(state->fileName, "r");
if (file != NULL)
{
fclose(file);
fprintf(stderr, "ERROR: %s already exists\n",
state->fileName);
result = 1;
goto error;
}
}
state->out = fopen(state->fileName,
state->binaryMode ? "wb" : "w");
if (state->out == NULL)
goto errnoError;
if (pageNumber != 1)
fputs("\n\n@@@@@@ Missing pages here! @@@@@@\n\n",
state->out);
}
}
state->pageCRC = 0;
state->seenPageCRC = seenPageCRC;
state->hdrFlags = (word16)flags;
state->pageNumber = pageNumber;
state->fileNumber = fileNumber;
state->productNumber = productNumber;
state->tabWidth = tabWidth;
skipPage = skipNextPage;
}
else if (!skipPage)
{
if (state->out == NULL)
{
result = PrintFileError(state, "ERROR: Missing header line");
goto error;
}
/* Normal data line */
state->pageCRC = CalculateCRC(fmt->pageCRC, state->pageCRC,
(byte const *)lineData,
length);
line[2] = '\0';
if (DecodeCheckDigits(fmt, line, NULL, fmt->runningCRCBits, &num)
|| RunningCRCFromPageCRC(fmt, state->pageCRC) != num)
{
result = PrintFileError(state, "ERROR: Running CRC failed");
goto error;
}
if (state->binaryMode)
{
length = DecodeLine(lineData, outbuf, length-1);
if (length < 0 || length > BYTES_PER_LINE) {
result = PrintFileError(state,
"ERROR: Corrupt radix-64 data");
goto error;
}
fwrite(outbuf, 1, length, state->out);
}
else
{
p = lineData;
while (*p != '\0')
{
if (*p == TAB_CHAR)
{
p++;
putc('\t', state->out);
while ((p - lineData) % state->tabWidth)
{
if (*p == '\n')
break;
else if (*p == ' ')
p++;
else
{
result = PrintFileError(state,
"ERROR: Not enough spaces "
"after a tab character");
goto error;
}
}
}
else if (*p == FORMFEED_CHAR)
{
p++;
if (*p != '\n')
{
result = PrintFileError(state,
"ERROR: Formfeed character "
"not at end of line");
goto error;
}
p++; /* Skip newline */
putc('\f', state->out);
}
else if (*p == CONTIN_CHAR)
{
p++;
if (*p != '\n')
{
result = PrintFileError(state,
"ERROR: Continuation character "
"not at end of line");
goto error;
}
p++; /* Skip newline */
}
else if (*p == SPACE_CHAR)
{
putc(' ', state->out);
p++;
}
else
{
putc(*p, state->out);
p++;
}
}
}
}
}
if (state->out != NULL)
{
if (!(state->hdrFlags & HDR_FLAG_LASTPAGE))
{
result = PrintFileError(state, "ERROR: Missing pages");
goto error;
}
if (state->pageCRC != state->seenPageCRC)
{
result = PrintFileError(state,
"ERROR: Page CRC failed on previous page");
goto error;
}
}
/* Check for missing files at the end */
result = ReadManifest(state, 0, NULL, 0);
goto done;
errnoError:
result = errno;
goto printError;
fileError:
result = ferror(state->file);
printError:
fprintf(stderr, "ERROR: %s\n", strerror(result));
error:
done:
if (state != NULL)
{
if (state->out != NULL)
fclose(state->out);
if (state->file != NULL)
fclose(state->file);
if (state->manifest != NULL)
fclose(state->manifest);
free(state);
}
return result;
}
void UsageAndExit(int result)
{
fprintf(stderr,
"Usage: unmunge [-fp] <file> [<manifest>]\n"
" -f Force overwrites of existing files\n"
" -p Force unmunge of partial files\n");
exit(result);
}
int main(int argc, char *argv[])
{
int result = 0;
int forceOverwrite = 0;
int forcePartialFiles = 0;
char * fileName = NULL;
char * manifestFileName = NULL;
int i, j;
InitUtil();
for (i = 1; i < argc && argv[i][0] == '-'; i++)
{
if (0 == strcmp(argv[i], "--"))
{
i++;
break;
}
for (j = 1; argv[i][j] != '\0'; j++)
{
if (argv[i][j] == 'h')
UsageAndExit(0);
else if (argv[i][j] == 'f')
forceOverwrite = 1;
else if (argv[i][j] == 'p')
forcePartialFiles = 1;
else
{
fprintf(stderr, "ERROR: Unrecognized option -%c\n", argv[i][j]);
UsageAndExit(1);
}
}
}
if (i < argc)
fileName = argv[i++];
if (i < argc)
manifestFileName = argv[i++];
if (fileName == NULL || i < argc)
UsageAndExit(1);
if ((result = UnMungeFile(fileName, manifestFileName,
forceOverwrite, forcePartialFiles)) != 0)
{
/* If result > 0, message should have already been printed */
if (result < 0)
fprintf(stderr, "ERROR: %s\n", strerror(result));
exit(1);
}
return 0;
}
/*
* Local Variables:
* tab-width: 4
* End:
* vi: ts=4 sw=4
* vim: si
*/

198
tools/util.c Normal file
View File

@ -0,0 +1,198 @@
/*
* util.c -- Miscellaneous shared code/data
*
* Copyright (C) 1997 Pretty Good Privacy, Inc.
*
* Written by Mark H. Weaver
*
* $Id: util.c,v 1.11 1997/11/07 00:44:10 mhw Exp $
*/
#include <stdlib.h>
#include "util.h"
char const hexDigits[] = "0123456789abcdef";
char const radix64Digits[] =
#if 0 /* Standard */
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
#else /* Modified form that avoids hard-to-OCR characters */
"ABCDEFGHIJKLMNPQRSTVWXYZabcdehijklmnpqtuwy145689\\^!#$%&*+=/:<>?@";
#endif
signed char hexDigitsInv[256];
signed char radix64DigitsInv[256];
/* teun: moved intitialisation of all three CRCPoly's to initUtil() */
/* CRC-CCITT: x^16 + x^12 + x^5 + 1 */
CRCPoly crcCCITTPoly;
/*
* PRZ's magic 24-bit polynomial - (x+1) * (irreducible of degree 23)
* x^24 +x^23 +x^18 +x^17 +x^14 +x^11 +x^10 +x^7 +x^6 +x^5 +x^4 +x^3 +x +1
* (Developed by Neal Glover). Note: this is bit-reversed from the form
* used in PGP, 0x1864cfb.
*/
CRCPoly crc24Poly;
/* CRC-32: x^32+x^26+x^23+x^22+x^16+x^12+x^11+x^10+x^8+x^7+x^5+x^4+x^2+x+1 */
CRCPoly crc32Poly;
EncodeFormat const hexFormat =
{
NULL, /* nextFormat */
'-', /* headerTypeChar */
hexDigits, /* digits */
hexDigitsInv, /* digitsInv */
4, /* bitsPerDigit */
16, /* radix */
&crcCCITTPoly, /* lineCRC */
&crc32Poly, /* pageCRC */
8, /* runningCRCBits */
24, /* runningCRCShift */
0xFF /* runningCRCMask */
};
EncodeFormat const radix64Format =
{
&hexFormat, /* nextFormat */
'A', /* headerTypeChar */
radix64Digits, /* digits */
radix64DigitsInv, /* digitsInv */
6, /* bitsPerDigit */
64, /* radix */
&crc24Poly, /* lineCRC */
&crc32Poly, /* pageCRC */
12, /* runningCRCBits */
20, /* runningCRCShift */
0xFFF /* runningCRCMask */
};
EncodeFormat const * firstFormat = &radix64Format;
static void InitCRCPoly(CRCPoly *poly)
{
int i, oneBit;
CRC crc = 1;
poly->table[0] = 0;
for (oneBit = 0x80; oneBit > 0; oneBit >>= 1) {
crc = (crc >> 1) ^ ((crc & 1) ? poly->poly : 0);
for (i = 0; i < 0x100; i += 2 * oneBit)
poly->table[i + oneBit] = poly->table[i] ^ crc;
}
}
CRC CalculateCRC(CRCPoly const *poly, CRC crc,
byte const *buffer, size_t length)
{
while (length--)
crc = (crc >> 8) ^ poly->table[(crc & 0xFF) ^ (*buffer++)];
return crc;
}
CRC ReverseCRC(CRCPoly const *poly, CRC crc, byte b)
{
int i, highBit = poly->highBit;
for (i = 0; i < 8; i++) {
if (crc & highBit) /* highBit is 2^(poly->bits-1) */
crc = ((crc ^ poly->poly) << 1) ^ 1;
else
crc <<= 1;
}
return crc ^ b;
}
static void InitDigitsInv(char const *digits, signed char *digitsInv)
{
int i;
for (i = 0; i < 256; i++)
digitsInv[i] = -1;
for (i = 0; digits[i]; i++)
digitsInv[(byte)digits[i]] = i;
}
/* Returns the number of chars encoded */
int EncodeCheckDigits(EncodeFormat const *fmt, word32 num,
int numBits, char *dest)
{
int destLen = EncodedLength(fmt, numBits);
word32 digitMask = fmt->radix - 1;
int i;
for (i = destLen - 1; i >= 0; i--)
{
dest[i] = EncodeDigit(fmt, num & digitMask);
num >>= fmt->bitsPerDigit;
}
return destLen;
}
/* Returns 1 if there's an error */
int DecodeCheckDigits(EncodeFormat const *fmt, char const *src, char **endPtr,
int numBits, word32 *valuePtr)
{
word32 value = 0;
int digitValue;
int i = EncodedLength(fmt, numBits);
while (i--)
{
digitValue = DecodeDigit(fmt, *src++);
if (digitValue < 0)
{
/* Invalid digit found */
*valuePtr = 0;
if (endPtr)
*endPtr = NULL;
return 1;
}
value = (value << fmt->bitsPerDigit) | digitValue;
}
*valuePtr = value;
if (endPtr)
*endPtr = (char *)src;
return 0;
}
EncodeFormat const *FindFormat(char headerTypeChar)
{
EncodeFormat const * fmt = firstFormat;
while (fmt && fmt->headerTypeChar != headerTypeChar)
fmt = fmt->nextFormat;
return fmt;
}
void InitUtil()
{
/* teun: removed "{ }" for MS VC compile */
crcCCITTPoly.bits = 16;
crcCCITTPoly.poly = 0x8408;
crcCCITTPoly.highBit = 0x8000;
crc24Poly.bits = 24;
crc24Poly.poly = 0xdf3261;
crc24Poly.highBit = 0x800000;
crc32Poly.bits = 32;
crc32Poly.poly = 0xedb88320;
crc32Poly.highBit = 0x80000000;
InitCRCPoly(&crcCCITTPoly);
InitCRCPoly(&crc24Poly);
InitCRCPoly(&crc32Poly);
InitDigitsInv(hexDigits, hexDigitsInv);
InitDigitsInv(radix64Digits, radix64DigitsInv);
}
/*
* Local Variables:
* tab-width: 4
* End:
* vi: ts=4 sw=4
* vim: si
*/

149
tools/util.h Normal file
View File

@ -0,0 +1,149 @@
/*
* util.h -- Miscellaneous defines
*
* Copyright (C) 1997 Pretty Good Privacy, Inc.
*
* Written by Mark H. Weaver
*
* $Id: util.h,v 1.23 1997/11/12 23:28:56 mhw Exp $
*/
#ifndef UTIL_H
#define UTIL_H 1
typedef unsigned long word32;
typedef unsigned short word16;
typedef unsigned char byte;
#define FMT32 "%08lx"
#define FMT16 "%04x"
#define FMT8 "%02x"
#define TAB_CHAR '\244' /* Currency symbol, like o in top of x */
#define TAB_STRING "\244"
#define TAB_PAD_CHAR ' ' /* The fact that this is space has leaked. */
#define TAB_PAD_STRING " " /* It may not be freely changed. */
#define FORMFEED_CHAR '\245' /* Yen symbol, like = on top of Y */
#define FORMFEED_STRING "\245"
#define SPACE_CHAR '\267' /* Middle dot, or bullet */
#define SPACE_STRING "\267"
#define CONTIN_CHAR '\266' /* Pilcrow (paragraph symbol) */
#define CONTIN_STRING "\266"
#define BYTES_PER_LINE 60 /* When using radix 64 */
#define LINES_PER_PAGE 72 /* Exclusive of 2 header lines */
#define LINE_LENGTH 80
#define PREFIX_LENGTH 7 /* Length of prefix, including the space */
#define HDR_PREFIX_CHAR '-'
#define RADIX64_END_CHAR '-'
typedef struct EncodeFormat EncodeFormat;
typedef word32 CRC;
typedef word16 CRCFragment;
typedef struct
{
CRC table[256];
int bits;
CRC poly;
CRC highBit;
} CRCPoly;
struct EncodeFormat
{
EncodeFormat const *nextFormat;
char headerTypeChar;
char const * digits;
signed char const * digitsInv;
int bitsPerDigit;
int radix;
CRCPoly const * lineCRC;
CRCPoly const * pageCRC;
int runningCRCBits;
int runningCRCShift;
int runningCRCMask;
};
#define HDR_ENC_LENGTH 19 /* Length of encoded prefix on header */
#define HDR_VERSION_BITS 4
#define HDR_FLAG_BITS 8
/* Page CRC bits omitted, since it's not constant */
#define HDR_TABWIDTH_BITS 4
#define HDR_PRODNUM_BITS 12
#define HDR_FILENUM_BITS 16
/* Enough to hold one whole page of munged data */
/* There is no point making this excessively too large */
#define PAGE_BUFFER_SIZE 8192
#if PAGE_BUFFER_SIZE < (LINES_PER_PAGE + 2) * (LINE_LENGTH + PREFIX_LENGTH + 2)
#error PAGE_BUFFER_SIZE is too small
#endif
/* Header flags */
#define HDR_FLAG_LASTPAGE 0x01 /* Indicates last page of file */
#define elemsof(array) (sizeof(array)/sizeof(*(array)))
extern char const hexDigits[];
extern char const radix64Digits[];
extern signed char hexDigitsInv[256];
extern signed char radix64DigitsInv[256];
extern CRCPoly crcCCITTPoly, crc24Poly, crc32Poly;
extern EncodeFormat const hexFormat, radix64Format;
extern EncodeFormat const * firstFormat;
#define HexDigitValue(ch) hexDigitsInv[(byte)(ch)]
#define Radix64DigitValue(ch) radix64DigitsInv[(byte)(ch)]
/* Returns the number of chars needed to encode the given number of bits */
#define EncodedLength(fmt, numBits) \
(((numBits) + (fmt)->bitsPerDigit - 1) / (fmt)->bitsPerDigit)
#define EncodeDigit(fmt, value) ((fmt)->digits[value])
#define DecodeDigit(fmt, digit) ((fmt)->digitsInv[(byte)digit])
#define AdvanceCRC(poly, crc, b) \
((crc) >> 8) ^ (poly)->table[((crc) ^ (b)) & 0xFF]
#define RunningCRCFromPageCRC(fmt, pageCRC) \
(((pageCRC) >> (fmt)->runningCRCShift) & (fmt)->runningCRCMask)
CRC CalculateCRC(CRCPoly const *poly, CRC crc,
byte const *buffer, size_t length);
CRC ReverseCRC(CRCPoly const *poly, CRC crc, byte b);
/* Returns the number of chars encoded */
int EncodeCheckDigits(EncodeFormat const *fmt, word32 num,
int numBits, char *dest);
/* Returns 1 if there's an error */
int DecodeCheckDigits(EncodeFormat const *fmt, char const *src, char **endPtr,
int numBits, word32 *valuePtr);
EncodeFormat const *FindFormat(char headerTypeChar);
void InitUtil();
#endif /* !UTIL_H */
/*
* Local Variables:
* tab-width: 4
* End:
* vi: ts=4 sw=4
* vim: si
*/

286
tools/yapp Normal file
View File

@ -0,0 +1,286 @@
#!/usr/bin/perl
#
# Yet another preprocessor
#
# $Id: yapp,v 1.5 1997/10/24 07:51:05 mhw Exp $
#
%vars = ('' => '$');
@incPath = (".");
sub Error
{
print STDERR $_[0], "\n";
exit(1);
}
sub VarSubst
{
my ($varName, $undefOkay) = @_;
if (defined($vars{$varName}))
{
return $vars{$varName};
}
elsif (!$undefOkay)
{
&Error("Undefined variable '$varName' in $fileName line $.");
}
}
sub NullFilter
{
0;
}
sub IfFilter
{
local $_ = $_[0];
if (/^##else(\s+.*)?/)
{
return 1;
}
elsif (/^##endif(\s+.*)?/)
{
return 2;
}
else
{
return 0;
}
}
sub DoFile
{
local $fileName = $_[0];
my $path;
local *FILE;
if ($fileName =~ m|^/|)
{
$path = $fileName;
}
else
{
for $dir (@incPath)
{
if (-e "$dir/$fileName")
{
$path = "$dir/$fileName";
last;
}
}
}
if ($path eq "")
{
&Error("Can't find '$fileName', from $fileName line $.");
}
open(FILE, "<$path") || &Error("Can't open $path: $!");
&DoOpenFile(*FILE, *NullFilter, 0);
close(FILE) || die;
0;
}
sub DoPrepass
{
local ($_, $skipFlag) = @_;
return "" if /^###/;
s/\s*###.*//; # Strip comments
s/\${(\w+)}/&VarSubst($1, $skipFlag)/eg; # Do variable substitutions
$_;
}
sub DoOpenFile
{
local *FILE = $_[0];
local *filter = $_[1];
my $skipFlag = $_[2];
my $result;
local $_;
while (<FILE>)
{
$_ = &DoPrepass($_, $skipFlag);
if ($result = &filter($_))
{
return $result;
}
elsif (/^##(\w*)(\s+(.*))?/)
{
my ($cmd, $params) = ($1, $3);
if ($cmd =~ /^if/)
{
my $condition;
my $ifStartLine = $.;
if ($cmd eq "if")
{
if ($params =~ /^(\d+)\s*$/)
{
$condition = int($1);
}
elsif ($params =~ /^(\d+)\s*([=!]=|[<>]=?)\s*(\d+)\s*$/)
{
my ($left, $op, $right) = ($1, $2, $3);
$condition = eval($left . $op . $right);
}
elsif ($params =~ /^(\S+)\s*(eq|ne)\s*(\S+)\s*$/)
{
my ($left, $op, $right) = ($1, $2, $3);
$left =~ s/([\\'])/\\$1/g;
$right =~ s/([\\'])/\\$1/g;
$condition = eval("'$left' $op '$right'");
}
else
{
&Error("Invalid ##if params: '$params' " .
"in $fileName line $.");
}
}
elsif ($cmd =~ /^ifn?def$/)
{
if ($params =~ /^(\w+)\s*$/)
{
$condition = defined($vars{$1});
$condition = !$condition if ($cmd eq "ifndef");
}
else
{
&Error("Invalid ##$cmd param: '$params' " .
"in $fileName line $.");
}
}
# Do main body of if
$result = &DoOpenFile(*FILE, *IfFilter,
$skipFlag || !$condition);
if ($result == 1) # an '##else' was found
{
# Handle else
$result = &DoOpenFile(*FILE, *IfFilter,
$skipFlag || $condition);
}
if ($result == 1) # a second '##else' was found
{
&Error("Two ##else's in a row in $fileName line $.");
}
elsif ($result == 0) # EOF was encountered
{
&Error("Unterminated ##if " .
"in $fileName line $ifStartLine");
}
}
elsif ($cmd eq "include")
{
if ($skipFlag)
{
}
elsif ($params =~ /^"(.*)"\s*$/)
{
my $incFile = $1;
&DoFile($incFile);
}
else
{
&Error("Invalid ##include params: '$params'");
}
}
elsif ($cmd eq "set")
{
if ($params =~ /^(\w+)=<<(")(.*)"\s*$/ or
$params =~ /^(\w+)=<<(')(.*)'\s*$/)
{
my $varName = $1;
my $quoteChar = $2;
my $endTag = $3 . "\n";
my $value;
while (<FILE>)
{
if ($_ eq $endTag)
{
chop $value;
last;
}
else
{
if ($quoteChar eq '"')
{
$_ = &DoPrepass($_, $skipFlag);
}
$value .= $_;
}
}
if (!$skipFlag)
{
$vars{$varName} = $value;
}
}
elsif ($params =~ /^(\w+)="(.*)"\s*$/ or
$params =~ /^(\w+)=(\S*)\s*$/)
{
if (!$skipFlag)
{
$vars{$1} = $2;
}
}
else
{
&Error("Invalid ##set command: '$params'");
}
}
else
{
&Error("Unrecognized command: '$_'");
}
}
elsif (!$skipFlag)
{
print;
}
}
return 0;
}
$optEnable = 1;
foreach (@ARGV)
{
if ($optEnable and /^-/)
{
if (/^--$/)
{
$optEnable = 0;
}
elsif (/^-D(\w+)=(.*)$/)
{
$vars{$1} = $2;
}
elsif (/^-I(.*)$/)
{
unshift @incPath, $1;
}
else
{
&Error("Unrecognized option: '$_'");
}
}
else
{
&DoFile($_);
}
}
#
# vi: ai ts=4
# vim: si
#

48
tools/yapp.doc Normal file
View File

@ -0,0 +1,48 @@
YAPP is a simple macro preprocessor designed to do minor tweaking to
another program's inputs.
In its input, anything of the form ${foo} is expanded with the variable
named foo. It is an error if ${foo} is not defined.
If you need to escape a dollar sign for some reason, the variable
with the empty string name , ${}, has the value "$".
The result of macro expansion is *not* re-expanded. Expansion is done only
when definitions are made.
After variable expansion, lines are checked to see if they are control lines.
Control lines begin with ## (after optional leading whitespace) All such lines are deleted and
do not appear in the output. ### is a comment. Other options
are:
##set variable=value
value may have one of the following forms:
token: Trailing whitespace is stripped. The token may not contain
any whitespace. Use quotes if it's complicated.
"string": The string may have embedded quotes, and whitespace after
the closing quote.
<<"DELIM": This is a here-document, and the value is all of the following
lines up until, but not including, the newline that precedes a line
that consists soley of DELIM, for any DELIM string.
The Delim must be in quotes. You have two options:
"DELIM": Expand macros in the body of the here-document.
'DELIM': Do not expand macros in the here-document.
##include "filename": Insert the named file in place of the current line.
##if num == num
##if num != num
##if num < num
##if num > num
##if num <= num
##if num >= num
##if token eq token
##if token ne token
##ifdef symbol
##ifndef symbol
##else
##endif
You can figure this one out. Macros in between are expanded as usual
(so the ##else or ##endif may be in a macro expansion), but the result
is ignored. String comparison is allowed only between simple words.
#ifdef symbol is true if ${symbol} is defined.