commit 60052b2f16264130264720e9cc3c576b51431e7e
Author: rnhmjoj <rnhmjoj@inventati.org>
Date:   Wed May 15 16:55:03 2019 +0200

    initial commit

diff --git a/MANIFEST b/MANIFEST
new file mode 100644
index 0000000..231015b
--- /dev/null
+++ b/MANIFEST
@@ -0,0 +1,32 @@
+1 test-file
+2 MANIFEST
+D books/
+D books/tools/
+3 bootstrap
+4 bootstrap2
+5 sortpages
+6 Makefile
+7 heap.c
+8 heap.h
+9 mempool.c
+10 mempool.h
+11 util.c
+12 util.h
+13 repair.c
+14 subst.c
+15 subst.h
+16 unmunge.c
+17 munge.c
+18 yapp.doc
+19 yapp
+20 psgen
+21 makemanifest
+D books/ps/
+22 prolog.ps
+23 charmap.ps
+D books/example/
+24 Makefile
+25 .cvsignore
+26 filelist
+27 footer.ps
+28 us-constitution.gz
diff --git a/README b/README
new file mode 100644
index 0000000..8abf4a8
--- /dev/null
+++ b/README
@@ -0,0 +1,477 @@
+PREFACE
+-------
+
+This book grew out of a project to publish source code for cryptographic
+software, namely PGP (Pretty Good Privacy), a software package for the
+encryption of electronic mail and computer files. PGP is the most widely
+used software in the world for email encryption. Pretty Good Privacy, Inc
+(or "PGP") has published the source code of PGP for peer review, a long-
+standing tradition in the history of PGP. The first time a fully implemented
+cryptographic software package was published in its entirety in book form
+was "PGP Source Code and Internals," by Philip Zimmermann, published by The
+MIT Press, 1995, ISBN 0-262-24039-4.
+
+Peer review of the source code is important to get users to trust the
+software, since any weaknesses can be detected by knowledgeable experts who
+make the effort to review the code. But peer review cannot be completely
+effective unless the experts conducting the review can compile and test the
+software, and verify that it is the same as the software products that are
+published electronically. To facilitate that, PGP publishes its source code
+in printed form that can be scanned into a computer via OCR (optical
+character recognition) technology.
+
+Why not publish the source code in electronic form? As you may know,
+cryptographic software is subject to U.S. export control laws and
+regulations. The new 1997 Commerce Department Export Administration
+Regulations (EAR) explicitly provide that "A printed book or other printed
+material setting forth encryption source code is not itself subject to the
+EAR." (see 15 C.F.R. §734.3(b)(2)). PGP, in an overabundance of caution,
+has only made available its source code in a form that is not subject to
+those regulations. So, books containing cryptographic source code may be
+published, and after they are published they may be exported, but only
+while they are still in printed form.
+
+Electronic commerce on the Internet cannot fully be successful without
+strong cryptography. Cryptography is important for protecting our privacy,
+civil liberties, and the security of our personal and business transactions
+in the information age. The widespread deployment of strong cryptography
+can help us regain some of the privacy and security that we have lost due
+to information technology. Further, strong cryptography (in the form of
+PGP) has already proven itself to be a valuable tool for the protection of
+human rights in oppressive countries around the world, by keeping those
+governments from reading the communications of human rights workers.
+
+This book of tools contains no cryptographic software of any kind, nor does
+it call, connect, nor integrate in any way with cryptographic software. But
+it does contain tools that make it easy to publish source code in book form.
+And it makes it easy to scan such source code in with OCR software rapidly
+and accurately.
+
+Philip Zimmermann
+prz@acm.org
+
+November 1997
+
+
+
+INTRODUCTION
+------------
+
+This book contains tools for printing computer source code on paper in
+human-readable form and reconstructing it exactly using automated tools.
+While standard OCR software can recover most of the graphic characters,
+non-printing characters like tabs, spaces, newlines and form feeds cause
+problems.
+
+In fact, these tools can print any ASCII text file; it's just that the
+attention these tools pay to spacing is particularly valuable for computer
+source code. The two-dimensional indentation structure of source code is
+very important to its comprehensibility. In some cases, distinctions
+between non-printing characters are critical: the standard make utility
+will not accept spaces where it expects to see a tab character.
+
+Producing a byte-for-byte identical copy of the original is also valuable
+for authentication, as you can verify a checksum.
+
+There are five problems we have addressed:
+
+1. Getting good OCR accuracy.
+2. Preserving whitespace.
+3. Preserving lines longer than can be printed on the page.
+4. Dealing with data that isn't human-readable.
+5. Detecting and correcting any residual errors.
+
+The first problem is partly addressed by using a font designed for OCR
+purposes, OCR-B. OCR-A is a very ugly font that contains only the digits 0
+through 9 and a few special punctuation symbols. OCR-B is a very readable
+monospaced font that contains a full ASCII set, and has been popular as a
+font on line printers for years because it distinguishes ambiguous
+characters and is clear even if fuzzy or distorted.
+
+The most unusual thing about the OCR-B font is the way that it prints a
+lower-case letter 1, with a small hook on the bottom, something like an
+upper-case L. This is to distinguish it from the numeral 1. We also made
+some modifications to the font, to print the numeral 0 with a slash, and
+to print the vertical bar in a broken form. Both of these are such common
+variants that they should not present any intelligibility barrier. Finally,
+we print the underscore character in a distinct manner that is hopefully
+not visually distracting, but is clearly distinguishable from the minus
+sign even in the absence of a baseline reference.
+
+The most significant part of getting good OCR accuracy is, however, using
+the OCR tools well. We've done a lot of testing and experimentation and
+present here a lot of information on what works and what doesn't.
+
+To preserve whitespace, we added some special symbols to display spaces,
+tabs, and form feeds. A space is printed as a small triangular dot
+character, while a hollow rightward-pointing triangle (followed by blank
+spaces to the right tab stop) signifies a tab. A form feed is printed as
+a yen symbol, and the printed line is broken after the form feed.
+
+Making the dot triangular instead of square helps distinguish it from a
+period. To reduce the clutter on the page and make the text more readable,
+the space character is only printed as a small dot if it follows a blank
+on the page (a tab or another space), or comes immediately before the end
+of the line. Thus, the reader (human or software) must be able to
+distinguish one space from no spaces, but can find multiple spaces by
+counting the dots (and adding one).
+
+The format is designed so that 80 characters, plus checksums, can be
+printed on one line of an 8.5x11" (or A4) page, the still-common punched
+card line length. Longer lines are managed with the simple technique of
+appending a big ugly black blob to the first part of the line indicating
+that the next printed line should be concatenated with the current one
+with no intervening newline. Hopefully, its use is infrequent.
+
+While ASCII text is by far the most popular form, some source code is not
+readable in the usual way. It may be an audio clip, a graphic image bitmap,
+or something else that is manipulated with a specialized editing tool. For
+printing purposes, these tools just print any such files as a long string
+of gibberish in a 64-character set designed to be easy to OCR unambiguously.
+Although the tools recognize such binary data and apply extra consistency
+checks, that can be considered a separate step.
+
+Finally, the problem of residual errors arises. OCR software is not perfect,
+and uses a variety of heuristics and spelling-check dictionaries to clean up
+any residual errors in human-language text. This isn't reliable enough for
+source code, so we have added per-page and per-line checksums to the printed
+material, and a series of tools to use those checksums to correct any
+remaining errors and convert the scanned text into a series of files again.
+
+This "munged" form is what you see in most of the body of this book. We
+think it does a good job of presenting source code in a way that can be read
+easily by both humans and computers.
+
+The tools are command-line oriented and a bit clunky. This has a purpose
+beyond laziness on the authors' parts: it keeps them small. Keeping them
+small makes the "bootstrapping" part of scanning this book easier, since you
+don't have the tools to help you with that.
+
+
+
+SCANNING
+--------
+
+Our tests were done with OmniPage 7.0 on a Power Macintosh 8500/120 and an
+HP ScanJet 4c scanner with an automatic document feeder. The first part of
+this is heavily OmniPage-specific, as that appears to be the most widely
+available OCR software.
+
+The tools here were developed under Linux, and should be generally portable
+to any Unix platform. Since this book is about printing and scanning source
+code, we assume the readers have enough programming background to know how
+to build a program from a Makefile, understand the hazards of CR, LF or CRLF
+line endings, and such minor details without explicit mention.
+
+The first step to getting OrnniPage 7 to work well is to set it up with
+options to disable all of its more advanced features for preserving font
+changes and formatting. Look in the Seffings menu.
+
+· Create a Zone Contents File with all of ASCII in it, plus the extra
+  bullet, currency, yen and pilcrow symbols. Name it "Source Code".
+· Create a Source Code style set. Within it, create a Source Code zone style
+  and make it the default.
+· Set the font to something fixed-width, like Courier.
+· Set a fixed font size (10 point) and plain text, left-aligned.
+· Set the tab character to a space.
+· Set the text flow to hard line returns.
+· Set the margins to their widest.
+· The font mapping options are irrelevant.
+
+Go to the settings panel and:
+
+· Under Scanner, set the brightness to manual. With careful setting of the
+  threshold, this generates much better results than either the automatic
+  threshold or the 3D OCR. Around 144 has been a good setting for us; you
+  may want to start there.
+· Under OCR, you'll build a training file to use later, but turn off
+  automatic page orientation and select your Source Code style set in the
+  Output Options. Also set a reasonable reject character. (For test, we
+  used the pi symbol, which came across from the Macintosh as a weird
+  sequence, but you can use anything as long as you make the appropriate
+  definition in subst.c.)
+
+Do an initial scan of a few pages and create a manual zone encompassing
+all of the text. Leave some margin for page misalignment, and leave space
+on the sides for the left-right shift caused by the book binding being in
+different places on odd and even pages.
+
+Set the Zone Contents and the Style set to the Source Code settings. After
+setting the Style Set, the Zone Style should be automatically set correctly
+(since you set Source Code as the default).
+
+Then save the Zone Template, and in the pop-up menu under the Zone step on
+the main toolbar you can now select it.
+
+Now we're ready to get characters recognized. The first results will be
+terrible, with lots of red (unrecognizable) and green (suspicious) text in
+the recognized window. Some tweaking will improve this enormously.
+
+The first step is setting a good black threshold. Auto brightness sets the
+threshold too low, making the character outlines bleed and picking up a lot
+of glitches on mostly-blank pages. Try training OCR on the few pages you've
+scanned and look at the representative characters. Adjust the threshold so
+the strokes are clear and distinct, neither so thin they are broken nor so
+think they smear into each other. The character that bleeds worst is
+lowercase w, while the underscore and tab symbols have the thinnest lines
+that need worry.
+
+You'll have to re-scan (you can just click the AUTO button) until you get
+satisfactory results.
+
+The next step is training. You should scan a significant number of pages
+and teach OmniPage about any characters it has difficulty with. There are
+several characters which have been printed in unusual ways which you must
+teach OmniPage about before it can recognize them reliably. We also have
+some characters that are unique, which the tools expect to be mapped to
+specific Latin-1 characters to be processed.
+
+They characters most in need of training are as follows:
+
+· Zero is printed 'slashed.'
+· Lowercase L has a curled tail to distinguish it clearly from other
+  vertical characters like 1 and I.
+· The or-bar or pipe symbol '|' is printed "broken" with a gap in the
+  middle to distinguish it similarly.
+· The underscore character has little "serifs" on the end to distinguish
+  it from a minus sign. We also raised it a just a tad higher than the
+  normal underscore character, which was too low in the character cell to
+  be reliably seen by OmniPage.
+· Tabs are printed as a hollow right-pointing triangle, followed by blanks
+  to the correct alignment position. If not trained enough, OmniPage
+  guesses this is a capital D. You should train OmniPage to recognize this
+  symbol as a currency symbol (Latin-1 244).
+· Any spaces in the original that follow a space, or a blank on the printed
+  page, are printed as a tiny black triangle. You should train OmniPage to
+  recognize this as a center dot or bullet (Latin-1 267). We didn't use a
+  standard center dot because OmniPage confused it with a period.
+· Any form feeds in the original are printed as a yen currency symbol
+  (Latin-1 245).
+· Lines over 80 columns long are broken after 79 columns by appending a big
+  ugly black block. You should train OmniPage to recognize this as a
+  pilcrow (paragraph symbol, Latin-1 266). We did this because after
+  deciding something black and visible was suitable, we found out the font
+  we used doesn't have a pilcrow in it.
+
+The zero and the tab character, because of their frequency, deserve special
+attention.
+
+In addition, look for any unrecognized characters (in red) and retrain those
+pages. If you get an unrecognized character, that character needs training,
+but Caere says that "good examples" are best to train on, so if the training
+doesn't recognize a slightly fuzzy K, and there's a nice crisp K available
+to train on, use that.
+
+Other things that need training:
+
+· ~ (tilde), ^ (caret), ` (backquote) and ' (quote). These get dropped
+  frequently unless you train them.
+· i, j and; (semicolon). These get mixed up.
+· 3 and S. These also get mixed up.
+· Q can fail to be recognized.
+· C and [ can be confused.
+· c/C, o/O, p/P, s/S, u/U, v/V, w/W, y/Y and z/Z are often confused. This
+  can be helped by some training.
+· r gets confused with c and n. I don't understand c, but it happens.
+· f gets confused with i.
+
+The OCR training pages have lots of useful examples of troublesome
+characters. Scan a few pages of material, training each page, then scan a
+few dozen pages and look for recognition problems. Look for what OmniPage
+reports as troublesome, and when you have the repair program working, use
+it to find and report further errors. Train a few pages particularly dense
+in problems and append the troublesome characters to the training file, the
+re-recognize the lot.
+
+Double-check your training file for case errors. It's easy to miss the shift
+key in the middle of a lot of training and will result in terrible results
+even though OmniPage won't report anything amiss. We have spent a while
+wondering why OmniPage wasn't recognizing capital S or capital W, only to
+find that OmniPage was just doing what it was trained to do.
+
+We have heard some reports that OmniPage has problems with large training
+files. We have observed OmniPage suffering repeatable internal errors
+sometimes after massive training additions, but they were cured by deleting
+a few training images. Appending more training images to the training file
+did not cause the problem to re-appear.
+
+Repairing the OCR results
+
+If the only copy of the tools you have is printed in this book, see the next
+chapter on bootstrapping at this point. Here, we assume that you have the
+tools and they work.
+
+When you have some reasonable OCR results, delete any directory pages. With
+no checksum information, they just confuse the postprocessing tools. (The
+tools will just stop with an error when they get to the "uncorrectable"
+directory name and you'll have to delete it then, so it's not fatal if you
+forget.) Copy the data to a machine that you have the repair and unmunge
+utilities on.
+
+The repair utility attempts automatic table-driven correction of common
+scanning errors. You have to recompile it to change the tables, but are
+encouraged to if you find a common problem that it does not correct reliably.
+If it gets stuck, it will deposit you into your favorite editor on or
+slightly after the offending line. (The file you will be editing is the
+unprocessed portion of the input.) After you correct the problem and quit
+the editor, repair will resume.
+
+"Your favorite editor" is taken from the $VISUAL and $EDITOR environment
+variables, or the -e option to repair.
+
+The repair utility never alters the original input file. It will produce
+corrected output for file in file.out, and when it has to stop, it writes
+any remaining uncorrected input back out to file.in (via a temporary
+file.dump) and lets you edit this file. If you re-run repair on file and
+file.in exists, repair will restart from there, so you may safely quit and
+re-run repair as often as you like. (But if you change the input file, you
+need to delete the .in file for repair to notice the change.)
+
+Statistics on repair's work are printed to file.log. This is an excellent
+place to look to see if any characters require more training.
+
+As it works, repair prints the line it is working on. If you see it make a
+mistake or get stuck, you can interrupt it (control-C or whatever is
+appropriate), and it will immediately drop into the editor. If you interrupt
+it a second time, it will exit rather than invoking the editor. If the
+editor returns a non-zero result code (fails), repair will also stop. (E.g.
+:cq in vim.)
+
+One thing that repair fixes without the least trouble is the number of
+spaces expected after a printing tab character. It's such an omnipresent OCR
+software error that repair doesn't even log it as a correction.
+
+In some cases, repair can miscorrect a line and go on to the next line,
+possibly even more than once, finally giving up a few lines below the actual
+error. If you are having trouble spotting the error, one helpful trick is to
+exit the editor and let repair try to fix the page again, but interrupt it
+while it is still working on the first line, before it has found the
+miscorrection.
+
+The Nasty Lines
+
+Some lines of code, particularly those containing long runs of underscore or
+minus characters, are particularly difficult to scan reliably. The repair
+program has a special "nasty lines" feature to deal with this. If a file
+named "nastylines" (or as specified by the -l option) exists, they are
+checksummed and are considered as total replacements for any input line with
+the same checksum. So, for example, if you place a blank line in the
+nastylines file, any scanner noise on blank lines will be ignored.
+
+The "nastylines" file is re-read every time repair restarts after an edit,
+so you can add more lines as the program runs. (The error-correction patterns
+should be done this way, too, but that'll have to wait for the next release.)
+
+Sortpages
+
+If, in the course of scanning, the pages have been split up or have gotten
+out of order, a perl script called sortpages can restore them to the proper
+order. It can merge multiple input files, discard duplicates, and warns about
+any missing pages it encounters. This script requires that the pages have
+been repaired, so that the page headers can be read reliably. The repair
+program does not care about the order it works on pages in; it examines each
+page independently. Unmunge, however, does need the pages in order.
+
+Unmunging
+
+After repair has finished its work, the unmunge program strips out the
+checksums and, based on the page headers, divides the data up among various
+files. Its first argument is the file to unpack. The optional second argument
+is a manifest file that lists all of the files and the directories they go
+in. Supplying this (an excellent idea) lets unmunge recreate a directory
+hierarchy and warn about missing files.
+
+When you have unmunged everything and reconstructed the original source code,
+you are done. Unmunge verifies all of the checksums independently of repair,
+as a sanity check, and you can have high confidence that the files are
+exactly the same as the originals that were printed.
+
+
+
+BOOTSTRAPPING
+-------------
+
+There's a problem using the postprocessing tools to correct OCR errors, when
+the code being OCRed is the tools themselves. We've tried to provide a
+reasonably easy way to get the system up and running starting from nothing
+but a copy of OmniPage.
+
+You could just scan all of the tools in, correct any errors by hand, delete
+the error-checking information in a text editor, and compile them. But
+finding all the errors by hand is painful in a body of code that large.
+With the aid of perl (version 5), which provides a lot of power in very
+little code, we have provided some utilities to make this process easier.
+
+The first-stage bootstrap is a one-page perl script designed to be as small
+and simple as possible, because you'll have to hand-correct it. It can verify
+the checksums on each line, and drop you into the editor on any lines where
+an error has occurred. It also knows how to strip out the visible spaces and
+tabs, how to correct spacing errors after visible tab characters, and how to
+invoke an editor on the erroneous line.
+
+Scan in the first-stage bootstrap as carefully as possible, using OmniPage's
+warnings to guide you to any errors, and either use a text editor or the
+one-line perl command at the top of the file to remove the checksums and
+convert any funny printed characters to whitespace form.
+
+The first thing to do is try running it on itself, and correct any errors you
+find this way. Note that the script writes its output to the file named in
+the page header, so you should name your hand-corrected version differently
+(or put it in a different directory) to avoid having it overwritten.
+
+The second-stage bootstrap is a much denser one-pager, with better error
+detection; it can detect missing lines and missing pages, and takes an
+optional second argument of a manifest file which it can use to put files
+in their proper directories. It's not strictly necessary, but it's only one
+more (dense) page and you can check it against itself and the original
+bootstrap.
+
+Both of the botstrap utilities can correct tab spacing errors in the OCR
+output. Although this doesn't matter in most source code, it is included
+in the checksums.
+
+Once you have reached this point, you can scan in the C code for repair and
+unmunge. The C unmunge is actually less friendly than the bootstrap
+utilities, because it is only intended to work with the output of repair.
+It is, however, much faster, since computing CRCs a bit at a time in an
+interpreted language is painfully slow for large amounts of data. It can
+also deal with binary files printed in radix-64.
+
+
+
+PRINTING
+--------
+
+Despite the title of this book, this process of producing a book is not well
+documented, since it's been evolving up to the moment of publication. There,
+is, however, a very useful working example of how to produce a book
+(strikingly similar to this book) in the example directory, all controlled
+by a Makefile.
+
+Briefly, a master perl script called psgen takes three parameters: a file
+list, a page numbers file to write to, and a volume number (which should
+always be 1 for a one-volume book). It runs the listed files through the
+munge utility, wraps them in some simple PostScript, and prepends a prolog
+that defines the special characters and PostScript functions needed by the
+text.
+
+The file list also includes per-file flags. The most important is the
+text/binary marker. Text files can also have a tab width specified, although
+munge knows how to read Emacs-style tab width settings from the end of a
+source file.
+
+The prolog is assembled from various other files and defines by psgen using
+a simple preprocessor called yapp (Yet Another Preprocessor). This process
+includes some book-specific information like the page footer.
+
+Producing the final PostScript requires the necessary non-standard fonts
+(Futura for the footers and OCRB for the code) and the psutils package,
+which provides the includeres utility used to embed the fonts in the
+PostScript file. The fonts should go in the books/ps directory, as
+"Futura.pfa" and the like.
+
+The pagenums file can be used to produce a table of contents. For this book,
+we generated the front matter (such as this chapter) separately, told psgen
+to start on the next page after this, and concatenated the resultant
+PostScript files for printing. The only trick was making the page footers
+look identical.
diff --git a/example/.cvsignore b/example/.cvsignore
new file mode 100644
index 0000000..076540c
--- /dev/null
+++ b/example/.cvsignore
@@ -0,0 +1,3 @@
+pagenums
+MANIFEST
+code.ps
diff --git a/example/Makefile b/example/Makefile
new file mode 100644
index 0000000..a3e3a82
--- /dev/null
+++ b/example/Makefile
@@ -0,0 +1,23 @@
+BOOKROOT=..
+TOOLSDIR=$(BOOKROOT)/tools
+PSDIR=$(BOOKROOT)/ps
+YAPP=$(TOOLSDIR)/yapp
+MAKEMANIFEST=$(TOOLSDIR)/makemanifest
+PSGEN=BOOKROOT=$(BOOKROOT) $(TOOLSDIR)/psgen
+INCLUDERES=(cd $(PSDIR); includeres)
+
+code.ps pagenums: filelist footer.ps MANIFEST books
+	$(PSGEN) -P2 -l3 -DfooterFile=footer.ps filelist pagenums 1 \
+		| $(INCLUDERES) > code.ps
+
+books:
+	ln -s $(BOOKROOT) books
+
+MANIFEST: filelist
+	$(MAKEMANIFEST) $< > $@
+
+clean:
+	rm -f `cat .cvsignore`
+
+gv%: %.ps
+	gv $<
diff --git a/example/filelist b/example/filelist
new file mode 100644
index 0000000..887c718
--- /dev/null
+++ b/example/filelist
@@ -0,0 +1,32 @@
+V 1 8
+T MANIFEST
+D books/
+D books/tools/
+T books/tools/bootstrap
+T books/tools/bootstrap2
+T4 books/tools/sortpages
+T books/tools/Makefile
+T books/tools/heap.c
+T books/tools/heap.h
+T books/tools/mempool.c
+T books/tools/mempool.h
+T books/tools/util.c
+T books/tools/util.h
+T books/tools/repair.c
+T books/tools/subst.c
+T books/tools/subst.h
+T books/tools/unmunge.c
+T books/tools/munge.c
+T books/tools/yapp.doc
+T4 books/tools/yapp
+T4 books/tools/psgen
+T4 books/tools/makemanifest
+D books/ps/
+T books/ps/prolog.ps
+T books/ps/charmap.ps
+D books/example/
+T books/example/Makefile
+T books/example/.cvsignore
+T books/example/filelist
+T books/example/footer.ps
+B books/example/us-constitution.gz
diff --git a/example/footer.ps b/example/footer.ps
new file mode 100644
index 0000000..52f6b7b
--- /dev/null
+++ b/example/footer.ps
@@ -0,0 +1,5 @@
+% A program to print the page footer, using the magic P function,
+% which takes a string and a font.
+(Tools for Publishing Source Code via OCR ) /Futura P
+(\343) /Symbol P	% Copyright symbol
+( 1997 Pretty Good Privacy, Inc.) /Futura P
diff --git a/example/us-constitution.gz b/example/us-constitution.gz
new file mode 100644
index 0000000..1a058ca
Binary files /dev/null and b/example/us-constitution.gz differ
diff --git a/ps/charmap.ps b/ps/charmap.ps
new file mode 100644
index 0000000..1602072
--- /dev/null
+++ b/ps/charmap.ps
@@ -0,0 +1,68 @@
+%%BeginResource: procset Latin1-vec 0 0
+/Latin1-vec [
+/.notdef	/.notdef	/.notdef	/.notdef
+/.notdef	/.notdef	/.notdef	/.notdef	
+/.notdef	/.notdef	/.notdef	/.notdef	
+/.notdef	/.notdef	/.notdef	/.notdef	
+/.notdef	/.notdef	/.notdef	/.notdef	
+/.notdef	/.notdef	/.notdef	/.notdef	
+/.notdef	/.notdef	/.notdef	/.notdef	
+/.notdef	/.notdef	/.notdef	/.notdef	
+/space		/exclam		/quotedbl	/numbersign	
+/dollar		/percent	/ampersand	/${rightQuoteGlyph}
+/parenleft	/parenright	/asterisk	/plus	
+/comma		/hyphen		/period		/slash	
+/${zeroGlyph}	/one		/two		/three	
+/four		/five		/six		/seven	
+/eight		/nine		/colon		/semicolon	
+/less		/equal		/greater	/question	
+/at		/A		/B		/C		
+/D		/E		/F		/G		
+/H		/I		/J		/K		
+/L		/M		/N		/O		
+/P		/Q		/R		/S		
+/T		/U		/V		/W		
+/X		/Y		/Z		/bracketleft		
+/backslash	/bracketright	/asciicircum	/${underscoreGlyph}
+/${leftQuoteGlyph} /a		/b		/c		
+/d		/e		/f		/g		
+/h		/i		/j		/k		
+/l		/m		/n		/o		
+/p		/q		/r		/s		
+/t		/u		/v		/w		
+/x		/y		/z		/braceleft		
+/${barGlyph}	/braceright	/tilde		/.notdef
+/.notdef	/.notdef	/.notdef	/.notdef	
+/.notdef	/.notdef	/.notdef	/.notdef	
+/.notdef	/.notdef	/.notdef	/.notdef	
+/.notdef	/.notdef	/.notdef	/.notdef	
+/.notdef	/.notdef	/.notdef	/.notdef	
+/.notdef	/.notdef	/.notdef	/.notdef	
+/.notdef	/.notdef	/.notdef	/.notdef	
+/.notdef	/.notdef	/.notdef	/.notdef	
+/space		/exclamdown	/cent		/sterling	
+/${tabGlyph}	/yen		/brokenbar	/section	
+/dieresis	/copyright	/ordfeminine	/guillemotleft	
+/logicalnot	/hyphen		/registered	/macron	
+/degree		/plusminus	/twosuperior	/threesuperior
+/acute		/mu		/${pilcrowGlyph} /${bulletGlyph}
+/cedilla	/dotlessi	/ordmasculine	/guillemotright	
+/onequarter	/onehalf	/threequarters	/questiondown	
+/Agrave		/Aacute		/Acircumflex	/Atilde	
+/Adieresis	/Aring		/AE		/Ccedilla	
+/Egrave		/Eacute		/Ecircumflex	/Edieresis	
+/Igrave		/Iacute		/Icircumflex	/Idieresis	
+/Eth		/Ntilde		/Ograve		/Oacute	
+/Ocircumflex	/Otilde		/Odieresis	/multiply	
+/Oslash		/Ugrave		/Uacute		/Ucircumflex	
+/Udieresis	/Yacute		/Thorn		/germandbls	
+/agrave		/aacute		/acircumflex	/atilde	
+/adieresis	/aring		/ae		/ccedilla	
+/egrave		/eacute		/ecircumflex	/edieresis	
+/igrave		/iacute		/icircumflex	/idieresis	
+/eth		/ntilde		/ograve		/oacute	
+/ocircumflex	/otilde		/odieresis	/divide	
+/oslash		/ugrave		/uacute		/ucircumflex	
+/udieresis	/yacute		/thorn		/ydieresis	
+]def
+%%EndResource
diff --git a/ps/prolog.ps b/ps/prolog.ps
new file mode 100644
index 0000000..d3bf8c9
--- /dev/null
+++ b/ps/prolog.ps
@@ -0,0 +1,306 @@
+##set pageNumFont="Futura"
+##set dirNameFont="Futura-Heavy"
+##set fontsNeeded="${font} Symbol Futura Futura-Heavy"
+##set includeFontComments=<<"END"
+%%IncludeResource: font ${font}
+%%IncludeResource: font Symbol
+%%IncludeResource: font Futura
+%%IncludeResource: font Futura-Heavy
+END
+##if ${font} eq Courier
+##set charShrinkFactor=0.93
+##set zeroGlyph=Oslash
+##set underscoreGlyph=underscore
+##set bulletGlyph=bullet
+##set tabGlyph=currency
+##set leftQuoteGlyph=quoteleft
+##set rightQuoteGlyph=quoteright
+##set pilcrowGlyph=paragraph
+##set barGlyph=bar
+##else
+##set charShrinkFactor=1
+##set zeroGlyph=Oslash
+##set underscoreGlyph=underscore2
+##set bulletGlyph=bullet2
+##set tabGlyph=tabsym
+##set leftQuoteGlyph=grave
+##set rightQuoteGlyph=quoteright	### was "acute"
+##set pilcrowGlyph=erase
+##set barGlyph=orsym
+##set do_custom_chars=1
+##endif
+%!PS-Adobe-3.0
+%%Orientation: Portrait
+%%Pages: (atend)
+%%DocumentNeededResources: font ${fontsNeeded}
+%%DocumentMedia: Letter 612 792 74 white ()
+%%EndComments
+%%BeginDefaults
+%%PageMedia: Letter
+%%PageResources: font ${fontsNeeded}
+%%EndDefaults
+%%BeginProlog
+%%BeginResource: procset Custom-Preamble 0 0
+%
+% Document definitions
+% (Upper case to avoid collisions)
+%
+
+% 8.5x11 paper is 612x792 points, but 24 points near the edge or so
+% shouldn't be used.
+/Topmargin 770 def
+/Leftmargin 30 def
+/Rightmargin 612 Leftmargin sub def
+/Botmargin 22 def
+/Bindoffset 40 def
+
+/Lineskip -10 def
+% How much to shrink characters by?
+/Factor ${charShrinkFactor} def
+/Fontsize 9.5 Factor mul def
+% (1000 units is std height, so Courier at 6/10 aspect ratio is 600.
+% Widen to make up for scaling loss.
+/Charwidth
+  Rightmargin Leftmargin sub Bindoffset sub 87 div Fontsize div 1000 mul
+def
+
+% Print a header (expects page number on stack)
+/OddPageStart
+{ save exch /MyFont findfont Fontsize scalefont setfont 
+  /CurrentLeft Leftmargin Bindoffset add def
+  /CurrentRight Rightmargin def
+  CurrentLeft Topmargin moveto } def
+
+/EvenPageStart
+{ save exch /MyFont findfont Fontsize scalefont setfont 
+  /CurrentLeft Leftmargin def
+  /CurrentRight Rightmargin Bindoffset sub def
+  CurrentLeft Topmargin moveto } def
+
+% /MyFont findfont [Fontsize 0 0 Fontsize 0 0] makefont setfont
+
+% Print the name of the directory in a large font
+/DirPage
+{
+  /${dirNameFont} findfont 14 scalefont setfont
+  0 -10 rmoveto (Directory) show
+  CurrentLeft 30 add currentpoint exch pop 20 sub moveto show
+} def
+
+% Advance a line
+/L {show CurrentLeft currentpoint exch pop Lineskip add moveto} bind def 
+
+% Print the "inside" footer line using P (string font => )
+% We do some magic involving redefining P to first measure the
+% width of this string and then print it, so you must use it
+% to do all printing.
+/Foot {
+##ifdef footerFile
+##include "${footerFile}"
+##endif
+} def
+
+% /P is defined in the Setup section
+
+% Print an odd footer
+/OddPageEnd
+ { CurrentLeft Botmargin moveto CurrentRight Botmargin lineto
+   1 setlinewidth stroke
+   CurrentLeft Botmargin 10 sub moveto
+   Foot
+   10 string cvs dup stringwidth
+   pop CurrentRight exch sub currentpoint exch pop moveto
+   /${pageNumFont} P
+   showpage
+   restore
+} def
+
+% Print an even footer
+/EvenPageEnd
+ { CurrentLeft Botmargin moveto CurrentRight Botmargin lineto
+   1 setlinewidth stroke
+   Leftmargin Botmargin 10 sub moveto
+   /${pageNumFont} P 
+   CurrentRight FootWidth sub currentpoint exch pop moveto
+   Foot
+   showpage
+   restore
+} def
+
+##ifdef do_custom_chars
+% A 1000-point OCRB discunderline consists of:
+% 111.45  -173.688 moveto
+% 609.356 -173.688 lineto
+% 609.356  -70.9227 lineto
+% 111.45   -70.9227 lineto
+% closepath
+% 720.0    -0.0 moveto
+% Line thickness is
+% 102.7653 pts.
+
+% This would suggest the following values:
+/underleft 111.45 def
+/underright 609.356 def
+/underthick 102.7643 def
+/underup underthick def
+/underdown 0 def
+/underserif 25 def
+
+% These look better in GhostScript, but not on a real Adobe rasterizer
+%/underright 600 def
+%/underleft 100 def
+%/underthick 75 def
+
+171
+211
+36081
+% The default bullet character is
+% 254.0 341.0 moveto
+% 254.0 170.0 lineto
+% 465.0 170.0 lineto
+% 465.0 341.0 lineto
+% closepath
+% Our modified version is based on:
+/bullwid 204 def
+/bullht 176.75 def
+/bullleft 254 341 add bullwid sub 2 div def
+/bullright 254 341 add bullwid add 2 div def
+/bullbot 254 def
+/bulltop bullbot bullht add def
+
+% And a custom-created tab symbol
+/tableft 250 def
+/tabright 550 def
+/tabtop 550 def
+/tabbot 50 def
+/tablinewidth 35 def
+
+% Let's try a vertical bar
+% OCRB defines (|)
+% 411.062 -173.688 moveto
+% 411.062 741.043 lineto
+% 308.297 741.043 lineto
+% 308.297 -173.688 lineto
+% closepath
+% 720.0 -0.0 moveto
+/orleft 308.297 def
+/orright 411.062 def
+/orbot -173.688 def
+/ortop 741.043 def
+/orbreak 150 def	% Width of break
+/orbbot ortop orbot add orbreak sub 2 div def	% Bottom of break
+/orbtop ortop orbot add orbreak add 2 div def	% Top of break
+##endif
+
+% newfontname encoding-vec fontname -> -	make a new encoded font
+/MF2 {
+  % Make a dict for the new font, with room for the /Metrics
+  findfont dup length 1 add dict begin
+  % Copy everything except the FID entry
+  {1 index /FID eq {pop pop} {def} ifelse} forall
+  % Set the encoding vector
+  /Encoding exch def
+
+##ifdef do_custom_chars
+  % Create a new expanded CharStrings dictionary
+  CharStrings dup length 5 add dict
+  begin { def } forall
+  % Create a custom underscore character
+  /underscore2 {
+	pop
+	//Charwidth 0 % width, bounding box follows
+	//underleft //underdown neg //underright //underthick //underup add
+	setcachedevice
+	//underleft //underthick //underup add moveto
+	//underleft //underserif add //underthick //underup add lineto
+	//underleft //underserif add //underthick lineto
+	//underright //underserif sub //underthick lineto
+	//underright //underserif sub //underthick //underup add lineto
+	//underright //underthick //underup add lineto
+	//underright //underdown neg lineto
+	//underright //underserif sub //underdown neg lineto
+	//underright //underserif sub 0 lineto
+	//underleft //underserif add 0 lineto
+	//underleft //underserif add //underdown neg lineto
+	//underleft //underdown neg lineto
+	closepath fill
+  } bind def
+  % Create a custom bullet character.
+  /bullet2 {
+	pop
+	//Charwidth 0 % width, bounding box follows
+	//bullleft //bullbot //bullright //bulltop
+	setcachedevice
+	//bullleft //bullbot moveto
+	//bullleft bullright add 2 div bulltop lineto
+	//bullright //bullbot lineto
+	closepath fill
+  } bind def
+  % Create a custom tab character.
+  /tabsym {
+	pop
+	//Charwidth 0 % width, bounding box follows
+	//tableft //tablinewidth sub //tabbot //tablinewidth sub
+	//tabright //tablinewidth add //tabtop //tablinewidth add
+	setcachedevice
+	//tablinewidth setlinewidth
+	true setstrokeadjust
+	0 setlinejoin
+	//tableft //tabbot moveto
+	//tabright //tabtop //tabbot add 2 div lineto
+	//tableft //tabtop lineto
+	closepath stroke
+  } bind def
+  /orsym {
+	pop
+	//Charwidth 0 % width, bounding box follows
+	//orleft //orbot //orright //ortop
+	setcachedevice
+	//orleft //orbot moveto
+	//orleft //orbbot lineto
+	//orright //orbbot lineto
+	//orright //orbot lineto
+	closepath
+	//orleft //ortop moveto
+	//orleft //orbtop lineto
+	//orright //orbtop lineto
+	//orright //ortop lineto
+	closepath fill
+  } bind def
+  /CharStrings currentdict end def
+##endif
+
+  % Create a new dict to be the /Metrics values
+  CharStrings dup length dict
+  % Now fill in the metrics dict with the desired width
+  begin { pop Charwidth def } forall /Metrics currentdict end def
+  % End of definitions
+  currentdict end 
+  % Define the font
+  definefont pop
+} bind def
+
+% Check PostScript language level.
+/gs_languagelevel /languagelevel where { pop languagelevel } { 1 } ifelse def
+
+%%EndResource
+##include "charmap.ps"
+${includeFontComments}
+%%EndProlog
+
+
+%%BeginSetup
+
+/MyFont Latin1-vec /${font} MF2
+/#copies 1 def
+
+% Compute the width of the /Foot string, by defining P to
+% add up the x-width of the characters.
+/P { findfont 9 scalefont setfont stringwidth pop add } def
+/FootWidth 0 Foot def
+% Redefine P to print, as usual
+/P { findfont 9 scalefont setfont show } def
+%%BeginResource: procset foo 0 0
+% This is an example
+%%EndResource
+%%EndSetup
diff --git a/tools/Makefile b/tools/Makefile
new file mode 100644
index 0000000..138d5b7
--- /dev/null
+++ b/tools/Makefile
@@ -0,0 +1,30 @@
+all: unmunge repair munge
+
+OPT = -g -O -W -Wall
+COMMON_OBJS = util.o
+
+UNMUNGE_OBJS = $(COMMON_OBJS) unmunge.o
+MUNGE_OBJS = $(COMMON_OBJS) munge.o
+REPAIR_OBJS = $(COMMON_OBJS) heap.o mempool.o subst.o repair.o
+
+unmunge: $(UNMUNGE_OBJS)
+	$(CC) $(OPT) -o $@ $(UNMUNGE_OBJS)
+
+munge: $(MUNGE_OBJS)
+	$(CC) $(OPT) -o $@ $(MUNGE_OBJS)
+
+repair: $(REPAIR_OBJS)
+	$(CC) $(OPT) -o $@ $(REPAIR_OBJS)
+
+.c.o:
+	$(CC) $(OPT) -o $@ -c $<
+
+clean:
+        -rm -f *.o munge unmunge repair core *.core
+
+unmunge.o: util.h
+munge.o: util.h
+repair.o: heap.h mempool.h util.h subst.h
+heap.o: heap.h
+mempool.o: mempool.h
+subst.o: subst.h
diff --git a/tools/bootstrap b/tools/bootstrap
new file mode 100644
index 0000000..768aae5
--- /dev/null
+++ b/tools/bootstrap
@@ -0,0 +1,68 @@
+#!/usr/bin/perl -s
+#
+# bootstrap -- Simpler version of unmunge for bootstrapping
+#
+# Unmunge this file using:
+#   perl -ne 'if (s/^ *[^-\s]\S{4,6} ?//) { s/[\244\245\267]/ /g; print; }'
+#
+# $Id: bootstrap,v 1.15 1997/11/14 03:52:53 mhw Exp $
+
+sub Fatal	{ print STDERR @_;  exit(1); }
+sub Max		{ my ($a, $b) = @_;  ($a > $b) ? $a : $b; }
+sub TabSkip	{ $tabWidth - 1 - (length($_[0]) % $tabWidth); }
+
+($tab,$yen,$pilc,$cdot,$tmp1,$tmp2)=("\244","\245","\266","\267","\377","\376");
+$editor = $ENV{'VISUAL'} || $ENV{'EDITOR'} || 'vi';
+$inFile = $ARGV[0];
+doFile: {
+    open(IN, "<$inFile") || die;
+    for ($lineNum = 1; ($_ = <IN>); $lineNum++) {
+	s/^\s+//;  s/\s+$//;	# Strip leading and trailing spaces
+	next if (/^$/);		# Ignore blank lines
+	($prefix, $seenCRCStr, $dummy, $_) = /^(\S{2})(\S{4})( (.*))?/;
+
+	# Correct the number of spaces after each tab
+	while (s/$tab( *)/$tmp1 . ($tmp2 x &Max(length($1), &TabSkip($`)))/e) {}
+	s/ ( +)/" " . ($cdot x length($1))/eg;	# Correct center dots
+	s/$tmp1/$tab/g;  s/$tmp2/ /g;  # Restore tabs and spaces from correction
+	s/\s*$/\n/;		# Strip trailing spaces, and add a newline
+
+	$crc = $seenCRC = 0;			# Calculate CRC
+	for ($data = $_; $data ne ""; $data = substr($data, 1)) {
+	    $crc ^= ord($data);
+	    for (1..8) {
+		$crc = ($crc >> 1) ^ (($crc & 1) ? 0x8408 : 0);
+	    }
+	}
+	if ($crc != hex($seenCRCStr)) {		# CRC mismatch
+	    close(IN);  close(OUT);
+	    unlink(@filesCreated);
+	    @filesCreated = ();
+	    @oldStat = stat($inFile);
+	    system($editor, "+$lineNum", $inFile);
+	    @newStat = stat($inFile);
+	    redo doFile if ($oldStat[9] != $newStat[9]);  # Check mod date
+	    &Fatal("Line $lineNum invalid: $_");
+	}
+
+	if ($prefix eq '--') {			# Process header line
+	    ($code, $pageNum, $file) = /^(\S{19}) Page (\d+) of (.*)/;
+	    $tabWidth = hex(substr($code, 11, 1));
+	    if ($file ne $lastFile) {
+		print "$file\n";
+		&Fatal("$file: already exists\n") if (!$f && (-e $file));
+		close(OUT);
+		open(OUT, ">$file") || &Fatal("$file: $!\n");
+		push(@filesCreated, ($lastFile = $file));
+	    }
+	} else {				# Unmunge normal line
+	    s/$tab( *)/"\t".(" " x (length($1) - &TabSkip($`)))/eg;
+	    s/$yen\n/\f/;	# Handle form feeds
+	    s/$pilc\n//;	# Handle continuation lines
+	    s/$cdot/ /g;	# Center dots -> spaces
+
+	    print OUT;
+	}
+    }
+    close(IN);  close(OUT);
+}
diff --git a/tools/bootstrap2 b/tools/bootstrap2
new file mode 100644
index 0000000..4bba127
--- /dev/null
+++ b/tools/bootstrap2
@@ -0,0 +1,72 @@
+#!/usr/bin/perl -s
+#
+# bootstrap2 -- Second stage bootstrapper, a version of unmunge
+#
+# $Id: bootstrap2,v 1.4 1997/11/14 03:52:54 mhw Exp $
+
+sub Cleanup	{ close(IN);  close(OUT);  unlink(@files);  @files = (); }
+sub Fatal	{ &Cleanup();  print STDERR @_;  exit(1); }
+sub TabSkip	{ $tabWidth - 1 - (length($_[0]) % $tabWidth); }
+sub TabFix	{ my ($needed, $actual) = (&TabSkip($_[0]), length($_[1]));
+    $tmp1 . ($tmp2 x $needed) . (" " x ($actual - $needed)); }
+sub HumanEdit	{ my ($file, $line, @message) = ($inFile, @_);  &Cleanup();
+    @old = stat($file);  system($editor, "+$line", $file);  @new = stat($file);
+    redo doFile if ($old[9] != $new[9]);	# Check mod date
+    &Fatal("Line $line, ", @message); }
+
+($tab,$yen,$pilc,$cdot,$tmp1,$tmp2)=("\244","\245","\266","\267","\377","\376");
+$editor = $ENV{'VISUAL'} || $ENV{'EDITOR'} || 'vi';
+($inFile, $manifest, @rest) = @ARGV;
+if ($manifest ne "") {		# Read manifest file
+    open(MANIFEST, "<$manifest") || &Fatal("$manifest: $!\n");
+    while (<MANIFEST>) { $dir = $1 if /^D\s+(.*)$/;
+	$index[$1] = $dir . $2 if /^(\d+)\s+(.*)$/; }
+}
+doFile: {
+    $seenPCRC = $pcrc1 = 0;  $lastFlags = 1;  $lastFileNum = 0;
+    open(IN, "<$inFile") || &Fatal("$inFile: $!\n");
+    for ($line = 1; ($_ = <IN>); $line++) {
+	s/^\s+//;  s/\s+$//;	# Strip leading and trailing spaces
+	next if (/^$/);		# Ignore blank lines
+	($prefix, $seenCRCStr, $dummy, $_) = /^(\S{2})(\S{4})( (.*))?/;
+	while (s/$tab( *)/&TabFix($`, $1)/eo) {}  # Correct spaces after tabs
+	s/($tmp2| )( +)/$1 . ($cdot x length($2))/ego;	# Correct center dots
+	s/$tmp1/$tab/go;  s/$tmp2/ /go;  # Restore tabs/spaces from correction
+	s/\s*$/\n/;		# Strip trailing spaces, and add a newline
+
+	$crc = 0;  $pcrc = $pcrc1;		# Calculate CRCs
+	for ($data = $_; $data ne ""; $data = substr($data, 1)) {
+	    $crc ^= ord($data);  $pcrc1 ^= ord($data);
+	    for (1..8) { $crc = ($crc >> 1) ^ (($crc & 1) ? 0x8408 : 0);
+		$pcrc1 = ($pcrc1 >> 1) ^ (($pcrc1 & 1) ? 0xedb88320 : 0); }
+	}
+	($seenPLCRC, $seenCRC) = map { hex($_) } ($prefix, $seenCRCStr);
+	&HumanEdit($line, "CRC failed: $_") if $crc != $seenCRC;
+	if ($prefix eq '--') {			# Process header line
+	    &HumanEdit($line - 1, "Page CRC failed") if $pcrc != $seenPCRC;
+	    ($humanHdr, $pageNum, $file) = /^\S{19} (Page (\d+) of (.*))/;
+	    ($vers, $flags, $seenPCRC, $tabWidth, $prodNum, $fileNum) =
+		map { hex($_) } /^(\S)(\S\S)(\S{8})(\S)(\S{3})(\S{4})/;
+	    if ($fileNum != $lastFileNum) {
+		print STDERR "MISSING files\n" if $fileNum != $lastFileNum + 1;
+		&Fatal("Missing pages\n") if $pageNum != 1 || !($lastFlags & 1);
+		if ($manifest ne "") {
+		    ($_ = $index[$fileNum]) =~ m%([^/]*)$%;
+		    &Fatal("Manifest mismatch\n") if ($file ne $1);
+		    ($file = $_) =~ s|/+|mkdir($`, 0777), "/"|eg;  # mkdir -p
+		}
+		&Fatal("$file: already exists\n") if (!$f && (-e $file));
+		close(OUT);  open(OUT, ">$file") || &Fatal("$file: $!\n");
+		push(@files, $file);  print "$fileNum $file\n";
+	    } else {
+		&Fatal("MISSING pages\n") if ($pageNum != $lastPageNum + 1);
+	    }
+	    ($lastFlags,$lastFileNum,$lastPageNum) = ($flags,$fileNum,$pageNum);
+	    $pcrc1 = 0;
+	} else {				# Unmunge normal line
+	    &HumanEdit($line, "CRC failed: $_") if ($pcrc1 >> 24) != $seenPLCRC;
+	    s/$tab( *)/"\t".(" " x (length($1) - &TabSkip($`)))/ego;
+	    s/$yen\n/\f/o;  s/$pilc\n//o;  s/$cdot/ /go;  print OUT;
+	}
+    }
+}
diff --git a/tools/heap.c b/tools/heap.c
new file mode 100644
index 0000000..6d0474c
--- /dev/null
+++ b/tools/heap.c
@@ -0,0 +1,144 @@
+/*
+ * heap.c -- Simple priority queue.  Takes pointers to cost values
+ * (presumably the first field in a larger structure) and returns
+ * them in increasing order of cost.
+ *
+ * Copyright (C) 1997 Pretty Good Privacy, Inc.
+ *
+ * Written by Colin Plumb and Mark H. Weaver
+ *
+ * $Id: heap.c,v 1.2 1997/07/05 02:55:23 colin Exp $
+ */
+
+#include <stdio.h>	/* For fprintf(stderr, "Out of memory") */
+#include <stdlib.h>	/* For malloc() & co. */
+
+#include "heap.h"
+
+#define HeapParent(i)			((i) / 2)
+#define HeapLeftChild(i)		((i) * 2)
+#define HeapRightChild(i)		((i) * 2 + 1)
+#define HeapElem(h, i)			(h)->elems[i]
+#define HeapMinElem(h)			HeapElem(h, 1)
+#define HeapElemCost(e)			(*(e))
+#define HeapCost(h, i)			HeapElemCost(HeapElem(h, i))
+#define HeapSize(h)				((h)->numElems)
+
+static void
+SiftDown(Heap const *heap, HeapCost *e)
+{
+	HeapIndex size = HeapSize(heap), parent = 1, child;
+	HeapCost cparent = HeapElemCost(e), cchild;
+
+	for (;;) {
+		child = 2*parent;
+		if (child > size)
+			break;
+		cchild = HeapCost(heap, child);
+		if (child < size && cchild > HeapCost(heap, child+1)) {
+			cchild = HeapCost(heap, child+1);
+			child++;
+		}
+		if (cparent <= cchild)
+			break;	/* Stop sifting down */
+		HeapElem(heap, parent) = HeapElem(heap, child);
+		parent = child;
+	}
+	HeapElem(heap, parent) = e;
+}
+
+/* Debug tool: verify heap property */
+void
+HeapVerify(Heap *heap)
+{
+	HeapIndex i;
+
+	for (i = 2; i <= HeapSize(heap); i++)
+		if (HeapCost(heap, i) < HeapCost(heap, HeapParent(i)))
+			fprintf(stderr, "DEBUG: VerifyHeap failed at elem %d\n", i);
+}
+
+/* Remove and return the minimum cost from the heap. */
+HeapCost *
+HeapGetMin(Heap *heap)
+{
+	HeapIndex lastElem = HeapSize(heap);
+	HeapCost *retval;
+
+	if (!lastElem)
+		return NULL;
+	retval = HeapMinElem(heap);
+	HeapSize(heap) = lastElem-1;
+	SiftDown(heap, HeapElem(heap, lastElem));
+	return retval;
+}
+
+/* Helper - set heap size, reallocating if needed */
+static void
+HeapResize(Heap *heap, HeapIndex newNumElems)
+{
+	if (newNumElems >= heap->elemsAllocated) {
+		HeapIndex newAllocSize = heap->elemsAllocated * 2;
+
+		if (newAllocSize <= newNumElems)
+			newAllocSize = newNumElems + 1;
+		heap->elems = (HeapCost **)realloc((void *)heap->elems,
+									  sizeof(*heap->elems) * newAllocSize);
+		if (heap->elems == NULL) {
+			fprintf(stderr, "Fatal error: Out of memory growing heap\n");
+			exit(1);
+		}
+		heap->elemsAllocated = newAllocSize;
+	}
+	heap->numElems = newNumElems;
+}
+
+/* Add an element to the heap */
+void
+HeapInsert(Heap *heap, HeapCost *newElem)
+{
+	HeapIndex parent, i = ++HeapSize(heap);
+	HeapCost cost = HeapElemCost(newElem);
+
+	HeapResize(heap, i);
+	/* Sift up until parent = 0 */
+	while ((parent = HeapParent(i)) && HeapCost(heap, parent) > cost) {
+		HeapElem(heap, i) = HeapElem(heap, parent);
+		i = parent;
+	}
+	heap->elems[i] = newElem;
+}
+
+/* Initialize a new heap */
+void
+HeapInit(Heap *heap, HeapIndex initSize)
+{
+	initSize++;	/* Add one for temporary element */
+	if (initSize < 1)
+		initSize = 1;
+	heap->elems = (HeapCost **)malloc(initSize * sizeof(*heap->elems));
+	if (heap->elems == NULL) {
+		fprintf(stderr, "Fatal error: Out of memory creating heap\n");
+		exit(1);
+	}
+	heap->elemsAllocated = initSize;
+	heap->numElems = 0;
+}
+
+/* Free up a heap's resources. */
+void
+HeapDestroy(Heap *heap)
+{
+	free((void *)heap->elems);
+	heap->elemsAllocated = 0;
+	heap->numElems = 0;
+	heap->elems = NULL;
+}
+
+/*
+ * Local Variables:
+ * tab-width: 4
+ * End:
+ * vi: ts=4 sw=4
+ * vim: si
+ */
diff --git a/tools/heap.h b/tools/heap.h
new file mode 100644
index 0000000..36e8782
--- /dev/null
+++ b/tools/heap.h
@@ -0,0 +1,43 @@
+/*
+ * heap.h -- Simple priority queue.  Takes pointers to cost values
+ * (presumably the first field in a larger structure) and returns
+ * them in increasing order of cost.
+ *
+ * Copyright (C) 1997 Pretty Good Privacy, Inc.
+ *
+ * Written by Colin Plumb and Mark H. Weaver
+ *
+ * $Id: heap.h,v 1.6 1997/10/31 04:22:46 mhw Exp $
+ */
+
+#ifndef HEAP_H
+#define HEAP_H 1
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <limits.h>
+
+typedef int HeapCost;
+#define COST_INFINITY INT_MAX
+typedef unsigned HeapIndex;
+
+typedef struct Heap {
+	HeapCost	**elems;
+	HeapIndex	numElems, elemsAllocated;
+} Heap;
+
+void HeapInit(Heap *heap, HeapIndex initSize);
+void HeapDestroy(Heap *heap);
+void HeapInsert(Heap *heap, HeapCost *newElem);
+HeapCost *HeapGetMin(Heap *heap);
+void HeapVerify(Heap *heap);
+
+#endif
+
+/*
+ * Local Variables:
+ * tab-width: 4
+ * End:
+ * vi: ts=4 sw=4
+ * vim: si
+ */
diff --git a/tools/makemanifest b/tools/makemanifest
new file mode 100644
index 0000000..4e8dcc8
--- /dev/null
+++ b/tools/makemanifest
@@ -0,0 +1,31 @@
+#!/usr/bin/perl
+
+$fileNum = 0;
+while(<>)
+{
+	/^([VDTB])(\S*)\s+(.*)/ || die("Bad filelist, line $.");
+	($type, $options, $name) = ($1, $2, $3);
+
+	if ($type eq "D")
+	{
+		$dir = $name;
+		print "D $dir\n";
+	}
+	elsif ($type eq "V")
+	{
+		# Do nothing
+	}
+	else
+	{
+		$fileNum++;
+		$tail = $name;
+		$tail =~ s|^.*/||;
+		die("Bad filelist, line $.") if $name ne $dir . $tail;
+		print "$fileNum $tail\n";
+	}
+}
+
+#
+# vi: ai ts=4
+# vim: si
+#
diff --git a/tools/mempool.c b/tools/mempool.c
new file mode 100644
index 0000000..40e3104
--- /dev/null
+++ b/tools/mempool.c
@@ -0,0 +1,137 @@
+/*
+ * mempool.c - Pooled memory allocation, similar to GNU obstacks.
+ *
+ * $Id: mempool.c,v 1.5 1997/11/13 23:53:08 colin Exp $
+ */
+#include <assert.h>
+#include <stdio.h>
+#include <string.h>
+#include <stdlib.h>	/* For malloc() & free() */
+
+#include "mempool.h"
+
+/*
+ * The memory pool allocation functions
+ *
+ * These are based on a linked list of memory blocks, usually of uniform
+ * size.  New memory is allocated from the tail of the current block,
+ * until that is inadequate, then a new block is allocated.
+ * The entire pool can be freed at once by calling memPoolFree().
+ */
+struct PoolBuf {
+	struct PoolBuf *next;
+	unsigned size;
+	/* Data follows */
+};
+
+/* The prototype empty pool, including the default allocation size. */
+static struct MemPool EmptyPool = { 0, 0, 0, 4096, 0 , 0, 0};
+
+/* Initialize the pool for first use */
+void
+memPoolInit(struct MemPool *pool)
+{
+	*pool = EmptyPool;
+}
+
+/* Set the pool's purge function */
+void
+memPoolSetPurge(struct MemPool *pool, int (*purge)(void *), void *arg)
+{
+	pool->purge = purge;
+	pool->purgearg = arg;
+}
+
+/* Free all the memory in the pool */
+void
+memPoolEmpty(struct MemPool *pool)
+{
+	struct PoolBuf *buf;
+
+	while ((buf = pool->head) != 0) {
+		pool->head = buf->next;
+		free(buf);
+	}
+	pool->freespace = 0;
+	pool->totalsize = 0;
+}
+
+
+/*
+ * Restore a pool to a marked position, freeing subsequently allocated
+ * memory.
+ */
+void
+memPoolCutBack(struct MemPool *pool, struct MemPool const *cutback)
+{
+	struct PoolBuf *buf;
+
+	assert(pool);
+	assert(cutback);
+	assert(pool->totalsize >= cutback->totalsize);
+
+	while((buf = pool->head) != cutback->head) {
+		pool->head = buf->next;
+		free(buf);
+	}
+	*pool = *cutback;
+}
+
+/*
+ * Allocate a chunk of memory for a structure.  Alignment is assumed to be
+ * a power of 2.  It could be generalized, if that ever becomes relevant.
+ * Note that alignment is from the beginning of an allocated chunk, which
+ * is guaranteed by ANSI to be as aligned as can possibly matter.
+ */
+void *
+memPoolAlloc(struct MemPool *pool, unsigned len, unsigned alignment)
+{
+	char *p;
+	unsigned t;
+
+	/* Where to allocate next object */
+	p = pool->freeptr;
+	/* How far it is from the beginning of the chunk. */
+	t = p - (char *)pool->head;
+	/* How much to round up freeptr to make alignment */
+	t = -t & --alignment;
+
+	/* Okay, does it fit? */
+	if (pool->freespace >= len+t) {
+		pool->freespace -= len+t;
+		p += t;
+		pool->freeptr = p + len;
+		return p;
+	}
+
+	/* It does not fit in the current chunk.  Go for a bigger chunk. */
+
+	/* First, figure out how much to skip at the beginning of the chunk */
+	alignment &= -(unsigned)sizeof(struct PoolBuf);
+	alignment += sizeof(struct PoolBuf);
+	/* Then, figure out a chunk size that will fit */
+	t = pool->chunksize;
+	assert(t);
+	while (len + alignment > t)
+		t *= 2;
+	while ((p = malloc(t)) == 0) {
+		/* If that didn't work, try purging or smaller allocations */
+		if (!pool->purge || !pool->purge(pool->purgearg)) {
+			t /= 2;
+			if (len + alignment > t)
+				fputs("Out of memory!\n", stderr);
+				exit (1);	/* Failed */
+		}
+	}
+
+	/* Update the various pointers. */
+	pool->totalsize += t;
+	((struct PoolBuf *)p)->next = pool->head;
+	((struct PoolBuf *)p)->size = t;
+	pool->head = (struct PoolBuf *)p;
+	pool->freespace = t - len - alignment;
+	p += alignment;
+	pool->freeptr = p + len;
+
+	return p;
+}
diff --git a/tools/mempool.h b/tools/mempool.h
new file mode 100644
index 0000000..1732a77
--- /dev/null
+++ b/tools/mempool.h
@@ -0,0 +1,36 @@
+/* $Id: mempool.h,v 1.2 1997/11/13 23:53:09 colin Exp $ */
+
+#ifndef MEMPOOL_H
+#define MEMPOOL_H
+
+typedef struct MemPool {
+	struct PoolBuf *head;
+	char *freeptr;
+	unsigned freespace;
+	unsigned chunksize;	/* Default starting point */
+	unsigned long totalsize;
+	int (*purge)(void *);	/* Return non-zero to retry alloc */
+	void *purgearg;
+} MemPool;
+
+/* A global pool for miscellaneous stuff. */
+extern struct MemPool MiscPool;
+
+/*
+ * Nice clean interfaces
+ */
+void memPoolInit(struct MemPool *pool);
+void memPoolSetPurge(struct MemPool *pool, int (*purge)(void *), void *arg);
+void memPoolEmpty(struct MemPool *pool);
+void memPoolCutBack(struct MemPool *dest, struct MemPool const *cutback);
+void *memPoolAlloc(struct MemPool *pool, unsigned len, unsigned alignment);
+#ifdef DEADCODE
+char const *memPoolStore(struct MemPool *pool, char const *str);
+#endif
+
+/* Lookie here!  An ASNI-compliant alignment finder! */
+#define alignof(type) (sizeof(struct{type _x; char _y;}) - sizeof(type))
+
+#define memPoolNew(pool, type) memPoolAlloc(pool, sizeof(type), alignof(type))
+
+#endif /* MEMPOOL_H */
diff --git a/tools/munge.c b/tools/munge.c
new file mode 100644
index 0000000..965e25a
--- /dev/null
+++ b/tools/munge.c
@@ -0,0 +1,543 @@
+/*
+ * munge.c -- Program to convert a text file into "munged" form,
+ *            suitable for reconstruction from printed form.  Tabs are
+ *            made visible and checksums are added to each line and each
+ *            page to protect against transcription errors.
+ *
+ * Copyright (C) 1997 Pretty Good Privacy, Inc.
+ *
+ * Designed by Colin Plumb, Mark H. Weaver, and Philip R. Zimmermann
+ * Written by Mark H. Weaver
+ *
+ * $Id: munge.c,v 1.32 1997/11/12 23:28:53 mhw Exp $
+ */
+
+#include <stdio.h>
+#include <errno.h>
+#include <string.h>
+#include <ctype.h>
+#include <stdlib.h>
+
+#include "util.h"
+
+/*
+ * The file is divided into pages, and the format of each page is
+ *
+--f414 000b2dc79af40010002 Page 1 of munge.c
+
+bc38e5 /*
+40a838  * munge.c -- Program to convert a text file into munged form
+647222  *
+193f28  * Copyright (C) 1997 Pretty Good Privacy, Inc.
+827222  *
+699025  * Designed by Colin Plumb, Mark H. Weaver, and Philip R. Zimmermann
+0d050c  * Written by Mark H. Weaver
+ *
+ * Where the first 2 columns are the high 8 bits (in hex) of a running
+ * CRC-32 of the page (the string "--", unlikely to be confused with
+ * any digits, indicates a page header line) and the next 4 columns
+ * are a CRC-16 of the rest of the line.  Then a space (not counted in
+ * the CRC), and the line of text.  Tabs are printed as the currency
+ * symbol (ISO Latin 1 character 164) followed by the appropriate number
+ * of spaces, and any form feeds are printed as a yen symbol (Latin 1 165).
+ * The CRC is computed on the transformed line, including the trailing
+ * newline.  No trailing whitespace is permitted.
+ *
+ * The header line contains a (hex) number of the form 0ffcccccccctpppnnnn,
+ * where the digit 0 is a version number, ff are flags, ccccccc is the CRC-32
+ * of the page, t is the tab size (usually 4 or 8; 0 for binary files that
+ * are sent in radix-64), ppp is the product number (usually 1, different
+ * for different books), and nnnn is the file number (sequential from 1).
+ *
+ * This is followed by " Page %u of " and the file name.
+ */
+
+typedef struct MungeState
+{
+	EncodeFormat const *	fmt;
+	EncodeFormat const *	hFmt;
+	int				binaryMode, tabWidth;
+	long			origLineNumber;
+	long			productNumber, fileNumber, pageNumber, lineNumber;
+	unsigned long	fileOffset;
+	CRC				pageCRC;
+	char const *	fileName;
+	char const *	fileNameTail;
+	char *			pageBuffer;	/* Buffer large enough to hold one page */
+	char *			pagePos;	/* Current position in pageBuffer */
+	word16			hdrFlags;
+	FILE *			file;
+	FILE *			out;
+} MungeState;
+
+
+void ChecksumLine(EncodeFormat const *fmt, char const *line, size_t length,
+				  char *prefix, CRC *pageCRC)
+{
+	CRC			lineCRC;
+	CRC			runCRCPart = 0;
+
+	lineCRC = CalculateCRC(fmt->lineCRC, 0, (byte const *)line, length);
+	if (pageCRC != NULL)
+	{
+		*pageCRC = CalculateCRC(fmt->pageCRC, *pageCRC,
+								(byte const *)line, length);
+		runCRCPart = RunningCRCFromPageCRC(fmt, *pageCRC);
+	}
+
+	prefix += EncodeCheckDigits(fmt, runCRCPart, fmt->runningCRCBits, prefix);
+	prefix += EncodeCheckDigits(fmt, lineCRC, fmt->lineCRC->bits, prefix);
+
+	*prefix++ = ' ';	/* Write a space over the null byte */
+}
+
+/* Returns 1 for convenience */
+int PrintFileError(MungeState *state, char const *message)
+{
+	fprintf(stderr, "%s in %s %s %lu\n", message, state->fileName,
+			state->binaryMode ? "offset" : "line",
+			state->binaryMode ? state->fileOffset : state->origLineNumber);
+	return 1;
+}
+
+int MungeLine(MungeState *state, char *buffer, int length,
+			  char *line, int *bufferUsed)
+{
+	int		i = 0, j = 0, jOld = 0;
+	char	ch;
+
+	for (i = 0; i < length && j < LINE_LENGTH; i++)
+	{
+		jOld = j;
+		ch = buffer[i];
+		if (ch == '\t')
+		{
+			line[j++] = TAB_CHAR;
+			if (state->tabWidth < 1)
+				return PrintFileError(state,
+									  "ERROR: Tab found in radix64 stream");
+			else
+				while (j % state->tabWidth && j < LINE_LENGTH)
+					line[j++] = TAB_PAD_CHAR;
+		}
+		else if (ch == '\n')
+		{
+			if (i + 1 < length)
+				return PrintFileError(state,
+								"UNEXPECTED ERROR: fgets read past newline!?");
+			break;
+		}
+		else if (ch == '\f')
+		{
+			break;
+		}
+		else if (ch == ' ' && (j <= 0 || line[j-1] == ' ' ||
+							   line[j-1] == SPACE_CHAR ||
+							   i+1 >= length || buffer[i+1] == '\n'))
+		{
+			line[j++] = SPACE_CHAR;
+		}	
+		else if (ch >= ' ' && ch <= '~')
+			line[j++] = ch;
+		else
+			return PrintFileError(state, "ERROR: Non-ASCII char");
+	}
+
+	if (i < length && buffer[i] == '\n')
+	{
+		i++;
+		state->origLineNumber++;
+	}
+	else if (i < length && buffer[i] == '\f' && j < LINE_LENGTH)
+	{
+		i++;
+		line[j++] = FORMFEED_CHAR;
+	}
+	else
+	{
+		/* If there's no newline, we need to add the continuation marker */
+		if (i > 0 && j >= LINE_LENGTH)
+		{
+			/* Remove the last character if we're out of room */
+			i--;
+			j = jOld;
+		}
+		line[j++] = CONTIN_CHAR;
+	}
+
+	/* Strip trailing spaces */
+	while (j > 0 && isspace((unsigned char)line[j - 1]))
+		j--;
+
+	if (j > LINE_LENGTH)	/* This should never happen */
+		return PrintFileError(state, "ERROR: Internal error, line too long");
+
+	/* Add trailing newline and NULL */
+	line[j++] = '\n';
+	line[j++] = '\0';
+
+	/* Return number of chars used from buffer */
+	*bufferUsed = i;
+
+	return 0;
+}
+
+static void
+Encode3(byte const src[3], char dest[4])
+{
+	dest[0] = radix64Digits[                     (src[0]>>2 & 0x3f)];
+	dest[1] = radix64Digits[(src[0]<<4 & 0x30) | (src[1]>>4 & 0x0f)];
+	dest[2] = radix64Digits[(src[1]<<2 & 0x3c) | (src[2]>>6 & 0x03)];
+	dest[3] = radix64Digits[(src[2]    & 0x3f)];
+}
+
+static int
+EncodeLine(byte const *src, int srcLen, char *dest)
+{
+	char *	destp = dest;
+	byte	tempSrc[3];
+
+	for (; srcLen >= 3; srcLen -= 3)
+	{
+		Encode3(src, destp);
+		src += 3; destp += 4;
+	}
+
+	if (srcLen > 0)
+	{
+		memset(tempSrc, 0, sizeof(tempSrc));
+		memcpy(tempSrc, src, srcLen);
+		Encode3(src, destp);
+		src += 3; destp += 4; srcLen -= 3;
+		while (srcLen < 0)
+			destp[srcLen++] = RADIX64_END_CHAR;
+	}
+
+	return destp - dest;
+}
+
+static int
+MungeBinaryLine(MungeState *state, byte const *buffer, int length, char *line)
+{
+	char	binLine[128];
+	int		binLength;			/* Destination length */
+	int		used;
+
+	binLength = EncodeLine(buffer, length, binLine);
+
+	/* Append newline */
+	binLine[binLength++] = '\n';
+	binLine[binLength] = '\0';
+
+	return MungeLine(state, binLine, binLength, line, &used);
+}
+
+int MaybePageBreak(MungeState *state)
+{
+	EncodeFormat const *	fmt = state->fmt;
+	EncodeFormat const *	hFmt = state->hFmt;
+
+	if (state->lineNumber >= LINES_PER_PAGE)
+	{
+		char	line[512];
+		char *	lineData	= line + PREFIX_LENGTH;
+		char *	p			= lineData;
+		
+		p += EncodeCheckDigits(hFmt, 0, HDR_VERSION_BITS, p);
+		p += EncodeCheckDigits(hFmt, state->hdrFlags, HDR_FLAG_BITS, p);
+		p += EncodeCheckDigits(hFmt, state->pageCRC, fmt->pageCRC->bits, p);
+		p += EncodeCheckDigits(hFmt, state->tabWidth, HDR_TABWIDTH_BITS, p);
+		p += EncodeCheckDigits(hFmt, state->productNumber, HDR_PRODNUM_BITS, p);
+		p += EncodeCheckDigits(hFmt, state->fileNumber, HDR_FILENUM_BITS, p);
+
+		sprintf(p, " Page %ld of %s\n", state->pageNumber + 1,
+				state->fileNameTail);
+
+		if (strlen(lineData) > LINE_LENGTH + 1)
+		{
+			PrintFileError(state, "ERROR: Header line too long");
+			fprintf(stderr, "> %s", lineData);
+			return -1;
+		}
+
+		/* Compute checksums and prefix them to line */
+		ChecksumLine(fmt, lineData, strlen(lineData), line, NULL);
+
+		fprintf(state->out, "%c%c%s\n%s\f", HDR_PREFIX_CHAR,
+				fmt->headerTypeChar, line + 2, state->pageBuffer);
+
+		state->pageNumber++;
+		state->lineNumber = 0;
+		state->pageCRC = 0;
+		state->pagePos = state->pageBuffer;		/* Clear page buffer */
+	}
+	return 0;
+}
+
+/*
+ * Search for Emacs "tab-width: " maker in file.
+ * Emacs is stricter about the format, but this will do.
+ */
+int FindTabWidth(MungeState *state)
+{
+	char const * const	tabWidthMarker = " tab-width: ";
+	char				buffer[512];
+	char *				p;
+	int					length;
+	int					tabWidth = 0;
+
+	fseek(state->file, -(sizeof(buffer) - 1), SEEK_END);
+	length = fread(buffer, 1, sizeof(buffer) - 1, state->file);
+	buffer[length] = '\0';
+	p = strstr(buffer, tabWidthMarker);
+	if (p != NULL)
+	{
+		p += strlen(tabWidthMarker);
+		while (*p != '\0' && *p != '\n' && isspace(*p))
+			p++;
+		tabWidth = strtol(p, &p, 10);
+		while (*p != '\0' && *p != '\n' && isspace(*p))
+			p++;
+		if (*p != '\n' || tabWidth < 2)
+			tabWidth = 0;
+		else if (tabWidth > 16)
+			fprintf(stderr, "WARNING: Weird tab-width (%d), %s\n",
+							tabWidth, state->fileName);
+	}
+	return tabWidth;
+}
+
+/*
+ * Open the given source file and send the munged output to the
+ * FILE *, with the given options.
+ */
+int MungeFile(char const *fileName, FILE *out, EncodeFormat const *fmt,
+			  int binaryMode, int defaultTabWidth,
+			  long productNumber, long fileNumber)
+{
+	MungeState *	state;
+	int				length, used;
+	char			line[PREFIX_LENGTH + LINE_LENGTH + 10];
+	char *			lineData = line + PREFIX_LENGTH;
+	char			buffer[128];
+	int				result = 0;
+
+	state = (MungeState *)calloc(1, sizeof(*state));
+	state->fmt = fmt;
+	state->hFmt = &hexFormat;
+	state->origLineNumber = 1;
+	state->fileName = fileName;
+	state->pageCRC = 0;
+	state->productNumber = productNumber;
+	state->fileNumber = fileNumber;
+	state->pageNumber = 0;
+	state->lineNumber = 0;
+	state->fileOffset = 0;
+	state->binaryMode = binaryMode;
+	state->pageBuffer = malloc(PAGE_BUFFER_SIZE);
+	state->pageBuffer[0] = '\0';
+	state->pagePos = state->pageBuffer;
+	state->hdrFlags = 0;
+	state->out = out;
+
+	state->fileNameTail = strrchr(state->fileName, '/');
+	if (state->fileNameTail == NULL)
+		state->fileNameTail = state->fileName;
+	else
+		state->fileNameTail++;
+
+	state->file = fopen(state->fileName, binaryMode ? "rb" : "r");
+	if (state->file == NULL)
+	{
+		result = errno;
+		fprintf(stderr, "ERROR opening %s: %s\n",
+				state->fileName, strerror(result));
+		goto error;
+	}
+	
+	if (state->binaryMode)
+	{
+		state->tabWidth = 0;
+	}
+	else
+	{
+		state->tabWidth = FindTabWidth(state);
+		if (state->tabWidth == 0)
+			state->tabWidth = defaultTabWidth;
+		rewind(state->file);
+	}
+
+	while (!feof(state->file))
+	{
+		if (state->binaryMode)
+		{
+			length = fread(buffer, 1, BYTES_PER_LINE, state->file);
+			if (length < 1)
+			{
+				if (feof(state->file))
+					break;
+				goto fileError;
+			}
+			if ((result = MaybePageBreak(state)))
+				goto error;
+			if ((result = MungeBinaryLine(state, buffer, length, lineData)))
+				goto error;
+			state->fileOffset += length;
+		}
+		else
+		{
+			if (fgets(buffer, sizeof(buffer), state->file) == NULL)
+			{
+				if (feof(state->file))
+					break;
+				goto fileError;
+			}
+			length = strlen(buffer);
+			if ((result = MaybePageBreak(state)))
+				goto error;
+			if ((result = MungeLine(state, buffer, length, lineData, &used)))
+				goto error;
+
+			if (used < length)
+				if (fseek(state->file, used - length, SEEK_CUR))
+					goto fileError;
+		}
+
+		/* Compute checksums and prefix them to the line */
+		ChecksumLine(fmt, lineData, strlen(lineData), line, &state->pageCRC);
+
+		strcpy(state->pagePos, line);
+		length = strlen(state->pagePos);
+		/* Suppress trailing whitespace on blank lines */
+		if (length == PREFIX_LENGTH+1 && state->pagePos[length-1] == '\n') {
+			state->pagePos[--length-1] = '\n';
+			state->pagePos[length] = '\0';
+		}
+		state->pagePos += length;
+
+		state->lineNumber++;
+	}
+
+	if (state->lineNumber > 0)
+	{
+		/* Force a final page break */
+		state->lineNumber = LINES_PER_PAGE;
+		state->hdrFlags |= HDR_FLAG_LASTPAGE;
+		if ((result = MaybePageBreak(state)))
+			goto error;
+	}
+
+	result = 0;
+	goto done;
+
+fileError:
+	result = ferror(state->file);
+
+error:
+done:
+	if (state != NULL)
+	{
+		if (state->file != NULL)
+			fclose(state->file);
+		free(state);
+	}
+	return result;
+}
+
+int main(int argc, char *argv[])
+{
+	int		result = 0;
+	int		i, j;
+	int		defaultTabWidth = 4;
+	int		binaryMode = 0;
+	long	productNumber = 1;
+	long	fileNumber = 1;
+	char *	endOfNumber;
+	EncodeFormat const *	fmt = NULL;
+
+	InitUtil();
+
+	for (i = 1; i < argc && argv[i][0] == '-'; i++)
+	{
+		if (0 == strcmp(argv[i], "--"))
+		{
+			i++;
+			break;
+		}
+		for (j = 1; argv[i][j] != '\0'; j++)
+		{
+			if (isdigit(argv[i][j]))
+			{
+				defaultTabWidth = argv[i][j] - '0';
+				if (defaultTabWidth < 2 || defaultTabWidth > 9)
+					fprintf(stderr, "WARNING: Weird default tab-width (%d)\n",
+									defaultTabWidth);
+			}
+			else if (argv[i][j] == 'b')
+			{
+				binaryMode = 1;
+			}
+			else if (argv[i][j] == 'F')
+			{
+				fmt = FindFormat(argv[i][j+1]);
+				if (!fmt || argv[i][j+2] != '\0')
+				{
+					fprintf(stderr, "ERROR: Invalid format char\n");
+					exit(1);
+				}
+				break;
+			}
+			else if (argv[i][j] == 'p')
+			{
+				productNumber = strtol(&argv[i][j+1], &endOfNumber, 10);
+				if (*endOfNumber != '\0')
+				{
+					fprintf(stderr, "ERROR: Invalid product number\n");
+					exit(1);
+				}
+				break;
+			}
+			else if (argv[i][j] == 'f')
+			{
+				fileNumber = strtol(&argv[i][j+1], &endOfNumber, 10);
+				if (*endOfNumber != '\0')
+				{
+					fprintf(stderr, "ERROR: Invalid file number\n");
+					exit(1);
+				}
+				break;
+			}
+			else
+			{
+				fprintf(stderr, "ERROR: Unrecognized option -%c\n", argv[i][j]);
+				exit(1);
+			}
+		}
+	}
+	if (!fmt)
+		fmt = binaryMode ? &radix64Format : &hexFormat;
+
+	for (; i < argc; i++)
+	{
+		if ((result = MungeFile(argv[i], stdout, fmt, binaryMode,
+								defaultTabWidth, productNumber,
+								fileNumber)) != 0)
+		{
+			/* If result > 0, message should have already been printed */
+			if (result < 0)
+				fprintf(stderr, "ERROR: %s\n", strerror(result));
+			exit(1);
+		}
+		fileNumber++;
+	}
+	
+	return 0;
+}
+
+/*
+ * Local Variables:
+ * tab-width: 4
+ * End:
+ * vi: ts=4 sw=4
+ * vim: si
+ */
diff --git a/tools/psgen b/tools/psgen
new file mode 100644
index 0000000..2848390
--- /dev/null
+++ b/tools/psgen
@@ -0,0 +1,324 @@
+#!/usr/bin/perl
+#
+# psgen -- Postscript generator for code portion of source books
+#
+# Reads in a list of files/dirs from <filelist>, runs munge on each of
+# them, and generates a single postscript file to stdout.  The page numbers
+# for each file/dir are put into the file <pagenums>.
+#
+# usage: psgen [ options... ] <filelist> <pagenums> <volume #>  > foo.ps
+#			-l<firstLogicalPage>
+#			-p<firstPhysicalPage>
+#			-f<font>
+#			-D<defs> (passed to yapp)
+#			-P<productNumber>
+#			-o<mungedOutFile>
+#			-e				(auto edit errors)
+#
+# $Id: psgen,v 1.18 1997/11/13 21:44:16 colin Exp $
+#
+
+$bookRoot = $ENV{"BOOKROOT"} || ".";
+$toolsDir = "$bookRoot/tools";
+$psDir = "$bookRoot/ps";
+$editor = $ENV{"EDITOR"} || "vi";
+
+# Configuration settings - external file names
+$mungeProg = "$toolsDir/munge";
+$yappProg = "$toolsDir/yapp";
+$preambleFile = "$psDir/prolog.ps";
+$tempFile = "/tmp/psgen-$$";
+
+# Parse arguments
+$firstLogPage = $firstPhysPage = 0;
+$productNumber = 1;
+$font = "OCRB";
+$autoEdit = 0;
+while ($#ARGV >= 0 && $ARGV[0] =~ /^-/)
+{
+	$_ = shift @ARGV;
+	if (/^--$/)
+	{
+		last;
+	}
+	elsif (/^-l(\d+)$/)
+	{
+		$firstLogPage = $1;
+	}
+	elsif (/^-p(\d+)$/)
+	{
+		$firstPhysPage = $1;
+	}
+	elsif (/^-f(.+)$/)
+	{
+		$font = $1;
+	}
+	elsif (/^-D(.+)$/)
+	{
+		$yappDefs .= " " . $_;
+	}
+	elsif (/^-P(\d+)$/)
+	{
+		$productNumber = $1;
+	}
+	elsif (/^-o(.+)$/)
+	{
+		$mungedOutFile = $1;
+	}
+	elsif (/^-e$/)
+	{
+		$autoEdit = 1;
+	}
+	else
+	{
+		&Error("Unrecognized option: '$_'");
+	}
+}
+$fileListFile = shift @ARGV || die "Missing file list argument (arg 1)";
+$pageNumFile = shift @ARGV || die "Missing page number file argument (arg 2)";
+$volume = shift @ARGV || die "Missing volume number argument (arg 3)";
+
+# Determine initial page numbers
+{
+	my $nextLogPage = 1;
+	my $nextPhysPage = 3;
+	my $volNum = 0;		# Which volume's page numbers we're reading
+
+	if ($volume > 1)
+	{
+		open(OLDPAGENUMS, "<$pageNumFile") || die;
+		while (<OLDPAGENUMS>)
+		{
+			if (/^Volume\s+(\d+)$/)
+			{
+				$volNum = $1;
+			}
+			elsif (/^Next:\s+(\d+)\s*$/ && $volNum == $volume - 1)
+			{
+				$nextLogPage = $1;
+			}
+		}
+		close(OLDPAGENUMS);
+	}
+	else
+	{
+		unlink($pageNumFile);
+	}
+	$firstLogPage = $nextLogPage if ($firstLogPage == 0);
+	$firstPhysPage = $nextPhysPage if ($firstPhysPage == 0);
+}
+
+# Names of PostScript operators invoked.  These are the interface
+# between this file and the $preambleFile.
+$oddPageStartPS = "OddPageStart";
+$evenPageStartPS = "EvenPageStart";
+$oddPageEndPS = "OddPageEnd";
+$evenPageEndPS = "EvenPageEnd";
+$dirPagePS = "DirPage";
+# This is short because it's emitted every line
+$linePS = "L";
+
+# Handle an error from munge.
+# A result of 0 means to retry, 1 means to exit
+sub MungeError
+{
+	my $result = 1;
+
+	open(FILEH, "<$tempFile") || die;
+	while (<FILEH>)
+	{
+		print STDERR;
+		if (/ in (.*) line (\d+)$/)
+		{
+			my ($fileName, $lineNumber) = ($1, $2);
+
+			if ($autoEdit)
+			{
+				my @statResult = stat($fileName);
+				my $oldMTime = $statResult[9];
+
+				system("'$editor' '+$lineNumber' '$fileName' 1>&2");
+				@statResult = stat($fileName);
+				$result = ($statResult[9] == $oldMTime);
+				last;
+			}
+		}
+	}
+	close(FILEH);
+	unlink($tempFile) || die "Couldn't unlink $tempFile";
+	return $result;
+}
+
+sub CopyFileToPS
+{
+	local $fileName = $_[0];
+	local $args = "'-I$psDir' '-Dfont=$font'";
+	local $_;
+
+	$args .= $yappDefs;
+	open(FILEH, "$yappProg $args '$fileName' |") || die;
+	while (<FILEH>)
+	{
+		print PSOUT $_;
+	}
+	close(FILEH) || exit(1);
+	1;
+}
+
+# Wrap a string in parens as required by PostScript, with proper quoting.
+sub StringPS
+{
+	local $str = $_[0];
+
+	$str =~ s/([\\()])/\\$1/g;
+	"(" . $str . ")";
+}
+
+# Emit a start of page.  The Postscript DSC %%Page: header 
+# (followed by logical page number, then physical) and
+# the top-of-page function (which is passed the page number as a string)
+sub PageStartPS
+{
+	local $pageNum = $_[0];
+
+	"%%Page: " . ($pageNum + $firstLogPage) . " " .
+				 ($pageNum + $firstPhysPage) . "\n" .
+		&StringPS($pageNum + $firstLogPage) .
+		((($pageNum + $firstLogPage) % 2) ? $oddPageStartPS
+										  : $evenPageStartPS) . "\n";
+}
+
+sub PageEndPS
+{
+	local $pageNum = $_[0];
+
+	((($pageNum + $firstLogPage) % 2) ? $oddPageEndPS : $evenPageEndPS) . "\n";
+}
+
+# Save the page number to a table-of-contents file
+sub SavePageNum
+{
+	local ($fileName, $pageNum) = @_;
+
+	print PAGENUMS ($pageNum + $firstLogPage), ": $fileName\n";
+}
+
+# The main code.
+
+open(PSOUT, ">-") || die;
+open(FILELIST, "<$fileListFile") || die;
+open(PAGENUMS, ">>$pageNumFile") || die;
+if ($mungedOutFile ne "")
+{
+	open(MUNGEDOUT, ">$mungedOutFile") || die;
+}
+
+print PAGENUMS "Volume $volume\n";
+
+&CopyFileToPS($preambleFile);
+
+$fileNumber = 0;
+$pageNum = 0;	# This is 0-based, since it is added to $first{Log,Phys}Page
+$enable = 0;
+
+while (<FILELIST>)
+{
+	/^([VDTB])(\S*)\s+(.*)/ || die "Illegal file list line $.";
+
+	local ($fileType, $options, $arg) = ($1, $2, $3);
+
+	if ($fileType eq "V")
+	{
+		@args = split(/\s+/, $arg);
+		if ($enable = ($args[0] == $volume))
+		{
+			$defaultTabWidth = int($args[1]);
+		}
+	}
+	elsif ($fileType eq "D")
+	{
+		next unless $enable;	# Do nothing if we're in the wrong volume
+		$dirName = $arg;
+		&SavePageNum($dirName, $pageNum);
+		print PSOUT &PageStartPS($pageNum);
+		print PSOUT &StringPS($dirName), $dirPagePS, "\n";
+		print PSOUT &PageEndPS($pageNum);
+		$pageNum++;
+	}
+	else
+	{
+		my $done = 0;
+
+		$fileNumber++;
+		$fileName = $arg;
+		next unless $enable;	# Do nothing if we're in the wrong volume
+		&SavePageNum($fileName, $pageNum);
+		$quotedFileName = $fileName;
+		$quotedFileName =~ s/'/\\'/g;
+		$tabWidth = ($options =~ /(\d)/) ? $1 : $defaultTabWidth;
+		$args = ($fileType eq "B") ? "-b" : "";
+		$args .= " -$tabWidth -p$productNumber -f$fileNumber";
+		while (!$done)
+		{
+			if (open(FILE, "$mungeProg $args '$quotedFileName' 2>$tempFile |"))
+			{
+				$line = <FILE>;
+				print MUNGEDOUT $line;
+
+				while ($line ne "")
+				{
+					print PSOUT &PageStartPS($pageNum);
+
+					while ($line ne "" and $line !~ /^\f/)
+					{
+						chop $line;
+						print PSOUT &StringPS($line), $linePS, "\n";
+						$line = <FILE>;
+						print MUNGEDOUT $line;
+					}
+					$line =~ s/^\f//;
+
+					print PSOUT &PageEndPS($pageNum);
+					$pageNum++;
+				}
+
+				if (close(FILE))
+				{
+					$done = 2;
+				}
+				else
+				{
+					$done = &MungeError();
+				}
+			}
+			else
+			{
+				$done = &MungeError();
+			}
+		}
+		if ($done == 1)
+		{
+			die;
+		}
+	}
+}
+
+# Print PostScript DSC trailer with the correct number of pages
+print PSOUT "%%Trailer\n%%Pages: ", $pageNum, "\n%%EOF\n";
+
+print PAGENUMS "Pages: ", $pageNum, "\n";
+print PAGENUMS "Next: ", ((($pageNum+1) & ~1) + $firstLogPage), "\n";
+
+close(PAGENUMS) || die;
+close(FILELIST) || die;
+close(PSOUT) || die;
+
+if ($mungedOutFile ne "")
+{
+	close(MUNGEDOUT) || die;
+}
+
+#
+# vi: ai ts=4
+# vim: si
+#
diff --git a/tools/repair.c b/tools/repair.c
new file mode 100644
index 0000000..2cced13
--- /dev/null
+++ b/tools/repair.c
@@ -0,0 +1,1851 @@
+/*
+ * repair.c -- Program which reconstructs scanned source, locates errors,
+ *			   and tries to fix most of them automatically.  If it
+ *             can't, it drops you into an editor on the appropriate
+ *             line for manual correction.
+ *
+ * Given a file "foo", this appends corrected output to "foo.out"
+ * and copies remaining uncorrected input in "foo.in".  If "foo.in"
+ * exists initially, "foo" is ignored and only "foo.in" is processed.
+ * Thus, re-running it repeatedly, possibly with other correction
+ * techniques in between, will result in correct output in "foo.out"
+ * and an empty "foo.in" file.
+ *
+ * This can automatically invoke an editor for you on the .in file
+ * and re-run itself.  The editor is chosen in the first available way:
+ * - The -e command-line argument takes a printf() format string to
+ *   format the editor invocation command line with the line number and
+ *   filename.  E.g. "emacs +%u %s".  %u and %s must appear, in that order.
+ * - Failing that, the default is "$VISUAL +%u %s"
+ * - Failing that, the default is "$EDITOR +%u %s"
+ * - Failing that, the program prints the error location and exits.
+ *   Specifying -e- forces this behaviour.
+ *
+ * Copyright (C) 1997 Pretty Good Privacy, Inc.
+ *
+ * Designed by Colin Plumb, Mark H. Weaver, and Philip R. Zimmermann
+ * Written by Colin Plumb
+ *
+ * $Id: repair.c,v 1.37 1997/11/14 08:39:40 mhw Exp $
+ */
+
+#include <assert.h>
+#include <stdio.h>
+#include <string.h>
+#include <ctype.h>
+#include <errno.h>
+#include <signal.h>
+
+#include "util.h"
+#include "heap.h"
+#include "mempool.h"
+#include "subst.h"
+
+/*
+ * The internal form of a substitution.  These are stored on
+ * lists indexed by the first character of the input substitution.
+ */
+typedef struct Substitution {
+	struct Substitution *next;
+	char const *input, *output;
+	size_t inlen, outlen;
+	HeapCost cost, cost2;
+	FilterFunc *filter;
+	unsigned int index;	/* Consecutive serial numbers */
+} Substitution;
+
+struct Substitution const substNull = { NULL, "", "", 0, 0, 0, 0, 0 };
+
+/*
+ * This might get increased later to support multiple classes of
+ * substitutions, for different contexts.  Currently, only one
+ * is used.
+ */
+#define SUBST_CLASSES 1
+
+/* List of substitutions, indexed by first character, plus a catch-all */
+Substitution *substitutions[SUBST_CLASSES][0x101];
+
+/*
+ * The pool of Substitution structures.  Remains alive for the entire
+ * execution of the program.
+ */
+static MemPool substPool;
+static Substitution *substFree;
+static unsigned int substCount = 1;	/* Preallcoate 0 to substNull */
+static unsigned int substFirstDynamic;
+#define SubstIsDynamic(s) ((s)->index >= substFirstDynamic)
+/* Adjust the substitution based on noccurrences this page */
+#define SubstAdjust(s,n) ((s)->cost = (s)->cost2)
+/* Is this a nasty-line substitution? */
+#define SubstIsNasty(s) ((s)->cost2 == COST_INFINITY)
+
+/* Every possible single-character string */
+static char substChars[512];
+#define SubstString(c) (substChars+2*((c)&255))
+
+/* Set the list of substitutions to empty */
+static void
+SubstInit(void)
+{
+	unsigned int i, j;
+
+	memPoolInit(&substPool);
+	substFree = 0;
+	substCount = 1;	/* Number zero is reserved for uncounted substitutions */
+	for (i = 0; i < elemsof(substitutions); i++)
+		for (j = 0; j < elemsof(*substitutions); j++)
+			substitutions[i][j] = NULL;
+
+	for (i = 0; i < 256; i++) {
+		substChars[2*i] = (char)i;
+		substChars[2*i+1] = 0;
+	}
+}
+
+/*
+ * For dynamically allocated substitutions, we maintain a free list.
+ * Each substitution has a unique serial number.  These are retained
+ * if a substitution goes on the free list, to keep substCount from
+ * ratcheting upwards indefinitely while still guaranteeing uniqueness.
+ */
+static Substitution *
+SubstAlloc(void)
+{
+	struct Substitution *subst = substFree;
+
+	if (subst) {
+		substFree = subst->next;
+	} else {
+		subst = memPoolNew(&substPool, Substitution);
+		subst->index = substCount++;
+	}
+	return subst;
+}
+
+static void
+SubstFree(Substitution *subst)
+{
+	subst->next = substFree;
+	substFree = subst;
+}
+
+static Substitution *
+MakeSubst(char const *input, char const *output, HeapCost cost, HeapCost cost2,
+	FilterFunc *filter, int class)
+{
+	struct Substitution *subst, **head;
+
+	subst = SubstAlloc();
+	subst->input = input;
+	subst->output = output;
+	subst->inlen = strlen(input);
+	subst->outlen = strlen(output);
+	subst->cost = cost;
+	subst->cost2 = cost2;
+	subst->filter = filter;
+
+	/*
+	 * Ignore certain substitutions when printing stats.
+	 * Identity substitutions, and the tab/space tweaking.
+	 */
+	if (strcmp(input, output) == 0 || strcmp(input, TAB_STRING) == 0 ||
+		(input[0] == ' ' && input[1] == 0 && output[0] == 0)) {
+			if (subst->index == substCount-1)
+				substCount--;
+			subst->index = 0;	/* Evil hack */
+	}
+
+	head = &substitutions[class][input[class] & 255];
+	subst->next = *head;
+	*head = subst;
+	return subst;
+}
+
+/*
+ * For each entry in the raw array, turn { "abc", "def", 5" }
+ * into cost-5 mappings of "a"->"d", "b"->"e" and "c"->"f".
+ * If the output string is NULL, the characters are deleted.
+ * An input string of NULL is the end of table delimiter.
+ */
+static void
+SubstSingle(struct RawSubst const *raw, int class)
+{
+	char const *input, *output;
+	int i, o;
+
+	while (raw->input) {
+		input = raw->input;
+		output = raw->output;
+		assert(!output || strlen(input) == strlen(output));
+
+		while (*input) {
+			i = *input++;
+			o = output ? *output++ : 0;
+			(void)MakeSubst(SubstString(i), SubstString(o),
+							raw->cost, raw->cost2, raw->filter, class);
+		}
+		raw++;
+	}
+}
+
+/*
+ * For each entry in the raw array, turn { "abc", "def", 5" }
+ * into a cost-5 mappings of "abc"->"def".
+ * An input string of NULL is the end of table delimiter.
+ */
+static void
+SubstMultiple(struct RawSubst const *raw, int class)
+{
+	while (raw->input) {
+		(void)MakeSubst(raw->input, raw->output, raw->cost, raw->cost2,
+					    raw->filter, class);
+		raw++;
+	}
+}
+
+/* Build the substitutions table */
+static void
+SubstBuild(void)
+{
+	SubstInit();
+	SubstSingle(substSingles, 0);
+	SubstMultiple(substMultiples, 0);
+	substFirstDynamic = substCount;
+}
+
+/*
+ * See if the desired substitution already exists
+ */
+static Substitution const *
+SubstSearch(char const *in, size_t inlen, char const *out, size_t outlen,
+	int class)
+{
+	Substitution *subst = substitutions[class][in[0] & 255];
+
+	for (; subst; subst = subst->next) {
+		if (subst->inlen == inlen && subst->outlen == outlen &&
+			memcmp(subst->input, in, inlen) == 0 &&
+			memcmp(subst->output, out, outlen) == 0)
+				return subst;	/* Already exists */
+	}
+	return NULL;
+}
+
+
+/*
+ * Create a new dynamic substitution.  First search to make
+ * sure it doesn't already esist.
+ */
+static Substitution const *
+SubstDynamic(char const *in, char const *out, int class)
+{
+	Substitution const *subst;
+
+	subst = SubstSearch(in, strlen(in), out, strlen(out), class);
+	return subst ? subst : MakeSubst(in, out, COST_INFINITY,
+									 DYNAMIC_COST_LEARNED, NULL, class);
+}
+
+/*
+ * Search for the substitution, allocating one if not found.
+ * the input string is not null-terminated and needs to be copied to
+ * an allocated buffer.  The output string can just be pointer-copied.
+ */
+static Substitution const *
+SubstNasty(char const *in, size_t inlen, char const *out, int class)
+{
+	Substitution const *subst;
+	char *string;
+
+	if ((subst = SubstSearch(in, inlen, out, strlen(out), class)) != NULL)
+		return subst;
+
+	if (!(string = malloc(inlen+1))) {
+		fputs("Out of memory!\n", stderr);
+		exit(1);
+	}
+	memcpy(string, in, inlen);
+	string[inlen] = 0;
+	return MakeSubst(string, out, COST_INFINITY, COST_INFINITY, NULL, class);
+}
+
+/*
+ * The state of the parser.
+ * Note that this is updated when a ParseNode is *removed* from the heap;
+ * ParseNodes that are in the heap have ParseStates that reflect the
+ * state before the substitution has been parsed; this is a copy of the
+ * parents' state, which is after the parsing.
+ */
+typedef struct ParseState {
+	CRC page_crc;			/* Computed per-page CRC */
+	word16 flags;			/* Flags; see below */
+	unsigned char pos;		/* Position on the line */
+} ParseState;	/* 7 bytes, rounded to 8 */
+
+/* Flags values */
+#define PS_MASK_PAGENUM	 0xC000 /* Digits in header page number (1..3) */
+#define PS_SHIFT_PAGENUM	 14	/* Shift for the above */
+#define PS_FLAG_EOL			512	/* Expect \n next */
+#define PS_FLAG_SPACE		256	/* Was last char a space? */
+#define PS_FLAG_TAB			128	/* Tabbing over a column */
+#define PS_FLAG_INHEADER	 64	/* Current line is a header */
+#define PS_FLAG_PASTHEADER	 32	/* A previous line was a header */
+#define PS_FLAG_BINWS		 16	/* In whitespace after binary data */
+#define PS_FLAG_BINEND		  8	/* End of binary data */
+#define PS_FLAG_DYNAMIC		  4	/* Have used ECC this line */
+#define PS_MASK_FORMAT	 	  3	/* The encoding format (max of 3, for now) */
+#define PS_SHIFT_FORMAT		  0	/* Shift for the above */
+
+/* Have we started on a second page?  Used to force flushing of the first. */
+#define InSecondHeader(ps) \
+	((~(ps)->flags & (PS_FLAG_INHEADER | PS_FLAG_PASTHEADER)) == 0)
+
+#define PageNumDigits(pn) (((pn)->ps.flags & PS_MASK_PAGENUM) >> PS_SHIFT_PAGENUM)
+#define PageNumDigitsIncrement(pn) ((pn)->ps.flags += 1<<PS_SHIFT_PAGENUM)
+
+EncodeFormat const *registeredFormats[4];
+
+/* Returns a small integer index */
+static int
+registerFormat(EncodeFormat const *format)
+{
+	int i;
+	for (i = 0; i < (int)elemsof(registeredFormats); i++) {
+		if (registeredFormats[i] == format)
+			return i;
+		if (!registeredFormats[i]) {
+			registeredFormats[i] = format;
+			return i;
+		}
+	}
+	fputs("Registered formats table overflow!\n", stderr);
+	exit(1);
+}
+
+#define psFormat(ps) registeredFormats[((ps)->flags & PS_MASK_FORMAT)>>PS_SHIFT_FORMAT]
+#define pnFormat(pn) psFormat(&(pn)->ps)
+#define psSetFormat(ps, i) \
+	((ps)->flags = ((ps)->flags & ~PS_MASK_FORMAT) | i << PS_SHIFT_FORMAT)
+
+typedef struct ParseNode {
+	HeapCost cost;
+	unsigned int refcnt;
+	struct ParseNode *parent;
+	char const *input;
+	struct Substitution const *subst;
+	struct ParseState ps;
+} ParseNode;	/* 32 bytes */
+
+/* A handle for walking backwards through the output stream */
+typedef struct OutputHandle {
+	ParseNode const *node;
+	char const *output;
+	unsigned int pos;
+} OutputHandle;
+
+/* Initialize the handle to point to a node (optionally, a position therein) */
+static void
+OutputInit(OutputHandle *oh, ParseNode const *node, char const *p)
+{
+		oh->node = node;
+		oh->output = p ? p : node->subst->output + node->subst->outlen;
+		oh->pos = 0;
+}
+
+/* Get the *previous* byte */
+static int
+OutputGetPrev(OutputHandle *oh)
+{
+	if (!oh->node)
+		return -1;
+	for (;;) {
+		if (oh->output != oh->node->subst->output) {
+			oh->pos++;
+			return *--oh->output & 255;
+		}
+		oh->node = oh->node->parent;
+		if (!oh->node)
+			break;
+		oh->output = oh->node->subst->output + oh->node->subst->outlen;
+	}
+	return -1;
+}
+
+/* Return the character just before the node - trivial handy wrapper */
+static int
+OutputPrevChar(ParseNode const *node)
+{
+	OutputHandle oh;
+
+	OutputInit(&oh, node, NULL);
+	return OutputGetPrev(&oh);
+}
+
+/*
+ * Unget the last retrieved character (and return it), or
+ * -1 if that is impossible.  At least one character is
+ * always ungettable, but after that you're on your own.
+ */
+static int
+OutputUnget(OutputHandle *oh)
+{
+	if (oh->node && *oh->output) {
+		oh->pos--;
+		return *oh->output++ & 255;
+	}
+	return -1;
+}
+
+/* The position is useful for comparing two OutputHandles. */
+#define OutputPos(oh) ((oh)->pos)
+
+/*
+ * Fill backwards from bufend until you hit the given char.
+ * Use -1 to get the whole buffer.
+ */
+static char *
+OutputGetUntil(OutputHandle oh, char *bufend, int end)
+{
+	int c;
+
+	while ((c = OutputGetPrev(&oh)) != -1 && c != end)
+		*--bufend = (char)c;
+	return bufend;
+}
+
+/*
+ * The per-page structure.  This is actually global, but describes
+ * the values kept for each page processed.
+ */
+typedef struct PerPage {
+	CRC page_check;
+	char const *maxpos, *minpos;
+	unsigned int tabsize;	/* Zero means this is a binary page */
+	unsigned int lines;
+	unsigned int retries;	/* How many retires since last progress? */
+	unsigned int max_retries;	/* Maximum number of retries needed. */
+} PerPage;
+
+PerPage perpage;	/* The global */
+
+static void
+PerPageInit(char const *buf)
+{
+	perpage.maxpos = perpage.minpos = buf;
+	perpage.page_check = 0;
+	perpage.tabsize = 4;	/* The default */
+	perpage.lines = perpage.retries = perpage.max_retries = 0;
+}
+
+/*
+ * Is the tab substitution being looked at acceptable?
+ * It is if the length needed to make the tab width come out
+ * right, it is.  Otherwise, it's junk.
+ */
+HeapCost
+TabFilter(struct ParseNode *parent, char const *limit,
+	struct Substitution const *subst)
+{
+	int c, tabpos;
+	OutputHandle oh;
+
+	(void)limit;
+	if (!perpage.tabsize)
+		return COST_INFINITY;	/* No interest */
+
+	/* How wide should the tab be? */
+	tabpos = (int)((parent->ps.pos-PREFIX_LENGTH) % perpage.tabsize);
+	if ((int)subst->outlen != (int)perpage.tabsize - tabpos)
+		return COST_INFINITY;
+	/* The right number - cost if likely, cost2 if unlikely */
+	if (subst->cost == subst->cost2)
+		return subst->cost;
+	OutputInit(&oh, parent, NULL);
+	do {
+		c = OutputGetPrev(&oh);
+	} while (c == ' ');
+	return (c == TAB_CHAR) ? subst->cost : subst->cost2;
+}
+
+/*
+ * Return cost if near blanks (including end-of-line), cost2 if not, and
+ * the average of there is a blank on one side.  There are additional
+ * versions for upper- and lower-case.  _ is considered upper-case,
+ * as it's oftne used in acro identifiers.
+ */
+HeapCost
+FilterNearBlanks(struct ParseNode *parent, char const *limit,
+	struct Substitution const *subst)
+{
+	int c = OutputPrevChar(parent), score = (isspace(c) != 0);
+	char const *p = parent->input + parent->subst->inlen;
+
+	score += p == limit || isspace((unsigned char)*p) != 0;
+	return (subst->cost*score + subst->cost2*(2-score))/2;
+}
+
+HeapCost
+FilterNearUpper(struct ParseNode *parent, char const *limit,
+	struct Substitution const *subst)
+{
+	int c = OutputPrevChar(parent), score = (isupper(c) != 0 || c == '_');
+	char const *p = parent->input + subst->inlen;
+
+	score += p != limit && (isupper((unsigned char)*p) != 0 || *p == '_');
+	return (subst->cost*score + subst->cost2*(2-score))/2;
+}
+
+HeapCost
+FilterNearXDigit(struct ParseNode *parent, char const *limit,
+	struct Substitution const *subst)
+{
+	int c = OutputPrevChar(parent), score = (isxdigit(c) != 0);
+	char const *p = parent->input + subst->inlen;
+
+	score += p != limit && (isxdigit((unsigned char)*p) != 0);
+	return (subst->cost*score + subst->cost2*(2-score))/2;
+}
+
+HeapCost
+FilterNearLower(struct ParseNode *parent, char const *limit,
+	struct Substitution const *subst)
+{
+	int c = OutputPrevChar(parent), score = (islower(c) != 0);
+	char const *p = parent->input + subst->inlen;
+
+	score += p != limit && (islower((unsigned char)*p) != 0);
+	return (subst->cost*score + subst->cost2*(2-score))/2;
+}
+
+/*
+ * cost2 unless previous character was a space (' ' or SPACE_CHAR).
+ * Note the & 255, necessary since chars might be signed and SPACE_CHAR
+ * is in the high (negative) half, but c is an int in the range -1..255.
+ */
+HeapCost
+FilterFollowsSpace(struct ParseNode *parent, char const *limit,
+	struct Substitution const *subst)
+{
+	int c = OutputPrevChar(parent);
+	(void)limit;
+	return (c == ' ' || c == (SPACE_CHAR & 255)) ? subst->cost : subst->cost2;
+}
+
+/* cost2 unless previous character was duplicate of this one */
+HeapCost
+FilterAfterRepeat(struct ParseNode *parent, char const *limit,
+	struct Substitution const *subst)
+{
+	int c = OutputPrevChar(parent);
+	(void)limit;
+	return (c == subst->output[0]) ? subst->cost : subst->cost2;
+}
+
+/* cost2 unless probably the closing quote in a char constant */
+HeapCost
+FilterCharConst(struct ParseNode *parent, char const *limit,
+	struct Substitution const *subst)
+{
+	OutputHandle oh;
+	int c;
+
+	(void)limit;
+	OutputInit(&oh, parent, NULL);
+	c = OutputGetPrev(&oh);
+	c = OutputGetPrev(&oh);
+	if (c == '\\')
+		c = OutputGetPrev(&oh);
+	return (c == '\'') ? subst->cost : subst->cost2;
+}
+
+/*
+ * If the identifier leading up to the current position contains
+ * an underscore, then it's likely the current position is an underscore
+ * as well; return cost.  If it does not, it's less likely; return cost2.
+ */
+HeapCost
+FilterLikelyUnderscore(struct ParseNode *parent, char const *limit,
+	struct Substitution const *subst)
+{
+	OutputHandle oh;
+	int c;
+
+	(void)limit;
+	OutputInit(&oh, parent, NULL);
+	for (;;) {
+		c = OutputGetPrev(&oh);
+		if (c == '_')
+			return subst->cost;
+		if (!isalnum(c))
+			return subst->cost2;
+	}
+}
+
+/* cost2 unless the following chars seem to be a checksum */
+HeapCost
+FilterChecksumFollows(struct ParseNode *parent, char const *limit,
+	struct Substitution const *subst)
+{
+	int i, score = 0;
+	char const *p = parent->input + subst->inlen;
+
+	if (limit - p < PREFIX_LENGTH)
+		return subst->cost2;
+	if (!isspace((unsigned char)p[PREFIX_LENGTH-1]))
+		return subst->cost2;
+	for (i = 0; i < PREFIX_LENGTH-1; i++)
+		score += (p[i] >= '0' && p[i] <= '9') + (p[i] >= 'a' && p[i] <= 'f');
+	i = (score >= PREFIX_LENGTH-2 ? subst->cost : subst->cost2);
+	/* Magic, since this function is perfect on binary files */
+	if (i < COST_INFINITY && perpage.tabsize == 0)
+		i = 0;
+	return i;
+}
+
+/* Manage a *big* pool of ParseNodes */
+
+struct MemPool nodePool;
+struct ParseNode *nodeFreeList = 0;
+
+/* Prepare for node allocations */
+static void
+NodePoolInit(void)
+{
+	memPoolInit(&nodePool);
+	nodeFreeList = NULL;
+}
+
+/* Free all nodes in one swell foop */
+static void
+NodePoolCleanup(void)
+{
+	nodeFreeList = NULL;
+	memPoolEmpty(&nodePool);
+}
+
+/* Allcoate a new (uninitialized) node */
+static struct ParseNode *
+NodeAlloc(void)
+{
+	struct ParseNode *node;
+
+	node = nodeFreeList;
+	if (node) {
+		nodeFreeList = node->parent;
+		return node;
+	}
+	return memPoolNew(&nodePool, ParseNode);
+}
+
+/* Free a node for reallocation */
+static void
+NodeFree(struct ParseNode *node)
+{
+	node->parent = nodeFreeList;
+	nodeFreeList = node;
+}
+
+/*
+ * Decrement a node's reference count, freeing it and
+ * recursively decrementing its parent's if the count
+ * goes to zero.
+ */
+static void
+NodeRelease(struct ParseNode *node)
+{
+	struct ParseNode *parent;
+	assert(node->refcnt);
+
+	while (!--node->refcnt) {
+		parent = node->parent;
+		NodeFree(node);
+		if (!parent)
+			break;
+		node = parent;
+	}
+}
+
+/* Add nodes to the substitution tree */
+
+/* Create a child of the given node, with the given properties. */
+static ParseNode *
+AddChild(ParseNode *parent, Substitution const *subst, HeapCost cost)
+{
+	ParseNode *child;
+
+	if (cost == COST_INFINITY)
+		return 0;
+
+	cost += parent->cost;
+	child = NodeAlloc();
+	*child = *parent;
+	/* Child is just like parent, except... */
+	child->cost = cost;
+	child->refcnt = 1;	/* The heap */
+	child->input += subst->inlen;
+	child->subst = subst;
+	child->parent = parent;
+	parent->refcnt++;
+	return child;
+}
+
+/* Hash table of nasty lines, indexed by per-line CRC */
+struct NastyLine {
+	struct NastyLine *next;
+	char const *line;
+	CRC crc;
+};
+
+#define NASTY_HASH_SIZE 256
+static struct NastyLine *nastyHash[NASTY_HASH_SIZE];	/* All zero */
+
+struct MemPool nastyStrings, nastyStructs;
+static CRCPoly const *nastyPoly = &crcCCITTPoly;
+/*
+ * Create a new NastyString entry if it doesn't already exist.
+ * Note that this expects the string passed to end in a newline which
+ * IS hashed but NOT stored
+ */
+static struct NastyLine *
+AddNasty(char const *string)
+{
+	size_t len = strlen(string);	/* Including newline */
+	CRC crc = CalculateCRC(nastyPoly, 0, (byte const *)string, len);
+	struct NastyLine *nasty, **nastyp = nastyHash + (crc % NASTY_HASH_SIZE);
+	char *line;
+
+	/* Search for an existing copy */
+	while ((nasty = *nastyp) != NULL) {
+		if (nasty->crc == crc &&
+			memcmp(nasty->line, string, len-1) == 0 &&
+			nasty->line[len-1] == 0)
+				return nasty;
+		nastyp = &nasty->next;
+	}
+	/* Create a new structure */
+	*nastyp = nasty = memPoolNew(&nastyStructs, struct NastyLine);
+	nasty->next = NULL;
+	nasty->line = line = memPoolAlloc(&nastyStrings, len, 1);
+	nasty->crc = crc;
+	memcpy(line, string, len-1);
+	line[len-1] = 0;
+	return nasty;
+}
+
+static void
+RehashNasties(CRCPoly const *poly)
+{
+	struct NastyLine *cur, *head;
+	CRC crc;
+	int i;
+	size_t len;
+
+	/* Put everything into one list and clear the hash table */
+	head = NULL;
+	for (i = 0; i < (int)elemsof(nastyHash); i++) {
+		while ((cur = nastyHash[i]) != NULL) {
+			nastyHash[i] = cur->next;
+			cur->next = head;
+			head = cur;
+		}
+	}
+	/* Recompute CRCs for the list and redistribute them among the buckets */
+	while (head) {
+		cur = head;
+		head = head->next;
+		len = strlen(cur->line);
+		crc = CalculateCRC(poly, 0, (byte const *)cur->line, len);
+		crc = AdvanceCRC(poly, crc, '\n');
+		cur->crc = crc;
+		cur->next = nastyHash[crc % NASTY_HASH_SIZE];
+		nastyHash[crc % NASTY_HASH_SIZE] = cur;
+	}
+	nastyPoly = poly;
+}
+
+/* Read in the nastylines file */
+static void
+ReadNasties(FILE *f)
+{
+	char buf[128];
+
+	while (fgets(buf, sizeof(buf)-1, f))
+		AddNasty(buf);
+}
+
+/*
+ * Convert an encoded string to binary.
+ * No error checking is performed.
+ */
+static word32
+GetWord32(EncodeFormat const *format, char const *buf, int len)
+{
+	word32 w = 0;
+
+	while (len--)
+		w = (w<<format->bitsPerDigit) + DecodeDigit(format, *buf++);
+	return w;
+}
+
+/* Attempt nasty line substitutions */
+static void
+TryNasty(struct ParseNode *parent, Heap *heap, char const *limit)
+{
+	struct NastyLine const *nasty;
+	struct Substitution const *subst;
+	struct ParseNode *child;
+	char const *end;
+	EncodeFormat const *format = pnFormat(parent);
+	OutputHandle oh;
+	char buf[4];
+	CRC check;
+	int i;
+
+	/* Make sure the lines are hashed properly */
+	if (nastyPoly != format->lineCRC)
+		RehashNasties(format->lineCRC);
+
+	/* Get the line to be replaced */
+	assert(parent->ps.pos == PREFIX_LENGTH);
+	end = memchr(parent->input, '\n', limit - parent->input);
+	if (!end)
+		end = limit;
+	/* Get the line's check value */
+	OutputInit(&oh, parent, NULL);
+	(void)OutputGetPrev(&oh);
+	i = 4;
+	while (--i)
+		buf[i] = OutputGetPrev(&oh);
+	check = GetWord32(format, buf, 4);
+	/* Find the matches */
+	nasty = nastyHash[check % NASTY_HASH_SIZE];
+	for (; nasty; nasty = nasty->next) {
+		if (nasty->crc == check) {
+			subst = SubstNasty(parent->input, end-parent->input,
+							   nasty->line, 0);
+			if (subst) {
+				child = AddChild(parent, subst, NASTY_COST);
+				if (child) {
+					child->ps.flags |= PS_FLAG_DYNAMIC;
+					HeapInsert(heap, &child->cost);
+				}
+			}
+		}
+	}
+}
+
+/*
+ * Form all of a ParseNode's children and add them to the heap.
+ * Limit is the limit of allowable lookahead.
+ */
+static void
+AddChildren(ParseNode *parent, Heap *heap, char const *limit)
+{
+	char c = parent->input[0];
+	Substitution *subst = substitutions[0][c & 255];
+	ParseNode *child;
+	HeapCost cost;
+
+/* If you want to make pure insertion substitutions, do that here */
+
+	assert(parent->input < limit);	/* We always have at least one char */
+
+	while (subst) {
+		if (subst->inlen == 1 ||	/* Easy case */
+			((size_t)(limit-parent->input) >= subst->inlen &&
+			 memcmp(subst->input, parent->input, subst->inlen) == 0))
+		{
+			cost = subst->cost;
+			if (subst->filter)
+				cost = subst->filter(parent, limit, subst);
+			child = AddChild(parent, subst, cost);
+			if (child)
+				HeapInsert(heap, &child->cost);
+		}
+		subst = subst->next;
+	}
+
+	/* Whole-line substitutions */
+	if (parent->ps.pos == PREFIX_LENGTH)
+		TryNasty(parent, heap, limit);
+}
+
+
+/* cost if this line has a dynamic substitution, otherwise cost2 */
+HeapCost
+FilterIsDynamic(struct ParseNode *parent, char const *limit,
+	struct Substitution const *subst)
+{
+	(void)limit;
+	return (parent->ps.flags & PS_FLAG_DYNAMIC) ? subst->cost : subst->cost2;
+}
+
+/* cost if the current page is binary mode, else cost2 */
+HeapCost
+FilterIsBinary(struct ParseNode *parent, char const *limit,
+	struct Substitution const *subst)
+{
+	(void)parent; (void)limit;
+	return perpage.tabsize ? subst->cost2 : subst->cost;
+}
+
+/* Debugging utility */
+#define DEBUG 1	/* Set to 1 to print every line considered */
+
+static size_t lastlen = 0;
+
+static void
+OverstrikeLine(char const *line, size_t len)
+{
+	static size_t lastlen = 0;
+	int blanklen;
+
+	if (!line) {
+		if (lastlen)
+			putchar('\n');
+		lastlen = 0;
+	} else if (len || lastlen) {
+		if (len > 79)
+			len = 79;
+		blanklen = (lastlen > len) ? (int)lastlen - len : 0;
+		printf("%.*s%*s\r", (int)len, line, blanklen, "");
+		fflush(stdout);
+		lastlen = len;
+	}
+}
+
+/* Print everything, for debugging */
+static void
+PrintLine(char const *line, size_t len)
+{
+	if (line) {
+		printf("%.*s\n", (int)len, line);
+		lastlen = 0;
+	}
+}
+
+static HeapCost ParseAdvanceString(Heap *heap, ParseNode *pn);
+
+/*
+ * Copy the parsechain from tail up to root, and hang it off of
+ * newroot, adjusting the costs and parse state accordingly.  Returns
+ * NULL if it is unable to (invalid parse, too expensive, etc.)
+ * Note that as per the convention, ParseAdvanceString is *not* called
+ * on the new tail node (but is called on all its parents).
+ */
+static ParseNode *
+CopyParse(ParseNode const *tail, ParseNode const *root, ParseNode *newroot)
+{
+	ParseNode *newtail, *parent;
+
+	if (tail == root)
+		return newroot;
+	parent = CopyParse(tail->parent, root, newroot);
+	if (!parent)
+		return NULL;
+	newtail = AddChild(parent, tail->subst, ParseAdvanceString(NULL, parent));
+	NodeRelease(parent);
+	return newtail;
+}
+
+/*
+ * Replace oldnode with a dynamic substitution for newchar, if possible,
+ * and fill in the chain down to "tail" just like before, but with no branches.
+ * Add the resultant ParseNode to the heap.
+ */
+static void
+AddDynamic(Heap *heap, ParseNode const *oldnode, ParseNode const *tail,
+	int newchar)
+{
+	Substitution const *subst = oldnode->subst;
+	ParseNode *newnode;
+
+	/* Only replace one-character substitutions */
+	if (subst->outlen != 1)
+		return;
+
+	subst = SubstDynamic(oldnode->subst->input, SubstString(newchar), 0);
+	newnode = AddChild(oldnode->parent, subst, -1); /* Try it immediately */
+	if (newnode) {
+		newnode->ps.flags |= PS_FLAG_DYNAMIC;
+		newnode = CopyParse(tail, oldnode, newnode);
+		if (newnode)
+			HeapInsert(heap, &newnode->cost);
+	}
+}
+
+/*
+ * Do the same, at a given (1-based) position on the line.  Owing to
+ * a minor glitch, we must never count the tail node, as this has not
+ * been parsed yet, so its oldnode->ps.pos field is inaccurate.
+ */
+static void
+AddDynamicAt(Heap *heap, int position, ParseNode const *tail, int newchar)
+{
+	ParseNode const *oldnode = tail;
+
+	do {
+		oldnode = oldnode->parent;
+	} while (oldnode->ps.pos > position);
+
+	if (oldnode->ps.pos == position)
+		AddDynamic(heap, oldnode, tail, newchar);
+}
+
+/*
+ * Given the computed and input check fields, correct the header field
+ * that *ends* at the given pos.  This can be used for both the line and
+ * page CRC errors by jyst changing the pos.  (It relies on the fact
+ * that the page CRC fragment fits into the LineCRC type.)
+ * It also relies on the fact that the CRC is at most 4 digits.
+ */
+static void
+ErrorCorrectHeader(Heap *heap, ParseNode const *tail, int pos,
+	EncodeFormat const *format, CRC crc, CRC check)
+{
+	CRC syndrome = crc ^ check;
+
+	/* Find the position and the crc digit at that position */
+	while (syndrome >= (CRC)format->radix) {
+		if (syndrome & (CRC)(format->radix - 1))
+			return;	/* uncorrectable */
+		pos--;
+		crc >>= format->bitsPerDigit;
+		syndrome >>= format->bitsPerDigit;
+	}
+	/* Paste in the correct digit */
+	AddDynamicAt(heap, pos, tail, EncodeDigit(format, crc & (format->radix-1)));
+}
+
+/*
+ * This function walks back through the line, and if the line CRC could be
+ * made correct by changing a character to another legal character,
+ * the change is added (on probation) to the substitution table.
+ */
+static void
+ErrorCorrect(Heap *heap, OutputHandle oh, EncodeFormat const *format,
+	CRC syndrome)
+{
+	ParseNode const *tail = oh.node;
+	int c;
+
+	syndrome = ReverseCRC(format->lineCRC, syndrome, 0);
+	while (oh.node->ps.pos > PREFIX_LENGTH) {
+		c = OutputGetPrev(&oh);
+		if (c == '\n' || c == -1) {	/* Can't happen */
+			printf("Line ended at pos %d\n", oh.node->ps.pos);
+			return;
+		}
+		syndrome = ReverseCRC(format->lineCRC, syndrome, 0);
+		if (syndrome >= 0x100 || !substitutions[0][c^syndrome] ||
+			oh.node->subst->outlen != 1)
+				continue;
+		AddDynamic(heap, oh.node, tail, c^syndrome);
+	}
+}
+
+/*
+ * Parsing operations.  This is a rather ugly and ad-hoc parser that
+ * knows a lot about the fixed-field format produced by the munge
+ * utility.  The main state variable is the position in
+ * the line, which controls the expected header, the position of
+ * tab stops, and the maximum permissible line length.
+ */
+#define OCCASIONALLY 100
+
+/* Set up a ParseState to top-of-page */
+static void
+ParseStateInit(ParseState *ps)
+{
+	static struct ParseState const parseNull = { 0, 0, 0 };
+	*ps = parseNull;
+}
+
+/*
+ * Try to accept a newline, checking CRCs and even doing error-correction
+ * as appropriate.
+ */
+static int
+ParseNewline(Heap *heap, ParseNode *pn, char const *string)
+{
+	OutputHandle oh;
+	int c;
+	char debugbuf[PREFIX_LENGTH+LINE_LENGTH+10];
+	char *header, *body, *end;
+	int pos, width;
+	CRC crc, check;
+	ParseNode *temp;
+	static int occasionally = OCCASIONALLY;
+	EncodeFormat const *format = pnFormat(pn);
+	EncodeFormat const *headerFormat = &hexFormat;
+
+	/* Get the line into a buffer for analysis */
+	OutputInit(&oh, pn, string);
+	end = debugbuf + sizeof(debugbuf)-1;
+	header = OutputGetUntil(oh, end, '\n');
+	/* Strip leading and trailing whitespace */
+	while (header < end && isspace((unsigned char)header[0]))
+		header++;
+	while (header < end && isspace((unsigned char)end[-1]))
+		end--;
+	*end++ = '\n';
+
+	/* Start of checksummed area */
+	body = header + PREFIX_LENGTH;
+	/* Blank lines are missing the trainign space from the prefix */
+	if (body >= end)
+		body = end-1;
+
+	crc = CalculateCRC(format->lineCRC, 0, body, end-body);
+	check = GetWord32(format, header+2, 4);
+	if (crc != check) {
+		if (!--occasionally) {
+			OverstrikeLine(header, end-header-1);
+			occasionally = OCCASIONALLY;
+		}
+		/* Try ECC on the line */
+		/* If we haven't already tried ECC on the line... */
+		if (!(pn->ps.flags & PS_FLAG_DYNAMIC)) {
+			ErrorCorrectHeader(heap, pn, PREFIX_LENGTH-1, format, crc, check);
+			ErrorCorrect(heap, oh, format, crc ^ check);
+		}
+		return COST_INFINITY;
+	}
+	/* Good enough that we always print it */
+	OverstrikeLine(header, end-header-1);
+
+	/* Okay, now there are two cases - header line or running CRC */
+	if (pn->ps.flags & PS_FLAG_INHEADER) {
+		/* Do things for first header */
+		if (!(pn->ps.flags & PS_FLAG_PASTHEADER)) {
+			/* Check version number */
+			width = EncodedLength(headerFormat, HDR_VERSION_BITS);
+			c = (int)GetWord32(&hexFormat, body, width);
+			if (c != 0) {
+				fputs("Fatal: you need a newer version of repair"
+				      " to process this file\n", stderr);
+				exit(1);
+			}
+			/* Suck in page CRC, after version & flags */
+			pos = width + EncodedLength(headerFormat, HDR_FLAG_BITS);
+			width = EncodedLength(headerFormat, format->pageCRC->bits);
+			perpage.page_check = GetWord32(&hexFormat, body+pos, width);
+			/* Get tab size */
+			pos += width;
+			width = EncodedLength(headerFormat, HDR_TABWIDTH_BITS);
+			perpage.tabsize = GetWord32(&hexFormat, body+pos, width);
+
+			/* Once we have the header, don't reconsider */
+			if (!(pn->ps.flags & PS_FLAG_PASTHEADER))
+				while ((temp = (ParseNode *)HeapGetMin(heap)) != NULL)
+					NodeRelease(temp);
+			pn->ps.page_crc = 0;	/* Clear for top of page */
+		}
+	} else {
+		/* Check the CRC-32 */
+		crc = CalculateCRC(format->pageCRC, pn->ps.page_crc, body, end-body);
+		pn->ps.page_crc = crc;
+		crc = RunningCRCFromPageCRC(format, crc);
+		check = GetWord32(format, header, 2);
+		if (crc != check) {
+			if (!(pn->ps.flags & PS_FLAG_DYNAMIC))
+				ErrorCorrectHeader(heap, pn, 2, format, crc, check);
+			return COST_INFINITY;
+		}
+	}
+
+	/* Hey, it's correct! */
+	PrintLine(header, end-header-1);
+
+	/* Start next line */
+	pn->ps.pos = 0;
+	/* Clear most other flags, but we *have* got a header */
+	c = pn->ps.flags & PS_FLAG_DYNAMIC;
+	pn->ps.flags &= PS_FLAG_BINEND | PS_MASK_FORMAT;
+	pn->ps.flags |= PS_FLAG_PASTHEADER;
+	/*
+	 * Give a bonus to the next line for having completed this one,
+	 * less if it was dynamically fixed.
+	 */
+	return c ? COST_LINE : COST_LINE*2/3;
+}
+
+/*
+ * Advance the parse state with pointed-to character.  Returns
+ * COST_INFINITY if an impossible state is reached, otherwise returns a
+ * cost value.  (Normally 0, this can be increased to penalize unlikely
+ * output combinations to nudge the correction in a certain direction.)
+ */
+static HeapCost
+ParseAdvance(Heap *heap, ParseNode *pn, char const *string)
+{
+	int i, retval = 0;
+	char c = *string;
+	EncodeFormat const *format = pnFormat(pn);
+
+	/*
+	 * Insist on spaces being correctly converted to SPACE_CHAR.
+	 * There's a little irregularity just before EOL.
+	 * Line contiunation and formfeed are also only legal at EOL.
+	 */
+	if (c == ' ') {
+		if (pn->ps.flags & PS_FLAG_SPACE && !(pn->ps.flags & PS_FLAG_TAB))
+			pn->ps.flags |= PS_FLAG_EOL;
+		pn->ps.flags |= PS_FLAG_SPACE;
+	} else if (pn->ps.flags & PS_FLAG_EOL) {
+		if (c != '\n')
+			return COST_INFINITY;
+	} else if (c == SPACE_CHAR) {
+		if (!(pn->ps.flags & PS_FLAG_SPACE))
+			pn->ps.flags |= PS_FLAG_EOL;
+	} else if (c == CONTIN_CHAR || c == FORMFEED_CHAR) {
+			pn->ps.flags |= PS_FLAG_EOL;
+	} else {
+		pn->ps.flags &= ~PS_FLAG_SPACE;
+	}
+
+	switch (pn->ps.pos) {
+		case 0:
+			if (c == ' ' || c == '\n') {
+				break;		/* Ignore ws and blank lines completely */
+			} else if (c == '\f' || c == HDR_PREFIX_CHAR) {
+				/* Start of a new page */
+				pn->ps.flags |= PS_FLAG_INHEADER;	/* Expect header next */
+				if (c == '\f')
+					break;
+				/* And fall through to increment pos */
+			} else if (pn->ps.flags & PS_FLAG_INHEADER ||
+					   pn->ps.flags & PS_FLAG_BINEND ||
+					   !(pn->ps.flags & PS_FLAG_PASTHEADER) ||
+					   DecodeDigit(format, c) < 0) {
+				return COST_INFINITY;	/* Various illegal cases */
+			}
+			pn->ps.pos++;
+			break;
+		case 1:
+			if ((pn->ps.flags & PS_FLAG_INHEADER)) {
+				format = FindFormat(c);	/* Second char of header */
+				if (!format)
+					return COST_INFINITY;
+				i = registerFormat(format);
+				psSetFormat(&pn->ps, i);
+				pn->ps.pos++;
+				break;
+			}
+			if (DecodeDigit(format, c) < 0)
+				return COST_INFINITY;	/* Illegal */
+			pn->ps.pos++;
+			break;
+		case 2:
+		case 3:
+		case 4:
+#if PREFIX_LENGTH != 7
+#error fix this code
+#endif
+		case PREFIX_LENGTH-2:
+			if (DecodeDigit(format, c) < 0)
+				return COST_INFINITY;	/* Illegal */
+			pn->ps.pos++;
+			break;
+		case PREFIX_LENGTH-1:
+			if (c == ' ') {
+				pn->ps.pos++;
+				break;
+			} else if (c != '\n') {
+				return COST_INFINITY;
+			}
+			/* Blank lines may be missing this space char */
+			/*FALLTHROUGH*/
+		/* The normal line starts here, at position 7 */
+		default:
+			if (pn->ps.flags & PS_FLAG_INHEADER) {	/* Header line */
+				/* Format is "--abcd 0123456789abcdef012 Page %u of %s" */
+				int off = pn->ps.pos - (PREFIX_LENGTH+HDR_ENC_LENGTH);
+				/* Offset relative to end of hex header */
+				if (off < 0) {
+					if (HexDigitValue(c & 255) < 0)
+						return COST_INFINITY;
+				} else if (off < 6) {
+					if (c != " Page "[off])	/* Yes, this is legal C */
+						return COST_INFINITY;
+				} else if (off == 6) {
+					if (c < '1' || c > '9')	/* First digit of page no. */
+						return COST_INFINITY;
+				} else {
+					/* Re-base to end of scanned part of page number */
+					off -= 7 + PageNumDigits(pn);
+					if (off == 0) {
+						if (c >= '0' && c <= '9' && PageNumDigits(pn) < 3)
+							PageNumDigitsIncrement(pn);
+						else if (c != ' ')
+							return COST_INFINITY;
+					} else if (off < 4) {
+						if (c != " of "[off])
+							return COST_INFINITY;
+					} else if (off == 4) {
+						if (!isgraph(c))
+							return COST_INFINITY;
+					} else if (c < ' ' || (c & 255) > '~') {
+						if (c != '\n')
+							return COST_INFINITY;
+						return ParseNewline(heap, pn, string);
+					}
+				}
+			} else if (!perpage.tabsize) {	/* Radix-64 line */
+				/* Line is "RlNFVF9UQU==   \n" */
+				if (isspace(c & 255)) {
+					if (!(pn->ps.flags & PS_FLAG_BINWS)) {
+						if ((pn->ps.pos - PREFIX_LENGTH) % 4 != 0)
+							return COST_INFINITY;
+						pn->ps.flags |= PS_FLAG_BINWS;
+						if (pn->ps.pos - PREFIX_LENGTH < BYTES_PER_LINE*4/3)
+							pn->ps.flags |= PS_FLAG_BINEND;
+					}
+					if (c == '\n')
+						return ParseNewline(heap, pn, string);
+				} else if (pn->ps.flags & PS_FLAG_BINWS) {
+					return COST_INFINITY;
+				} else if (c == RADIX64_END_CHAR) {
+					if ((pn->ps.pos - PREFIX_LENGTH) % 4 < 2)
+						return COST_INFINITY;
+					pn->ps.flags |= PS_FLAG_BINEND;
+				} else if (pn->ps.flags & PS_FLAG_BINEND) {
+					return COST_INFINITY;
+				} else if (Radix64DigitValue(c) < 0) {
+					return COST_INFINITY;
+				}
+			} else {	/* Normal line */
+				/* Make sure tab stops come out right */
+				if (pn->ps.flags & PS_FLAG_TAB) {
+					if (((pn->ps.pos - PREFIX_LENGTH) % perpage.tabsize) == 0)
+						pn->ps.flags &= ~PS_FLAG_TAB;
+					else if (c != TAB_PAD_CHAR && c != '\n') {
+						return COST_INFINITY;	/* Illegal */
+					}
+				}
+				/*
+				 * Yes, this code has hard-coded ASCII assumptions
+				 * It knows that the acceptable range of '\n', ' '..'~',
+				 * TAB_CHAR, FORMFEED_CHAR is in that order.
+				 * Signed char machines have it backwards, to be confusing.
+				 */
+				if ((c & 255) < ' ') {
+					/* Newline! (Or something illegal) */
+					if (c != '\n')
+						return COST_INFINITY;
+					return ParseNewline(heap, pn, string);
+				}
+				/* A normal character */
+				if ((c & 255) > '~') {
+					if (pn->ps.flags & PS_FLAG_INHEADER)
+						return COST_INFINITY;	/* Illegal */
+					if (c == TAB_CHAR)
+						pn->ps.flags |= PS_FLAG_TAB;
+					else if (c != FORMFEED_CHAR && c != SPACE_CHAR &&
+												   c != CONTIN_CHAR)
+						return COST_INFINITY;	/* Illegal */
+				}
+			}
+			if (++pn->ps.pos > PREFIX_LENGTH + LINE_LENGTH)
+				return COST_INFINITY;
+			break;
+	}
+	return retval;
+}
+
+/*
+ * Run the parser over the string in a ParseNode (using repeated calls
+ * to ParseAdvance).  Return the penalty cost, or COST_INFINITY if
+ * it's impossible
+ */
+static HeapCost
+ParseAdvanceString(Heap *heap, ParseNode *pn)
+{
+	HeapCost cost, total = 0;
+	char const *string = pn->subst->output;
+
+	while (*string) {
+		cost = ParseAdvance(heap, pn, string++);
+		if (cost == COST_INFINITY)
+			return cost;
+		total += cost;
+	}
+	return total;
+}
+
+static unsigned int *globalStats = NULL;
+static unsigned globalSize = 0;
+static unsigned globalEdits = 0;
+
+/*
+ * This walks the list of substitutions, performing two tasks with
+ * the statistics gathered.
+ *
+ * First, although not essential, it prints any interesting changes
+ * (non-identity substitutions) made, and a count of the total number
+ * of substitutions (including identity) as an approximate character count.
+ *
+ * Second, it does maintenance on dynamic (learned during program
+ * execution) substitutions.  It discards any substitutions that end
+ * up unused, and computes nice costs for the others, based on the
+ * global (per-file) statistics.
+ *
+ * (This function is also called at the end to print the per-file stats,
+ * which does redundant weight adjustment, but it's harmless.)
+ */
+static void
+UseStats(unsigned int *stats, FILE *log)
+{
+	unsigned int i, j, n, changes = 0;
+	unsigned long grand = 0;
+	Substitution *s, **sp;
+
+	if (!stats)
+		return;
+
+	/* Yes, this loop is permuted on purpose */
+	for (j = 0; j < elemsof(*substitutions); j++) {
+		for (i = 0; i < elemsof(substitutions); i++) {
+			sp = &substitutions[i][j];
+			while ((s = *sp) != 0) {
+				grand += n = stats[s->index];
+				/* Retain or purge dynamic substitutions, depending. */
+				if (SubstIsDynamic(s)) {
+					if (n) {
+						SubstAdjust(s, n);
+					} else if (!globalStats[s->index]) {
+						/* Forget unused dynamic substitutions */
+						*sp = s->next;
+						if (SubstIsNasty(s))
+							free((char *)s->input);	/* Dynamically allocated */
+						SubstFree(s);
+						continue;
+					}
+				}
+				sp = &s->next;
+				/*
+				 * Print interesting substitutions.  Some boring substitutions,
+				 * flagged with an index value of zero, are not printed.
+				 */
+				if (!s->index || !n)
+					continue;
+				changes += n;
+				fprintf(log, "\t%2ux \"%.*s\"%*s-> \"%.*s\"%*s(cost ",
+					   stats[s->index], (int)s->inlen, s->input,
+					   s->inlen>3 ? 0 : 3-(int)s->inlen, "",
+					   (int)s->outlen, s->output,
+					   s->outlen>3 ? 0 : 3-(int)s->outlen, "");
+				fprintf(log, s->cost == COST_INFINITY ? "-" : "%d", s->cost);
+				if (s->filter)
+					fprintf(log, s->cost2 == COST_INFINITY ? "/-" : "/%d",
+					        s->cost2);
+				fputs(SubstIsDynamic(s) ? ") ** LEARNED **\n" : ")\n", log);
+			}
+		}
+	}
+	fprintf(log, "\tTotal: %u changes (out of %lu)\n", changes, grand);
+}
+
+static void
+DoStats(ParseNode const *node, unsigned int page, FILE *log)
+{
+	unsigned int *stats;
+	unsigned int n;
+
+	/* Enlarge global stats if needed */
+	if (globalSize < substCount) {
+		stats = realloc(globalStats, substCount * sizeof(*stats));
+		if (!stats)  {
+			fputs("Fatal error: out of memory for stats!\n", stderr);
+			exit(1);
+		}
+		for (n = globalSize; n < substCount; n++)
+			stats[n] = 0;
+		globalStats = stats;
+		globalSize = substCount;
+	}
+
+	/* Allocate per-page stats */
+	stats = calloc(substCount, sizeof(*stats));
+	if (!stats) {
+		fputs("Fatal error: out of memory for stats!\n", stderr);
+		exit(1);
+	}
+	/* Cheat and assume that calloc() initializes unsigned ints to zero */
+	while (node) {
+		stats[node->subst->index]++;
+		node = node->parent;
+	}
+
+	/* Keep the global counts accurate */
+	for (n = 0; n < substCount; n++)
+		globalStats[n] += stats[n];
+
+	fprintf(log, "Page %u substitutions:\n", page);
+	UseStats(stats, log);
+
+	free(stats);
+}
+
+/* Spit out a page of data (needs work).  Returns number of lines */
+static unsigned
+PrintPage(OutputHandle oh, FILE *out)
+{
+	char pagebuf[PAGE_BUFFER_SIZE];
+	char *p1;	/* Beginning of current line */
+	char *p2;	/* End of current line (WS stripped) */
+	char *p3;	/* End of current line (newline) */
+	char *p4;	/* End of all output */
+	unsigned lines = 0;
+
+	p4 = pagebuf + sizeof(pagebuf);
+	p1 = OutputGetUntil(oh, p4, -1);
+
+	/* Output the lines without leading & trailing whitespace */
+	while (p1 < p4) {
+		/* Identify the line */
+		p3 = memchr(p1, '\n', p4-p1);
+		if (!p3)
+			p3 = p4;
+		/* Delete leading whitespacee */
+		while (isspace((unsigned char)*p1) && p1 < p3)
+			p1++;
+		/* Delete trailing whitepace */
+		p2 = p3;
+		while (isspace((unsigned char)p2[-1]) && p1 < p2)
+			p2--;
+		/* Spit out this line */
+		fwrite(p1, 1, (size_t)(p2-p1), out);
+		putc('\n', out);
+		/* Advance p1 past the newline */
+		p1 = p3 + 1;
+		lines++;
+	}
+	return lines;
+}
+
+static volatile int interrupt = 0;
+static void (* volatile oldhandler)(int) = SIG_DFL;
+
+static void inthandler(int sig)
+{
+	if (++interrupt > 2)
+		(void)signal(sig, oldhandler);
+}
+
+/*
+ * Given a buffer, process a page from it and try to write a corrected page to
+ * the out file.  Return the number of bytes accessed.  (0 if it was unable
+ * to make any corrections.)
+ */
+static size_t
+DoPage(char const *buf, size_t len, FILE *out, unsigned int page, FILE *log)
+{
+	ParseNode *node;
+	Heap heap;
+	HeapCost cost;
+	OutputHandle oh;
+	void (*sighandler)(int);
+
+	HeapInit(&heap, 1000);
+	NodePoolInit();
+	PerPageInit(buf);
+
+	/* Initialize signal handling */
+	interrupt = 0;
+	sighandler = signal(SIGINT, inthandler);
+	if (sighandler != inthandler)
+		oldhandler = sighandler;
+
+	/* Make a root node */
+	node = NodeAlloc();
+	node->cost = 0;
+	node->refcnt = 1;
+	node->input = buf;
+	node->subst = &substNull;
+	ParseStateInit(&node->ps);
+	node->parent = NULL;
+
+	HeapInsert(&heap, &node->cost);
+
+	/* The main loop: try to extend the current parse. */
+	while ((node = (ParseNode *)HeapGetMin(&heap)) != NULL) {
+		cost = ParseAdvanceString(&heap, node);
+		if (cost != COST_INFINITY) {
+			/* End of the file, or hit a second header line? */
+			if (node->input == buf+len || InSecondHeader(&node->ps)) {
+				/* Try to wrap up page, if page CRC works */
+				if (node->ps.page_crc == perpage.page_check) {
+					/* Success! */
+					HeapDestroy(&heap);
+					OutputInit(&oh, node, NULL);
+					OverstrikeLine("", 0);
+
+					if (InSecondHeader(&node->ps)) {
+						/* Back up to last newline */
+						OutputInit(&oh, node, NULL);
+						while (OutputGetPrev(&oh) != '\n')
+							;
+						OutputUnget(&oh);
+					}
+					/* oh points to node that emitted last char on page */
+					len = oh.node->input - buf; /* Chars eaten this page */
+					perpage.lines = PrintPage(oh, out);
+					DoStats(oh.node, page, log);
+					NodePoolCleanup();
+					return len;
+				}
+			} else {
+				/* Keep working on the page */
+				node->cost = cost += node->cost;
+				if (node->input > perpage.maxpos) {
+					perpage.maxpos = perpage.minpos = node->input;
+					if (perpage.max_retries < perpage.retries)
+						perpage.max_retries = perpage.retries;
+					perpage.retries = 0;	/* Made progress */
+				} else if (node->input < perpage.minpos) {
+					perpage.minpos = node->input;	/* Furthest backtrack */
+				}
+				++perpage.retries;
+				if (heap.numElems > MAX_HEAP || interrupt)
+					HeapDestroy(&heap);
+				else
+					AddChildren(node, &heap, buf+len);
+			}
+		}
+		NodeRelease(node);
+	}
+	/* Failed! */
+	OverstrikeLine(NULL, 0);
+	puts("Stopping for manual edit.");
+
+	NodePoolCleanup();
+	/* Get rid of the dynamic substitutions */
+	DoStats(NULL, page, log);
+
+	return 0;
+}
+
+/* The magic file-shuffling routine. */
+static int
+RepairFile(char const *name, char const *editor, char const *nastylines)
+{
+	char buf[PAGE_BUFFER_SIZE];
+	char *filename;
+	char const *p;
+	size_t namelen;
+	FILE *in = 0, *out = 0, *dump = 0, *log = 0;
+	size_t inbytes;		/* Bytes in input buffer */
+	size_t outbytes;	/* Bytes taken from input buffer */
+	unsigned int pages = 0;	/* # of pages processed */
+	unsigned int lines = 0;	/* # of lines processed (until trouble) */
+	unsigned int minline, maxline;	/* Where is the error? */
+	int giveup;			/* Have we had to abort corrections? */
+	int err;			/* Copy of errno for returns */
+
+	globalSize = 0;	/* Reset global stats */
+
+	namelen = strlen(name);
+	if (!(filename = malloc(namelen+10))) {
+		p = "Unable to allocate memory\n";
+		goto error;
+	}
+
+	memcpy(filename, name, namelen);
+	strcpy(filename+namelen, ".log");
+	puts(filename);
+	if (!(log = fopen(filename, "at"))) {
+		p = "Unable to open log file \"%s\"\n";
+		goto error;
+	}
+
+	strcpy(filename+namelen, ".out");
+	puts(filename);
+	if (!(out = fopen(filename, "at"))) {
+		p = "Unable to open output file \"%s\"\n";
+		goto error;
+	}
+
+retry:
+	/* Read in any new nasty lines */
+	if (!(in = fopen(nastylines, "rt"))) {
+		fprintf(stderr, "Unable to open nasty lines file \"%s\"\n", nastylines);
+	} else {
+		ReadNasties(in);
+		fclose(in);
+	}
+	/* Try to open input file - .in or original */
+	p = filename;
+	strcpy(filename+namelen, ".in");
+	if (!(in = fopen(filename, "rt"))) {
+		if (!(in = fopen(name, "rt"))) {
+			filename[namelen] = 0;
+			p = "Unable to open input file \"%s\"\n";
+			goto error;
+		}
+		p = name;
+	}
+	printf("Repairing from %s\n", p);
+	strcpy(filename+namelen, ".dmp");
+	if (!(dump = fopen(filename, "wt"))) {
+		p = "Unable to open output file \"%s\"\n";
+		goto error;
+	}
+
+	giveup = 0;
+	inbytes = 0;	/* Bytes already at the front of the buffer */
+	/* Append more data from the file */
+	while ((inbytes += fread(buf+inbytes, 1, sizeof(buf)-inbytes, in)) != 0) {
+		if (giveup) {
+			/* Giving up mode - just copy through */
+			outbytes = fwrite(buf, 1, inbytes, dump);
+			if (!outbytes) {
+				p = "Error writing dump file!\n";
+				goto error;
+			}
+		} else {
+			outbytes = DoPage(buf, inbytes, out, pages+1, log);
+			NodePoolCleanup();
+			if (outbytes) {
+				pages++;
+				lines += perpage.lines;
+			} else {	/* Failed */
+				/* Find range of backtracking for error location */
+				minline = 1;
+				for (p = buf;  p < perpage.minpos; p++)
+					minline += (*p == '\n');
+				for (maxline = minline; p < perpage.maxpos; p++)
+					maxline += (*p == '\n');
+				giveup = 1;
+			}
+		}
+		/* Fewer bytes now in the buffer */
+		inbytes -= outbytes;
+		/* Move those bytes to the front again */
+		memmove(buf, buf+outbytes, inbytes);
+	}
+
+	fclose(in);
+	in = 0;
+	fclose(dump);
+	dump = 0;
+
+	/* Okay, let's get tricky */
+	memcpy(buf, name, namelen);
+	strcpy(buf+namelen, ".dmp");
+	strcpy(filename+namelen, ".in");
+
+	/* teun: MS Visual C doesn't rename on top of existing file; remove it */
+	if (remove(filename) != 0) {
+		err = errno;
+		fprintf(stderr, "Warning deleting %s\n", filename);
+	}
+
+	if (rename(buf, filename) != 0) {
+		err = errno;
+		fclose(out);
+		fclose(log);
+		/* teun: corrected buf, filename order. This cost me an hour */
+		fprintf(stderr, "Error renaming %s -> %s\n", buf, filename);
+		return err;
+	}
+
+	/* This code is spaghetti - is there a cleaner way? */
+	if (giveup) {
+		printf("Error in %s, lines %u-%u\n", filename, minline, maxline);
+		fprintf(log, "Error in %s, lines %u-%u\n", filename, minline, maxline);
+		if (interrupt > 1)
+			goto manual;
+		if (editor) {
+			if (strcmp(editor, "-") == 0)
+				goto manual;
+			sprintf(buf, editor, maxline, filename);
+		} else {
+			p = getenv("VISUAL");
+			if (!p)
+				p = getenv("EDITOR");
+			if (!p)
+				goto manual;
+			sprintf(buf, "%s +%u %s\n", p, maxline, filename);
+		}
+		printf("Executing %s\n", buf);
+		globalEdits++;
+		if (system(buf) == 0)
+			goto retry;
+		fputs("Edit failed - aborting\n", stderr);
+manual:
+		puts("Please fix the error by hand and re-run repair.");
+	}
+
+	fclose(out);
+	free(filename);
+
+	fprintf(log, "\n%u lines successfully processed.\n", lines);
+	fprintf(log, "Overall substitutions (%u pages):\n", pages);
+	UseStats(globalStats, log);
+	printf("%u manual edits required\n", globalEdits);
+	fclose(log);
+
+	return 0;
+
+error:
+	err = errno;
+	if (log) fclose(log);
+	if (dump) fclose(dump);
+	if (out) fclose(out);
+	if (in) fclose(in);
+	fprintf(stderr, p, filename);
+	free(filename);
+	return err;
+}
+
+/* Process the command line, calling RepairFile as needed. */
+int
+main(int argc, char *argv[])
+{
+	int		result = 0;
+	int		i;
+	char const *editor = NULL;
+	char const *nastylines = "nastylines";
+
+	InitUtil();
+	SubstBuild();
+	memPoolInit(&nastyStructs);
+	memPoolInit(&nastyStrings);
+
+	/* Process leading flags */
+	for (i = 1; i < argc && argv[i][0] == '-'; i++) {
+		if (argv[i][1] == '-' && argv[i][2] == 0) {
+			i++;
+			break;
+		} else if (argv[i][1] == 'e') {
+			editor = argv[i][2] ? argv[i]+2 : argv[++i];
+		} else if (argv[i][1] == 'l') {
+			nastylines = argv[i][2] ? argv[i]+2 : argv[++i];
+		} else {
+			editor = argv[i][2] ? argv[i]+2 : argv[++i];
+			fprintf(stderr, "ERROR: Unrecognized option %s\n", argv[i]);
+			return 1;
+		}
+	}
+
+	/* Process files */
+	for (; i < argc; i++) {
+		result = RepairFile(argv[i], editor, nastylines);
+		if (result != 0) {
+			fprintf(stderr, "Fatal error: %s\n", strerror(result));
+			return 1;
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * Local Variables:
+ * tab-width: 4
+ * End:
+ * vi: ts=4 sw=4
+ * vim: si
+ */
diff --git a/tools/sortpages b/tools/sortpages
new file mode 100644
index 0000000..29689fd
--- /dev/null
+++ b/tools/sortpages
@@ -0,0 +1,185 @@
+#!/usr/bin/perl
+#
+# $Id: sortpages,v 1.8 1997/12/11 19:20:58 mhw Exp $
+#
+
+@fileNameFromNumber = ();
+@pagesFound = ();
+$theProductNumber = 0;
+
+for $fileIndex (0..$#ARGV)
+{
+	$fileName = $ARGV[$fileIndex];
+	open(FILE, "<$fileName") || die;
+	while (!eof(FILE))
+  	{
+  		$filePos = tell(FILE);
+  		$_ = <FILE>;
+ 		if (/^\f?-\S/)
+  		{
+  			my ($versionHex, $flagsHex, $pageCRCHex, $tabWidthHex,
+				$productNumberHex, $fileNumberHex, $pageNumber, $name)
+					  = (/^\f?-\S\S{4}\ 		# CRC followed by a space
+						 ([0-9a-f])				# Format version
+						 ([0-9a-f]{2})			# Flags
+						 ([0-9a-f]{8})			# Running CRC32
+						 ([0-9a-f])				# Tab width (0 means radix64)
+						 ([0-9a-f]{3})			# Product number
+						 ([0-9a-f]{4})			# File number
+						 \ Page\ (\d+)\ of\ (.*)/x);
+			my $version = hex($versionHex);
+			my $flags = hex($flagsHex);
+			my $productNumber = hex($productNumberHex);
+			my $fileNumber = hex($fileNumberHex);
+
+			unless ($version == 0 && $productNumber > 0
+						&& $fileNumber > 0 && $pageNumber > 0
+						&& $name ne "")
+			{
+				print STDERR "ERROR: Invalid header info ",
+							 "at $fileName line $.\n";
+				exit(1);
+			}
+
+			if (!defined($fileNameFromNumber[$fileNumber]))
+			{
+				$fileNameFromNumber[$fileNumber] = $name;
+			}
+			elsif ($fileNameFromNumber[$fileNumber] ne $name)
+			{
+				print STDERR "ERROR: Mismatched filename ",
+							 "at $fileName line $.\n";
+				exit(1);
+			}
+
+			if (!$theProductNumber)
+			{
+				$theProductNumber = $productNumber;
+			}
+			elsif ($theProductNumber != $productNumber)
+			{
+				print STDERR "ERROR: Different product number ",
+							 "at $fileName line $.\n";
+				exit(1);
+			}
+
+			push @pagesFound, (sprintf "%5d:%4d:%d:%d:%d",
+					 $fileNumber, $pageNumber, $flags, $fileIndex, $filePos);
+		}
+	}
+	close(FILE) || die;
+}
+
+@pagesFound = sort @pagesFound;
+
+$result = 0;
+$lastFileNumber = 0;
+$lastPageNumber = 0;
+$nextFileNumber = 1;
+$nextPageNumber = 1;
+$fileIndexOpen = -1;
+foreach (@pagesFound)
+{
+	my ($fileNumber, $pageNumber, $flags, $fileIndex, $filePos) = split /:/;
+
+	$fileNumber = int($fileNumber);
+	$pageNumber = int($pageNumber);
+
+	if ($fileNumber == $lastFileNumber && $pageNumber == $lastPageNumber)
+	{
+		print STDERR "DUPLICATE: File $fileNumber, page $pageNumber, skipped\n";
+		next;
+	}
+
+	if ($nextFileNumber < $fileNumber && $nextPageNumber != 1)
+	{
+		print STDERR "MISSING: File $nextFileNumber, ",
+					 "pages $nextPageNumber - END\n";
+		$nextPageNumber = 1;
+		$nextFileNumber++;
+		$result = 1;
+	}
+	if ($nextFileNumber < $fileNumber)
+	{
+		print STDERR "MISSING: Files $nextFileNumber - ",
+					 $fileNumber-1, "\n";
+		$nextFileNumber = $fileNumber;
+		$nextPageNumber = 1;
+		$result = 1;
+	}
+	if ($nextFileNumber != $fileNumber)
+	{
+		print STDERR "ERROR: Internal error, unexpected fileNumber\n";
+		exit(1);
+	}
+
+	if ($nextPageNumber < $pageNumber)
+	{
+		print STDERR "MISSING: File $fileNumber, pages $nextPageNumber - ",
+					 $pageNumber-1, "\n";
+		$nextPageNumber = $pageNumber;
+		$result = 1;
+	}
+	if ($nextPageNumber != $pageNumber)
+	{
+		print STDERR "ERROR: Internal error, unexpected pageNumber\n";
+		exit(1);
+	}
+
+	if ($fileIndexOpen != $fileIndex)
+	{
+		if ($fileIndexOpen >= 0)
+		{
+			close(FILE) || die;
+			$fileIndexOpen = -1;
+		}
+		$fileName = $ARGV[$fileIndex];
+		open(FILE, "<$fileName") || die;
+		$fileIndexOpen = $fileIndex;
+	}
+	seek(FILE, $filePos, 0) || die($!);
+
+	$_ = <FILE>;
+	print;
+	while (<FILE>)
+	{
+		last if /^\f?-\S/;
+		print;
+	}
+	$lastFileNumber = $fileNumber;
+	$lastPageNumber = $pageNumber;
+
+	if ($flags & 1)		# Bit 0 of flags indicates last page of file
+	{
+		$nextFileNumber++;
+		$nextPageNumber = 1;
+	}
+	else
+	{
+		$nextPageNumber++;
+	}
+}
+
+if ($nextPageNumber != 1)
+{
+	print STDERR "MISSING: File $nextFileNumber, ",
+				 "pages $nextPageNumber - END\n";
+	$nextPageNumber = 1;
+	$nextFileNumber++;
+	$result = 1;
+}
+
+print STDERR "Highest file number encountered: ", $nextFileNumber - 1, "\n";
+
+if ($fileIndexOpen >= 0)
+{
+	close(FILE) || die;
+	$fileIndexOpen = -1;
+}
+
+exit($result);
+
+#
+# vi: ai ts=4
+# vim: si
+#
diff --git a/tools/subst.c b/tools/subst.c
new file mode 100644
index 0000000..76dfe13
--- /dev/null
+++ b/tools/subst.c
@@ -0,0 +1,222 @@
+/*
+ * subst.c -- Repair substitution tables
+ *
+ * Copyright (C) 1997 Pretty Good Privacy, Inc.
+ *
+ * Written by Colin Plumb
+ *
+ * $Id: subst.c,v 1.14 1997/11/03 22:12:00 colin Exp $
+ *
+ * IT IS EXPECTED that users of this program will play with these tables
+ * and the cost values in the subst.h header.  (Some day, they'll all
+ * get moved to an external config file.)
+ *
+ * NOTE: Other cost are hiding in the Filter functions in repair.c.
+ * Remember to keep them all on the same scale.
+ */
+
+/*
+ * The repair program copies its input to its output, making various
+ * substitutions, until it manages to produce a version that satisfies
+ * the parser.  This includes having a correct CRC for each line.
+ * Each substitution has a cost, and the combinations are tried in order
+ * of increasing cost.  NOTE that even translating "A"->"A" counts as
+ * a substitution, although it may have zero cost.
+ *
+ * The intention is to correct transcription errors, where the
+ * errors have a distinctly non-uniform distribution.  Slight
+ * differences in cost produce a preference in trying some errors
+ * first.  If an error costs half as much as another, combinations
+ * of two of that error will be compared to one of the more expensive.
+ * Too many cheap substitutions will result is repair spending
+ * a very log time searching before considering the more expensive
+ * substitutions.
+ *
+ * The following parameters and the raw substitution tables are expected
+ * to be edited by the user based on experience.  Eventually, this
+ * will be moved into an external config file, but for now it's a matter
+ * of recompiling.
+ */
+
+#include "subst.h"
+#include "util.h"
+
+/* what the OCR software reports for "unrecognizable */
+#define UNRECOG_STRING "~\274"
+
+/*
+ * The input substitutions to make (one-to-one).   These are listed in
+ * the order of correction. i.e. uncorrected input first, then corrected
+ * output.  Substitutions are one-way; to get two-way, list it twice.
+ */
+
+struct RawSubst const substSingles[] = {
+	/* Identity substitutions - note that period (.) is excluded */
+	{ "!\"#$%&'()*+,-./0123456789:;<=>?" SPACE_STRING,
+	  "!\"#$%&'()*+,-./0123456789:;<=>?" SPACE_STRING, 0, 0, NULL },
+	{ "@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_\t" TAB_STRING,
+	  "@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_\t" TAB_STRING, 0, 0, NULL },
+	{ "`abcdefghijklmnopqrstuvwxyz{|}~\f" FORMFEED_STRING,
+	  "`abcdefghijklmnopqrstuvwxyz{|}~\f" FORMFEED_STRING, 0, 0, NULL },
+#if (TAB_PAD_CHAR & 128)	/* Not already included? */
+	{ TAB_PAD_STRING, TAB_PAD_STRING, 0, NULL },
+#endif
+	{ "\r\n" CONTIN_STRING, "\n\n" CONTIN_STRING, 0, 0, NULL },
+
+	/* Occasionally these just get inserted as glitches */
+	{ ".,'`", NULL, 5, 10, FilterNearBlanks },
+	/* This is now pretty infrequent */
+	{ "-_", "_-", 0, 10, FilterAfterRepeat },
+
+	/*
+	 * Capitalization errors are common in some cases
+	 * c/C, s/S, u/U are fucked up all the time.
+	 * Also o/O, v/V and w/W.  x, y and z also give some problems.
+	 */
+	{ "cilmopsuvwxyz", "CILMOPSUVWXYZ", 7, 13, FilterNearLower },
+	{ "CILMOPSUVWXYZ", "cilmopsuvwxyz", 7, 13, FilterNearUpper },
+	/* Other errors */
+	{ "g9aaiji;xX00Si", "9gg2ji;i%%oO3f", 10, 0, NULL },
+	/* This seems to happen a lot */
+	{ "c", "r", 9, 0, NULL },
+
+	{ "j", ";", 9, 0, NULL },
+	{ "' ", "``", 10, 0, NULL },
+
+	/* Uncommon errors */
+
+	/* Wierd stuff that's happened in the checksum part */
+	/* A highish weight is okay here */
+	{ "sSEdJl", "554437", 15, 0, NULL },
+	{ "LESsPZ", "bb8a22", 15, 0, NULL },
+
+	/* Wierd stuff that has happened */
+	{ "BasAeaeRoooo", "3334a@QQpqbd", 5, 15, FilterIsBinary },
+	{ "oooo", "pqbd", 0, 15, FilterIsBinary },
+	{ "ttTCCflO", "iff{[lfG", 12, 0, NULL },
+#if 0
+	/* If the line-breaks get screwed up, use these */
+	{ " ", "\n", 10, COST_INFINITY, FilterChecksumFollows },
+	{ "\n", " ", COST_INFINITY, 10, FilterChecksumFollows },
+	{ "\n", NULL, COST_INFINITY , 11, FilterChecksumFollows },
+#endif
+
+{ NULL, NULL, 0, 0, NULL }
+};
+
+/* The many-to-many substitutions */
+struct RawSubst const substMultiples[] = {
+	{ "''", "\"", 2, 0, NULL },
+	{ "``", "\"", 2, 0, NULL },
+	{ ",'", "\"", 2, 0, NULL },
+	{ "',", "\"", 2, 0, NULL },
+	{ ",,", "\"", 2, 0, NULL },
+	/* Extra inserted spaces are common */
+	{ " ", " ", COST_INFINITY,  0, FilterFollowsSpace },
+	{ " ", "", 0, 15, FilterFollowsSpace },
+	{ "\t", " ", COST_INFINITY,  0, FilterFollowsSpace },
+	{ "\t", "", 0, 10, FilterFollowsSpace },
+	/* Convert between SPACE_CHAR dots and periods */
+	{ ".", SPACE_STRING, 1, COST_INFINITY, FilterFollowsSpace },
+	{ ".", " "SPACE_STRING, COST_INFINITY, 10, FilterFollowsSpace },
+	{ SPACE_STRING, ".", 15, 5, FilterFollowsSpace },
+	{ SPACE_STRING, " "SPACE_STRING, COST_INFINITY, 5, FilterFollowsSpace },
+
+	/* Replace "unknown" by zero - it often is */
+	{ UNRECOG_STRING, "0", 1, 0, NULL },
+	{ UNRECOG_STRING, "_", 2, 0, NULL },
+	{ UNRECOG_STRING, ")", 3, 0, NULL },
+	{ UNRECOG_STRING, "^", 4, 0, NULL },
+	/* Except that these glitches are common */
+	{ UNRECOG_STRING"'", "\\\"", 0, 0, NULL },
+	{ UNRECOG_STRING"'", "\"", 1, 0, NULL },
+	{ "'"UNRECOG_STRING, "\"", 0, 0, NULL },
+	{ UNRECOG_STRING UNRECOG_STRING , "\"", 0, 0, NULL },
+	/* Something else that has been seen */
+	{ "V'", "\\\"", 5, 0, NULL },
+
+	/* A common transposition */
+	{ "\"'", "'\"", 5, 0, NULL },
+	{ "'\"", "\"'", 5, 0, NULL },
+	/* These also happen fairly often */
+	{ " \"", "''", 5, 0, NULL },
+	{ "\" ", "''", 5, 0, NULL },
+
+	/* Common glitches */
+	{ "\t.\n", "\n", 5, 0, NULL },
+	{ "\t,\n", "\n", 5, 0, NULL },
+	{ "\t-\n", "\n", 5, 0, NULL },
+	{ "\t_\n", "\n", 5, 0, NULL },
+	{ "\t'\n", "\n", 5, 0, NULL },
+	{ "\t`\n", "\n", 5, 0, NULL },
+	{ "\t~\n", "\n", 5, 0, NULL },
+	{ "\t:\n", "\n", 5, 0, NULL },
+	{ "\t"SPACE_STRING"\n", "\n", 5, 0, NULL },
+
+	/* Less common */
+	{ " .\n", "\n", 10, 0, NULL },
+	{ " ,\n", "\n", 10, 0, NULL },
+	{ " -\n", "\n", 10, 0, NULL },
+	{ " _\n", "\n", 10, 0, NULL },
+	{ " '\n", "\n", 10, 0, NULL },
+	{ " `\n", "\n", 10, 0, NULL },
+	{ " ~\n", "\n", 10, 0, NULL },
+	{ " :\n", "\n", 10, 0, NULL },
+	{ " "SPACE_STRING"\n", "\n", 10, 0, NULL },
+
+	/* Even less common */
+	{ ".\n", "\n", 15, 0, NULL },
+	{ ",\n", "\n", 15, 0, NULL },
+	{ "-\n", "\n", 15, 0, NULL },
+	{ "_\n", "\n", 15, 0, NULL },
+	{ "'\n", "\n", 15, 0, NULL },
+	{ "`\n", "\n", 15, 0, NULL },
+	{ "~\n", "\n", 15, 0, NULL },
+	{ ":\n", "\n", 15, 0, NULL },
+	{ SPACE_STRING"\n", "\n", 15, 0, NULL },
+
+	/* Wierd stuff that has happened */
+	{ "lJ", "U", 10, 0, NULL },
+	{ "ll", "U", 10, 0, NULL },
+	{ "l1", "U", 10, 0, NULL },
+	{ "il", "U", 10, 0, NULL },	/* Fairly common, actually */
+	{ "li", "U", 10, 0, NULL },
+	{ "l)", "U", 10, 0, NULL },
+	{ "Ll", "U", 10, 0, NULL },
+	{ "LI", "U", 10, 0, NULL },
+	{ "L1", "U", 10, 0, NULL },
+
+	{ "lo", "b", 10, 0, NULL },
+	{ "cl", "d", 10, 0, NULL },
+	{ "cliff", "diff", 2, 0, NULL },
+	{ "*\n", "*/\n", 10, 0, NULL },
+
+	/* That big black block has odd things happen to it */
+	{ "d", CONTIN_STRING, 10, 0, NULL },
+	{ "d\n", CONTIN_STRING"\n", 3, 0, NULL },
+	{ "S", CONTIN_STRING, 10, 0, NULL },
+	{ "S\n", CONTIN_STRING"\n", 3, 0, NULL },
+
+	/* Tab-stop wonders */
+	{ TAB_STRING, TAB_STRING"", 0, 0, TabFilter },
+	{ TAB_STRING, TAB_STRING" ", 0, 0, TabFilter },
+	{ TAB_STRING, TAB_STRING"  ", 0, 0, TabFilter },
+	{ TAB_STRING, TAB_STRING"   ", 0, 0, TabFilter },
+	{ TAB_STRING, TAB_STRING"    ", 0, 0, TabFilter },
+	{ TAB_STRING, TAB_STRING"     ", 0, 0, TabFilter },
+	{ TAB_STRING, TAB_STRING"      ", 0, 0, TabFilter },
+	{ TAB_STRING, TAB_STRING"       ", 0, 0, TabFilter },
+	/* Some scan errors */
+	{ "D ", TAB_STRING"", 1, 5, TabFilter },
+	{ "D ", TAB_STRING" ", 1, 5, TabFilter },
+	{ "D ", TAB_STRING"  ", 1, 5, TabFilter },
+	{ "D ", TAB_STRING"   ", 1, 5, TabFilter },
+	{ "D ", TAB_STRING"    ", 1, 5, TabFilter },
+	{ "D ", TAB_STRING"     ", 1, 5, TabFilter },
+	{ "D ", TAB_STRING"      ", 1, 5, TabFilter },
+	{ "D ", TAB_STRING"       ", 1, 5, TabFilter },
+#if TAB_PAD_CHAR != ' '
+#error Fix those tab patterns!
+#endif
+{ NULL, NULL, 0, 0, NULL }
+};
diff --git a/tools/subst.h b/tools/subst.h
new file mode 100644
index 0000000..79005c3
--- /dev/null
+++ b/tools/subst.h
@@ -0,0 +1,66 @@
+/*
+ * subst.h -- Header for repair substitutions
+ *
+ * Copyright (C) 1997 Pretty Good Privacy, Inc.
+ *
+ * Written by Colin Plumb
+ *
+ * $Id: subst.h,v 1.9 1997/11/03 22:12:00 colin Exp $
+ */
+
+/*
+ * Give up if the list of pending changes to attempt grows to this many
+ * elements.  Each element is 32 bytes, so 128K is 8 MB of memory.
+ * (Other than this, repair's memory usage is fairly modest.)
+ */
+#define MAX_HEAP (1<<17)
+
+/*
+ * There is a hack in the code to find a single substitution that will fix a
+ * line, even if it's not in the tables.  It gets added to the tables "on
+ * probation", with an infinite cost, and if it leads to a successful
+ * correction of the entire page, is "learned" for future use and its
+ * cost reduced to something finite.
+ * (This is not remembered across runs of the program, though.
+ * Edit the tables in the source to fix it.)
+ */
+#define DYNAMIC_COST_LEARNED 15
+
+/*
+ * This negative-cost bonus for passing the end of a line with the right
+ * CRC makes the search engine reluctant to backtrack past a correct CRC,
+ * greatly improving efficiency.  It's rather a hack, though.  Think of
+ * this in terms of "how many errors should be considered in the current
+ * line before considering the possibility of errors in the previous line?"
+ *
+ * This bonus is halved for lines that are the result of a correction
+ * that was computed from the checksum, since a correct checksum is
+ * much less significant in such a case.
+ */
+#define COST_LINE -30
+
+/* The cost of a full-line nastyline substitution. */
+#define NASTY_COST 5
+
+/* Type describing filter functions used in substitutions */
+struct ParseNode;
+struct Substitution;
+#include "heap.h"
+typedef HeapCost FilterFunc(struct ParseNode *parent, char const *limit,
+	struct Substitution const *subst);
+FilterFunc TabFilter,              FilterFollowsSpace, FilterNearBlanks;
+FilterFunc FilterNearUpper,        FilterNearLower,    FilterNearXDigit;
+FilterFunc FilterAfterRepeat,      FilterCharConst,    FilterChecksumFollows;
+FilterFunc FilterLikelyUnderscore, FilterIsDynamic,    FilterIsBinary;
+
+/* The external substitution format */
+typedef struct RawSubst {
+	char const *input;
+	char const *output;
+	HeapCost cost, cost2;
+	FilterFunc *filter;
+} RawSubst;
+
+/* The substitutions to make */
+extern struct RawSubst const substSingles[];
+extern struct RawSubst const substMultiples[];
diff --git a/tools/unmunge.c b/tools/unmunge.c
new file mode 100644
index 0000000..831297e
--- /dev/null
+++ b/tools/unmunge.c
@@ -0,0 +1,666 @@
+/*
+ * unmunge.c -- Program to convert a munged file to original form
+ *
+ * Copyright (C) 1997 Pretty Good Privacy, Inc.
+ *
+ * Designed by Colin Plumb, Mark H. Weaver, and Philip R. Zimmermann
+ * Written by Mark H. Weaver
+ *
+ * $Id: unmunge.c,v 1.13 1997/11/13 23:27:08 mhw Exp $
+ */
+
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <fcntl.h>
+#include <unistd.h>
+
+/*#include <direct.h>   teun: MS VC wants direct.h for mkdir */
+
+#include <stdio.h>
+#include <errno.h>
+#include <string.h>
+#include <ctype.h>
+#include <stdlib.h>
+#include <assert.h>
+
+#include "util.h"
+
+typedef struct UnMungeState
+{
+	char const *	mungedFileName;
+	char			dirName[128];
+	char			fileName[128];
+	char *			fileNameTail;
+	int				binaryMode, tabWidth;
+	long			productNumber, fileNumber, pageNumber, lineNumber;
+	long			manifestLineNumber;
+	word16			hdrFlags;
+	CRC				pageCRC, seenPageCRC;
+	FILE *			manifest;
+	FILE *			file;
+	FILE *			out;
+} UnMungeState;
+
+
+/* Returns number of characters decoded, or -1 on error */
+static int
+Decode4(char const src[4], byte dest[3])
+{
+	int		i, length;
+	byte	srcVal[4];
+
+	for (i = 0; i < 4 && src[i] != RADIX64_END_CHAR; i++)
+		if ((srcVal[i] = Radix64DigitValue(src[i])) == (byte) -1)
+			return 1;
+
+	length = i - 1;
+	if (length < 1)
+		return -1;
+
+	for (; i < 4; i++)
+		srcVal[0] = 0;
+
+	dest[0] = (srcVal[0] << 2) | (srcVal[1] >> 4);
+	dest[1] = (srcVal[1] << 4) | (srcVal[2] >> 2);
+	dest[2] = (srcVal[2] << 6) | (srcVal[3]);
+
+	return length;
+}
+
+/*
+ * Return number of characters decoded, or -1 on error
+ */
+static int
+DecodeLine(char const *src, char *dest, int srclength)
+{
+	int destlength = 0;
+	int result;
+
+	if (srclength % 4 || !srclength)
+		return -1;	/* Must be a multiple of 4 */
+
+	while (srclength -= 4) {
+		if (Decode4(src, dest + destlength) != 3)
+			return -1;
+		src += 4;
+		destlength += 3;
+	}
+	result = Decode4(src, dest + destlength);
+	if (result < 1)
+		return -1;
+	return destlength + result;
+}
+
+int PrintFileError(UnMungeState *state, char const *message)
+{
+	fprintf(stderr, "%s, %s line %ld\n", message,
+			state->mungedFileName, state->lineNumber);
+	return 1;
+}
+
+int ReadManifest(UnMungeState *state, long fileNumberWanted,
+				 char const *fileTailPrefix, long prefixLen)
+{
+	long		fileNumber = 0;
+	long		firstMissingFileNum = 0, lastMissingFileNum = 0;
+	char		buffer[512];
+	char *		p;
+
+	if (state->manifest == NULL)
+	{
+		if (fileNumberWanted != 0)
+		{
+			assert(fileTailPrefix != NULL);
+			strncpy(state->fileName, fileTailPrefix, sizeof(state->fileName));
+			state->fileName[sizeof(state->fileName) - 1] = '\0';
+			state->fileNameTail = state->fileName;
+		}
+		return 0;
+	}
+	while (fgets(buffer, sizeof(buffer), state->manifest))
+	{
+		if ((p = strchr(buffer, '\n')) != NULL)
+			*p = '\0';
+		state->manifestLineNumber++;
+		if (buffer[0] == 'D')
+		{
+			if (buffer[1] != ' ')
+				goto invalidManifest;
+			strncpy(state->dirName, buffer + 2, sizeof(state->dirName));
+			if (state->dirName[sizeof(state->dirName) - 1] != '\0')
+				goto invalidManifest;
+		}
+		else
+		{
+			fileNumber = strtol(buffer, &p, 10);
+			if (p == buffer || *p != ' ')
+				goto invalidManifest;
+			p++;
+
+			if (fileNumberWanted == 0 || fileNumber < fileNumberWanted)
+			{
+				if (firstMissingFileNum == 0)
+					firstMissingFileNum = fileNumber;
+				lastMissingFileNum = fileNumber;
+				continue;
+			}
+			else if (fileNumber > fileNumberWanted)
+				break;
+			else
+			{
+				size_t		len;
+
+				len = strlen(state->dirName);
+				assert(sizeof(state->fileName) >= sizeof(state->dirName));
+				memcpy(state->fileName, state->dirName, len);
+				strncpy(state->fileName + len, p,
+						sizeof(state->fileName) - len);
+				if (strncmp(p, fileTailPrefix, prefixLen) != 0)
+				{
+					fprintf(stderr, "Mismatched filename, headers say '%s',\n"
+							"  manifest says '%s'\n",
+							fileTailPrefix, p);
+					return 1;
+				}
+				p = state->dirName;
+				while ((p = strchr(p, '/')) != NULL)
+				{
+					*p = '\0';
+					mkdir(state->dirName, 0777);
+					*p++ = '/';
+				}
+				state->fileNameTail = state->fileName + len;
+				break;
+			}
+		}
+	}
+	if (firstMissingFileNum != 0)
+	{
+		fprintf(stderr, "Missing files %ld-%ld\n",
+				firstMissingFileNum, lastMissingFileNum);
+	}
+	if (fileNumberWanted != 0 && fileNumber != fileNumberWanted)
+	{
+		fprintf(stderr, "Can't find file %ld in manifest file\n",
+				fileNumberWanted);
+		return 1;
+	}
+	return 0;
+
+invalidManifest:
+	fprintf(stderr, "Error parsing manifest file, line %ld\n",
+			state->manifestLineNumber);
+	return 1;
+}
+
+int UnMungeFile(char const *mungedFileName, char const *manifestFileName,
+				int forceOverwrite, int forcePartialFiles)
+{
+	UnMungeState *	state;
+	EncodeFormat const *	fmt = NULL;
+	char			buffer[512];
+	char			outbuf[BYTES_PER_LINE+1];
+	char *			line;
+	char *			lineData;
+	char *			p;
+	int				length;
+	int				result = 0;
+	int				skipPage = 0;
+	CRC				lineCRC;
+	word32			num;
+
+	state = (UnMungeState *)calloc(1, sizeof(*state));
+	state->mungedFileName = mungedFileName;
+
+	if (manifestFileName != NULL)
+	{
+		if ((state->manifest = fopen(manifestFileName, "r")) == NULL)
+			goto errnoError;
+	}
+
+	if ((state->file = fopen(state->mungedFileName, "r")) == NULL)
+		goto errnoError;
+
+	while (!feof(state->file))
+	{
+		if (fgets(buffer, sizeof(buffer), state->file) == NULL)
+		{
+			if (feof(state->file))
+				break;
+			goto fileError;
+		}
+
+		state->lineNumber++;
+
+		line = buffer;
+		/* Strip leading whitespace */
+		while (isspace(*line))
+			line++;
+		if (*line == '\0')
+			continue;
+
+		/* Strip trailing whitespace */
+		p = line + strlen(line);
+		while (p > line && (byte)p[-1] < 128 && isspace(p[-1]))
+			p--;
+
+		lineData = line + PREFIX_LENGTH;
+
+		/* Pad up to at least PREFIX_LENGTH */
+		while (p < lineData)
+			*p++ = ' ';
+		*p++ = '\n';
+		*p = '\0';
+		length = p - lineData;
+
+		if (line[0] == HDR_PREFIX_CHAR)
+		{
+			fmt = FindFormat(line[1]);
+			if (!fmt)
+			{
+				result = PrintFileError(state, "ERROR: Invalid header type");
+				goto error;
+			}
+		}
+
+		lineCRC = CalculateCRC(fmt->lineCRC, 0, (byte const *)lineData, length);
+
+		p = line + EncodedLength(fmt, fmt->runningCRCBits);
+		if (DecodeCheckDigits(fmt, p, NULL, fmt->lineCRC->bits, &num)
+				|| lineCRC != num)
+		{
+			result = PrintFileError(state, "ERROR: Line CRC failed");
+			goto error;
+		}
+
+		if (line[0] == HDR_PREFIX_CHAR)
+		{
+			int			formatVersion;
+			int			flags;
+			CRC			seenPageCRC;
+			int			tabWidth;
+			long		productNumber;
+			long		fileNumber;
+			long		pageNumber;
+			char *		fileNameTail;
+			int			skipNextPage = 0;
+			char *		p;
+			EncodeFormat const *	hFmt = &hexFormat;
+
+			/* Parse header line */
+			p = lineData;
+
+			if (DecodeCheckDigits(hFmt, p, &p, HDR_VERSION_BITS, &num))
+			{
+			invalidHeader:
+				result = PrintFileError(state, "ERROR: Invalid header");
+				goto error;
+			}
+			formatVersion = num;
+
+			if (DecodeCheckDigits(hFmt, p, &p, HDR_FLAG_BITS, &num))
+				goto invalidHeader;
+			flags = num;
+
+			if (DecodeCheckDigits(hFmt, p, &p, fmt->pageCRC->bits, &num))
+				goto invalidHeader;
+			seenPageCRC = num;
+
+			if (DecodeCheckDigits(hFmt, p, &p, HDR_TABWIDTH_BITS, &num))
+				goto invalidHeader;
+			tabWidth = num;
+
+			if (DecodeCheckDigits(hFmt, p, &p, HDR_PRODNUM_BITS, &num))
+				goto invalidHeader;
+			productNumber = num;
+
+			if (DecodeCheckDigits(hFmt, p, &p, HDR_FILENUM_BITS, &num))
+				goto invalidHeader;
+			fileNumber = num;
+
+			if (sscanf(p, " Page %ld of ", &pageNumber) < 1)
+				goto invalidHeader;
+
+			if (formatVersion > 0)
+			{
+				result = PrintFileError(state,
+										"ERROR: Format too new for "
+											"this version of unmunge");
+				goto error;
+			}
+
+			p = strstr(p, " of ");
+			if (p == NULL)
+				goto invalidHeader;
+
+			fileNameTail = p + 4;
+			p = fileNameTail + strlen(fileNameTail);
+			if (p < fileNameTail + 3 || p[-1] != '\n')
+				goto invalidHeader;
+			else
+				p[-1] = '\0';
+
+			if (state->out != NULL && state->pageCRC != state->seenPageCRC)
+			{
+				result = PrintFileError(state,
+								"ERROR: Page CRC mismatch on page before");
+				goto error;
+			}
+
+			if ((state->hdrFlags & HDR_FLAG_LASTPAGE) && state->out != NULL)
+			{
+				fclose(state->out);
+				state->out = NULL;
+			}
+
+			if (state->out != NULL)
+			{
+				if (pageNumber != state->pageNumber + 1 ||
+						fileNumber != state->fileNumber ||
+						productNumber != state->productNumber ||
+						tabWidth != state->tabWidth ||
+						strcmp(fileNameTail, state->fileNameTail) != 0)
+				{
+					if (fileNumber == state->fileNumber &&
+							pageNumber > state->pageNumber + 1)
+					{
+						(void)PrintFileError(state,
+									"ERROR: Missing pages of this file");
+						if (forcePartialFiles && !state->binaryMode)
+						{
+							fputs("\n\n@@@@@@ Missing pages here! @@@@@@\n\n",
+								  state->out);
+						}
+						else
+						{
+							skipNextPage = 1;
+							fclose(state->out);
+							state->out = NULL;
+							remove(state->fileName);
+						}
+					}
+					else
+					{
+						(void)PrintFileError(state,
+									"ERROR: Missing pages of previous file");
+						if (forcePartialFiles && !state->binaryMode)
+						{
+							fputs("\n\n@@@@@@ Missing pages here! @@@@@@\n\n",
+								  state->out);
+							/* Make it non-fatal, though... */
+							fclose(state->out);
+							state->out = NULL;
+						}
+						else
+						{
+							fclose(state->out);
+							state->out = NULL;
+							remove(state->fileName);
+						}
+					}
+				}
+			}
+			if (state->out == NULL)
+			{
+				if (pageNumber != 1 && !skipPage)
+					(void)PrintFileError(state,
+							 "ERROR: File doesn't begin with page 1");
+
+				state->binaryMode = (tabWidth == 0);
+
+				if (pageNumber != 1 && (state->binaryMode
+										|| !forcePartialFiles))
+				{
+					skipNextPage = 1;
+				}
+				else
+				{
+					/* TODO: Use global filelist to get pathname */
+					result = ReadManifest(state, fileNumber, fileNameTail,
+										  strlen(fileNameTail));
+					if (result != 0)
+						goto error;
+
+					if (!forceOverwrite)
+					{
+						FILE *	file;
+
+						/* Make sure file doesn't already exist */
+						file = fopen(state->fileName, "r");
+						if (file != NULL)
+						{
+							fclose(file);
+							fprintf(stderr, "ERROR: %s already exists\n",
+									state->fileName);
+							result = 1;
+							goto error;
+						}
+					}
+
+					state->out = fopen(state->fileName,
+									   state->binaryMode ? "wb" : "w");
+					if (state->out == NULL)
+						goto errnoError;
+
+					if (pageNumber != 1)
+						fputs("\n\n@@@@@@ Missing pages here! @@@@@@\n\n",
+							  state->out);
+				}
+			}
+
+			state->pageCRC = 0;
+			state->seenPageCRC = seenPageCRC;
+			state->hdrFlags = (word16)flags;
+			state->pageNumber = pageNumber;
+			state->fileNumber = fileNumber;
+			state->productNumber = productNumber;
+			state->tabWidth = tabWidth;
+			skipPage = skipNextPage;
+		}
+		else if (!skipPage)
+		{
+			if (state->out == NULL)
+			{
+				result = PrintFileError(state, "ERROR: Missing header line");
+				goto error;
+			}
+
+			/* Normal data line */
+			state->pageCRC = CalculateCRC(fmt->pageCRC, state->pageCRC,
+											   (byte const *)lineData,
+											   length);
+			line[2] = '\0';
+			if (DecodeCheckDigits(fmt, line, NULL, fmt->runningCRCBits, &num)
+				|| RunningCRCFromPageCRC(fmt, state->pageCRC) != num)
+			{
+				result = PrintFileError(state, "ERROR: Running CRC failed");
+				goto error;
+			}
+
+			if (state->binaryMode)
+			{
+				length = DecodeLine(lineData, outbuf, length-1);
+				if (length < 0 || length > BYTES_PER_LINE) {
+					result = PrintFileError(state,
+									"ERROR: Corrupt radix-64 data");
+					goto error;
+				}
+				fwrite(outbuf, 1, length, state->out);
+			}
+			else
+			{
+				p = lineData;
+				while (*p != '\0')
+				{
+					if (*p == TAB_CHAR)
+					{
+						p++;
+						putc('\t', state->out);
+						while ((p - lineData) % state->tabWidth)
+						{
+							if (*p == '\n')
+								break;
+							else if (*p == ' ')
+								p++;
+							else
+							{
+								result = PrintFileError(state,
+												"ERROR: Not enough spaces "
+												"after a tab character");
+								goto error;
+							}
+						}
+					}
+					else if (*p == FORMFEED_CHAR)
+					{
+						p++;
+						if (*p != '\n')
+						{
+							result = PrintFileError(state,
+											"ERROR: Formfeed character "
+											"not at end of line");
+							goto error;
+						}
+						p++;	/* Skip newline */
+						putc('\f', state->out);
+					}
+					else if (*p == CONTIN_CHAR)
+					{
+						p++;
+						if (*p != '\n')
+						{
+							result = PrintFileError(state,
+											"ERROR: Continuation character "
+											"not at end of line");
+							goto error;
+						}
+						p++;	/* Skip newline */
+					}
+					else if (*p == SPACE_CHAR)
+					{
+						putc(' ', state->out);
+						p++;
+					}
+					else
+					{
+						putc(*p, state->out);
+						p++;
+					}
+				}
+			}
+		}
+	}
+	if (state->out != NULL)
+	{
+		if (!(state->hdrFlags & HDR_FLAG_LASTPAGE))
+		{
+			result = PrintFileError(state, "ERROR: Missing pages");
+			goto error;
+		}
+		if (state->pageCRC != state->seenPageCRC)
+		{
+			result = PrintFileError(state,
+							"ERROR: Page CRC failed on previous page");
+			goto error;
+		}
+	}
+
+	/* Check for missing files at the end */
+	result = ReadManifest(state, 0, NULL, 0);
+	goto done;
+
+errnoError:
+	result = errno;
+	goto printError;
+
+fileError:
+	result = ferror(state->file);
+
+printError:
+	fprintf(stderr, "ERROR: %s\n", strerror(result));
+
+error:
+done:
+	if (state != NULL)
+	{
+		if (state->out != NULL)
+			fclose(state->out);
+		if (state->file != NULL)
+			fclose(state->file);
+		if (state->manifest != NULL)
+			fclose(state->manifest);
+		free(state);
+	}
+	return result;
+}
+
+void UsageAndExit(int result)
+{
+	fprintf(stderr,
+			"Usage: unmunge [-fp] <file> [<manifest>]\n"
+			"  -f  Force overwrites of existing files\n"
+			"  -p  Force unmunge of partial files\n");
+	exit(result);
+}
+
+int main(int argc, char *argv[])
+{
+	int		result = 0;
+	int		forceOverwrite = 0;
+	int		forcePartialFiles = 0;
+	char *	fileName = NULL;
+	char *	manifestFileName = NULL;
+	int		i, j;
+
+	InitUtil();
+
+	for (i = 1; i < argc && argv[i][0] == '-'; i++)
+	{
+		if (0 == strcmp(argv[i], "--"))
+		{
+			i++;
+			break;
+		}
+		for (j = 1; argv[i][j] != '\0'; j++)
+		{
+			if (argv[i][j] == 'h')
+				UsageAndExit(0);
+			else if (argv[i][j] == 'f')
+				forceOverwrite = 1;
+			else if (argv[i][j] == 'p')
+				forcePartialFiles = 1;
+			else
+			{
+				fprintf(stderr, "ERROR: Unrecognized option -%c\n", argv[i][j]);
+				UsageAndExit(1);
+			}
+		}
+	}
+
+	if (i < argc)
+		fileName = argv[i++];
+	if (i < argc)
+		manifestFileName = argv[i++];
+	if (fileName == NULL || i < argc)
+		UsageAndExit(1);
+
+	if ((result = UnMungeFile(fileName, manifestFileName,
+							  forceOverwrite, forcePartialFiles)) != 0)
+	{
+		/* If result > 0, message should have already been printed */
+		if (result < 0)
+			fprintf(stderr, "ERROR: %s\n", strerror(result));
+		exit(1);
+	}
+
+	return 0;
+}
+
+/*
+ * Local Variables:
+ * tab-width: 4
+ * End:
+ * vi: ts=4 sw=4
+ * vim: si
+ */
+
diff --git a/tools/util.c b/tools/util.c
new file mode 100644
index 0000000..f487436
--- /dev/null
+++ b/tools/util.c
@@ -0,0 +1,198 @@
+/*
+ * util.c -- Miscellaneous shared code/data
+ *
+ * Copyright (C) 1997 Pretty Good Privacy, Inc.
+ *
+ * Written by Mark H. Weaver
+ *
+ * $Id: util.c,v 1.11 1997/11/07 00:44:10 mhw Exp $
+ */
+
+#include <stdlib.h>
+#include "util.h"
+
+char const hexDigits[] = "0123456789abcdef";
+char const radix64Digits[] =
+#if 0	/* Standard */
+	"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
+#else	/* Modified form that avoids hard-to-OCR characters */
+	"ABCDEFGHIJKLMNPQRSTVWXYZabcdehijklmnpqtuwy145689\\^!#$%&*+=/:<>?@";
+#endif
+
+signed char hexDigitsInv[256];
+signed char radix64DigitsInv[256];
+
+/* teun: moved intitialisation of all three CRCPoly's to initUtil() */
+
+/* CRC-CCITT: x^16 + x^12 + x^5 + 1 */
+CRCPoly	crcCCITTPoly;
+/*
+ * PRZ's magic 24-bit polynomial - (x+1) * (irreducible of degree 23)
+ * x^24 +x^23 +x^18 +x^17 +x^14 +x^11 +x^10 +x^7 +x^6 +x^5 +x^4 +x^3 +x +1
+ * (Developed by Neal Glover).  Note: this is bit-reversed from the form
+ * used in PGP, 0x1864cfb.
+ */
+CRCPoly	crc24Poly;
+/* CRC-32: x^32+x^26+x^23+x^22+x^16+x^12+x^11+x^10+x^8+x^7+x^5+x^4+x^2+x+1 */
+CRCPoly	crc32Poly;
+
+EncodeFormat const	hexFormat =
+{
+	NULL,				/* nextFormat */
+	'-',				/* headerTypeChar */
+	hexDigits,			/* digits */
+	hexDigitsInv,		/* digitsInv */
+	4,					/* bitsPerDigit */
+	16,					/* radix */
+	&crcCCITTPoly,		/* lineCRC */
+	&crc32Poly,			/* pageCRC */
+	8,					/* runningCRCBits */
+	24,					/* runningCRCShift */
+	0xFF				/* runningCRCMask */
+};
+
+EncodeFormat const	radix64Format =
+{
+	&hexFormat,			/* nextFormat */
+	'A',				/* headerTypeChar */
+	radix64Digits,		/* digits */
+	radix64DigitsInv,	/* digitsInv */
+	6,					/* bitsPerDigit */
+	64,					/* radix */
+	&crc24Poly,			/* lineCRC */
+	&crc32Poly,			/* pageCRC */
+	12,					/* runningCRCBits */
+	20,					/* runningCRCShift */
+	0xFFF				/* runningCRCMask */
+};
+
+EncodeFormat const *	firstFormat = &radix64Format;
+
+
+static void InitCRCPoly(CRCPoly *poly)
+{
+	int		i, oneBit;
+	CRC		crc = 1;
+
+	poly->table[0] = 0;
+	for (oneBit = 0x80; oneBit > 0; oneBit >>= 1) {
+		crc = (crc >> 1) ^ ((crc & 1) ? poly->poly : 0);
+		for (i = 0; i < 0x100; i += 2 * oneBit)
+			poly->table[i + oneBit] = poly->table[i] ^ crc;
+	}
+}
+
+CRC CalculateCRC(CRCPoly const *poly, CRC crc,
+				 byte const *buffer, size_t length)
+{
+	while (length--)
+		crc = (crc >> 8) ^ poly->table[(crc & 0xFF) ^ (*buffer++)];
+	return crc;
+}
+
+CRC ReverseCRC(CRCPoly const *poly, CRC crc, byte b)
+{
+	int		i, highBit = poly->highBit;
+
+	for (i = 0; i < 8; i++) {
+		if (crc & highBit)		/* highBit is 2^(poly->bits-1) */
+			crc = ((crc ^ poly->poly) << 1) ^ 1;
+		else
+			crc <<= 1;
+	}
+	return crc ^ b;
+}
+
+static void InitDigitsInv(char const *digits, signed char *digitsInv)
+{
+	int		i;
+
+	for (i = 0; i < 256; i++)
+		digitsInv[i] = -1;
+	for (i = 0; digits[i]; i++)
+		digitsInv[(byte)digits[i]] = i;
+}
+
+/* Returns the number of chars encoded */
+int EncodeCheckDigits(EncodeFormat const *fmt, word32 num,
+					  int numBits, char *dest)
+{
+	int		destLen = EncodedLength(fmt, numBits);
+	word32	digitMask = fmt->radix - 1;
+	int		i;
+
+	for (i = destLen - 1; i >= 0; i--)
+	{
+		dest[i] = EncodeDigit(fmt, num & digitMask);
+		num >>= fmt->bitsPerDigit;
+	}
+	return destLen;
+}
+
+/* Returns 1 if there's an error */
+int DecodeCheckDigits(EncodeFormat const *fmt, char const *src, char **endPtr,
+					  int numBits, word32 *valuePtr)
+{
+	word32	value = 0;
+	int		digitValue;
+	int		i = EncodedLength(fmt, numBits);
+
+	while (i--)
+	{
+		digitValue = DecodeDigit(fmt, *src++);
+		if (digitValue < 0)
+		{
+			/* Invalid digit found */
+			*valuePtr = 0;
+			if (endPtr)
+				*endPtr = NULL;
+			return 1;
+		}
+		value = (value << fmt->bitsPerDigit) | digitValue;
+	}
+	*valuePtr = value;
+	if (endPtr)
+		*endPtr = (char *)src;
+	return 0;
+}
+
+EncodeFormat const *FindFormat(char headerTypeChar)
+{
+	EncodeFormat const *	fmt = firstFormat;
+
+	while (fmt && fmt->headerTypeChar != headerTypeChar)
+		fmt = fmt->nextFormat;
+	return fmt;
+}
+
+void InitUtil()
+{
+	/* teun: removed "{ }" for MS VC compile */
+
+	crcCCITTPoly.bits = 16;
+	crcCCITTPoly.poly = 0x8408;
+	crcCCITTPoly.highBit = 0x8000;
+
+	crc24Poly.bits = 24;
+	crc24Poly.poly = 0xdf3261;
+	crc24Poly.highBit = 0x800000;
+
+	crc32Poly.bits = 32;
+	crc32Poly.poly = 0xedb88320;
+	crc32Poly.highBit = 0x80000000;
+
+	InitCRCPoly(&crcCCITTPoly);
+	InitCRCPoly(&crc24Poly);
+	InitCRCPoly(&crc32Poly);
+	InitDigitsInv(hexDigits, hexDigitsInv);
+	InitDigitsInv(radix64Digits, radix64DigitsInv);
+}
+
+
+/*
+ * Local Variables:
+ * tab-width: 4
+ * End:
+ * vi: ts=4 sw=4
+ * vim: si
+ */
diff --git a/tools/util.h b/tools/util.h
new file mode 100644
index 0000000..b2e06bd
--- /dev/null
+++ b/tools/util.h
@@ -0,0 +1,149 @@
+/*
+ * util.h -- Miscellaneous defines
+ *
+ * Copyright (C) 1997 Pretty Good Privacy, Inc.
+ *
+ * Written by Mark H. Weaver
+ *
+ * $Id: util.h,v 1.23 1997/11/12 23:28:56 mhw Exp $
+ */
+
+#ifndef UTIL_H
+#define UTIL_H 1
+
+typedef unsigned long	word32;
+typedef unsigned short	word16;
+typedef unsigned char	byte;
+
+#define FMT32	"%08lx"
+#define FMT16	"%04x"
+#define FMT8	"%02x"
+
+#define TAB_CHAR		'\244'	/* Currency symbol, like o in top of x */
+#define TAB_STRING		"\244"
+#define TAB_PAD_CHAR	' '		/* The fact that this is space has leaked. */
+#define TAB_PAD_STRING	" "		/* It may not be freely changed. */
+#define FORMFEED_CHAR	'\245'	/* Yen symbol, like = on top of Y */
+#define FORMFEED_STRING	"\245"
+#define SPACE_CHAR		'\267'	/* Middle dot, or bullet */
+#define SPACE_STRING	"\267"
+#define CONTIN_CHAR		'\266'	/* Pilcrow (paragraph symbol) */
+#define CONTIN_STRING	"\266"
+
+#define BYTES_PER_LINE	60		/* When using radix 64 */
+
+#define LINES_PER_PAGE	72		/* Exclusive of 2 header lines */
+#define LINE_LENGTH		80
+#define PREFIX_LENGTH	7		/* Length of prefix, including the space */
+
+#define HDR_PREFIX_CHAR		'-'
+#define RADIX64_END_CHAR	'-'
+
+typedef struct EncodeFormat		EncodeFormat;
+typedef word32					CRC;
+typedef word16					CRCFragment;
+
+typedef struct
+{
+	CRC			table[256];
+	int			bits;
+	CRC			poly;
+	CRC			highBit;
+} CRCPoly;
+
+struct EncodeFormat
+{
+	EncodeFormat const *nextFormat;
+	char				headerTypeChar;
+	char const *		digits;
+	signed char const *	digitsInv;
+	int					bitsPerDigit;
+	int					radix;
+	CRCPoly const *		lineCRC;
+	CRCPoly	const *		pageCRC;
+	int					runningCRCBits;
+	int					runningCRCShift;
+	int					runningCRCMask;
+};
+
+
+#define HDR_ENC_LENGTH		19		/* Length of encoded prefix on header */
+
+#define HDR_VERSION_BITS	4
+#define HDR_FLAG_BITS		8
+/* Page CRC bits omitted, since it's not constant */
+#define HDR_TABWIDTH_BITS	4
+#define HDR_PRODNUM_BITS	12
+#define HDR_FILENUM_BITS	16
+
+
+/* Enough to hold one whole page of munged data */
+/* There is no point making this excessively too large */
+#define PAGE_BUFFER_SIZE	8192
+
+#if PAGE_BUFFER_SIZE < (LINES_PER_PAGE + 2) * (LINE_LENGTH + PREFIX_LENGTH + 2)
+#error PAGE_BUFFER_SIZE is too small
+#endif
+
+
+/* Header flags */
+#define HDR_FLAG_LASTPAGE	0x01	/* Indicates last page of file */
+
+
+#define elemsof(array) (sizeof(array)/sizeof(*(array)))
+
+
+extern char const	hexDigits[];
+extern char const	radix64Digits[];
+
+extern signed char	hexDigitsInv[256];
+extern signed char	radix64DigitsInv[256];
+
+extern CRCPoly		crcCCITTPoly, crc24Poly, crc32Poly;
+
+extern EncodeFormat const		hexFormat, radix64Format;
+extern EncodeFormat const *		firstFormat;
+
+
+#define HexDigitValue(ch)		hexDigitsInv[(byte)(ch)]
+#define Radix64DigitValue(ch)	radix64DigitsInv[(byte)(ch)]
+
+/* Returns the number of chars needed to encode the given number of bits */
+#define EncodedLength(fmt, numBits)	\
+		(((numBits) + (fmt)->bitsPerDigit - 1) / (fmt)->bitsPerDigit)
+#define EncodeDigit(fmt, value)		((fmt)->digits[value])
+#define DecodeDigit(fmt, digit)		((fmt)->digitsInv[(byte)digit])
+
+#define AdvanceCRC(poly, crc, b)	\
+		((crc) >> 8) ^ (poly)->table[((crc) ^ (b)) & 0xFF]
+
+#define RunningCRCFromPageCRC(fmt, pageCRC)	\
+		(((pageCRC) >> (fmt)->runningCRCShift) & (fmt)->runningCRCMask)
+
+
+CRC CalculateCRC(CRCPoly const *poly, CRC crc,
+				 byte const *buffer, size_t length);
+CRC ReverseCRC(CRCPoly const *poly, CRC crc, byte b);
+
+/* Returns the number of chars encoded */
+int EncodeCheckDigits(EncodeFormat const *fmt, word32 num,
+					  int numBits, char *dest);
+
+/* Returns 1 if there's an error */
+int DecodeCheckDigits(EncodeFormat const *fmt, char const *src, char **endPtr,
+					  int numBits, word32 *valuePtr);
+
+EncodeFormat const *FindFormat(char headerTypeChar);
+
+void InitUtil();
+
+
+#endif /* !UTIL_H */
+
+/*
+ * Local Variables:
+ * tab-width: 4
+ * End:
+ * vi: ts=4 sw=4
+ * vim: si
+ */
diff --git a/tools/yapp b/tools/yapp
new file mode 100644
index 0000000..ac78227
--- /dev/null
+++ b/tools/yapp
@@ -0,0 +1,286 @@
+#!/usr/bin/perl
+#
+# Yet another preprocessor
+#
+# $Id: yapp,v 1.5 1997/10/24 07:51:05 mhw Exp $
+#
+
+%vars = ('' => '$');
+@incPath = (".");
+
+sub Error
+{
+	print STDERR $_[0], "\n";
+	exit(1);
+}
+
+sub VarSubst
+{
+	my ($varName, $undefOkay) = @_;
+
+	if (defined($vars{$varName}))
+	{
+		return $vars{$varName};
+	}
+	elsif (!$undefOkay)
+	{
+		&Error("Undefined variable '$varName' in $fileName line $.");
+	}
+}
+
+sub NullFilter
+{
+	0;
+}
+
+sub IfFilter
+{
+	local $_ = $_[0];
+
+	if (/^##else(\s+.*)?/)
+	{
+		return 1;
+	}
+	elsif (/^##endif(\s+.*)?/)
+	{
+		return 2;
+	}
+	else
+	{
+		return 0;
+	}
+}
+
+sub DoFile
+{
+    local $fileName = $_[0];
+	my $path;
+	local *FILE;
+
+	if ($fileName =~ m|^/|)
+	{
+		$path = $fileName;
+	}
+	else
+	{
+		for $dir (@incPath)
+		{
+			if (-e "$dir/$fileName")
+			{
+				$path = "$dir/$fileName";
+				last;
+			}
+		}
+	}
+	if ($path eq "")
+	{
+		&Error("Can't find '$fileName', from $fileName line $.");
+	}
+
+	open(FILE, "<$path") || &Error("Can't open $path: $!");
+	&DoOpenFile(*FILE, *NullFilter, 0);
+	close(FILE) || die;
+	0;
+}
+
+sub DoPrepass
+{
+	local ($_, $skipFlag) = @_;
+
+	return "" if /^###/;
+	s/\s*###.*//;								# Strip comments
+	s/\${(\w+)}/&VarSubst($1, $skipFlag)/eg;	# Do variable substitutions
+	$_;
+}
+
+sub DoOpenFile
+{
+	local *FILE = $_[0];
+	local *filter = $_[1];
+	my $skipFlag = $_[2];
+	my $result;
+	local $_;
+
+	while (<FILE>)
+	{
+		$_ = &DoPrepass($_, $skipFlag);
+		if ($result = &filter($_))
+		{
+			return $result;
+		}
+		elsif (/^##(\w*)(\s+(.*))?/)
+		{
+			my ($cmd, $params) = ($1, $3);
+
+			if ($cmd =~ /^if/)
+			{
+				my $condition;
+				my $ifStartLine = $.;
+
+				if ($cmd eq "if")
+				{
+					if ($params =~ /^(\d+)\s*$/)
+					{
+						$condition = int($1);
+					}
+					elsif ($params =~ /^(\d+)\s*([=!]=|[<>]=?)\s*(\d+)\s*$/)
+					{
+						my ($left, $op, $right) = ($1, $2, $3);
+
+						$condition = eval($left . $op . $right);
+					}
+					elsif ($params =~ /^(\S+)\s*(eq|ne)\s*(\S+)\s*$/)
+					{
+						my ($left, $op, $right) = ($1, $2, $3);
+
+						$left =~ s/([\\'])/\\$1/g;
+						$right =~ s/([\\'])/\\$1/g;
+						$condition = eval("'$left' $op '$right'");
+					}
+					else
+					{
+						&Error("Invalid ##if params: '$params' " .
+							   "in $fileName line $.");
+					}
+				}
+				elsif ($cmd =~ /^ifn?def$/)
+				{
+					if ($params =~ /^(\w+)\s*$/)
+					{
+						$condition = defined($vars{$1});
+						$condition = !$condition if ($cmd eq "ifndef");
+					}
+					else
+					{
+						&Error("Invalid ##$cmd param: '$params' " .
+							   "in $fileName line $.");
+					}
+				}
+
+				# Do main body of if
+				$result = &DoOpenFile(*FILE, *IfFilter,
+									  $skipFlag || !$condition);
+
+				if ($result == 1)	# an '##else' was found
+				{
+					# Handle else
+					$result = &DoOpenFile(*FILE, *IfFilter,
+										  $skipFlag || $condition);
+				}
+
+				if ($result == 1)	# a second '##else' was found
+				{
+					&Error("Two ##else's in a row in $fileName line $.");
+				}
+				elsif ($result == 0)	# EOF was encountered
+				{
+					&Error("Unterminated ##if " .
+						   "in $fileName line $ifStartLine");
+				}
+			}
+			elsif ($cmd eq "include")
+			{
+				if ($skipFlag)
+				{
+				}
+				elsif ($params =~ /^"(.*)"\s*$/)
+				{
+					my $incFile = $1;
+
+					&DoFile($incFile);
+				}
+				else
+				{
+					&Error("Invalid ##include params: '$params'");
+				}
+			}
+			elsif ($cmd eq "set")
+			{
+				if ($params =~ /^(\w+)=<<(")(.*)"\s*$/ or
+					$params =~ /^(\w+)=<<(')(.*)'\s*$/)
+				{
+					my $varName = $1;
+					my $quoteChar = $2;
+					my $endTag = $3 . "\n";
+					my $value;
+
+					while (<FILE>)
+					{
+						if ($_ eq $endTag)
+						{
+							chop $value;
+							last;
+						}
+						else
+						{
+							if ($quoteChar eq '"')
+							{
+								$_ = &DoPrepass($_, $skipFlag);
+							}
+							$value .= $_;
+						}
+					}
+					if (!$skipFlag)
+					{
+						$vars{$varName} = $value;
+					}
+				}
+				elsif ($params =~ /^(\w+)="(.*)"\s*$/ or
+					   $params =~ /^(\w+)=(\S*)\s*$/)
+				{
+					if (!$skipFlag)
+					{
+						$vars{$1} = $2;
+					}
+				}
+				else
+				{
+					&Error("Invalid ##set command: '$params'");
+				}
+			}
+			else
+			{
+				&Error("Unrecognized command: '$_'");
+			}
+		}
+		elsif (!$skipFlag)
+		{
+			print;
+		}
+	}
+	return 0;
+}
+
+$optEnable = 1;
+
+foreach (@ARGV)
+{
+	if ($optEnable and /^-/)
+	{
+		if (/^--$/)
+		{
+			$optEnable = 0;
+		}
+		elsif (/^-D(\w+)=(.*)$/)
+		{
+			$vars{$1} = $2;
+		}
+		elsif (/^-I(.*)$/)
+		{
+			unshift @incPath, $1;
+		}
+		else
+		{
+			&Error("Unrecognized option: '$_'");
+		}
+	}
+	else
+	{
+		&DoFile($_);
+	}
+}
+
+#
+# vi: ai ts=4
+# vim: si
+#
diff --git a/tools/yapp.doc b/tools/yapp.doc
new file mode 100644
index 0000000..94dfe4a
--- /dev/null
+++ b/tools/yapp.doc
@@ -0,0 +1,48 @@
+YAPP is a simple macro preprocessor designed to do minor tweaking to
+another program's inputs.
+
+In its input, anything of the form ${foo} is expanded with the variable
+named foo.  It is an error if ${foo} is not defined.
+If you need to escape a dollar sign for some reason, the variable
+with the empty string name , ${}, has the value "$".
+
+The result of macro expansion is *not* re-expanded.  Expansion is done only
+when definitions are made.
+
+After variable expansion, lines are checked to see if they are control lines.
+Control lines begin with ## (after optional leading whitespace)  All such lines are deleted and
+do not appear in the output.  ### is a comment.  Other options
+are:
+
+##set variable=value
+
+value may have one of the following forms:
+token:  Trailing whitespace is stripped.  The token may not contain
+any whitespace.  Use quotes if it's complicated.
+"string":  The string may have embedded quotes, and whitespace after
+	the closing quote.
+<<"DELIM":  This is a here-document, and the value is all of the following
+lines up until, but not including, the newline that precedes a line
+that consists soley of DELIM, for any DELIM string.
+The Delim must be in quotes.  You have two options:
+"DELIM": Expand macros in the body of the here-document.
+'DELIM': Do not expand macros in the here-document.
+
+##include "filename": Insert the named file in place of the current line.
+
+##if num == num
+##if num != num
+##if num < num
+##if num > num
+##if num <= num
+##if num >= num
+##if token eq token
+##if token ne token
+##ifdef symbol
+##ifndef symbol
+##else
+##endif
+You can figure this one out.  Macros in between are expanded as usual
+(so the ##else or ##endif may be in a macro expansion), but the result
+is ignored.  String comparison is allowed only between simple words.
+#ifdef symbol is true if ${symbol} is defined.