File: //usr/local/man/cat1/safecat.0
safecat(1) General Commands Manual safecat(1)
[1mNAME[0m
safecat - safely write data to a file
[1mSYNOPSIS[0m
[1msafecat [4m[22mtempdir[24m [4mdestdir[0m
[1mINTRODUCTION[0m
[4msafecat[24m is a program which implements Professor Daniel Bernstein's
[4mmaildir[24m algorithm to copy [4mstdin[24m safely to a file in a specified direc‐
tory. With [4msafecat[24m, the user is offered two assurances. First, if
[4msafecat[24m returns a successful exit status, then all data is guaranteed
to be saved in the destination directory. Second, if a file exists in
the destination directory, placed there by [4msafecat[24m, then the file is
guaranteed to be complete.
When saving data with [4msafecat[24m, the user specifies a destination direc‐
tory, but not a file name. The file name is selected by [4msafecat[24m to
ensure that no filename collisions occur, even if many [4msafecat[24m pro‐
cesses and other programs implementing the [4mmaildir[24m algorithm are writ‐
ing to the directory simultaneously. If particular filenames are
desired, then the user should rename the file after [4msafecat[24m completes.
In general, when spooling data with [4msafecat[24m, a single, separate process
should handle naming, collecting, and deleting these files. Examples
of such a process are daemons, cron jobs, and mail readers.
[1mRELIABILITY ISSUES[0m
A machine may crash while data is being written to disk. For many pro‐
grams, including many mail delivery agents, this means that the data
will be silently truncated. Using Professor Bernstein's [4mmaildir[24m algo‐
rithm, every file is guaranteed complete or nonexistent.
Many people or programs may write data to a common "spool" directory.
Systems like [4mmh-mail[24m store files using numeric names in a directory.
Incautious writing to files can result in a collision, in which one
write succeeds and the other appears to succeed but fails. Common
strategies to resolve this problem involve creation of lock files or
other synchronizing mechanisms, but such mechanisms are subject to
failure. Anyone who has deleted $HOME/.netscape/lock in order to start
netscape can attest to this. The [4mmaildir[24m algorithm is immune to this
problem because it uses no locks at all.
[1mTHE MAILDIR ALGORITHM[0m
As described in maildir(5), [4msafecat[24m applies the [4mmaildir[24m algorithm by
writing data in six steps. First, it [1mstat()s [22mthe two directories [4mtem‐[0m
[4mpdir[24m and [4mdestdir[24m, and exits unless both directories exist and are
writable. Second, it [1mstat()s [22mthe name [1mtempdir/[4m[22mtime.pid.host[24m, where
[4mtime[24m is the number of seconds since the beginning of 1970 GMT, [4mpid[24m is
the program's process ID, and [4mhost[24m is the host name. Third, if [1mstat()[0m
returned anything other than ENOENT, the program sleeps for two sec‐
onds, updates [4mtime[24m, and tries the [1mstat() [22magain, a limited number of
times. Fourth, the program creates [1mtempdir/[4m[22mtime.pid.host[24m. Fifth, the
program [4mNFS-writes[24m the message to the file. Sixth, the program [1mlink()[22ms
the file to [1mdestdir/[4m[22mtime.pid.host[24m. At that instant the data has been
successfully written.
In addition, [4msafecat[24m starts a 24-hour timer before creating [1mtem‐[0m
[1mpdir/[4m[22mtime.pid.host[24m, and aborts the write if the timer expires. Upon
error, timeout, or normal completion, [4msafecat[24m attempts to [1munlink() tem‐[0m
[1mpdir/[4m[22mtime.pid.host[24m.
[1mEXIT STATUS[0m
An exit status of 0 (success) implies that all data has been safely
committed to disk. A non-zero exit status should be considered to mean
failure, though there is an outside chance that [4msafecat[24m wrote the data
successfully, but didn't think so.
Note again that if a file appears in the destination directory, then it
is guaranteed to be complete.
If [4msafecat[24m completes successfully, then it will print the name of the
newly created file (without its path) to standard output.
[1mSUGGESTED APPLICATIONS[0m
Exciting uses for [4msafecat[24m abound, obviously, but a word may be in order
to suggest what they are.
If you run Linux and use qmail instead of sendmail, you should consider
converting your inbox to [4mmaildir[24m for its superior reliability. If your
home directory is NFS mounted, qmail forces you to use [4mmaildir[24m. On the
downside, the lovely tool [4mprocmail[24m, which filters your spam, does not
know [4mmaildir[24m. Rather than running the patched [4mprocmail[24m, you might con‐
sider using [4msafecat[24m to deliver to your inbox. That allows you to use
the latest [4mprocmail[24m without waiting for the [4mmaildir[24m patches to be
applied to it.
(Note: the previous paragraph was written before [4mprocmail[24m started han‐
dling maildir delivery. Since maildir delivery has been added, my point
is made [4mstronger[24m! [4mProcmail[24m's maildir support does not comply with
Dan's algorithm, and so does not offer the reliability promised by
maildir delivery. [4mProcmail[24m plus [4msafecat[24m has always offered reliable
maildir delivery. Another victory for modularity!)
If you write CGI applications to collect data over the World Wide Web,
you might find [4msafecat[24m useful. Web applications suffer from two major
problems. Their performance suffers from every stoppage or bottleneck
in the internet; they cannot afford to introduce performance problems
of their own. Additionally, web applications should NEVER leave the
server and database in an inconsistent state. This is likely, however,
if CGI scripts directly frob some database--particularly if the data‐
base is overloaded or slow. What happens when users get bored and
click "Stop" or "Back"? Maybe the database activity completes. Maybe
the CGI script is killed, leaving the DB in an inconsistent state.
Consider the following strategy. Make your CGI script dump its request
to a spool directory using [4msafecat[24m. Immediately return a receipt to
the browser. Now the browser has a complete guarantee that their sub‐
mission is received, and the perceived performance of your web applica‐
tion is optimal.
Meanwhile, a spooler daemon notices the fresh request, snatches it and
updates the database. Browsers can be informed that their request will
be fulfilled in X minutes. The result is optimal performance despite a
capricious internet. In addition, users can be offered nearly 100%
reliability.
[1mEXAMPLES[0m
To convince sendmail to use [4mmaildir[24m for message delivery, add the fol‐
lowing line to your .forward file:
[1m|SAFECAT HOME/Maildir/tmp HOME/Maildir/new || exit 75 #USERNAME[0m
where [1mSAFECAT [22mis the complete path of the [4msafecat[24m program, [1mHOME [22mis the
complete path to your home directory, and [1mUSERNAME [22mis your login name.
Making this change is likely to pay off; many campuses and companies
mount user home directories with NFS. Using [4mmaildir[24m to deliver to your
inbox folder helps ensure that your mail will not be lost due to some
NFS error. Of course, if you are a System Administrator, you should
consider switching to qmail.
To run a program and catch its output safely into some directory, you
can use a shell script like the following.
#!/bin/bash
MYPROGRAM=cat # The program you want to run
TEMPDIR=/tmp # The name of a temporary directory
DESTDIR=$HOME/work/data # The directory for storing information
try() { $* 2>/dev/null || echo NO 1>&2 }
set `( try $MYPROGRAM | try safecat $TEMPDIR $DESTDIR ) 2>&1`
test "$?" = "0" || exit -1
test "$1" = "NO" && { rm -f $DESTDIR/$2; exit -1; }
This script illustrates the pitfalls of writing secure programs with
the shell. The script assumes that your program might generate some
output, but then fail to complete. There is no way for [4msafecat[24m to know
whether your program completed successfully or not, because of the
semantics of the shell. As a result, safecat might create a file in
the data directory which is "complete" but not useful. The shell
script deletes the file in that case.
More generally, the safest way to use [4msafecat[24m is from within a C pro‐
gram which invokes safecat with [4mfork()[24m and [4mexecve()[24m. The parent
process can the simply [4mkill()[24m the [4msafecat[24m process if any problems
develop, and optionally can try again. Whether to go to this trouble
depends upon how serious you are about protecting your data. Either
way, [4msafecat[24m will not be the weak link in your data flow.
[1mBUGS[0m
In order to perform the last step and [4mlink()[24m the temporary file into
the destination directory, both directories must reside in the same
file system. If they do not, [4msafecat[24m will quietly fail every time. In
Professor Bernstein's implementation of [4mmaildir[24m, the temporary and des‐
tination directories are required to belong to the same parent direc‐
tory, which essentially avoids this problem. We relax this requirement
to provide some flexibility, at the cost of some risk. Caveat emptor.
Although [4msafecat[24m cleans up after itself, it may sometimes fail to
delete the temporary file located in [4mtempdir[24m. Since safecat times out
after 24 hours, you may freely delete any temporary files older than 36
hours. Files newer than 36 hours should be left alone. A system of
data flow involving safecat should include a cron job to clean up tem‐
porary files, or should obligate consumers of the data to do the
cleanup, or both. In the case of qmail, mail readers using [4mmaildir[24m are
expected to scan and clean up the temporary directory.
The guarantee of safe delivery of data is only "as certain as UNIX will
allow." In particular, a disk hardware failure could result in [4msafecat[0m
concluding that the data was safe, when it was not. Similarly, a suc‐
cessful exit status from [4msafecat[24m is of no value if the computer, its
disks and backups all explode at some subsequent time.
In other words, if your data is vital to you, then you won't just use
[4msafecat[24m. You'll also invest in good equipment (possibly including a
RAID disk), a UPS for the server and drives, a regular backup schedule,
and competent system administration. For many purposes, however, [4msafe‐[0m
[4mcat[24m can be considered 100% reliable.
[1mCREDITS[0m
The [4mmaildir[24m algorithm was devised by Professor Daniel Bernstein, the
author of qmail. Parts of this manpage borrow directly from maildir(5)
by Professor Bernstein. In particular, the section "THE MAILDIR ALGO‐
RITHM" transplants his explanation of the [4mmaildir[24m algorithm in order to
illustrate that [4msafecat[24m complies with it.
The code for [4msafecat[24m was written by the present author, and not bor‐
rowed explicitly from qmail code. However, qmail code certainly influ‐
enced the present author, since it was studied at great length in order
to understand the algorithm precisely.
Copyright (c) 2000, Len Budney. All rights reserved.
[1mSEE ALSO[0m
mbox(5), qmail-local(8), maildir(5)
safecat(1)