''' Hey EMACS, I'm an -*-Nroff-*- file.
.bp
.NH 1
Trouble-shooting Notes
.XS
\*(SN Trouble-shooting Notes
.XE
.PP
Sometimes, Notes does not work.  Here are some hints for diagnosing
problems and recovering from them.
.NH 2
Permissions
.XS
\*(SN Permissions
.XE
.PP
The most common cause of mysterious behavior from a program in the
notesfile system is incorrect permissions.  When notes is acting up,
it is always worth the effort to carefully check all the programs,
files, and directories.  There are many ways to wreck the permissions,
but creating a file as root or moving files to a new directory are
fairly common methods.
.PP
Since notes runs setgid, the files and directories need to have the
group permissions open.  User programs should be setgid only, and
utility programs need to be setgid.  The utility programs are usually
installed both setuid and setgid, but it really doesn't matter, since
most of them refuse to proceed if they are not run by the notesfile
owner.  The network reception programs (nfrcv and newsinput) are
generally setuid and setgid.  None of the scripts need to be setgid
or setuid.
.PP
All directories in the spool area and archive area must be 0770, and
all files there must be 0660.  You could set permissions to 0775 and
0664, but there is little useful that a user can do with a raw
notesfile (besides running "du").  All files and directories should be
owned by the notesfile owner and must be owned by the notesfile group.
If in doubt, write a find to guarantee the correct permissions.
.PP
The home directory for the notesfile owner should be 0775.  The
subdirectories should be 0775 and all non-executable files should be
0660 or 0664.  Setting the files to 0664 is very handy (especially for
net.log), since it allows an administrator to inspect the notesfile
system without becoming "root" or "notes".
.PP
The binaries in LIBDIR should be 0754, with the exception of nfrcv
and newsinput which must be 0755.  Allowing read permissions for 
world makes it easy to check the RCS ident strings.
.NH 2
The nfmaint Notesfile
.XS
\*(SN The nfmaint Notesfile
.XE
.PP
Problems and events in the notesfile system are logged to the
notesfile "nfmaint".  Most of the messages really are self-explanatory,
but a couple require special attention here.
.PP
The marvelous message "Failure: net.jokes" means that either newsinput
or nfrcv was unable to open the net.jokes notesfile.  This is usually
a permission problem, though it can be the result of a damaged
notesfile.  To get this message, an open has to fail on one of the
files that make up the notesfiles (ACCESS.LIST, NOTE.INDEX,
RESP.INDEX, and BODY.TEXT).  If any of the files are inaccessible or
missing, the open will fail.  The files are always opened for
read/write.
.PP
If the message claims that the errno is "Not a Teletype", then that
information is probably not relevant.  Errno is set to ENOTTY
elsewhere in the program.  Look for another cause.
.PP
Any message complaining about getnrec, putnrec, getrrec, or putrrec
probably indicates a corrupted notesfile.  These are usually an error
in a read() or write().
.NH 2
Examining Headers
.XS
\*(SN Examining Headers
.XE
.PP
When you are looking at note or response in notes, the 'H' command
will show you all the headers.  The "Path:" header line will tell you
where that particular note came from.  All the headers are explained
in RFC-850, "Standard for USENET Text Messages", which is included
with the Notes documentation.
'''.NH 2
'''Verbose Messages
'''.XS
'''\*(SN Verbose Messages
'''.XE
.NH 2
Duplicates
.XS
\*(SN Duplicates
.XE
.PP
When News articles are translated to old notes format and back, it is
possible for the articles returning from oldnotes to not match the
original.  Notes 2.7 avoids this problem by not using oldnotes format
at all.  It can however exchange articles in oldnotes format with
older Notes implementations.  This will almost a always cause
duplicates.
.PP
Each article has a unique identifier.  In the B News format, this is
called the Message-ID, and looks like this: <1234.abdc@newsvax.UUCP>.
It may be an arbitrary string, but really should have the form:
.DS
\fB<\fIUnique-string\fB@\fIsystem-name\fB>\fR
.DE
In the oldnotes format, the unique identifier looks like this:
.DS
\fIsystem-name\fB:\fIlong-integer\fR
.DE
The system name must be less than 32 characters (less than 10
characters in Notes 1.3), and never has a domain.  Notes 2.7 will
truncate system names to 31 characters when sending in oldnotes
format.
.PP
If the unique-string part of the message-ID cannot be converted to an
integer, a -1 will be put into the integer and the entire message-ID
will be stored in the system-name part.  All message-ID's generated by
News 2.10.3 are of this type.
.NH 3
Domain Duplicates
.XS
\*(SN Domain Duplicates
.XE
.PP
When the domain is stripped from the system-name part to transform it
into an oldnotes system-name, information is lost.  A Message-ID made
from the notes unique ID will not match the original.  Notes 2.7
attempts to handle this.  When it check for a match between two
Message-ID's, one with a domain, and one without, it ignores the
domain in the match.
.NH 3
Truncation Duplicates
.XS
\*(SN Truncation Duplicates
.XE
.PP
If the system name is longer than 31 characters, it will be truncated.
This is more likely if the entire message-ID has to be stored in the
system-name part.  Notes 2.7 does not eliminate truncation duplicates.
.PP
If the note passes through a Notes 1.3 system, it is likely to have
garbage tacked on after the tenth character (if the system-name was 10
chars or longer).
.NH 3
-100 Duplicates
.XS
\*(SN -100 Duplicates
.XE
.PP
Previous News/Notes gateways multiplied the integer part of the
message-ID by -100 when going from News to Notes, an undid it coming
back.  Notes 2.7 never multiplies by -100, though it will undo that
multiplication.  This can cause duplicates on the oldnotes side of
things, if there are two gateways (an old and a new) talking to the
same set of notes sites.
.PP
It is possible to get an "arithmetic truncation" duplicate if the
multiplication by -100 overflowed the 32-bit signed integer.
.NH 2
Corrupted Notesfiles
.XS
\*(SN Corrupted Notesfiles
.XE
.PP
In many cases, there is little that can be done about a corrupted
notesfile unless you are a real notes guru.  Generally, you should
restore it from a backup and ask a site that sends you articles to
reset your sequencer to the backup time.  It might be easier to simply
get a tape of the notesfiles from the other site rather than get them
off of your own backups.  This will work fine if you are running
compatible versions of Notes (one way to find out ...).
.PP
It is possible to repair some kinds of damage if you don't have
backups available.  See the section "Repairing Corrupted Notesfiles"
(below) for some hints.
.PP
To tell if a notesfile is corrupted, dump it with "newsoutput -A"
(older versions of Notes had the nfdump command, now obsolete).  If
the dump succeeds, then the notesfile is probably OK.  If it fails,
you may be able to tell where the problem is by looking at the dump.
If only one note is bad, then a director or the notes owner can zap
the offending notesstring ('z' command on the director page).
Sometimes the very last nnotesstring is corrupt, usually due to
nfarchive running out of disk space (see below).
.PP
Obviously, a corrupt notesfile is a serious bug.  Please try to track
down the problem if it happens to you.
.NH 2
Corrupted Notesfiles and Free Space
.XS
\(*SN Corrupted Notesfiles and Free Space
.XE
.PP
Notesfiles can be corrupted when the notes spool area runs out of free
space.  This can happen during notes reception (newsinput) or during
notes archiving (nfarchive).  Nfarchive makes a copy of the notes
file, then deletes the old one.  To run nfarchive, even with the
\fB-d\fP options (delete), requires extra space.  The free space must
be at least as large as the largest notesfile that you archive.
.PP
It is a very good idea to keep a safe amount of free space in the
Notes spool area.  Keep at least enough space to hold a weekend worth
of Notes traffic plus the space needed for nfarchive (equal to your
largest notesfile).  On a system that receives all Usenet newsgroups,
this requires about five to ten Megabytes.  If you run nfarchive every
night, this should suffice.  If you run nfarchive once per week, you'll
need more padding.
.NH 2
Repairing Corrupted Notesfiles
.XS
\*(SN Repairing Corrupted Notesfiles
.XE
.PP
\fIMissing NOTE.INDEX or RESP.INDEX:\fP This is a serious problem.
Restore the notesfile from backups, if possible. If that is not
possible, and the information is vital, make a copy of BODY.TEXT and
start to work with an editor.  Emacs is much better than vi for this
task, because the file may contain lines longer than 512 characters,
which vi will truncate for you.  Vi also may have trouble with any
binary info in the file.
.PP
The header lines will show up without the identifying fields
("mark@cbosgd" instead of "From: mark@cbosgd"), but they should be in
the same order every time.  Fields which are not recognized by Notes
(like "Expires:") will be grouped to together at the end untouched.
Each field will be null-terminated.  Dates are stored as a time_t (see
<sys/types.h>).  The text of the message follows the header.
.PP
\fITruncated NOTE.INDEX, RESP.INDEX, or BODY.TEXT:\fP Zap the last
notesstring.
.PP
\fIMissing or corrupt ACCESS.LIST:\fP Copy a good ACCESS.LIST from
another notesfile, then edit the permissions to be correct from the
director page.
.NH 2
Problems Transferring Notes
.XS
\*(SN Problems Transferring Notes
.XE
.PP
Rnews and nfrcv can exit with one of four exit values.  A value of 0
(zero) means success, of course.  A 1 (one) means that some fatal error
occurred.  A 2 (two) means that there was no notesfile by that name,
and that none could be created.  A 3 (three) means that the incoming
news batch was in the wrong format or garbled.
.PP
An exit value larger than 128 means that a signal terminated the
program and that a core file was left.  Subtract 128 from the returned
value to find the number of the signal.  Signal numbers are documented
in signal(2).  The signal+128 encoding is the return from a wait(2)
system call.  When looking up the signal number, use the documentation
from the remote system (where the error occurred).  Signal numbers are
not the same on all Unix systems.
.PP
A message like "rnews (DENIED)" means that uuxqt is not allowed to execute
rnews.  Add it to "/usr/lib/uucp/L.cmds".
.PP
When you do run into problems receiving batches, it is very helpful to
capture batches for a while.  Later, those can be fed into rnews, perhaps
even using a debugger.  Substitute one of these scripts for /bin/rnews:
.DS
cat > /usr/spool/rnews/batch$$
tee /usr/spool/rnews/batch$$ | /usr/lib/notes/newsinput
.DE
Remember to change this back when you are done.  This can eat a lot of
disk space.
