If you are processing UUCP mailbox files, messages are
separated by a line starting with "From
" (ie. The
word "From
" followed by a space). Some mail software
will prefix lines in message bodies with a `>
'
to avoid MUA's from incorrectly treating the line as a message
separator. However, some mail software doesn't.
To avoid incorrect separator detection, many MUAs perform
a more stricter detection of separators beyond "From
".
MHonArc, by default, will treat lines starting
with "From
" as a message separator, which can lead to
incorrect message termination if the From line has not been escaped
with a `>
'.
To fix the problem, use the MSGSEP resource to instruct MHonArc to use a stricter test detecting a message separator. The following MSGSEP resource setting is known to work well:
<MsgSep> ^From \S+\s+\S+\s+\S+\s+\d+\s+\d+:\d+:\d+\s+\d+ </MsgSep>
If this fails, you can try the CONLEN
resource available in v2.0. The CONLEN resource, when set, tells
MHonArc to utilize the Content-Length
fields
in the message head. If your MTA defines this field accurately
(sendmail on Solaris does), then you can utilize this
feature.
No. In order to achieve the same effect, you must add the original, unprocessed, message to the destination archive, then remove the appropriate HTML version of the message from the source archive.
Yes. The following was contributed by Stephane Bortzmeyer:
- Subject: Improvment to MHonArc FAQ
- From: bortzmeyer@internatif.org (Stephane Bortzmeyer)
- Date: Sun, 13 Apr 1997 15:29:31 +0200
... some text deleted ... Having rmed my database :-( I had to write such a program. I include it at the end, it seems quite simple, while necessiting a few text edition after (you just have to include the output of my program in an empty database). #!/usr/local/bin/perl require 'timelocal.pl'; require '/web/mail/MHonArc/lib/mhutil.pl'; require '/web/mail/MHonArc/lib/mhtime.pl'; $dir = shift (@ARGV); opendir (DIR, "$dir") || die "Cannot open $dir: $!"; while ($file = readdir (DIR)) { if ($file =~ /^msg([0-9]+)\.html$/) { $no = $1; open (FILE, "< $dir/$file") || die "Cannot open $file: $!"; while (<FILE>) { chop; if (/^<!--X-([^:]*): (.*)-->$/) { $headers{$1} = $2; $headers{$1} =~ s/ *$//; } } close (FILE); @date = &parse_date ($headers{'Date'}); $date = &get_time_from_date ($date[1], $date[2], $date[3], $date[4], $date[5], $date[6]); $id = "$date $no"; print STDERR "Message $id:\n"; foreach $header (keys (%headers)) { print STDERR "$header: $headers{$header}\n"; $name = $header; $name =~ s/-//; $$name{$id} = $headers{$header}; } } } closedir (DIR); print "%ContentType = (\n"; foreach $key (keys (%ContentType)) { print "\'$key\', \'$ContentType{$key}\',\n"; } print ");\n"; print "%Date = (\n"; foreach $key (keys (%Date)) { print "\'$key\', \'$Date{$key}\',\n"; } print ");\n"; print "%From = (\n"; foreach $key (keys (%From)) { print "\'$key\', \'$From{$key}\',\n"; } print ");\n"; print "%MsgId = (\n"; foreach $key (keys (%MessageId)) { print "\'$key\', \'$MessageId{$key}\',\n"; } print ");\n"; print "%Subject = (\n"; foreach $key (keys (%Subject)) { print "\'$key\', \'$Subject{$key}\',\n"; } print ");\n"; print "%IndexNum = (\n"; foreach $key (keys (%MessageId)) { ($garbage, $num) = split (' ', $key); print "\'$key\', \'$num\',\n"; } print ");\n";
Yes. MHonArc performs archive locking to protect from multiple MHonArc process attempting to write to an archive at the same time. This locking allows MHonArc to safely be used to add messages as they are received.
The following example assumes you are using on a Unix system using sendmail as the mail transfer agent. Please refer to documentation about sendmail if you are not familiar with it (sendmail, 2ed, from O'Reilly is an excellent source).
The approach shown here uses a .forward file in the home directory of the account you want mailed archived. For this example, let's assume it is my account. Here is how to set up the .forward file to invoke MHonArc on incoming mail:
\ehood, "|/home/ehood/bin/webnewmail #ehood"
The "\ehood
" tells sendmail
to still deposit the incoming message to my mail spool file. The
"#ehood
" Bourne shell comment is needed to insure the
command is unique from another user. Otherwise, sendmail
may not invoke the program for you or the other user.
webnewmail is a Perl program that calls MHonArc with the appropriate arguments. A wrapper program is used instead of calling MHonArc directly to keep the .forward file simple, but you can call MHonArc directly if you want. Here is the code to the webnewmail program:
#!/usr/local/bin/perl # Edit above path to point to where perl is on your system. ## Specify a package to protect names from MHonArc. ## MHonArc uses package main for most stuff; a minor ## inconvenience. package webnewmail; ## Edit to point to installed mhonarc. $MHonArc = "/home/ehood/bin/mhonarc"; ## Define ARGV (ARGV is same across all packages). ## Edit options as required/desired. @ARGV = ("-add", "-quiet", "-outdir", "/home/ehood/public_html/newmail"); ## Just require mhonarc, this prevents the overhead of a ## fork/exec. We reset the namespace to main just in-case. package main; require $webnewmail'MHonArc; # Or, $webnewmail::MHonArc (Perl 5 style)
The webnewmail program has to have the executable bit set. This is achieved by using "chmod a+x webnewmail".
No. This is outside of the MHonArc's scope. You can grow your own filter, using the method described in the previous question, to scan the message header an invoke MHonArc with the proper arguments. Or. you can use a tool like Procmail (http://www.ii.com/internet/robots/procmail/). Here are a some messages from users about using Procmail:
- From: Achim Bohnet <ach@rosat.mpe-garching.mpg.de>
- Date: Wed, 13 Nov 1996 13:56:08 +0100
... some text deleted ... Here is what I use in .procmailrc to archive the mhonarc list: NEWDATE="`/usr/bin/date +%Y-%m`" MHONARC_MBOX="/local/mail/lists/mhonarc/$NEWDATE.mbox" :0: $MHONARC_MBOX$LOCKEXT * ^Sender:.*owner-mhonarc@ { :0 c $MHONARC_MBOX :0 c | /local/mail/mhonarc-1.2.2/mailarchive -add mhonarc "$NEWDATE" } Mailarchive is nothing more than a wrapper around mhonarc with my long. list of options. Achim P.S. Procmail itself comes with an example manual page. It's worth looking into it.
- From: "Eric D. Friedman" <friedman@hydra.acs.uci.edu>
- Date: Wed, 13 Nov 1996 06:38:42 -0800
You can actually dispense with the wrapper if you use environment variables to pass options to MHonArc, but I'm sure Achim has a good reason for doing it his way. Just for the purposes of comparion, here's how I do it: eeeweb% cat .procmailrc #Set on when debugging VERBOSE=off #Replace `mail' with your mail directory (Pine uses mail, Elm uses Mail) MAILDIR=$HOME/Mail #Directory for storing procmail log and rc files PMDIR=$HOME/.procmail #Path and options for mhonarc MHONARC='/dcs/packages/infosys/bin/mhonarc -add -quiet -umask 022 -idxfname inde x.html' :0 * ^Originator:.*@classes.uci.edu { MHHOME=$HOME/classarc LOGFILE=$PMDIR/classlists.log INCLUDERC=$PMDIR/rc.classlists } :0 E { MHHOME=$HOME/mail-arc LOGFILE=$PMDIR/otherlists.log INCLUDERC=$PMDIR/rc.otherlists } and then in the file .procmail/rc.classlists or rc.otherlists (depending on the Originator: of the message), lots of the following: # Procmail Entry for uci-www :0 E * ^TOuci-www { :0 c uci-www/. :0 |$MHONARC -rcfile $MHHOME/uci-www/0-rcfile.html -outdir $MHHOME/uci-www } Eric D. Friedman friedman@uci.edu
- From: Paul McKinley <mckinley@austin.asc.slb.com>
- Date: Mon, 21 Apr 1997 15:29:08 -0500
... some text deleted ... I use procmail to drive mhonarc archives from Majordomo. I set up a single pseudouser and drive several archives from the one pseudouser. Here's a sample .forward file: "|/usr/ucb/rsh cappuccino \"set IFS=' '; exec /usr/local/procmail/bin/procmail #widget\"" Another example is: "|/bin/csh -c \"set IFS=' '; exec /usr/local/procmail/bin/procmail #widget\"" Two reasons to use the "rsh cappuccino": 1. doesn't require the user to be able to login to server, although the username must still be valid 2. gets the processing load off the mail server Here's an example .procmail recipe: LOGFILE=$HOME/procmail_errors LOGABSTRACT=all LOCKEXT=.lock VERBOSE=on UMASK=003 # widget: list short description :0 H * ^List-Name: widget { # The rotate call (under construction) does archive rotation # leave commented! #:0c i #| /home/web-arch/bin/rotate /usr/local/web/webarchive/widget # Put the mail in the mailbox, which is used by archiver to re-generate # the html indexes :0 cA /usr/local/web/webarchive/widget/current/mbox # The mhonarc call examines mbox, turns the mail messages into .html # documents, and compiles the indexes. # -reverse -treverse\ :0 ia | /usr/local/mhonarc/bin/mhonarc \ -idxfname index.shtml \ -tidxfname threads.shtml \ -rcfile widget.rc\ -outdir /usr/local/web/webarchive/widget/current \ /usr/local/web/webarchive/widget/current/mbox } I have a directory per archive, and put the current period in directory "current". Then I have an index page per archive that indexes the periods, plus gives information about the list and how to subscribe/unsubscribe. The widget.rc file resides in the pseudouser's home directory. Note the * ^List-Name: widget I put the following in the majordomo list's config file: message_headers << END List-Name: widget END This adds the "List-Name" header to messages, which is what procmail filters for. Hope this helps Paul McKinley Unix SysAdmin Contractor
Yes. If MHonArc sees no archive exists when perform an add, it will automatically create the archive.
Make sure the file maillist.html (or the value of the IDXFNAME resource) does not exist if no archive exists and -add has been specified. Otherwise, unpredictable output of the maillist.html file may result if maillist.html is not in the proper format.
Big gaps in the message number sequence may occur if you defined the MAXSIZE resource and you have MHonArc rescanning a mail folder for adding new messages. The problem occurs when MHonArc reads in messages that will automatically get deleted due to MAXSIZE. Ie. Messages subject to automatic deletion are the oldest ones. If the input contains old messages that will get deleted at the end of processing, the old messages will still use up message numbers since messages to be deleted are not determined until all input is read. Since MHonArc does not keep information about deleted messages, if the messages are fed into MHonArc again, the "jumping" will occur again (and the jump will get larger for each additional update).
To avoid the problem, try to pass only new, never processed, messages to MHonArc instead of having MHonArc rescanning the same mail folder for new messages. Another approach is to set either the EXPIREAGE or EXPIREDATE resources (available in v2.0 beta 2, or later). These work as an alternative to MAXSIZE and will help in preventing message number jumping since expiration of a message is checked when it is initially read (bypassing the assignment of a message number).
This condition may occur when you have MHonArc examine the same folder periodically to add any new message. If there are messages in the folder without message-ids, then those messages will be re-added each time MHonArc runs.
Why? Well, MHonArc uses message-ids for determining if a message has been archived, or not. Therefore, if a message-id is missing for a message, then MHonArc believes it is new.
In general, mail has message-ids. They get assigned by MTAs. However, if messages are generated by a CGI program, or other non-mail specific software, then the program in question should create a message-id. Else, you will need to move already-processed messages into a different area so MHonArc does not read them again.