using perl to parse unix mailbox

MadJuggla9

2[H]4U
Joined
Oct 9, 2002
Messages
3,515
i wanted to write a little script to read my school mailbox and display it for me. It reads in my mailbox, and parses it out. Just a small script. It worked the first time i ran it, but for some reason it keeps reprinting the same material and i dont know why. I have included the format of the mailbox and my current code.

mailbox:
Code:
From MAILER-DAEMON Wed Apr 20 12:14:20 2005
Date: Wed, 20 Apr 2005 12:14:20 -0500 (CDT)
From: Mail System Internal Data <[email protected]>
Subject: DON'T DELETE THIS MESSAGE -- FOLDER INTERNAL DATA
X-IMAP: 1078710465 0000001543
Status: RO

This text is part of the internal format of your mail folder, and is not
a real message.  It is created automatically by the mail system software.
If deleted, important folder data will be lost, and it will be re-created
with the data reset to initial values.


program:
Code:
#!/usr/bin/perl
print "Content-type: text/html\n\n";
print "<h1>My Mail Box</h1>\n";
open(F, "<mailbox");
$count = 0;

print "<table border=1 width=600 bgcolor=\"#EEEEEE\">\n";


# loop through mailbox
while (($line = <F>)) {

   # set variables in order
   chomp($line);
   if ($line =~ /^From /){
        $count++;
   }
   elsif ($line =~ /^Date:/){
        ($date_text,$date) = split(/:/,$line);
   }
   elsif ($line =~ /^From:/){
        ($from_text,$from) = split(/:/,$line);
   }
   elsif ($line =~ /^Subject:/){
        ($subject_text,$subject) = split(/:/,$line);
   }


   # begin printing table
   print "<tr>";
   print "<td width=30>$count   </td>";
   print "<td width=100>$from   </td>";
   print "<td width=200>$subject</td>";
   print "<td width=270>$date   </td>";
   print "</tr>";

   # times loops has been performed
   $times_looped++
}

print "</table>\n";

print $times_looped;

the loop is executed the exact number of lines of the mailbox, just like it should. beneath some of the messages is some unicode and other misc garbage. i dont see how that would match any of my if statements, but that was my initial thought. any ideas? i had it working correctly when i just printed the message number and the subject.

here is a link to what it is currently doing
http://mars.utm.edu/~chrrgarn/cs226/perl/mail/checkmail.pl
 
Change the while loop so that it only prints the variables only once, after they've all been read. Here's my changes:
Code:
#!/usr/bin/perl
print "Content-type: text/html\n\n";
print "<h1>My Mail Box</h1>\n";
open(F, "<mailbox");
$count = 0;

print "<table border=1 width=600 bgcolor=\"#EEEEEE\">\n";


# loop through mailbox
while (($line = <F>)) {

   # set variables in order
   chomp($line);
   if ($line =~ /^From /){
        $count++;
   }
   elsif ($line =~ /^Date:/){
        ($date_text,$date) = split(/:/,$line);
   }
   elsif ($line =~ /^From:/){
        ($from_text,$from) = split(/:/,$line);
   }
   elsif ($line =~ /^Subject:/){
        ($subject_text,$subject) = split(/:/,$line);
   }
  if ( this line is the last line of the message ) #Fix this so it's proper.  I don't know what the message seperator is...
   {
    printtable();
   }
}

   # begin printing table
sub printtable()
 {
   print "<tr>";
   print "<td width=30>$count   </td>";
   print "<td width=100>$from   </td>";
   print "<td width=200>$subject</td>";
   print "<td width=270>$date   </td>";
   print "</tr>";
 }

   # times loops has been performed
   $times_looped++

print "</table>\n";

print $times_looped;
HTH
 
unhappy_mage said:
Change the while loop so that it only prints the variables only once, after they've all been read. Here's my changes:
Code:
#!/usr/bin/perl
print "Content-type: text/html\n\n";
print "<h1>My Mail Box</h1>\n";
open(F, "<mailbox");
$count = 0;

print "<table border=1 width=600 bgcolor=\"#EEEEEE\">\n";


# loop through mailbox
while (($line = <F>)) {

   # set variables in order
   chomp($line);
   if ($line =~ /^From /){
        $count++;
   }
   elsif ($line =~ /^Date:/){
        ($date_text,$date) = split(/:/,$line);
   }
   elsif ($line =~ /^From:/){
        ($from_text,$from) = split(/:/,$line);
   }
   elsif ($line =~ /^Subject:/){
        ($subject_text,$subject) = split(/:/,$line);
   }
  if ( this line is the last line of the message ) #Fix this so it's proper.  I don't know what the message seperator is...
   {
    printtable();
   }
}

   # begin printing table
sub printtable()
 {
   print "<tr>";
   print "<td width=30>$count   </td>";
   print "<td width=100>$from   </td>";
   print "<td width=200>$subject</td>";
   print "<td width=270>$date   </td>";
   print "</tr>";
 }

   # times loops has been performed
   $times_looped++

print "</table>\n";

print $times_looped;

the message seperator is nothing more than spaces sadly. Ill give that code a try though, thanks
 
A little ghetto:

Code:
# somewhere earlier, before opening the file
$print_msg=0;

# in your while loop
if ($line =~ /^From /)
{
  if (defined $date_text)
  {
    $print_msg=1;
    $date_text=undef; # assuming you don't do anything with this later
  }
  $count++;
}

...

# include in while loop
if($print_msg)
{
  print "<tr>";
  ...
  print "</tr>";
}

# after while loop (for the last message)
print "<tr>";
...
print "</tr>";

print "</table>\n";
 
Just to throw this out there, maybe you can get your sysadmin to deliver mail in a less stupid format. Maildir makes so many problems like this completely trivial. It's really amazing to me that mbox has lasted so long.
 
Back
Top