Lost email - possible file system bug (ext4)

This is puzzling.

I am using “fetchmail” to receive mail from a remote POP3 server. It, in turn, passes the mail to the local SMTP server (I am using “sendmail”). And from there, it goes to the local delivery agent, in this case “slocal” (part of the “nmh” package.

What happened: fetchmail reported two messages received. But only one showed up.

Here are the sendmail logs:


Oct 29 18:10:47 nwr2 sendmail[19637]: q9TNAloP019637: from=<prvs=642155cc3=rickert@cs.niu.edu>, size=3116, class=0, nrcpts=1, msgid=<525565778.3065491351534034611.JavaMail.r4cq5@inhas00326.bi-ext.com>, proto=ESMTP, daemon=MTA, relay=localhost [127.0.0.1]
Oct 29 18:10:47 nwr2 sendmail[19640]: q9TNAloP019637: to=<rickert+holl@localhost>, ctladdr=<rickert+holl@localhost> (1001/100), delay=00:00:00, xdelay=00:00:00, mailer=slocal, pri=33324, relay=holl, dsn=2.0.0, stat=Sent
Oct 29 18:10:48 nwr2 sendmail[19637]: q9TNAloQ019637: from=<marc.welters@orange.fr>, size=865098, class=0, nrcpts=1, msgid=<1173106376.57670.1351539428159.JavaMail.www@wwinf1x08>, proto=ESMTP, daemon=MTA, relay=localhost [127.0.0.1]
Oct 29 18:10:48 nwr2 sendmail[19643]: q9TNAloQ019637: to=<rickert+holl@localhost>, ctladdr=<rickert+holl@localhost> (1001/100), delay=00:00:01, xdelay=00:00:00, mailer=slocal, pri=895287, relay=holl, dsn=2.0.0, stat=Sent

Those show the two messages as handled sequentially. The delivery of the first message is reported before receipt of the second message. So this should not be due to a race condition.

In my mh mailbox, the second message shows up as message 40. What should have happened, is that the first message should have shown up as message 40, and the second as message 41 (the last old message in that mailbox is 39).

My “.maildelivery” file directs where to deliver. I checked it. There are no rules that call for a message to be deleted. Both messages should have been delivered, but one is missing.

Note that I have not seen this happen before.

I am not particularly concerned about the lost email, which I am sure was spam. I am troubled that there is an apparent bug. And it seems most likely to have been a filesystem bug. The file system is ext4.

On 2012-10-30 01:06, nrickert wrote:
> And it seems
> most likely to have been a filesystem bug. The file system is ext4.

I doubt it. You need more clues to blame the filesystem :slight_smile:


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” (Minas Tirith))

The two most likely possibilities that I can think of, are:

“slocal” failed to deliver the message, yet returned status 0.

“slocal” successfully delivered the first message (it should have been message 40). The second “slocal” would have started by reading the directory to find the highest number message there, and presumably it did not find the 40.

I’ll grant that “slocal” is an old and buggy program. Yet the first possibility seems unlikely. And while the second problem could be caused by a bug in readdir(), that too seems unlikely. That’s what suggests a file system bug.

Granted, zillions of other things could have gone wrong, but most are very unlikely.

(added in edit) - I am not “blaming” the file system. I listed that as only a possibility. Given the small amount of evidence, it would be difficult to track this down to find what actually happened.

On 2012-10-30 04:56, nrickert wrote:

> Granted, zillions of other things could have gone wrong, but most are
> very unlikely.

Unfortunately, I’m not familiar with your combination of software. I use fetchmail, postfix,
amavis-new, procmail, spamassassin, and dovecot.


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” (Minas Tirith))