2008-12-17 00:16:40

by Tino Keitel

[permalink] [raw]
Subject: Very slow header cache in mutt if the maildir is on ext3

Hi,

I suffered from very long delays (like one minute) when opening
maildirs with a few thousand mails on my laptop. I did some testing on
my desktop in a very similar environment (LVM, dm-crypt, 2,5" hard
disk, identical mutt configuration), which worked as quick as expected.
However, I tested with the maildir on XFS. When I switched to ext3,
things got slow again. Then I discovered that the mutt header cache
file is the culprit. I already made a backup and restore, so the file
system is in a fresh state. I also removed the old header cache before
testing.

In strace, I see that during the delay read() and lseek() were called
to read the header cache. If I flush all caches, and then read only
the header cache back into the cache using dd, it works as quick as
expected (only a few seconds).

The strange thing is that in both cases (maildir on ext3 and maildir on
XFS), the header cache file is stored on a separate file system on a
different hard disk on an XFS file system. I don't get the point why
reading the header cache from this XFS file system is so slow when the
maildir is on ext3, and fast when the maildir is on XFS.

Here are strace excerpts for XFS and ext3, taken with "strace -f -s0
-tt" after doing "echo 7 > /proc/sys/vm/drop_caches":

http://tikei.de/mutt_hcache_xfs.txt
http://tikei.de/mutt_hcache_ext3.txt

Note the start and end times. The ext3 case took nearly 18 seconds, the
XFS case took 3 seconds.

Regards,
Tino


2008-12-17 00:52:24

by Tino Keitel

[permalink] [raw]
Subject: Re: Very slow header cache in mutt if the maildir is on ext3

On Wed, Dec 17, 2008 at 01:16:25 +0100, Tino Keitel wrote:

[...]

> The strange thing is that in both cases (maildir on ext3 and maildir on
> XFS), the header cache file is stored on a separate file system on a
> different hard disk on an XFS file system. I don't get the point why
> reading the header cache from this XFS file system is so slow when the
> maildir is on ext3, and fast when the maildir is on XFS.
>
> Here are strace excerpts for XFS and ext3, taken with "strace -f -s0
> -tt" after doing "echo 7 > /proc/sys/vm/drop_caches":
>
> http://tikei.de/mutt_hcache_xfs.txt
> http://tikei.de/mutt_hcache_ext3.txt

OK, after glancing at the strace output again, I see that the seek
offsets are much more linear in the XFS case, whereas they are pretty
random in the ext3 case. I guess that this is connected to the order
of the files in the maildir, which depends on the FS type. So this is
a bug in mutt which makes reading the header cache dead slow if the
files are in an inconvenient order.

Thanks to Goswin Brederlo for hinting at the seek offsets.

Regards,
Tino

2008-12-17 01:03:47

by Tino Keitel

[permalink] [raw]
Subject: Re: Very slow header cache in mutt if the maildir is on ext3

On Wed, Dec 17, 2008 at 01:52:10 +0100, Tino Keitel wrote:

[...]

> OK, after glancing at the strace output again, I see that the seek
> offsets are much more linear in the XFS case, whereas they are pretty
> random in the ext3 case. I guess that this is connected to the order
> of the files in the maildir, which depends on the FS type. So this is
> a bug in mutt which makes reading the header cache dead slow if the
> files are in an inconvenient order.

Just for the records: I tested again on ext3 with dir_index disabled,
and the cache was read as quickly as with XFS.

Regards,
Tino

2008-12-17 03:25:31

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Very slow header cache in mutt if the maildir is on ext3

On Wed, Dec 17, 2008 at 01:52:10AM +0100, Tino Keitel wrote:
> OK, after glancing at the strace output again, I see that the seek
> offsets are much more linear in the XFS case, whereas they are pretty
> random in the ext3 case. I guess that this is connected to the order
> of the files in the maildir, which depends on the FS type. So this is
> a bug in mutt which makes reading the header cache dead slow if the
> files are in an inconvenient order.

I *thought* mutt had a patch which sorted the files returned by
readdir() by inode number, and then opened the files sorted by inode
number order; maybe it was a distro-specific patch that was never
pushed back to mainline, though. In any case, sorting list of
directory entries as returned by readdir() by inode number does solves
the problem for ext3 with htree, and in general is a good optimization
for most filesystems. (See attached for a ld-preload hack that
demonstrates the optimization.)

There is also a fix in ext4 which partially addresses this problem,
which could be back-ported to ext3:

commit 240799cdf22bd789ea6852653c3b879d35ad0a6c
Author: Theodore Ts'o <[email protected]>
Date: Thu Oct 9 23:53:47 2008 -0400

ext4: Use readahead when reading an inode from the inode table

With modern hard drives, reading 64k takes roughly the same time as
reading a 4k block. So request readahead for adjacent inode table
blocks to reduce the time it takes when iterating over directories
(especially when doing this in htree sort order) in a cold cache case.
With this patch, the time it takes to run "git status" on a kernel
tree after flushing the caches via "echo 3 > /proc/sys/vm/drop_caches"
is reduced by 21%.

Signed-off-by: "Theodore Ts'o" <[email protected]>

- Ted


Attachments:
(No filename) (1.75 kB)
spd_readdir.tar.gz (3.47 kB)
Download all attachments

2008-12-17 09:10:29

by Tino Keitel

[permalink] [raw]
Subject: Re: Very slow header cache in mutt if the maildir is on ext3

On Tue, Dec 16, 2008 at 22:25:17 -0500, Theodore Tso wrote:

[...]

> I *thought* mutt had a patch which sorted the files returned by
> readdir() by inode number, and then opened the files sorted by inode
> number order; maybe it was a distro-specific patch that was never
> pushed back to mainline, though. In any case, sorting list of

It is in mutt upstream, and enabled by default, but only if
maildir_header_cache_verify is set. If not, the inode list is kept
unsorted. maildir_header_cache_verify is enabled by default, but I
disabled it in my muttrc.

Regards,
Tino

2008-12-17 12:38:42

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Very slow header cache in mutt if the maildir is on ext3

On Wed, Dec 17, 2008 at 10:10:03AM +0100, Tino Keitel wrote:
> On Tue, Dec 16, 2008 at 22:25:17 -0500, Theodore Tso wrote:
>
> > I *thought* mutt had a patch which sorted the files returned by
> > readdir() by inode number, and then opened the files sorted by inode
> > number order; maybe it was a distro-specific patch that was never
> > pushed back to mainline, though. In any case, sorting list of
>
> It is in mutt upstream, and enabled by default, but only if
> maildir_header_cache_verify is set. If not, the inode list is kept
> unsorted. maildir_header_cache_verify is enabled by default, but I
> disabled it in my muttrc.

I just checked mutt 1.5.17 in Ubunty Hardy, and it sorts the inodes
even if maildir_header_cache_verify is unset. (It sorts it earlier if
that option is set, but a little later in the function, if it wasn't
sorted earlier, it sorts it then.) Check for calls to maildir_sort()
that use md_cmp_inode(); in my version of mutt, there are two such
calls in mh.c:maildir_delayed_parsing().

- Ted

2008-12-17 16:17:37

by Tino Keitel

[permalink] [raw]
Subject: Re: Very slow header cache in mutt if the maildir is on ext3

On Wed, Dec 17, 2008 at 07:32:04 -0500, Theodore Tso wrote:

[...]

> I just checked mutt 1.5.17 in Ubunty Hardy, and it sorts the inodes
> even if maildir_header_cache_verify is unset. (It sorts it earlier if
> that option is set, but a little later in the function, if it wasn't
> sorted earlier, it sorts it then.) Check for calls to maildir_sort()
> that use md_cmp_inode(); in my version of mutt, there are two such
> calls in mh.c:maildir_delayed_parsing().

I checked maildir_sort(), and it wasn't called with
maildir_header_cache_verify unset. In the source, it looks like this:

#if USE_HCACHE
if (option(OPTHCACHEVERIFY))
{
DO_SORT();
ret = stat(fn, &lastchanged);
}

...

if (ctx->magic == M_MH)
data = mutt_hcache_fetch (hc, p->h->path, strlen);
else
data = mutt_hcache_fetch (hc, p->h->path + 3, &maildir_hcache_keylen);

...

#endif /* USE_HCACHE */

DO_SORT();

So DO_SORT() is called _after_ reading the header cache if
maildir_header_cache_verify is unset, which is too late, because the
hard disk seeks to death if the cache is read with the unsorted inode
list.

Regards,
Tino