2007-12-18 11:28:17

by Damien Wyart

[permalink] [raw]
Subject: Important regression with XFS update for 2.6.24-rc6

Hello,

As a follow-up to
<http://marc.info/?l=linux-kernel&m=119796120524618&w=2> (LKML seems
down right now so I am not linking to it), I have detected an important
problem with these two patches: after applying them by hand (downloaded
them raw from SGI's gitweb) on top of 2.6.24-rc5-git5 (they have not yet
been pulled into mainline by Linux as of this morning) for testing
purposes, I noticed upon reboot that "ls -l" on directories with many
files and subdirectories (around 5000 entries) takes several hundreds of
MB in RAM and then dies with "memory exhausted" error.

I also noticed that ldconfig takes a lot of time to complete, and
firefox seems also to eat much more memory than usual. Reverting the two
patches (going back to vanilla rc5-git5) makes these problems go away.
I am not able to test right now if only one of the patches is bogus or
if both of them are concerned.

As the symptoms are easy to reproduce, I guess this is some kind of
brown paper bag bug and will be easy for XFS experts to spot.


Best,

--
Damien Wyart


2007-12-18 12:25:20

by David Chinner

[permalink] [raw]
Subject: Re: Important regression with XFS update for 2.6.24-rc6

On Tue, Dec 18, 2007 at 12:28:04PM +0100, Damien Wyart wrote:
> Hello,
>
> As a follow-up to <http://marc.info/?l=linux-kernel&m=119796120524618&w=2>
> (LKML seems down right now so I am not linking to it), I have detected an
> important problem with these two patches: after applying them by hand
> (downloaded them raw from SGI's gitweb) on top of 2.6.24-rc5-git5 (they have
> not yet been pulled into mainline by Linux as of this morning) for testing
> purposes, I noticed upon reboot that "ls -l" on directories with many files
> and subdirectories (around 5000 entries) takes several hundreds of MB in RAM
> and then dies with "memory exhausted" error.

Ok. I haven't noticed anything wrong with directories up to about 250,000
files in the last few days. The ls -l I just did on a directory with
15000 entries (btree format) used about 5MB of RAM. extent format
directories appear to work fine as well (tested 500 entries).

Can you:

a) isolate the problem to one patch or the other. My guess
would be the directory mod, but.....
b) show your working ;)
- what platform (i386, x86_64, etc)
- what debug options
- commands and output that shows the problem
- strace of ls -l going bad
- xfs_info from filesystem in question

> I also noticed that ldconfig takes a lot of time to complete, and firefox
> seems also to eat much more memory than usual. Reverting the two patches
> (going back to vanilla rc5-git5) makes these problems go away. I am not
> able to test right now if only one of the patches is bogus or if both of
> them are concerned.

Well, there goes a).....

> As the symptoms are easy to reproduce, I guess this is some kind of brown
> paper bag bug and will be easy for XFS experts to spot.

Well, not reproducable on my test boxes. It may well be a brown paper
bag job, but it's not obvious.

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group

2007-12-18 14:32:44

by Damien Wyart

[permalink] [raw]
Subject: Re: Important regression with XFS update for 2.6.24-rc6

* David Chinner <[email protected]> [071218 13:24]:
> Ok. I haven't noticed anything wrong with directories up to about
> 250,000 files in the last few days. The ls -l I just did on
> a directory with 15000 entries (btree format) used about 5MB of RAM.
> extent format directories appear to work fine as well (tested 500
> entries).

Ok, nice to know the problem is not so frequent.

> Can you:

> a) isolate the problem to one patch or the other. My guess
> would be the directory mod, but.....

Yes, it is indeed the directory patch. But even if I still sometimes get
huge memory usage with ls (using the patched kernel), this is quite
rare, and the problem is now mainly getting entries in the listing
repeated, and the ls process taking longer than without the patch. But
this is mainly after booting. I guess the cache plays a role and even
using drop_caches, I can't reproduce the problem. Only on fresh reboot
do I get it systematically, but much less often the memory problem. And
as said earlier, after fresh boot on rc5-git5 without the directory
patch, the ls -l goes normal (no repeated entries).

> b) show your working ;)

Sorry, I forgot this part in my initial report.

> - what platform (i386, x86_64, etc)

i386.

> - what debug options

Nothing special, the kernel has 4K stacks, and xfs partitions are
mounted with noatime,nodiratime.

> - commands and output that shows the problem

It is mainly "ls -l" in a quite crowded directory.

> - strace of ls -l going bad
> - xfs_info from filesystem in question

I have put the files at http://damien.wyart.free.fr/xfs/

strace_xfs_problem.1.gz and strace_xfs_problem.2.gz have been created
with the problematic kernel, and are quite bigger than
strace_xfs_problem.normal.gz, which has been created with the vanilla
rc5-git5. There is also xfs_info.


I can provide further details if needed (maybe kernel config, but
nothing special on the xfs side), but I confirm the behavior is
different with and without the directory patch
(041388b54ed95cd169546bd83bacd08ee32bd7ea on oss.sgi), and doesn't look
normal with the patch.

--
Damien Wyart

2007-12-18 15:20:26

by David Chinner

[permalink] [raw]
Subject: Re: Important regression with XFS update for 2.6.24-rc6

On Tue, Dec 18, 2007 at 03:30:31PM +0100, Damien Wyart wrote:
> * David Chinner <[email protected]> [071218 13:24]:
> > Ok. I haven't noticed anything wrong with directories up to about
> > 250,000 files in the last few days. The ls -l I just did on
> > a directory with 15000 entries (btree format) used about 5MB of RAM.
> > extent format directories appear to work fine as well (tested 500
> > entries).
>
> Ok, nice to know the problem is not so frequent.

.....

> I have put the files at http://damien.wyart.free.fr/xfs/
>
> strace_xfs_problem.1.gz and strace_xfs_problem.2.gz have been created
> with the problematic kernel, and are quite bigger than
> strace_xfs_problem.normal.gz, which has been created with the vanilla
> rc5-git5. There is also xfs_info.

Looks like several getdents() through the directory the getdents()
call starts outputting the first files again. It gets to a certain
point and always goes back to the beginning. However, it appears to
get to the end eventually (without ever getting past the bad offset).

I'll ook into this more in the morning as it's not obvious what is
wrong in my sleep-deprived state....

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group

2007-12-19 10:46:19

by David Chinner

[permalink] [raw]
Subject: Re: Important regression with XFS update for 2.6.24-rc6

On Wed, Dec 19, 2007 at 02:19:47AM +1100, David Chinner wrote:
> On Tue, Dec 18, 2007 at 03:30:31PM +0100, Damien Wyart wrote:
> > * David Chinner <[email protected]> [071218 13:24]:
> > > Ok. I haven't noticed anything wrong with directories up to about
> > > 250,000 files in the last few days. The ls -l I just did on
> > > a directory with 15000 entries (btree format) used about 5MB of RAM.
> > > extent format directories appear to work fine as well (tested 500
> > > entries).
> >
> > Ok, nice to know the problem is not so frequent.
>
> .....
>
> > I have put the files at http://damien.wyart.free.fr/xfs/
> >
> > strace_xfs_problem.1.gz and strace_xfs_problem.2.gz have been created
> > with the problematic kernel, and are quite bigger than
> > strace_xfs_problem.normal.gz, which has been created with the vanilla
> > rc5-git5. There is also xfs_info.
>
> Looks like several getdents() through the directory the getdents()
> call starts outputting the first files again. It gets to a certain
> point and always goes back to the beginning. However, it appears to
> get to the end eventually (without ever getting past the bad offset).

UML and a bunch of printk's to the rescue.

So we went back to double buffering, which then screwed up the d_off
of the dirents. I changed the temporary dirents to point to the current
offset so that filldir got what it expected when filling the user buffer.

Except it appears that it I didn't to initialise the current
offset for the first dirent read from the temporary buffer so filldir
occasionally got an uninitialised offset. Can someone pass me a
brown paper bag, please?

In my local testing, more often than not, that uninitialised offset
reads as zero which is where the looping comes from. Sometimes it
points off into wacko-land, which is probably how we eventually get
the looping terminating before you run out of memory.

That also explains why we haven't seen it - it requires the user buffer
to fill on the first entry of a backing buffer and so it is largely
dependent on the pattern of name lengths, page size and filesystem
block size aligning just right to trigger the problem.

Can you test this patch, Damien?

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group

---
fs/xfs/linux-2.6/xfs_file.c | 1 +
1 file changed, 1 insertion(+)

Index: 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_file.c
===================================================================
--- 2.6.x-xfs-new.orig/fs/xfs/linux-2.6/xfs_file.c 2007-12-19 00:26:40.000000000 +1100
+++ 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_file.c 2007-12-19 21:26:38.701143555 +1100
@@ -348,6 +348,7 @@ xfs_file_readdir(

size = buf.used;
de = (struct hack_dirent *)buf.dirent;
+ curr_offset = de->offset /* & 0x7fffffff */;
while (size > 0) {
if (filldir(dirent, de->name, de->namlen,
curr_offset & 0x7fffffff,

2007-12-19 11:17:42

by Damien Wyart

[permalink] [raw]
Subject: Re: Important regression with XFS update for 2.6.24-rc6

* David Chinner <[email protected]> [071219 11:45]:
> Can someone pass me a brown paper bag, please?

My first impression on this bug was not so wrong, after all ;-)

> That also explains why we haven't seen it - it requires the user
> buffer to fill on the first entry of a backing buffer and so it is
> largely dependent on the pattern of name lengths, page size and
> filesystem block size aligning just right to trigger the problem.

I guess I was lucky to trigger it quite easily...

> Can you test this patch, Damien?

Works fine, all the bad symptoms have disappeared and strace output is
normal.

So you can add:

Tested-by: Damien Wyart <[email protected]>

--
Damien

2007-12-19 11:31:41

by David Chinner

[permalink] [raw]
Subject: Re: Important regression with XFS update for 2.6.24-rc6

On Wed, Dec 19, 2007 at 12:17:30PM +0100, Damien Wyart wrote:
> * David Chinner <[email protected]> [071219 11:45]:
> > Can someone pass me a brown paper bag, please?
>
> My first impression on this bug was not so wrong, after all ;-)
>
> > That also explains why we haven't seen it - it requires the user buffer to
> > fill on the first entry of a backing buffer and so it is largely dependent
> > on the pattern of name lengths, page size and filesystem block size
> > aligning just right to trigger the problem.
>
> I guess I was lucky to trigger it quite easily...
>
> > Can you test this patch, Damien?
>
> Works fine, all the bad symptoms have disappeared and strace output is
> normal.
>
> So you can add:
>
> Tested-by: Damien Wyart <[email protected]>

Thanks for reporting the bug and testing the fix so quickly, Damien.
I'll give it some more QA before I push it, though.

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group