LinuxLists.cc - kerneloops.org: 2.6.26-rc possible regression in ext3

2008-06-19 05:34:46

Subject: kerneloops.org: 2.6.26-rc possible regression in ext3

In the kerneloops.org stats, a new oops is rapidly climbing the charts.
The oops is a page fault in the ext3 "do_slit" function, and the first
report of it was with 2.6.26-rc6-git3.

It happens with various applications; the backtraces are at:

http://www.kerneloops.org/search.php?search=do_split

but are generally of this pattern:

*do_split
ext3_add_entry
ext3_rename
vfs_rename
... <various paths into vfs_rename> ...

or

*do_split
? add_dirent_to_buf
ext3_add_entry
ext3_new_inode
ext3_add_nondir
ext3_create
vfs_create
....

did we change anything in ext3 this cycle?

2008-06-19 06:02:20

by Linus Torvalds

[permalink] [raw]

Subject: Re: kerneloops.org: 2.6.26-rc possible regression in ext3

On Wed, 18 Jun 2008, Arjan van de Ven wrote:
>
> In the kerneloops.org stats, a new oops is rapidly climbing the charts.
> The oops is a page fault in the ext3 "do_split" function, and the first
> report of it was with 2.6.26-rc6-git3.

Interesting.

> It happens with various applications; the backtraces are at:
>
> http://www.kerneloops.org/search.php?search=do_split
>
> but are generally of this pattern:
>
> *do_split
> ext3_add_entry
> ext3_rename
> vfs_rename
> ... <various paths into vfs_rename> ...
>
> or
>
> *do_split
> ? add_dirent_to_buf
> ext3_add_entry
> ext3_new_inode
> ext3_add_nondir
> ext3_create
> vfs_create
> ....
>
> did we change anything in ext3 this cycle?

I'm not seeing anything relevant, but I'm adding Al to the cc in, since
the r/o bind mounts did change fs/namei.c and vfs_create/mkdir in
particular. Not that I see why that would trigger either, but the changes
to fs/ext3/namei.c seem to be even _less_ interesting than that.

One thing I note is that all the oopses seem to be i686 - are there that
few x86-64 fc10 users (I'd have assumed that 64-bit is starting to be the
norm for people who live on the edge, but perhaps I'm just out of touch)?

Or could this perhaps be an indication that it is specific to i686 some
way (eg a compiler issue?)

Linus

2008-06-19 06:09:29

by Arjan van de Ven

[permalink] [raw]

Subject: Re: kerneloops.org: 2.6.26-rc possible regression in ext3

Linus Torvalds wrote:
> Or could this perhaps be an indication that it is specific to i686 some
> way (eg a compiler issue?)
>

Dave Airlie just confirmed this is a compiler bug indeed in gcc 4.3.1
and pointed at https://bugzilla.redhat.com/show_bug.cgi?id=451068

2008-06-19 06:12:53

by Arjan van de Ven

[permalink] [raw]

Subject: Re: kerneloops.org: 2.6.26-rc possible regression in ext3

Linus Torvalds wrote:
> One thing I note is that all the oopses seem to be i686 - are there that
> few x86-64 fc10 users (I'd have assumed that 64-bit is starting to be the
> norm for people who live on the edge, but perhaps I'm just out of touch)?
>

for rawhide the 64/32 ratio seems to be 106/135
for fedora 9 the 64/32 ratio is 4946/13636

(nr of oopses for the specific architecture/releases)

so your assumption of the experimental rawhide users are more likely to use 64 bit seems to be quite correct.

2008-06-19 06:14:24

by Linus Torvalds

[permalink] [raw]

Subject: Re: kerneloops.org: 2.6.26-rc possible regression in ext3

On Wed, 18 Jun 2008, Linus Torvalds wrote:
>
> One thing I note is that all the oopses seem to be i686 - are there that
> few x86-64 fc10 users (I'd have assumed that 64-bit is starting to be the
> norm for people who live on the edge, but perhaps I'm just out of touch)?
>
> Or could this perhaps be an indication that it is specific to i686 some
> way (eg a compiler issue?)

The oops code is odd:

27: 8d 4c 18 fe lea 0xfffffffe(%eax,%ebx,1),%ecx
2b:* 8b 19 mov (%ecx),%ebx <-- trapping instruction
2d: 83 e9 08 sub $0x8,%ecx
30: 89 d8 mov %ebx,%eax
32: 66 d1 e8 shr %ax
35: 0f b7 c0 movzwl %ax,%eax

and that "lea" is doing an address computation of "eax+2*ebx-2". Which
does *not* look like an address to a 32-bit entity, but to a 16-bit one.
Yeah, it's not conclusive, but it is suggestive.

And the 16-bit "shr+movzwl" further strengthens the case that it is
actually working on a 16-bit entity. The trapping instruction _should_
possibly have been a "movzwl (%ecx),%ebx" to begin with.

But it did a 32-bit load, and in this case it looks as if the 16-bit load
would have been correct! The value of ECX in this example was

ECX: dc384ffe

ie it was indeed a two-byte aligned thing at the end of the page, and if
the load had been a 16-bit load (like the data seems to be), it would
never have oopsed! The page fault seems to be due to DEBUG_PAGEALLOC and
the next page being unmapped because it's not allocated.

I only looked closer at one particular oops (25906, in case anybody
cares), but at least judging from that particular one I would indeed
suspect a compiler bug.

Of course, the main reason I say that is that none of the ext3 or VFS
changes look even _remotely_ relevant to any of this. They really don't
look like they could possibly matter for "do_split()" unless there is
something really odd going on.

Linus

2008-06-19 06:41:34

by Linus Torvalds

[permalink] [raw]

Subject: Re: kerneloops.org: 2.6.26-rc possible regression in ext3

On Wed, 18 Jun 2008, Linus Torvalds wrote:
>
> and that "lea" is doing an address computation of "eax+2*ebx-2". Which
> does *not* look like an address to a 32-bit entity, but to a 16-bit one.
> Yeah, it's not conclusive, but it is suggestive.

I'm wrong, that's just "eax+ebx-2". The *2 was just a brainfart on my
part.

But I think I have pinpointed where it comes from: it's the

struct dx_map_entry *map;

which is a structure like this:

struct dx_map_entry
{
u32 hash;
u16 offs;
u16 size;
};

and it does look like it's the

if (size + map[i].size/2 > blocksize/2)

calculation, where "i" counts backwards from "count-1" to 0.

In particular, the code

27: 8d 4c 18 fe lea 0xfffffffe(%eax,%ebx,1),%ecx
2b:* 8b 19 mov (%ecx),%ebx <-- trapping instruction
2d: 83 e9 08 sub $0x8,%ecx
30: 89 d8 mov %ebx,%eax
32: 66 d1 e8 shr %ax
38: 8d 04 02 lea (%edx,%eax,1),%eax

seems to be that "size + map[i].size/2" calculation, but I have a hard
time trying to line it up with wat _my_ compiler gives me. But the nearest
match I have is:

movw 6(%ecx), %bx # <variable>.size, D.21305
subl $8, %ecx #, ivtmp.921
movl -104(%ebp), %edx # blocksize, tmp179
movl %ebx, %eax # D.21305, tmp176
shrw %ax # tmp176
movzwl %ax, %eax # tmp176, tmp177
leal (%esi,%eax), %eax #, tmp178

which seems to be largely the same thing (except I have a "movw" to load
the size, and %ecx is offset by one 'map' entry - so the offset is 6 (in
the memop) instead of that "-2" (from the lea).

I think I'll give up, but that's the closest match I can find. No
guarantees, but it seems to support the notion of "wrong 32-bit load where
it should have used a 16-bit one".

Linus

2008-06-20 15:36:41

by Bill Nottingham

[permalink] [raw]

Subject: Re: kerneloops.org: 2.6.26-rc possible regression in ext3

Linus Torvalds ([email protected]) said:
> Or could this perhaps be an indication that it is specific to i686 some
> way (eg a compiler issue?)

Yes. https://bugzilla.redhat.com/show_bug.cgi?id=451068

Bill