2010-01-06 23:41:41

by Andrew Morton

[permalink] [raw]
Subject: Re: vfs related crash in 2.6.33-rc2

On Thu, 31 Dec 2009 05:59:32 +0900
OGAWA Hirofumi <[email protected]> wrote:

> Marvin <[email protected]> writes:
>
> >> Marvin <[email protected]> writes:
> >> > Hi,
> >> >
> >> > I'm getting a lot of these:
> >> >
> >> > kernel: general protection fault: 0000 [#1] SMP
> >> > kernel: last sysfs file: /sys/devices/pci0000:00/0000:00:18.3/modalias
> >> > kernel: CPU 0
> >> > kernel: Pid: 12177, comm: packagekitd Not tainted 2.6.33-rc2 #1
> >> > ...
> >> >
> >> > filesystem is ext4 (in case it matters).
> >>
> >> BTW, are you using nfs client on this machine?
> >>
> >
> > um - yes, now that I think about it... I killed a nfs umount process (because of an
> > offline server) shortly before the oopses started to fire.
>
> OK. Probably, this oops would be same with one which happened on my
> machine recently. That path in patch corrupts dcache hash, so it can be
> the cause of strange behavior or oops on dcache hash.
>
> If so, the attached patch would fix it.
>
> Thanks.
> --
> OGAWA Hirofumi <[email protected]>
>
>
> Recent change is missing to update "rehash". With that change, it will
> become the cause of adding dentry to hash twice.
>
> This explains the reason of Oops (dereference the freed dentry in
> __d_lookup()) on my machine.
>
> Signed-off-by: OGAWA Hirofumi <[email protected]>
> ---
>
> fs/nfs/dir.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff -puN fs/nfs/dir.c~nfs-d_rehash-fix fs/nfs/dir.c
> --- linux-2.6/fs/nfs/dir.c~nfs-d_rehash-fix 2009-12-28 06:18:09.000000000 +0900
> +++ linux-2.6-hirofumi/fs/nfs/dir.c 2009-12-28 06:18:16.000000000 +0900
> @@ -1615,6 +1615,7 @@ static int nfs_rename(struct inode *old_
> goto out;
>
> new_dentry = dentry;
> + rehash = NULL;
> new_inode = NULL;
> }
> }

Guys, what's the status of this fix? Did Marvin have a chance to test
it? Are the NFS developers aware of it?

Thanks.



2010-01-07 09:27:58

by Marc Dietrich

[permalink] [raw]
Subject: Re: vfs related crash in 2.6.33-rc2


Hi,

> On Wed, 2010-01-06 at 15:41 -0800, Andrew Morton wrote:
> > On Thu, 31 Dec 2009 05:59:32 +0900
> >
> > OGAWA Hirofumi <[email protected]> wrote:
> > > Marvin <[email protected]> writes:
> > > >> Marvin <[email protected]> writes:
> > > >> > Hi,
> > > >> >
> > > >> > I'm getting a lot of these:
> > > >> >
> > > >> > kernel: general protection fault: 0000 [#1] SMP
> > > >> > kernel: last sysfs file:
> > > >> > /sys/devices/pci0000:00/0000:00:18.3/modalias kernel: CPU 0
> > > >> > kernel: Pid: 12177, comm: packagekitd Not tainted 2.6.33-rc2 #1
> > > >> > ...
> > > >> >
> > > >> > filesystem is ext4 (in case it matters).
> > > >>
> > > >> BTW, are you using nfs client on this machine?
> > > >
> > > > um - yes, now that I think about it... I killed a nfs umount process
> > > > (because of an offline server) shortly before the oopses started to
> > > > fire.
> > >
> > > OK. Probably, this oops would be same with one which happened on my
> > > machine recently. That path in patch corrupts dcache hash, so it can be
> > > the cause of strange behavior or oops on dcache hash.
> > >
> > > If so, the attached patch would fix it.
> > >
> > > Thanks.
> >
> > Guys, what's the status of this fix? Did Marvin have a chance to test
> > it? Are the NFS developers aware of it?
> >
> > Thanks.
>
> Sorry for the delay. The above fix looks correct to me, but I too would
> like a confirmation that it fixes the Oops before I push it to Linus.
>
> In the meantime, I've committed it to my linux-next branch.

It seems that I send the reply to Hirofumi only, sorry for that. The patch works fine
- no oops anymore.

Thanks

Marvin

2010-01-07 13:45:51

by Trond Myklebust

[permalink] [raw]
Subject: Re: vfs related crash in 2.6.33-rc2

On Thu, 2010-01-07 at 10:27 +0100, Marvin wrote:
> It seems that I send the reply to Hirofumi only, sorry for that. The patch works fine
> - no oops anymore.

OK. Thanks for testing!

Trond


2010-01-06 23:56:08

by Trond Myklebust

[permalink] [raw]
Subject: Re: vfs related crash in 2.6.33-rc2

On Wed, 2010-01-06 at 15:41 -0800, Andrew Morton wrote:
> On Thu, 31 Dec 2009 05:59:32 +0900
> OGAWA Hirofumi <[email protected]> wrote:
>
> > Marvin <[email protected]> writes:
> >
> > >> Marvin <[email protected]> writes:
> > >> > Hi,
> > >> >
> > >> > I'm getting a lot of these:
> > >> >
> > >> > kernel: general protection fault: 0000 [#1] SMP
> > >> > kernel: last sysfs file: /sys/devices/pci0000:00/0000:00:18.3/modalias
> > >> > kernel: CPU 0
> > >> > kernel: Pid: 12177, comm: packagekitd Not tainted 2.6.33-rc2 #1
> > >> > ...
> > >> >
> > >> > filesystem is ext4 (in case it matters).
> > >>
> > >> BTW, are you using nfs client on this machine?
> > >>
> > >
> > > um - yes, now that I think about it... I killed a nfs umount process (because of an
> > > offline server) shortly before the oopses started to fire.
> >
> > OK. Probably, this oops would be same with one which happened on my
> > machine recently. That path in patch corrupts dcache hash, so it can be
> > the cause of strange behavior or oops on dcache hash.
> >
> > If so, the attached patch would fix it.
> >
> > Thanks.
> > --
> > OGAWA Hirofumi <[email protected]>
> >
> >
> > Recent change is missing to update "rehash". With that change, it will
> > become the cause of adding dentry to hash twice.
> >
> > This explains the reason of Oops (dereference the freed dentry in
> > __d_lookup()) on my machine.
> >
> > Signed-off-by: OGAWA Hirofumi <[email protected]>
> > ---
> >
> > fs/nfs/dir.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff -puN fs/nfs/dir.c~nfs-d_rehash-fix fs/nfs/dir.c
> > --- linux-2.6/fs/nfs/dir.c~nfs-d_rehash-fix 2009-12-28 06:18:09.000000000 +0900
> > +++ linux-2.6-hirofumi/fs/nfs/dir.c 2009-12-28 06:18:16.000000000 +0900
> > @@ -1615,6 +1615,7 @@ static int nfs_rename(struct inode *old_
> > goto out;
> >
> > new_dentry = dentry;
> > + rehash = NULL;
> > new_inode = NULL;
> > }
> > }
>
> Guys, what's the status of this fix? Did Marvin have a chance to test
> it? Are the NFS developers aware of it?
>
> Thanks.
>

Sorry for the delay. The above fix looks correct to me, but I too would
like a confirmation that it fixes the Oops before I push it to Linus.

In the meantime, I've committed it to my linux-next branch.

Cheers
Trond