2003-02-26 11:28:19

by Zilvinas Valinskas

[permalink] [raw]
Subject: kernel Ooops (2.5.63 bk latest)

Hello,

$ mount
/dev/ide/host0/bus0/target0/lun0/part5 on / type ext3
(rw,errors=remount-ro)
proc on /proc type proc (rw)
/dev/ide/host0/bus0/target0/lun0/part11 on /tmp type ext3 (rw)
/dev/ide/host0/bus0/target0/lun0/part6 on /boot type ext3 (rw)
/dev/ide/host0/bus0/target0/lun0/part8 on /usr type ext3 (rw)
/dev/ide/host0/bus0/target0/lun0/part9 on /home type ext3 (rw)
/dev/ide/host0/bus0/target0/lun0/part10 on /var type ext3 (rw)
usbdevfs on /proc/bus/usb type usbdevfs (rw)
automount(pid4452) on /mnt type autofs
(rw,fd=5,pgrp=4452,minproto=2,maxproto=4)
//192.168.2.123/exchange on /mnt/exchange type smbfs (rw)
//192.168.2.123/install on /mnt/install type smbfs (rw)

XMMS is stuck in D state.
/mnt/exchange and /mnt/install are mounted of windows 2k server.

the decoded oops and dmesg are attached below.


Attachments:
(No filename) (818.00 B)
feb26-dmesg (11.64 kB)
feb26-ksymoops.decoded (3.30 kB)
Download all attachments

2003-02-28 06:45:54

by Maneesh Soni

[permalink] [raw]
Subject: Re: kernel Ooops (2.5.63 bk latest)

Hi Linus,

The BUG was caught in d_validate() --> dget(). I think the
dentry to be validated can be already on LRU list with d_count
as zero. So, dget_locked should be used in place of dget().
dcache_rcu mistakingly used dget. This patch corrects it.

Please apply the following patch.

diff -urN linux-2.5.63-bk3/fs/dcache.c linux-2.5.63-bk3-d_validate/fs/dcache.c
--- linux-2.5.63-bk3/fs/dcache.c 2003-02-28 12:06:09.000000000 +0530
+++ linux-2.5.63-bk3-d_validate/fs/dcache.c 2003-02-28 12:16:30.000000000 +0530
@@ -1056,7 +1056,7 @@
* as it is parsed under dcache_lock
*/
if (dentry == list_entry(lhp, struct dentry, d_hash)) {
- dget(dentry);
+ __dget_locked(dentry);
spin_unlock(&dcache_lock);
return 1;
}


Regards,
Maneesh


--
Maneesh Soni
IBM Linux Technology Center,
IBM India Software Lab, Bangalore.
Phone: +91-80-5044999 email: [email protected]
http://lse.sourceforge.net/

2003-02-28 07:23:30

by Andrew Morton

[permalink] [raw]
Subject: Re: kernel Ooops (2.5.63 bk latest)

Maneesh Soni <[email protected]> wrote:
>
> Hi Linus,
>
> The BUG was caught in d_validate() --> dget(). I think the
> dentry to be validated can be already on LRU list with d_count
> as zero. So, dget_locked should be used in place of dget().
> dcache_rcu mistakingly used dget. This patch corrects it.
>
> Please apply the following patch.
>
> diff -urN linux-2.5.63-bk3/fs/dcache.c linux-2.5.63-bk3-d_validate/fs/dcache.c
> --- linux-2.5.63-bk3/fs/dcache.c 2003-02-28 12:06:09.000000000 +0530
> +++ linux-2.5.63-bk3-d_validate/fs/dcache.c 2003-02-28 12:16:30.000000000 +0530
> @@ -1056,7 +1056,7 @@
> * as it is parsed under dcache_lock
> */
> if (dentry == list_entry(lhp, struct dentry, d_hash)) {
> - dget(dentry);
> + __dget_locked(dentry);
> spin_unlock(&dcache_lock);
> return 1;

Is this correct? If smbfs is playing around with dentries which are on
dentry_unused and which have a zero refcount then these can be freed up at
any time. The filesystem should have taken a ref on the dentry to prevent it
from being scavenged.

Isn't the bug over in smb_fill_cache(), which does:

newdent = d_lookup(...);
...
ctl.cache->dentry[ctl.idx] = newdent;
...
dput(newdent);

I suspect we need to take an extra ref on the dentry when it is copied to the
cache, and put that ref back when smb_readdir() has finished using the dentry
(it looks like it's already doing that).

If so, the same problem is present in 2.4, but nobody noticed because 2.4 is
already using __dget_locked() and escapes the BUG check.

2003-02-28 08:52:23

by Maneesh Soni

[permalink] [raw]
Subject: Re: kernel Ooops (2.5.63 bk latest)

On Thu, Feb 27, 2003 at 11:34:34PM -0800, Andrew Morton wrote:
> Maneesh Soni <[email protected]> wrote:
> >
> > Hi Linus,
> >
> > The BUG was caught in d_validate() --> dget(). I think the
> > dentry to be validated can be already on LRU list with d_count
> > as zero. So, dget_locked should be used in place of dget().
> > dcache_rcu mistakingly used dget. This patch corrects it.
> >
> > Please apply the following patch.
> >
> > diff -urN linux-2.5.63-bk3/fs/dcache.c linux-2.5.63-bk3-d_validate/fs/dcache.c
> > --- linux-2.5.63-bk3/fs/dcache.c 2003-02-28 12:06:09.000000000 +0530
> > +++ linux-2.5.63-bk3-d_validate/fs/dcache.c 2003-02-28 12:16:30.000000000 +0530
> > @@ -1056,7 +1056,7 @@
> > * as it is parsed under dcache_lock
> > */
> > if (dentry == list_entry(lhp, struct dentry, d_hash)) {
> > - dget(dentry);
> > + __dget_locked(dentry);
> > spin_unlock(&dcache_lock);
> > return 1;
>
> Is this correct? If smbfs is playing around with dentries which are on
> dentry_unused and which have a zero refcount then these can be freed up at
> any time. The filesystem should have taken a ref on the dentry to prevent it
> from being scavenged.
> Isn't the bug over in smb_fill_cache(), which does:
>
> newdent = d_lookup(...);
> ...
> ctl.cache->dentry[ctl.idx] = newdent;
> ...
> dput(newdent);
>
> I suspect we need to take an extra ref on the dentry when it is copied to the
> cache, and put that ref back when smb_readdir() has finished using the dentry
> (it looks like it's already doing that).
>
> If so, the same problem is present in 2.4, but nobody noticed because 2.4 is
> already using __dget_locked() and escapes the BUG check.

ref is taken in d_validate, which is called before using the smbfs cached
dentry. It is this ref which is taken back in smb_readdir().

I am not sure when it is proper to take the extra ref that is either when
dentry is introduced in the smbfs cache or when it is read from the cache.

Maneesh

--
Maneesh Soni
IBM Linux Technology Center,
IBM India Software Lab, Bangalore.
Phone: +91-80-5044999 email: [email protected]
http://lse.sourceforge.net/

2003-02-28 09:28:34

by Andrew Morton

[permalink] [raw]
Subject: Re: kernel Ooops (2.5.63 bk latest)

Maneesh Soni <[email protected]> wrote:
>
> ref is taken in d_validate, which is called before using the smbfs cached
> dentry. It is this ref which is taken back in smb_readdir().

hmm, thinking about this a bit more...

It appears that d_validate's role in life is to take some random pointer
which the caller thinks might be a dentry and to verify that it is indeed
still a valid dentry (or a totally new dentry at the same address!). If
so, pin that dentry and tell the user.

Perhaps the dentry's address was received across the network, although it
doesn't look that way.

In which case passing in a might-be pointer to a zero-ref dentry is
appropriate and your patch is OK. One would need some understanding of ncpfs
and smbfs to say for sure.