2005-11-22 21:33:32

by Steven Rostedt

[permalink] [raw]
Subject: What protection does sysfs_readdir have with SMP/Preemption?

Hi,

I'm developing a custom kernel on top of Ingo's -rt patch. My kernel
makes race conditions in the vanilla kernel show up very well :-)

I just hit a bug, actually a page fault in fs/sysfs/dir.c in
sysfs_readdir:



for (p=q->next; p!= &parent_sd->s_children; p=p->next) {
struct sysfs_dirent *next;
const char * name;
int len;

next = list_entry(p, struct sysfs_dirent,
s_sibling);
if (!next->s_element)
continue;

name = sysfs_get_name(next);
len = strlen(name);
if (next->s_dentry)
ino = next->s_dentry->d_inode->i_ino;

^^^^
This is where I had a bad pointer reference.

else
ino = iunique(sysfs_sb, 2);

if (filldir(dirent, name, len, filp->f_pos, ino,
dt_type(next)) < 0)
return 0;


Looking at this code, I don't see anything protecting the s_dentry. For
example, couldn't the following happen:

sysfs_create_dir is called, which calls create_dir. Now we create a
dentry with no d_inode. In sysfs_make_dirent which calls
sysfs_new_dirent which adds to the parents s_children. Then
sysfs_make_dirent sets s_dentry = dentry (the one that was just made
with no d_inode assigned yet). Then create_dir calls sysfs_create which
finally assigns the d_inode.

So, either there is some hidden protection and my modification to the
kernel has caused this to bug, or we have just been lucky the whole time
in the vanilla kernel.

-- Steve



2005-11-22 21:41:45

by Greg KH

[permalink] [raw]
Subject: Re: What protection does sysfs_readdir have with SMP/Preemption?

On Tue, Nov 22, 2005 at 04:33:22PM -0500, Steven Rostedt wrote:
> Hi,
>
> I'm developing a custom kernel on top of Ingo's -rt patch. My kernel
> makes race conditions in the vanilla kernel show up very well :-)
>
> I just hit a bug, actually a page fault in fs/sysfs/dir.c in
> sysfs_readdir:
>
>
>
> for (p=q->next; p!= &parent_sd->s_children; p=p->next) {
> struct sysfs_dirent *next;
> const char * name;
> int len;
>
> next = list_entry(p, struct sysfs_dirent,
> s_sibling);
> if (!next->s_element)
> continue;
>
> name = sysfs_get_name(next);
> len = strlen(name);
> if (next->s_dentry)
> ino = next->s_dentry->d_inode->i_ino;
>
> ^^^^
> This is where I had a bad pointer reference.
>
> else
> ino = iunique(sysfs_sb, 2);
>
> if (filldir(dirent, name, len, filp->f_pos, ino,
> dt_type(next)) < 0)
> return 0;
>
>
> Looking at this code, I don't see anything protecting the s_dentry. For
> example, couldn't the following happen:
>
> sysfs_create_dir is called, which calls create_dir. Now we create a
> dentry with no d_inode. In sysfs_make_dirent which calls
> sysfs_new_dirent which adds to the parents s_children. Then
> sysfs_make_dirent sets s_dentry = dentry (the one that was just made
> with no d_inode assigned yet). Then create_dir calls sysfs_create which
> finally assigns the d_inode.
>
> So, either there is some hidden protection and my modification to the
> kernel has caused this to bug, or we have just been lucky the whole time
> in the vanilla kernel.

I think we've been lucky :(

Maneesh, any ideas?

thanks,

greg k-h

2005-11-23 04:53:24

by Maneesh Soni

[permalink] [raw]
Subject: Re: What protection does sysfs_readdir have with SMP/Preemption?

On Tue, Nov 22, 2005 at 01:39:47PM -0800, Greg KH wrote:
> On Tue, Nov 22, 2005 at 04:33:22PM -0500, Steven Rostedt wrote:
> > Hi,
> >
> > I'm developing a custom kernel on top of Ingo's -rt patch. My kernel
> > makes race conditions in the vanilla kernel show up very well :-)
> >
> > I just hit a bug, actually a page fault in fs/sysfs/dir.c in
> > sysfs_readdir:
> >
> >
> >
> > for (p=q->next; p!= &parent_sd->s_children; p=p->next) {
> > struct sysfs_dirent *next;
> > const char * name;
> > int len;
> >
> > next = list_entry(p, struct sysfs_dirent,
> > s_sibling);
> > if (!next->s_element)
> > continue;
> >
> > name = sysfs_get_name(next);
> > len = strlen(name);
> > if (next->s_dentry)
> > ino = next->s_dentry->d_inode->i_ino;
> >
> > ^^^^
> > This is where I had a bad pointer reference.
> >
> > else
> > ino = iunique(sysfs_sb, 2);
> >
> > if (filldir(dirent, name, len, filp->f_pos, ino,
> > dt_type(next)) < 0)
> > return 0;
> >
> >
> > Looking at this code, I don't see anything protecting the s_dentry. For
> > example, couldn't the following happen:
> >
> > sysfs_create_dir is called, which calls create_dir. Now we create a
> > dentry with no d_inode. In sysfs_make_dirent which calls
> > sysfs_new_dirent which adds to the parents s_children. Then
> > sysfs_make_dirent sets s_dentry = dentry (the one that was just made
> > with no d_inode assigned yet). Then create_dir calls sysfs_create which
> > finally assigns the d_inode.
> >
> > So, either there is some hidden protection and my modification to the
> > kernel has caused this to bug, or we have just been lucky the whole time
> > in the vanilla kernel.
>
> I think we've been lucky :(
>
> Maneesh, any ideas?
>

The dir operation sysfs_readdir() is called under directory inode's i_sem
taken in vfs_readdir() and create_dir() also takes parent directory inode's
i_sem. So in this case I think following are the relevant steps happening
which look safe to me.

cpu 0
vfs_readdir()
down(dir inode i_sem)
sysfs_readdir(dir)
parse through dir->s_dirent s_children list
up(dir inode i_sem)


cpu 1
sysfs_create_dir()
create_dir()
down(parent dir inode i_sem)
lookup_one_len (allocates & makes the new directory dentry visible)
sysfs_make_diret()
sysfs_new_dirent()
attach the new directory s_dirent to parent's s_children list)
up(parent dir inode i_sem)


Basically, sysfs_readdir for a directory is protected against any
addition/deletion in the directory by directory inode's i_sem.

But the bad pointer reference seen in sysfs_readdir() has to be debugged.
Assumption here is that if there is a dentry attached to s_dirent, there
has to be a inode associated becuase negative dentries are not created
in sysfs. Is it possible to get some more information about the recreation
scenario. Could you enable DEBUG printks for lib/kobject.c and
drivers/base/class.c to see the events happening.


Thanks
Maneesh

--
Maneesh Soni
Linux Technology Center,
IBM India Software Labs,
Bangalore, India
email: [email protected]
Phone: 91-80-25044990

2005-11-23 08:18:54

by Ingo Molnar

[permalink] [raw]
Subject: Re: What protection does sysfs_readdir have with SMP/Preemption?


* Maneesh Soni <[email protected]> wrote:

> But the bad pointer reference seen in sysfs_readdir() has to be
> debugged. Assumption here is that if there is a dentry attached to
> s_dirent, there has to be a inode associated becuase negative dentries
> are not created in sysfs. Is it possible to get some more information
> about the recreation scenario. Could you enable DEBUG printks for
> lib/kobject.c and drivers/base/class.c to see the events happening.

on a related note - i've been carrying the patch below in -rt for 2
months (i.e. Steven's kernel has it too), as a workaround against the
crash described below.

so it appears that the -rt kernel is triggering some genuine sysfs race.
[note that it only happens on an SMP kernel, booting an UP kernel or
with maxcpus=1 makes the bug go away.] I have done full kobject
debugging but no conclusive results. Also, that particular crash happens
earliest with PAGEALLOC enabled. [i have packed up the email discussion
related to that crash, and i'm sending it to Maneesh separately.
Maneesh, any ideas or suggestions?]

note that Steven has a dual-core Athlon64 X2 system. Steven, do you get
the crash even with maxcpus=1?

Ingo

-----
i'm occasionally getting the crash below on a PREEMPT_RT kernel. Might
be a PREEMPT_RT bug, or might be some sysfs race only visible under
PREEMPT_RT. Any ideas? The crash is at:

(gdb) list *0xc01a2095
0xc01a2095 is in sysfs_hash_and_remove (fs/sysfs/inode.c:229).
224 }
225
226 void sysfs_hash_and_remove(struct dentry * dir, const char * name)
227 {
228 struct sysfs_dirent * sd;
229 struct sysfs_dirent * parent_sd = dir->d_fsdata;
230
231 if (dir->d_inode == NULL)
232 /* no inode means this hasn't been made visible yet */
233 return;
(gdb)

[...]
Calling initcall 0xc05ba6e0: spi_transport_init+0x0/0x30()
Calling initcall 0xc05ba710: ahc_linux_init+0x0/0xf0()
ACPI: PCI Interrupt 0000:03:04.0[A] -> GSI 18 (level, low) -> IRQ 20
scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 7.0
<Adaptec aic7899 Ultra160 SCSI adapter>
aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs

Vendor: SEAGATE Model: ST336706LC Rev: 010A
Type: Direct-Access ANSI SCSI revision: 03
scsi0:A:0:0: Tagged Queuing enabled. Depth 32
target0:0:0: Beginning Domain Validation
target0:0:0: wide asynchronous.
target0:0:0: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 63)
target0:0:0: Ending Domain Validation
Vendor: SEAGATE Model: ST336706LC Rev: 010A
Type: Direct-Access ANSI SCSI revision: 03
scsi0:A:1:0: Tagged Queuing enabled. Depth 32
target0:0:1: Beginning Domain Validation
target0:0:1: wide asynchronous.
target0:0:1: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 63)
target0:0:1: Ending Domain Validation
BUG: Unable to handle kernel paging request at virtual address f6c47fc0
printing eip:
c01a2095
*pde = 006cc067
*pte = 36c47000
Oops: 0000 [#1]
PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in:
CPU: 1
EIP: 0060:[<c01a2095>] Not tainted VLI
EFLAGS: 00010282 (2.6.14-rc2-rt2)
EIP is at sysfs_hash_and_remove+0x15/0x110
eax: f6c47f2c ebx: f6c42f64 ecx: c013edb4 edx: f6c3e5b8
esi: f6c42f5c edi: c04fd880 ebp: c277ec88 esp: c277ec70
ds: 007b es: 007b ss: 0068 preempt: 00000001
Process swapper (pid: 1, threadinfo=c277e000 task=c277d900 stack_left=3132 worst_left=-1)
Stack: f6c4a3b8 f6c3e5b8 f6c47f2c f6c42f64 f6c42f5c c04fd880 c277ec90 c01a3ddb
c277ecb0 c02a8051 c04fd888 f6c3e5b8 c04fd7a0 f6c42f5c f7ad6c68 c04fd7a0
c277ecbc c02a9d52 00000000 c277ecd4 c02a9f88 f6c42f5c f7ad6c68 f79ac80c
Call Trace:
[<c010405a>] show_stack+0x7a/0x90 (32)
[<c010421e>] show_registers+0x18e/0x1f0 (56)
[<c0104420>] die+0x100/0x180 (68)
[<c0442ea8>] do_page_fault+0x368/0x556 (92)
[<c0103d0b>] error_code+0x4f/0x54 (84)
[<c01a3ddb>] sysfs_remove_link+0xb/0x10 (8)
[<c02a8051>] class_device_del+0xf1/0x100 (32)
[<c02a9d52>] attribute_container_class_device_del+0x12/0x20 (12)
[<c02a9f88>] transport_remove_classdev+0x38/0x70 (24)
[<c02a9bfd>] attribute_container_device_trigger+0x8d/0xc0 (40)
[<c02a9fcd>] transport_remove_device+0xd/0x10 (8)
[<c032f01b>] scsi_target_reap+0x9b/0xb0 (20)
[<c032feb4>] __scsi_scan_target+0x94/0x130 (44)
[<c0330068>] scsi_scan_channel+0x78/0x90 (32)
[<c0330109>] scsi_scan_host_selected+0x89/0xf0 (32)
[<c0330192>] scsi_scan_host+0x22/0x30 (16)
[<c0349a85>] ahc_linux_register_host+0x1b5/0x1c0 (132)
[<c034d53d>] ahc_linux_pci_dev_probe+0xed/0x140 (132)
[<c022a02d>] pci_call_probe+0xd/0x10 (12)
[<c022a081>] __pci_device_probe+0x51/0x60 (20)
[<c022a0b9>] pci_device_probe+0x29/0x60 (16)
[<c02a6d76>] driver_probe_device+0x36/0xb0 (36)
[<c02a6ebd>] __driver_attach+0x4d/0x70 (20)
[<c02a6419>] bus_for_each_dev+0x49/0x70 (40)
[<c02a6ef9>] driver_attach+0x19/0x20 (12)
[<c02a68e1>] bus_add_driver+0x81/0xd0 (36)
[<c02a72a1>] driver_register+0x51/0x60 (20)
[<c022a34b>] pci_register_driver+0x8b/0xa0 (16)
[<c034d59d>] ahc_linux_pci_init+0xd/0x20 (8)
[<c05ba799>] ahc_linux_init+0x89/0xf0 (24)
[<c059ba62>] do_initcalls+0x32/0xe0 (36)
[<c059bb35>] do_basic_setup+0x25/0x30 (8)
[<c01003de>] init+0xae/0x2d0 (24)
[<c0101359>] kernel_thread_helper+0x5/0xc (1032327196)
---------------------------
| preempt count: 00000001 ]
| 1-level deep critical section nesting:
----------------------------------------
.. [<c013fc41>] .... add_preempt_count+0x11/0x20
.....[<c01169a0>] .. ( <= try_to_wake_up+0x50/0x440)

------------------------------
| showing all locks held by: | (swapper/1 [c277d900, 116]):
------------------------------

#001: [c067618c] {(struct semaphore *)(&hwif->gendev_rel_sem)}
... acquired at: init_hwif_data+0x8d/0x180

#002: [c0676d0c] {(struct semaphore *)(&hwif->gendev_rel_sem)}
... acquired at: init_hwif_data+0x8d/0x180

#003: [c067788c] {(struct semaphore *)(&hwif->gendev_rel_sem)}
... acquired at: init_hwif_data+0x8d/0x180

#004: [c067840c] {(struct semaphore *)(&hwif->gendev_rel_sem)}
... acquired at: init_hwif_data+0x8d/0x180

#005: [c0678f8c] {(struct semaphore *)(&hwif->gendev_rel_sem)}
... acquired at: init_hwif_data+0x8d/0x180

#006: [c0679b0c] {(struct semaphore *)(&hwif->gendev_rel_sem)}
... acquired at: init_hwif_data+0x8d/0x180

#007: [c067a68c] {(struct semaphore *)(&hwif->gendev_rel_sem)}
... acquired at: init_hwif_data+0x8d/0x180

#008: [c067b20c] {(struct semaphore *)(&hwif->gendev_rel_sem)}
... acquired at: init_hwif_data+0x8d/0x180

#009: [c067bd8c] {(struct semaphore *)(&hwif->gendev_rel_sem)}
... acquired at: init_hwif_data+0x8d/0x180

#010: [c067c90c] {(struct semaphore *)(&hwif->gendev_rel_sem)}
... acquired at: init_hwif_data+0x8d/0x180

#011: [f79a7a00] {(struct semaphore *)(&dev->sem)}
... acquired at: __driver_attach+0x22/0x70

#012: [f7aee8ac] {(struct semaphore *)(&shost->scan_mutex)}
... acquired at: scsi_scan_host_selected+0x56/0xf0

#013: [c04dc604] {kernel_sem.lock}
... acquired at: __reacquire_kernel_lock+0x33/0x70

#014: [c04eed24] {attribute_container_mutex.lock}
... acquired at: attribute_container_device_trigger+0x18/0xc0

Code: 8b 7c 24 08 89 ec 5d c3 8d b4 26 00 00 00 00 8d bc 27 00 00 00 00 55 89 e5 83 ec 18 89 5d f4 89 75 f8 89 7d fc 89 45 f0 89 55 ec <8b> b0 94 00 00 00 8b 40 4c 85 c0 75 0e 8b 5d f4 8b 75 f8 8b 7d
<0>Kernel panic - not syncing: Attempted to kill init!


Signed-off-by: Ingo Molnar <[email protected]>

drivers/base/class.c | 4 ++++
1 files changed, 4 insertions(+)

Index: linux/drivers/base/class.c
===================================================================
--- linux.orig/drivers/base/class.c
+++ linux/drivers/base/class.c
@@ -520,8 +520,10 @@ int class_device_add(struct class_device
class_name = make_class_name(class_dev);
sysfs_create_link(&class_dev->kobj,
&class_dev->dev->kobj, "device");
+ /*
sysfs_create_link(&class_dev->dev->kobj, &class_dev->kobj,
class_name);
+ */
}

/* notify any interfaces this device is now here */
@@ -618,7 +620,9 @@ void class_device_del(struct class_devic
if (class_dev->dev) {
class_name = make_class_name(class_dev);
sysfs_remove_link(&class_dev->kobj, "device");
+ /*
sysfs_remove_link(&class_dev->dev->kobj, class_name);
+ */
}
if (class_dev->devt_attr)
class_device_remove_file(class_dev, class_dev->devt_attr);

2005-11-23 12:35:26

by Steven Rostedt

[permalink] [raw]
Subject: Re: What protection does sysfs_readdir have with SMP/Preemption?


On Wed, 23 Nov 2005, Ingo Molnar wrote:

>
> note that Steven has a dual-core Athlon64 X2 system. Steven, do you get
> the crash even with maxcpus=1?
>

Actually Ingo, this happened on my UP test machine, a 368MHz Pentium.

But unfortunately, it so far only happened once, and I've been trying to
recreate it, with no success. The test that crashed it was running 10
tasks that would read the entire filesystem. I was debugging another bug
(something specific to my kernel, or maybe -rt) when I hit this bug.
Looking at it, it seemed to not be related to the changes I made. Perhaps
it could be related to your changes?

-- Steve

PS. I only got my AMD64 x2 to debug your kernel ;-) I have no requirement
to have my kernel run on it. I just needed a faster machine, also to move
off my SMP box to make that a test box too. Since I saw lots of people
having problems with -rt and AMD64 I chose to get that one.

2005-11-23 12:53:25

by Maneesh Soni

[permalink] [raw]
Subject: Re: What protection does sysfs_readdir have with SMP/Preemption?

On Wed, Nov 23, 2005 at 09:18:45AM +0100, Ingo Molnar wrote:
>
> * Maneesh Soni <[email protected]> wrote:
>
> > But the bad pointer reference seen in sysfs_readdir() has to be
> > debugged. Assumption here is that if there is a dentry attached to
> > s_dirent, there has to be a inode associated becuase negative dentries
> > are not created in sysfs. Is it possible to get some more information
> > about the recreation scenario. Could you enable DEBUG printks for
> > lib/kobject.c and drivers/base/class.c to see the events happening.
>
> on a related note - i've been carrying the patch below in -rt for 2
> months (i.e. Steven's kernel has it too), as a workaround against the
> crash described below.

[ replied in the separate thread ]

> so it appears that the -rt kernel is triggering some genuine sysfs race.
> [note that it only happens on an SMP kernel, booting an UP kernel or
> with maxcpus=1 makes the bug go away.] I have done full kobject
> debugging but no conclusive results. Also, that particular crash happens
> earliest with PAGEALLOC enabled. [i have packed up the email discussion
> related to that crash, and i'm sending it to Maneesh separately.
> Maneesh, any ideas or suggestions?]

Still waiting for that mail to show up. Looks like this discussion is not
on lkml.

The kdobject or driver core debugging messages can possibly narrow the problem
down to some particular sysfs user like some driver or module and throw some
light on how the sysfs calls are being made.

> note that Steven has a dual-core Athlon64 X2 system. Steven, do you get
> the crash even with maxcpus=1?
>
> Ingo
>

--
Maneesh Soni
Linux Technology Center,
IBM India Software Labs,
Bangalore, India
email: [email protected]
Phone: 91-80-25044990

2005-11-23 12:55:04

by Maneesh Soni

[permalink] [raw]
Subject: [OOPS] sysfs_hash_and_remove (was Re: What protection ....)

On Wed, Nov 23, 2005 at 09:18:45AM +0100, Ingo Molnar wrote:
>
[..]
> on a related note - i've been carrying the patch below in -rt for 2
> months (i.e. Steven's kernel has it too), as a workaround against the
> crash described below.
>
[..]

> i'm occasionally getting the crash below on a PREEMPT_RT kernel. Might
> be a PREEMPT_RT bug, or might be some sysfs race only visible under
> PREEMPT_RT. Any ideas? The crash is at:
>
> (gdb) list *0xc01a2095
> 0xc01a2095 is in sysfs_hash_and_remove (fs/sysfs/inode.c:229).
> 224 }
> 225
> 226 void sysfs_hash_and_remove(struct dentry * dir, const char * name)
> 227 {
> 228 struct sysfs_dirent * sd;
> 229 struct sysfs_dirent * parent_sd = dir->d_fsdata;
> 230
> 231 if (dir->d_inode == NULL)
> 232 /* no inode means this hasn't been made visible yet */
> 233 return;
> (gdb)
>
Looks like here it is crashing due to bogus dentry pointer in the kobject
kobj->dentry. Could be some stale pointer?

Just got the mbox.. will go thru it more closely..

--
Maneesh Soni
Linux Technology Center,
IBM India Software Labs,
Bangalore, India
email: [email protected]
Phone: 91-80-25044990

2005-11-23 12:56:52

by Steven Rostedt

[permalink] [raw]
Subject: Re: What protection does sysfs_readdir have with SMP/Preemption?

On Wed, 23 Nov 2005, Maneesh Soni wrote:
>
> The dir operation sysfs_readdir() is called under directory inode's i_sem
> taken in vfs_readdir() and create_dir() also takes parent directory inode's
> i_sem. So in this case I think following are the relevant steps happening
> which look safe to me.
>
> cpu 0
> vfs_readdir()
> down(dir inode i_sem)
> sysfs_readdir(dir)
> parse through dir->s_dirent s_children list
> up(dir inode i_sem)
>
>
> cpu 1
> sysfs_create_dir()
> create_dir()
> down(parent dir inode i_sem)
> lookup_one_len (allocates & makes the new directory dentry visible)
> sysfs_make_diret()
> sysfs_new_dirent()
> attach the new directory s_dirent to parent's s_children list)
> up(parent dir inode i_sem)
>
>
> Basically, sysfs_readdir for a directory is protected against any
> addition/deletion in the directory by directory inode's i_sem.

OK, I missed that, thanks.

>
> But the bad pointer reference seen in sysfs_readdir() has to be debugged.
> Assumption here is that if there is a dentry attached to s_dirent, there
> has to be a inode associated becuase negative dentries are not created
> in sysfs. Is it possible to get some more information about the recreation
> scenario. Could you enable DEBUG printks for lib/kobject.c and
> drivers/base/class.c to see the events happening.

The bug that I've been fighting in my own kernel is a memory leak. So I
started looking into this at what would happen in verious places if an
allocation didn't work.

In create_dir:

error = sysfs_make_dirent(p->d_fsdata, *d, k, mode, SYSFS_DIR);
// Above is where the entry is added to the parent link list.

if (!error) {
error = sysfs_create(*d, mode, init_dir);
// If sysfs_create fails to allocate an inode, when below
// does the element get removed from the parent?
if (!error) {
p->d_inode->i_nlink++;
(*d)->d_op = &sysfs_dentry_ops;
d_rehash(*d);
}
}
if (error && (error != -EEXIST)) {
sysfs_put((*d)->d_fsdata);
// sysfs_put only seems to handle the kobject portion

d_drop(*d);
// d_drop handles the unhash
}
dput(*d);

So I'm not sure an error from sysfs_create will remove the object from the
link list. In fact, it might be worst since now there's an object on the
link list that may no long even be an object.

I'll test this by forcing a failure at sysfs_create.

-- Steve


2005-11-23 12:57:18

by Maneesh Soni

[permalink] [raw]
Subject: Re: What protection does sysfs_readdir have with SMP/Preemption?

On Wed, Nov 23, 2005 at 07:35:04AM -0500, Steven Rostedt wrote:
>
> On Wed, 23 Nov 2005, Ingo Molnar wrote:
>
> >
> > note that Steven has a dual-core Athlon64 X2 system. Steven, do you get
> > the crash even with maxcpus=1?
> >
>
> Actually Ingo, this happened on my UP test machine, a 368MHz Pentium.
>
> But unfortunately, it so far only happened once, and I've been trying to
> recreate it, with no success. The test that crashed it was running 10
> tasks that would read the entire filesystem. I was debugging another bug
> (something specific to my kernel, or maybe -rt) when I hit this bug.
> Looking at it, it seemed to not be related to the changes I made. Perhaps
> it could be related to your changes?
>


I normally test the sysfs races by running these two loops simultenously
on a SMP box. Basically running these will create/delete sysfs files and
directories and also do readdir.

while true; do insmod drivers/net/dummy.ko; rmmod dummy; done
while true; do find /sys/class/net/dummy0/ | xargs cat > /dev/null; done


--
Maneesh Soni
Linux Technology Center,
IBM India Software Labs,
Bangalore, India
email: [email protected]
Phone: 91-80-25044990

2005-11-23 14:01:23

by Maneesh Soni

[permalink] [raw]
Subject: Re: What protection does sysfs_readdir have with SMP/Preemption?

On Wed, Nov 23, 2005 at 07:56:39AM -0500, Steven Rostedt wrote:
[..]
>
> The bug that I've been fighting in my own kernel is a memory leak. So I
> started looking into this at what would happen in verious places if an
> allocation didn't work.
>
> In create_dir:
>
> error = sysfs_make_dirent(p->d_fsdata, *d, k, mode, SYSFS_DIR);
> // Above is where the entry is added to the parent link list.
>
> if (!error) {
> error = sysfs_create(*d, mode, init_dir);
> // If sysfs_create fails to allocate an inode, when below
> // does the element get removed from the parent?
> if (!error) {
> p->d_inode->i_nlink++;
> (*d)->d_op = &sysfs_dentry_ops;
> d_rehash(*d);
> }
> }
> if (error && (error != -EEXIST)) {
> sysfs_put((*d)->d_fsdata);
> // sysfs_put only seems to handle the kobject portion
>
> d_drop(*d);
> // d_drop handles the unhash
> }
> dput(*d);
>
> So I'm not sure an error from sysfs_create will remove the object from the
> link list. In fact, it might be worst since now there's an object on the
> link list that may no long even be an object.
>
> I'll test this by forcing a failure at sysfs_create.
>

hmm looks like we got some situation which is not desirable and could lead
to bogus sysfs_dirent in the parent list. It may not be the exact problem
in this case though, but needs fixing IMO.

After sysfs_make_dirent(), the ref count for sysfs dirent will be 2.
(one from allocation, and after linking the new dentry to it). On
error from sysfs_create(), we do sysfs_put() once, decrementing the
ref count to 1. And again when the new dentry for which we couldn't
allocate the d_inode, is d_drop()'ed. In sysfs_d_iput() we again
sysfs_put(), and decrement the sysfs dirent's ref count to 0, which will
be the final sysfs_put(), and it will free the sysfs_dirent but never
unlinks it from the parent list. So, parent list could still will having
links to the freed sysfs_dirent in its s_children list.

so basically list_del_init(&sd->s_sibling) should be done in error path
in create_dir().

Could you also put the appended patch in your trial runs..


Thanks
Maneesh





o Following patch corrects the buggy create_dir() error path. This bug could
end up in having a bogus sysfs_dirent in the parent list. Now the newly
allocated and linked sysfs_dirent is also un-linked in case of error
resulting from sysfs_create()

o Many thanks to Steven Rostedt and Ingo Molnar for pointing this out.

Signed-off-by: Maneesh Soni <[email protected]>
---

linux-2.6.15-rc2-mm1-maneesh/fs/sysfs/dir.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletion(-)

diff -puN fs/sysfs/dir.c~fix-create_dir-error-path fs/sysfs/dir.c
--- linux-2.6.15-rc2-mm1/fs/sysfs/dir.c~fix-create_dir-error-path 2005-11-23 18:59:36.072449992 +0530
+++ linux-2.6.15-rc2-mm1-maneesh/fs/sysfs/dir.c 2005-11-23 19:07:53.475833184 +0530
@@ -112,7 +112,9 @@ static int create_dir(struct kobject * k
}
}
if (error && (error != -EEXIST)) {
- sysfs_put((*d)->d_fsdata);
+ struct sysfs_dirent *sd = (*d)->d_fsdata;
+ list_del_init(&sd->s_sibling);
+ sysfs_put(sd);
d_drop(*d);
}
dput(*d);
_


--
Maneesh Soni
Linux Technology Center,
IBM India Software Labs,
Bangalore, India
email: [email protected]
Phone: 91-80-25044990

2005-11-23 14:16:01

by Steven Rostedt

[permalink] [raw]
Subject: Re: What protection does sysfs_readdir have with SMP/Preemption?

On Wed, 2005-11-23 at 19:28 +0530, Maneesh Soni wrote:

>
> hmm looks like we got some situation which is not desirable and could lead
> to bogus sysfs_dirent in the parent list. It may not be the exact problem
> in this case though, but needs fixing IMO.
>
> After sysfs_make_dirent(), the ref count for sysfs dirent will be 2.
> (one from allocation, and after linking the new dentry to it). On
> error from sysfs_create(), we do sysfs_put() once, decrementing the
> ref count to 1. And again when the new dentry for which we couldn't
> allocate the d_inode, is d_drop()'ed. In sysfs_d_iput() we again
> sysfs_put(), and decrement the sysfs dirent's ref count to 0, which will
> be the final sysfs_put(), and it will free the sysfs_dirent but never
> unlinks it from the parent list. So, parent list could still will having
> links to the freed sysfs_dirent in its s_children list.
>
> so basically list_del_init(&sd->s_sibling) should be done in error path
> in create_dir().
>
> Could you also put the appended patch in your trial runs..
>

I'm already playing around with this. You might want this patch instead.
I noticed that if sysfs_make_dirent fails to allocate the sd, then a
null will be passed to sysfs_put.

But this is not the end of the problems. I'll follow up on that comment
right after this.

-- Steve

Signed-off-by: Steven Rostedt <[email protected]>

Index: linux-2.6.15-rc2-git2/fs/sysfs/dir.c
===================================================================
--- linux-2.6.15-rc2-git2.orig/fs/sysfs/dir.c 2005-11-23 08:40:33.000000000 -0500
+++ linux-2.6.15-rc2-git2/fs/sysfs/dir.c 2005-11-23 08:52:57.000000000 -0500
@@ -112,7 +112,11 @@
}
}
if (error && (error != -EEXIST)) {
- sysfs_put((*d)->d_fsdata);
+ struct sysfs_dirent *sd = (*d)->d_fsdata;
+ if (sd) {
+ list_del_init(&sd->s_sibling);
+ sysfs_put(sd);
+ }
d_drop(*d);
}
dput(*d);


2005-11-23 14:21:13

by Steven Rostedt

[permalink] [raw]
Subject: Re: What protection does sysfs_readdir have with SMP/Preemption?

On Wed, 2005-11-23 at 09:15 -0500, Steven Rostedt wrote:

>
> But this is not the end of the problems. I'll follow up on that comment
> right after this.

Here's what I mean. I'm using the below patch to see what happens on
the error cases, and things are still bombing.

The patch returns a failure after 30 calls. Even with my previous patch,
I'm crashing.

PCI: Probing PCI hardware (bus 00)
ACPI: Assume root bridge [\_SB_.PCI0] bus is 0
kobject_register failed for CHN1 (-12)
[<c01041ee>] dump_stack+0x1e/0x20
[<c02206ab>] kobject_register+0x6b/0x80
[<c0255e33>] acpi_device_register+0x105/0x11b
[<c0256b8e>] acpi_add_single_object+0xf6/0x146
[<c0256cee>] acpi_bus_scan+0x110/0x17b
[<c03d6297>] acpi_scan_init+0x6b/0x89
[<c03be9e7>] do_initcalls+0x57/0xd0
[<c03bea85>] do_basic_setup+0x25/0x30
[<c01002e5>] init+0x35/0x170
[<c01013e5>] kernel_thread_helper+0x5/0x10
Unable to handle kernel NULL pointer dereference at virtual address 00000008
printing eip:
c01ab6ef
*pde = 00000000
Oops: 0000 [#1]
PREEMPT
Modules linked in:
CPU: 0
EIP: 0060:[<c01ab6ef>] Not tainted VLI
EFLAGS: 00010296 (2.6.15-rc2-git2)
EIP is at create_dir+0xf/0x250
eax: 00000000 ebx: cfe92a0c ecx: cfe8b224 edx: 00000000
esi: cfe92a08 edi: cfe92e08 ebp: cffc1ec0 esp: cffc1e9c
ds: 007b es: 007b ss: 0068
Process swapper (pid: 1, threadinfo=cffc0000 task=c127aa50)
Stack: 00000000 00000010 cffc1eb8 c023b56b cfff2a40 000000d0 cfe92a08 cfe92a08
cfe92e08 cffc1ee0 c01ab998 cfe92a08 00000000 cfe92a0c cffc1ed8 00000000
00000000 cffc1ef4 c022035f cfe92a08 cfe92a08 fffffffe cffc1f0c c02205db
Call Trace:
[<c010418b>] show_stack+0xab/0xf0
[<c010437f>] show_registers+0x18f/0x230
[<c01045bd>] die+0xed/0x190
[<c032731a>] do_page_fault+0x33a/0x670
[<c0103df7>] error_code+0x4f/0x54
[<c01ab998>] sysfs_create_dir+0x38/0x80
[<c022035f>] create_dir+0x1f/0x60
[<c02205db>] kobject_add+0x8b/0xf0
[<c0220668>] kobject_register+0x28/0x80
[<c0255e33>] acpi_device_register+0x105/0x11b
[<c0256b8e>] acpi_add_single_object+0xf6/0x146
[<c0256cee>] acpi_bus_scan+0x110/0x17b
[<c03d6297>] acpi_scan_init+0x6b/0x89
[<c03be9e7>] do_initcalls+0x57/0xd0
[<c03bea85>] do_basic_setup+0x25/0x30
[<c01002e5>] init+0x35/0x170
[<c01013e5>] kernel_thread_helper+0x5/0x10
Code: 37 c0 89 e5 8b 45 08 89 88 8c 00 00 00 31 c0 5d c3 8d 74 26 00 8d bc 27 00 00 00 00 55 89 e5 57 56 53 83 ec 18 8b 45 0c 8b 5d 10 <8b> 50 08 ff 4a 70 0f 88 8c 0e 00 00 31 c0 b9 ff ff ff ff 89 df
<0>Kernel panic - not syncing: Attempted to kill init!


I'm still looking into this, to see who can't handle the -ENOMEM.

-- Steve

Index: linux-2.6.15-rc2-git2/fs/sysfs/dir.c
===================================================================
--- linux-2.6.15-rc2-git2.orig/fs/sysfs/dir.c 2005-10-27 20:02:08.000000000 -0400
+++ linux-2.6.15-rc2-git2/fs/sysfs/dir.c 2005-11-23 08:04:42.000000000 -0500
@@ -98,13 +98,17 @@
{
int error;
umode_t mode = S_IFDIR| S_IRWXU | S_IRUGO | S_IXUGO;
+ static int breakme = 0;

down(&p->d_inode->i_sem);
*d = lookup_one_len(n, p, strlen(n));
if (!IS_ERR(*d)) {
error = sysfs_make_dirent(p->d_fsdata, *d, k, mode, SYSFS_DIR);
if (!error) {
- error = sysfs_create(*d, mode, init_dir);
+ if ((++breakme % 30))
+ error = sysfs_create(*d, mode, init_dir);
+ else
+ error = -ENOMEM;
if (!error) {
p->d_inode->i_nlink++;
(*d)->d_op = &sysfs_dentry_ops;



2005-11-23 15:24:57

by Steven Rostedt

[permalink] [raw]
Subject: kobject_register needs return value checks (was: What protection does sysfs_readdir have with SMP/Preemption?)

I'm doing some tests to see what happens on a failure of setting up
something in the sysfs, and I've discovered a few areas that don't test
the return value of kobject_register. The test was due to a memory
problem in a custom kernel that showed that sysfs didn't quite handle
the error cases well.

I did the following command:

find . -name "*.c" ! -type d | xargs grep "kobject_register"

I found 27 hits. Of these:

14 - checked the return value.
1 - reference in .mod.c file /* ignore it */
3 - in comments /* ignore it */
1 - declaration of actual function /* ignore it */
1 - EXPORT_SYMBOL /* ignore it */
1 - inside a printk quote. /* ignore it */

and ...

6 - calls without checking return values.

Here are the culprits:

./drivers/acpi/scan.c: kobject_register(&device->kobj);
./drivers/firmware/efivars.c: kobject_register(&new_efivar->kobj);
./drivers/md/md.c: kobject_register(&mddev->kobj);
./drivers/parisc/pdc_stable.c: kobject_register(&entry->kobj);
./fs/partitions/check.c: kobject_register(&p->kobj);
./kernel/params.c: kobject_register(&mk->kobj);


Normally, these would not return errors, but in case they do, the kernel
should be robust enough to handle it.

I tried to CC all the maintainers of the above files.

-- Steve


2005-11-24 04:18:43

by Maneesh Soni

[permalink] [raw]
Subject: Re: What protection does sysfs_readdir have with SMP/Preemption?

On Wed, Nov 23, 2005 at 09:15:44AM -0500, Steven Rostedt wrote:
> On Wed, 2005-11-23 at 19:28 +0530, Maneesh Soni wrote:
>
> >
> > hmm looks like we got some situation which is not desirable and could lead
> > to bogus sysfs_dirent in the parent list. It may not be the exact problem
> > in this case though, but needs fixing IMO.
> >
> > After sysfs_make_dirent(), the ref count for sysfs dirent will be 2.
> > (one from allocation, and after linking the new dentry to it). On
> > error from sysfs_create(), we do sysfs_put() once, decrementing the
> > ref count to 1. And again when the new dentry for which we couldn't
> > allocate the d_inode, is d_drop()'ed. In sysfs_d_iput() we again
> > sysfs_put(), and decrement the sysfs dirent's ref count to 0, which will
> > be the final sysfs_put(), and it will free the sysfs_dirent but never
> > unlinks it from the parent list. So, parent list could still will having
> > links to the freed sysfs_dirent in its s_children list.
> >
> > so basically list_del_init(&sd->s_sibling) should be done in error path
> > in create_dir().
> >
> > Could you also put the appended patch in your trial runs..
> >
>
> I'm already playing around with this. You might want this patch instead.
> I noticed that if sysfs_make_dirent fails to allocate the sd, then a
> null will be passed to sysfs_put.
>
> But this is not the end of the problems. I'll follow up on that comment
> right after this.
>
> -- Steve
>
> Signed-off-by: Steven Rostedt <[email protected]>
>
> Index: linux-2.6.15-rc2-git2/fs/sysfs/dir.c
> ===================================================================
> --- linux-2.6.15-rc2-git2.orig/fs/sysfs/dir.c 2005-11-23 08:40:33.000000000 -0500
> +++ linux-2.6.15-rc2-git2/fs/sysfs/dir.c 2005-11-23 08:52:57.000000000 -0500
> @@ -112,7 +112,11 @@
> }
> }
> if (error && (error != -EEXIST)) {
> - sysfs_put((*d)->d_fsdata);
> + struct sysfs_dirent *sd = (*d)->d_fsdata;
> + if (sd) {
> + list_del_init(&sd->s_sibling);
> + sysfs_put(sd);
> + }
> d_drop(*d);
> }
> dput(*d);
>
>

Agreed. This makes more sense.


Thanks
Maneesh


--
Maneesh Soni
Linux Technology Center,
IBM India Software Labs,
Bangalore, India
email: [email protected]
Phone: 91-80-25044990

2005-11-24 12:28:57

by Maneesh Soni

[permalink] [raw]
Subject: Re: [OOPS] sysfs_hash_and_remove (was Re: What protection ....)

On Wed, Nov 23, 2005 at 06:22:13PM +0530, Maneesh Soni wrote:
> On Wed, Nov 23, 2005 at 09:18:45AM +0100, Ingo Molnar wrote:
> >
> [..]
> > on a related note - i've been carrying the patch below in -rt for 2
> > months (i.e. Steven's kernel has it too), as a workaround against the
> > crash described below.
> >
> [..]
>
> > i'm occasionally getting the crash below on a PREEMPT_RT kernel. Might
> > be a PREEMPT_RT bug, or might be some sysfs race only visible under
> > PREEMPT_RT. Any ideas? The crash is at:
> >
> > (gdb) list *0xc01a2095
> > 0xc01a2095 is in sysfs_hash_and_remove (fs/sysfs/inode.c:229).
> > 224 }
> > 225
> > 226 void sysfs_hash_and_remove(struct dentry * dir, const char * name)
> > 227 {
> > 228 struct sysfs_dirent * sd;
> > 229 struct sysfs_dirent * parent_sd = dir->d_fsdata;
> > 230
> > 231 if (dir->d_inode == NULL)
> > 232 /* no inode means this hasn't been made visible yet */
> > 233 return;
> > (gdb)
> >
> Looks like here it is crashing due to bogus dentry pointer in the kobject
> kobj->dentry. Could be some stale pointer?
>

[From the mbox sent by Ingo]

On Sat, Sep 24, 2005 at 09:06:27AM -0500, James Bottomley wrote:
> On Sat, 2005-09-24 at 14:36 +0200, Ingo Molnar wrote:
> > so to me this seems like that in certain cases the starget->dev gets
> > released once too often? Full log attached.
>
> But that's not what your traces show. The trace shows the target0:0:2
> object going through the last two phases of its lifecycle; remove and
> then release (which your trace doesn't show it as having got to).
>
> The trace also seems to show that the object that is already gone is the
> dentry.
>
> So, based on this, I think it's a dev<->classdev linkage problem. This
> is what I think it is:
>
> Previously, classdevs being interfaces on devs used to (2.6.13 and
> before) simply point to their various devs via links. The remove (not
> release) order was always dev then classdevs. However, remove is making
> the object invisible, so this is actually physically removing pieces of
> the sysfs infrastructure. The problem seems to be that the dev sysfs
> directory is removed first, followed by the classdevs sysfs dir. Even
> though the classdevs sysfs dir contains symlinks into dev, this is fine
> because symlinks don't ping the dentries of targets. However, post

sysfs symlinks do pin dentry though not directly. The target kobject is
pinned (kobject_get) when a symlink pointing to it is created. And as
long as the target kobject is alive, the corresponding dentry will remain.
The target kobject is un-pinned when the symlink is released.

> 2.6.13 greg introduced a backlink from dev->classdev so you can tell
> what interfaces exist on a device. In this case, that's the
> spi_transport:target0:0:2 dentry in the dev dir. However, the dev dir
> removal appears to have released this, and we just don't notice.
>
> To test the theory, try this patch. However, I have no clue what to do
> about this if it truly is the problem ... we have bidirectional cross
> object links, so remove order can't be altered to fix it. It looks like
> we need to understand why we have to remove the links in the first
> place ... if the kobject dir removal does that for us, then they're
> unnecessary.

yes, sysfs_remove_directory() will remove the contents (including symlinks)
first and then remove the directory. But there is some complication in case
of bidirectional links. Lets say we have

[maneesh@bigbang ingo]$ ls -la a b
a:
total 8
drwxrwxr-x 2 maneesh maneesh 4096 Nov 23 23:13 .
drwxrwxr-x 4 maneesh maneesh 4096 Nov 23 23:12 ..
lrwxrwxrwx 1 maneesh maneesh 4 Nov 23 23:13 link-to-b-in-a -> ../b

b:
total 8
drwxrwxr-x 2 maneesh maneesh 4096 Nov 23 23:13 .
drwxrwxr-x 4 maneesh maneesh 4096 Nov 23 23:12 ..
lrwxrwxrwx 1 maneesh maneesh 4 Nov 23 23:13 link-to-a-in-b -> ../a

so when "link-to-b-in-a" is created a ref is taken for "b" and vice versa.

Assuming kobject "a" is deleted first (without removing the symlink
"link-to-b-in-a"). Because of extra ref taken while creating "link-to-a-in-b",
kobject "a" is not removed and hence sysfs_remove_dir() is also not called.

So, IMO, it is necessary to explicitly remove links before unregistering
the kobject in case of bidirectional cross symlinks.

The patch from James, is working, because it is not creating the cross
symlink itself.



> James
>
> diff --git a/drivers/base/class.c b/drivers/base/class.c
> --- a/drivers/base/class.c
> +++ b/drivers/base/class.c
> @@ -520,8 +520,10 @@ int class_device_add(struct class_device
> class_name = make_class_name(class_dev);
> sysfs_create_link(&class_dev->kobj,
> &class_dev->dev->kobj, "device");
> + /*
> sysfs_create_link(&class_dev->dev->kobj, &class_dev->kobj,
> class_name);
> + */
> }
>
> /* notify any interfaces this device is now here */
> @@ -618,7 +620,9 @@ void class_device_del(struct class_devic
> if (class_dev->dev) {
> class_name = make_class_name(class_dev);
> sysfs_remove_link(&class_dev->kobj, "device");
> + /*
> sysfs_remove_link(&class_dev->dev->kobj, class_name);
> + */
> }
> if (class_dev->devt_attr)
> class_device_remove_file(class_dev, class_dev->devt_attr);
>


--
Maneesh Soni
Linux Technology Center,
IBM India Software Labs,
Bangalore, India
email: [email protected]
Phone: 91-80-25044990

2005-11-24 14:32:38

by Ingo Molnar

[permalink] [raw]
Subject: Re: What protection does sysfs_readdir have with SMP/Preemption?


* Maneesh Soni <[email protected]> wrote:

> > I'm already playing around with this. You might want this patch instead.
> > I noticed that if sysfs_make_dirent fails to allocate the sd, then a
> > null will be passed to sysfs_put.

> Agreed. This makes more sense.

ok, i've applied Steven's patch.

Ingo

2005-11-24 14:34:52

by Ingo Molnar

[permalink] [raw]
Subject: Re: [OOPS] sysfs_hash_and_remove (was Re: What protection ....)


* Maneesh Soni <[email protected]> wrote:

> So, IMO, it is necessary to explicitly remove links before
> unregistering the kobject in case of bidirectional cross symlinks.
>
> The patch from James, is working, because it is not creating the cross
> symlink itself.

so, what is your suggestion, what should be done to fix the problem? The
patch below:

> > + /*
> > sysfs_create_link(&class_dev->dev->kobj, &class_dev->kobj,
> > class_name);
> > + */

> > + /*
> > sysfs_remove_link(&class_dev->dev->kobj, class_name);
> > + */

isnt fit for upstream inclusion :-)

Ingo

2005-11-26 22:26:20

by James Bottomley

[permalink] [raw]
Subject: Re: [OOPS] sysfs_hash_and_remove (was Re: What protection ....)

On Thu, 2005-11-24 at 15:34 +0100, Ingo Molnar wrote:
> * Maneesh Soni <[email protected]> wrote:
>
> > So, IMO, it is necessary to explicitly remove links before
> > unregistering the kobject in case of bidirectional cross symlinks.
> >
> > The patch from James, is working, because it is not creating the cross
> > symlink itself.
>
> so, what is your suggestion, what should be done to fix the problem? The
> patch below:
> isnt fit for upstream inclusion :-)

Well, the patch was just intended to confirm the problem diagnosis.

The solution Maneesh appears to be advocating to the issue is imposing
del ordering, the issue being that device_del is the call that actually
removes the directories and symlinks, so all callers have to make sure
they've called class_device_del for every class on the device before
calling device_del. Since this trigger point happens regardless of
references, we can't expect references to get us out of this one, so
we'll have to audit the failing code manually. I did find and fix one
of these issues in SCSI, but there may be more lurking around ...

James


2006-02-11 00:33:48

by Greg KH

[permalink] [raw]
Subject: Re: [OOPS] sysfs_hash_and_remove (was Re: What protection ....)

On Wed, Nov 23, 2005 at 06:22:13PM +0530, Maneesh Soni wrote:
> On Wed, Nov 23, 2005 at 09:18:45AM +0100, Ingo Molnar wrote:
> >
> [..]
> > on a related note - i've been carrying the patch below in -rt for 2
> > months (i.e. Steven's kernel has it too), as a workaround against the
> > crash described below.
> >
> [..]
>
> > i'm occasionally getting the crash below on a PREEMPT_RT kernel. Might
> > be a PREEMPT_RT bug, or might be some sysfs race only visible under
> > PREEMPT_RT. Any ideas? The crash is at:
> >
> > (gdb) list *0xc01a2095
> > 0xc01a2095 is in sysfs_hash_and_remove (fs/sysfs/inode.c:229).
> > 224 }
> > 225
> > 226 void sysfs_hash_and_remove(struct dentry * dir, const char * name)
> > 227 {
> > 228 struct sysfs_dirent * sd;
> > 229 struct sysfs_dirent * parent_sd = dir->d_fsdata;
> > 230
> > 231 if (dir->d_inode == NULL)
> > 232 /* no inode means this hasn't been made visible yet */
> > 233 return;
> > (gdb)
> >
> Looks like here it is crashing due to bogus dentry pointer in the kobject
> kobj->dentry. Could be some stale pointer?

Did you ever figure anything out here? I'm seeing a lot more reports of
this problem lately, especially if you enable slab debugging. For
example:
http://bugzilla.kernel.org/show_bug.cgi?id=5876

thanks,

greg k-h

2006-02-11 15:46:58

by Steven Rostedt

[permalink] [raw]
Subject: Re: [OOPS] sysfs_hash_and_remove (was Re: What protection ....)

On Fri, 10 Feb 2006, Greg KH wrote:

> On Wed, Nov 23, 2005 at 06:22:13PM +0530, Maneesh Soni wrote:
> > Looks like here it is crashing due to bogus dentry pointer in the kobject
> > kobj->dentry. Could be some stale pointer?
>
> Did you ever figure anything out here? I'm seeing a lot more reports of
> this problem lately, especially if you enable slab debugging. For
> example:
> http://bugzilla.kernel.org/show_bug.cgi?id=5876
>

The patch has just been added to the 2.6.16 series so if this does fix the
bug, we wont know until someone tries one of the 2.6.16-rc kernels (or
later).

Have you seen this bug in one of those kernels?

-- Steve

2006-02-24 01:05:12

by Greg KH

[permalink] [raw]
Subject: Re: [OOPS] sysfs_hash_and_remove (was Re: What protection ....)

On Sat, Feb 11, 2006 at 10:46:45AM -0500, Steven Rostedt wrote:
> On Fri, 10 Feb 2006, Greg KH wrote:
>
> > On Wed, Nov 23, 2005 at 06:22:13PM +0530, Maneesh Soni wrote:
> > > Looks like here it is crashing due to bogus dentry pointer in the kobject
> > > kobj->dentry. Could be some stale pointer?
> >
> > Did you ever figure anything out here? I'm seeing a lot more reports of
> > this problem lately, especially if you enable slab debugging. For
> > example:
> > http://bugzilla.kernel.org/show_bug.cgi?id=5876
> >
>
> The patch has just been added to the 2.6.16 series so if this does fix the
> bug, we wont know until someone tries one of the 2.6.16-rc kernels (or
> later).

Ugh, you're right, sorry for the noise. Too many patches floating
around here :)

> Have you seen this bug in one of those kernels?

No I haven't.

thanks,

greg k-h