2008-10-28 15:12:18

by folkert

[permalink] [raw]
Subject: [2.6.26] kobject_add_internal failed for 2:0 with -EEXIST / unable to handle kernel NULL pointer dereference in sysfs_create_link

While running my http://vanheusden.com/pyk/ script (which randomly
inserts and removes modules) I triggered the folllowing oops in a 2.6.26
kernel on an IBM xSeries 260. This oops (in fact no oops at all) did not
get triggered in a 2.6.18 kernel on that system.

[ 42.507375] FDC 0 is a National Semiconductor PC87306
[ 42.509057] kobject_add_internal failed for 2:0 with -EEXIST, don't try to register things with the same name in the same directory.
[ 42.509291] Pid: 5301, comm: modprobe Not tainted 2.6.26-1-amd64 #1
[ 42.509431]
[ 42.509433] Call Trace:
[ 42.509685] [<ffffffff8031b031>] kobject_add_internal+0x13f/0x17e
[ 42.509823] [<ffffffff8031b46e>] kobject_add+0x74/0x7c
[ 42.509969] [<ffffffff802e2470>] sysfs_addrm_finish+0x19/0x1ea
[ 42.510141] [<ffffffff802e21b4>] sysfs_find_dirent+0x1b/0x2f
[ 42.510331] [<ffffffff802e2741>] create_dir+0x5a/0x87
[ 42.510466] [<ffffffff8031ae88>] kobject_get+0x12/0x17
[ 42.510614] [<ffffffff80382771>] get_device+0x17/0x20
[ 42.510754] [<ffffffff80382d81>] device_add+0x9b/0x53f
[ 42.510915] [<ffffffff8031acf2>] kobject_init+0x41/0x69
[ 42.511374] [<ffffffff803832d1>] device_create_vargs+0x9a/0xc6
[ 42.511519] [<ffffffff8027d23b>] bdi_register+0x57/0xb4
[ 42.511657] [<ffffffff8030ac34>] elv_register_queue+0x67/0x6f
[ 42.511818] [<ffffffff8030e54e>] blk_register_queue+0x77/0x9b
[ 42.511818] [<ffffffff80311ff4>] add_disk+0x64/0x87
[ 42.511818] [<ffffffffa0071f04>] :floppy:floppy_module_init+0xdf3/0xea8
[ 42.511818] [<ffffffff8022c184>] try_to_wake_up+0x118/0x129
[ 42.511840] [<ffffffff80254e9b>] sys_init_module+0x190e/0x1aa4
[ 42.511992] [<ffffffff8030cc77>] blk_init_queue+0x0/0x8
[ 42.512148] [<ffffffff8020be9a>] system_call_after_swapgs+0x8a/0x8f
[ 42.512290]
[ 42.512410] BUG: unable to handle kernel NULL pointer dereference at 0000000000000128
[ 42.512703] IP: [<ffffffff802e2f17>] sysfs_create_link+0x44/0x105
[ 42.512889] PGD 1bdc12067 PUD 1ba9d5067 PMD 0
[ 42.513198] Oops: 0000 [1] SMP
[ 42.513422] CPU 2
[ 42.513576] Modules linked in: floppy(+) output ide_cd_mod serio_raw dm_snapshot ata_generic snd_pcm snd_timer i2c_piix4 dm_mirror ehci_hcd battery usbhid ff_memless pcspkr(-) fan thermal_sys libata loop i2c_core joydev dm_log hid cdrom snd_page_alloc evdev netconsole configfs ipv6 snd soundcore ext3 jbd mbcache dm_mod dock enclosure sd_mod serverworks aacraid scsi_mod tg3 ide_pci_generic ide_core [last unloaded: psmouse]
[ 42.519374] Pid: 5301, comm: modprobe Not tainted 2.6.26-1-amd64 #1
[ 42.519374] RIP: 0010:[<ffffffff802e2f17>] [<ffffffff802e2f17>] sysfs_create_link+0x44/0x105
[ 42.519374] RSP: 0018:ffff8101ba991d48 EFLAGS: 00010246
[ 42.519374] RAX: 0000000000009292 RBX: 00000000000000f0 RCX: ffffffff804fe088
[ 42.519374] RDX: ffffffff804b341d RSI: 00000000000000f0 RDI: ffffffff80653d80
[ 42.519374] RBP: ffffffff804b341d R08: 0000000000000000 R09: ffff8101bb44d000
[ 42.519374] R10: 0000000000000001 R11: 0000000000000046 R12: 00000000fffffff2
[ 42.519374] R13: ffff8101bac6c5f0 R14: 0000000000000008 R15: 0000000000000000
[ 42.519374] FS: 00007f97800126e0(0000) GS:ffff8101bf0ad0c0(0000) knlGS:0000000000000000
[ 42.519374] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 42.519374] CR2: 0000000000000128 CR3: 00000001a9c89000 CR4: 00000000000006e0
[ 42.519374] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 42.519374] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 42.519374] Process modprobe (pid: 5301, threadinfo ffff8101ba990000, task ffff8101bf20e3c0)
[ 42.519374] Stack: 0000000000000000 ffffffff8030e54e ffff8101bc0d9c00 0000000000000000
[ 42.519374] ffff8101bc0d9d58 0000000000000000 0000000000000000 ffffffffa01fee30
[ 42.522582] ffffffffa01fee20 ffffffffa0071f04 ffffffff8022c184 0000000000000282
[ 42.522582] Call Trace:
[ 42.522582] [<ffffffff8030e54e>] ? blk_register_queue+0x77/0x9b
[ 42.522582] [<ffffffffa0071f04>] ? :floppy:floppy_module_init+0xdf3/0xea8
[ 42.522582] [<ffffffff8022c184>] ? try_to_wake_up+0x118/0x129
[ 42.522582] [<ffffffff80254e9b>] ? sys_init_module+0x190e/0x1aa4
[ 42.522582] [<ffffffff8030cc77>] ? blk_init_queue+0x0/0x8
[ 42.522582] [<ffffffff8020be9a>] ? system_call_after_swapgs+0x8a/0x8f
[ 42.522582]
[ 42.522582]
[ 42.522582] Code: 48 85 ff 49 c7 c5 a0 ba 50 80 74 13 4c 8b 6f 38 41 bc f2 ff ff ff 4d 85 ed 0f 84 bc 00 00 00 48 c7 c7 80 3d 65 80 e8 fb 68 14 00 <48> 8b 5b 38 48 85 db 74 19 83 3b 00 75 11 be 81 00 00 00 48 c7
[ 42.526571] RIP [<ffffffff802e2f17>] sysfs_create_link+0x44/0x105
[ 42.526571] RSP <ffff8101ba991d48>
[ 42.526571] CR2: 0000000000000128
[ 42.531559] ---[ end trace 4eb65a6452398ce5 ]---


Folkert van Heusden

--
Multitail est un outil permettant la visualisation de fichiers de
journalisation et/ou le suivi de l'ex?cution de commandes. Filtrage,
mise en couleur de mot-cl?, fusions, visualisation de diff?rences
(diff-view), etc. http://www.vanheusden.com/multitail/
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, http://www.vanheusden.com


2008-10-28 23:07:32

by Kay Sievers

[permalink] [raw]
Subject: Re: [2.6.26] kobject_add_internal failed for 2:0 with -EEXIST / unable to handle kernel NULL pointer dereference in sysfs_create_link

On Tue, Oct 28, 2008 at 16:11, Folkert van Heusden
<[email protected]> wrote:
> While running my http://vanheusden.com/pyk/ script (which randomly
> inserts and removes modules) I triggered the folllowing oops in a 2.6.26
> kernel on an IBM xSeries 260. This oops (in fact no oops at all) did not
> get triggered in a 2.6.18 kernel on that system.
>
> [ 42.507375] FDC 0 is a National Semiconductor PC87306
> [ 42.509057] kobject_add_internal failed for 2:0 with -EEXIST, don't try to register things with the same name in the same directory.
> [ 42.509291] Pid: 5301, comm: modprobe Not tainted 2.6.26-1-amd64 #1
> [ 42.509431]
> [ 42.509433] Call Trace:
> [ 42.509685] [<ffffffff8031b031>] kobject_add_internal+0x13f/0x17e
> [ 42.509823] [<ffffffff8031b46e>] kobject_add+0x74/0x7c
> [ 42.509969] [<ffffffff802e2470>] sysfs_addrm_finish+0x19/0x1ea
> [ 42.510141] [<ffffffff802e21b4>] sysfs_find_dirent+0x1b/0x2f
> [ 42.510331] [<ffffffff802e2741>] create_dir+0x5a/0x87
> [ 42.510466] [<ffffffff8031ae88>] kobject_get+0x12/0x17
> [ 42.510614] [<ffffffff80382771>] get_device+0x17/0x20
> [ 42.510754] [<ffffffff80382d81>] device_add+0x9b/0x53f
> [ 42.510915] [<ffffffff8031acf2>] kobject_init+0x41/0x69
> [ 42.511374] [<ffffffff803832d1>] device_create_vargs+0x9a/0xc6
> [ 42.511519] [<ffffffff8027d23b>] bdi_register+0x57/0xb4

Looks like bdi sees two devices with the same devnum, or didn't
cleanup an old entry.

What does: ls -l "/sys/class/bdi/" print?

How many floppies (or emulated floppies) does this system have?

Kay

2008-10-29 09:41:05

by folkert

[permalink] [raw]
Subject: Re: [2.6.26] kobject_add_internal failed for 2:0 with -EEXIST / unable to handle kernel NULL pointer dereference in sysfs_create_link

> > While running my http://vanheusden.com/pyk/ script (which randomly
> > inserts and removes modules) I triggered the folllowing oops in a 2.6.26
> > kernel on an IBM xSeries 260. This oops (in fact no oops at all) did not
> > get triggered in a 2.6.18 kernel on that system.
> >
> > [ 42.507375] FDC 0 is a National Semiconductor PC87306
> > [ 42.509057] kobject_add_internal failed for 2:0 with -EEXIST, don't try to register things with the same name in the same directory.
> > [ 42.509291] Pid: 5301, comm: modprobe Not tainted 2.6.26-1-amd64 #1
> > [ 42.509431]
> > [ 42.509433] Call Trace:
> > [ 42.509685] [<ffffffff8031b031>] kobject_add_internal+0x13f/0x17e
> > [ 42.509823] [<ffffffff8031b46e>] kobject_add+0x74/0x7c
> > [ 42.509969] [<ffffffff802e2470>] sysfs_addrm_finish+0x19/0x1ea
> > [ 42.510141] [<ffffffff802e21b4>] sysfs_find_dirent+0x1b/0x2f
> > [ 42.510331] [<ffffffff802e2741>] create_dir+0x5a/0x87
> > [ 42.510466] [<ffffffff8031ae88>] kobject_get+0x12/0x17
> > [ 42.510614] [<ffffffff80382771>] get_device+0x17/0x20
> > [ 42.510754] [<ffffffff80382d81>] device_add+0x9b/0x53f
> > [ 42.510915] [<ffffffff8031acf2>] kobject_init+0x41/0x69
> > [ 42.511374] [<ffffffff803832d1>] device_create_vargs+0x9a/0xc6
> > [ 42.511519] [<ffffffff8027d23b>] bdi_register+0x57/0xb4
>
> Looks like bdi sees two devices with the same devnum, or didn't
> cleanup an old entry.
> What does: ls -l "/sys/class/bdi/" print?

The following:
folkert@debiantesthw:~$ ls -l /sys/class/bdi/
total 0
drwxr-xr-x 3 root root 0 2008-10-28 18:32 1:0
drwxr-xr-x 3 root root 0 2008-10-28 18:32 1:1
drwxr-xr-x 3 root root 0 2008-10-28 18:32 1:10
drwxr-xr-x 3 root root 0 2008-10-28 18:32 1:11
drwxr-xr-x 3 root root 0 2008-10-28 18:32 1:12
drwxr-xr-x 3 root root 0 2008-10-28 18:32 1:13
drwxr-xr-x 3 root root 0 2008-10-28 18:32 1:14
drwxr-xr-x 3 root root 0 2008-10-28 18:32 1:15
drwxr-xr-x 3 root root 0 2008-10-28 18:32 1:2
drwxr-xr-x 3 root root 0 2008-10-28 18:32 1:3
drwxr-xr-x 3 root root 0 2008-10-28 18:32 1:4
drwxr-xr-x 3 root root 0 2008-10-28 18:32 1:5
drwxr-xr-x 3 root root 0 2008-10-28 18:32 1:6
drwxr-xr-x 3 root root 0 2008-10-28 18:32 1:7
drwxr-xr-x 3 root root 0 2008-10-28 18:32 1:8
drwxr-xr-x 3 root root 0 2008-10-28 18:32 1:9
drwxr-xr-x 3 root root 0 2008-10-28 18:32 2:0
drwxr-xr-x 3 root root 0 2008-10-28 18:32 2:1
drwxr-xr-x 3 root root 0 2008-10-28 18:32 254:0
drwxr-xr-x 3 root root 0 2008-10-28 18:32 254:1
drwxr-xr-x 3 root root 0 2008-10-28 18:32 254:2
drwxr-xr-x 3 root root 0 2008-10-28 18:32 254:3
drwxr-xr-x 3 root root 0 2008-10-28 18:32 254:4
drwxr-xr-x 3 root root 0 2008-10-28 18:32 254:5
drwxr-xr-x 3 root root 0 2008-10-28 18:32 3:0
drwxr-xr-x 3 root root 0 2008-10-29 11:39 7:0
drwxr-xr-x 3 root root 0 2008-10-29 11:39 7:1
drwxr-xr-x 3 root root 0 2008-10-29 11:39 7:2
drwxr-xr-x 3 root root 0 2008-10-29 11:39 7:3
drwxr-xr-x 3 root root 0 2008-10-29 11:39 7:4
drwxr-xr-x 3 root root 0 2008-10-29 11:39 7:5
drwxr-xr-x 3 root root 0 2008-10-29 11:39 7:6
drwxr-xr-x 3 root root 0 2008-10-29 11:39 7:7
drwxr-xr-x 3 root root 0 2008-10-28 18:32 8:0
drwxr-xr-x 3 root root 0 2008-10-28 18:32 default

> How many floppies (or emulated floppies) does this system have?

1 physical.


Folkert van Heusden

--
Ever wonder what is out there? Any alien races? Then please support
the seti@home project: setiathome.ssl.berkeley.edu
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, http://www.vanheusden.com

2008-10-29 10:02:13

by Bryan Wu

[permalink] [raw]
Subject: Re: [2.6.26] kobject_add_internal failed for 2:0 with -EEXIST / unable to handle kernel NULL pointer dereference in sysfs_create_link

On Wed, Oct 29, 2008 at 5:40 PM, Folkert van Heusden
<[email protected]> wrote:
>> > While running my http://vanheusden.com/pyk/ script (which randomly
>> > inserts and removes modules) I triggered the folllowing oops in a 2.6.26
>> > kernel on an IBM xSeries 260. This oops (in fact no oops at all) did not
>> > get triggered in a 2.6.18 kernel on that system.
>> >
>> > [ 42.507375] FDC 0 is a National Semiconductor PC87306
>> > [ 42.509057] kobject_add_internal failed for 2:0 with -EEXIST, don't try to register things with the same name in the same directory.
>> > [ 42.509291] Pid: 5301, comm: modprobe Not tainted 2.6.26-1-amd64 #1
>> > [ 42.509431]
>> > [ 42.509433] Call Trace:
>> > [ 42.509685] [<ffffffff8031b031>] kobject_add_internal+0x13f/0x17e
>> > [ 42.509823] [<ffffffff8031b46e>] kobject_add+0x74/0x7c
>> > [ 42.509969] [<ffffffff802e2470>] sysfs_addrm_finish+0x19/0x1ea
>> > [ 42.510141] [<ffffffff802e21b4>] sysfs_find_dirent+0x1b/0x2f
>> > [ 42.510331] [<ffffffff802e2741>] create_dir+0x5a/0x87
>> > [ 42.510466] [<ffffffff8031ae88>] kobject_get+0x12/0x17
>> > [ 42.510614] [<ffffffff80382771>] get_device+0x17/0x20
>> > [ 42.510754] [<ffffffff80382d81>] device_add+0x9b/0x53f
>> > [ 42.510915] [<ffffffff8031acf2>] kobject_init+0x41/0x69
>> > [ 42.511374] [<ffffffff803832d1>] device_create_vargs+0x9a/0xc6
>> > [ 42.511519] [<ffffffff8027d23b>] bdi_register+0x57/0xb4
>>
>> Looks like bdi sees two devices with the same devnum, or didn't
>> cleanup an old entry.
>> What does: ls -l "/sys/class/bdi/" print?
>
> The following:
> folkert@debiantesthw:~$ ls -l /sys/class/bdi/
> total 0
> drwxr-xr-x 3 root root 0 2008-10-28 18:32 1:0
> drwxr-xr-x 3 root root 0 2008-10-28 18:32 1:1
> drwxr-xr-x 3 root root 0 2008-10-28 18:32 1:10
> drwxr-xr-x 3 root root 0 2008-10-28 18:32 1:11
> drwxr-xr-x 3 root root 0 2008-10-28 18:32 1:12
> drwxr-xr-x 3 root root 0 2008-10-28 18:32 1:13
> drwxr-xr-x 3 root root 0 2008-10-28 18:32 1:14
> drwxr-xr-x 3 root root 0 2008-10-28 18:32 1:15
> drwxr-xr-x 3 root root 0 2008-10-28 18:32 1:2
> drwxr-xr-x 3 root root 0 2008-10-28 18:32 1:3
> drwxr-xr-x 3 root root 0 2008-10-28 18:32 1:4
> drwxr-xr-x 3 root root 0 2008-10-28 18:32 1:5
> drwxr-xr-x 3 root root 0 2008-10-28 18:32 1:6
> drwxr-xr-x 3 root root 0 2008-10-28 18:32 1:7
> drwxr-xr-x 3 root root 0 2008-10-28 18:32 1:8
> drwxr-xr-x 3 root root 0 2008-10-28 18:32 1:9
> drwxr-xr-x 3 root root 0 2008-10-28 18:32 2:0
> drwxr-xr-x 3 root root 0 2008-10-28 18:32 2:1
> drwxr-xr-x 3 root root 0 2008-10-28 18:32 254:0
> drwxr-xr-x 3 root root 0 2008-10-28 18:32 254:1
> drwxr-xr-x 3 root root 0 2008-10-28 18:32 254:2
> drwxr-xr-x 3 root root 0 2008-10-28 18:32 254:3
> drwxr-xr-x 3 root root 0 2008-10-28 18:32 254:4
> drwxr-xr-x 3 root root 0 2008-10-28 18:32 254:5
> drwxr-xr-x 3 root root 0 2008-10-28 18:32 3:0
> drwxr-xr-x 3 root root 0 2008-10-29 11:39 7:0
> drwxr-xr-x 3 root root 0 2008-10-29 11:39 7:1
> drwxr-xr-x 3 root root 0 2008-10-29 11:39 7:2
> drwxr-xr-x 3 root root 0 2008-10-29 11:39 7:3
> drwxr-xr-x 3 root root 0 2008-10-29 11:39 7:4
> drwxr-xr-x 3 root root 0 2008-10-29 11:39 7:5
> drwxr-xr-x 3 root root 0 2008-10-29 11:39 7:6
> drwxr-xr-x 3 root root 0 2008-10-29 11:39 7:7
> drwxr-xr-x 3 root root 0 2008-10-28 18:32 8:0
> drwxr-xr-x 3 root root 0 2008-10-28 18:32 default
>
>> How many floppies (or emulated floppies) does this system have?
>
> 1 physical.
>
>

Hi guys,

I found similar issue on Blackfin board and I believe it is a common bug for bdi
http://lkml.org/lkml/2008/10/16/126

But there is no response about this bug, although I am working on it.

-Bryan

2008-10-29 12:25:28

by Kay Sievers

[permalink] [raw]
Subject: Re: [2.6.26] kobject_add_internal failed for 2:0 with -EEXIST / unable to handle kernel NULL pointer dereference in sysfs_create_link

On Wed, Oct 29, 2008 at 10:40, Folkert van Heusden
<[email protected]> wrote:
>> > While running my http://vanheusden.com/pyk/ script (which randomly
>> > inserts and removes modules) I triggered the folllowing oops in a 2.6.26
>> > kernel on an IBM xSeries 260. This oops (in fact no oops at all) did not
>> > get triggered in a 2.6.18 kernel on that system.
>> >
>> > [ 42.507375] FDC 0 is a National Semiconductor PC87306
>> > [ 42.509057] kobject_add_internal failed for 2:0 with -EEXIST, don't try to register things with the same name in the same directory.
>> > [ 42.509291] Pid: 5301, comm: modprobe Not tainted 2.6.26-1-amd64 #1
>> > [ 42.509431]
>> > [ 42.509433] Call Trace:
>> > [ 42.509685] [<ffffffff8031b031>] kobject_add_internal+0x13f/0x17e
>> > [ 42.509823] [<ffffffff8031b46e>] kobject_add+0x74/0x7c
>> > [ 42.509969] [<ffffffff802e2470>] sysfs_addrm_finish+0x19/0x1ea
>> > [ 42.510141] [<ffffffff802e21b4>] sysfs_find_dirent+0x1b/0x2f
>> > [ 42.510331] [<ffffffff802e2741>] create_dir+0x5a/0x87
>> > [ 42.510466] [<ffffffff8031ae88>] kobject_get+0x12/0x17
>> > [ 42.510614] [<ffffffff80382771>] get_device+0x17/0x20
>> > [ 42.510754] [<ffffffff80382d81>] device_add+0x9b/0x53f
>> > [ 42.510915] [<ffffffff8031acf2>] kobject_init+0x41/0x69
>> > [ 42.511374] [<ffffffff803832d1>] device_create_vargs+0x9a/0xc6
>> > [ 42.511519] [<ffffffff8027d23b>] bdi_register+0x57/0xb4
>>
>> Looks like bdi sees two devices with the same devnum, or didn't
>> cleanup an old entry.
>> What does: ls -l "/sys/class/bdi/" print?
>
> The following:
> folkert@debiantesthw:~$ ls -l /sys/class/bdi/
> total 0

> drwxr-xr-x 3 root root 0 2008-10-28 18:32 2:0
> drwxr-xr-x 3 root root 0 2008-10-28 18:32 2:1

Oh, you are running the old sysfs layout without symlinks. Care to
tell where the "device" link in these directories points to?

Thanks,
Kay

2008-10-29 12:28:24

by Kay Sievers

[permalink] [raw]
Subject: Re: [2.6.26] kobject_add_internal failed for 2:0 with -EEXIST / unable to handle kernel NULL pointer dereference in sysfs_create_link

On Wed, Oct 29, 2008 at 11:01, Bryan Wu <[email protected]> wrote:
> On Wed, Oct 29, 2008 at 5:40 PM, Folkert van Heusden
> <[email protected]> wrote:
>>> > While running my http://vanheusden.com/pyk/ script (which randomly
>>> > inserts and removes modules) I triggered the folllowing oops in a 2.6.26
>>> > kernel on an IBM xSeries 260. This oops (in fact no oops at all) did not
>>> > get triggered in a 2.6.18 kernel on that system.
>>> >
>>> > [ 42.507375] FDC 0 is a National Semiconductor PC87306
>>> > [ 42.509057] kobject_add_internal failed for 2:0 with -EEXIST, don't try to register things with the same name in the same directory.
>>> > [ 42.509291] Pid: 5301, comm: modprobe Not tainted 2.6.26-1-amd64 #1
>>> > [ 42.509431]
>>> > [ 42.509433] Call Trace:
>>> > [ 42.509685] [<ffffffff8031b031>] kobject_add_internal+0x13f/0x17e
>>> > [ 42.509823] [<ffffffff8031b46e>] kobject_add+0x74/0x7c
>>> > [ 42.509969] [<ffffffff802e2470>] sysfs_addrm_finish+0x19/0x1ea
>>> > [ 42.510141] [<ffffffff802e21b4>] sysfs_find_dirent+0x1b/0x2f
>>> > [ 42.510331] [<ffffffff802e2741>] create_dir+0x5a/0x87
>>> > [ 42.510466] [<ffffffff8031ae88>] kobject_get+0x12/0x17
>>> > [ 42.510614] [<ffffffff80382771>] get_device+0x17/0x20
>>> > [ 42.510754] [<ffffffff80382d81>] device_add+0x9b/0x53f
>>> > [ 42.510915] [<ffffffff8031acf2>] kobject_init+0x41/0x69
>>> > [ 42.511374] [<ffffffff803832d1>] device_create_vargs+0x9a/0xc6
>>> > [ 42.511519] [<ffffffff8027d23b>] bdi_register+0x57/0xb4
>>>
>>> Looks like bdi sees two devices with the same devnum, or didn't
>>> cleanup an old entry.
>>> What does: ls -l "/sys/class/bdi/" print?
...
>> drwxr-xr-x 3 root root 0 2008-10-28 18:32 2:0
>> drwxr-xr-x 3 root root 0 2008-10-28 18:32 2:1
...
>> drwxr-xr-x 3 root root 0 2008-10-28 18:32 default
>>
>>> How many floppies (or emulated floppies) does this system have?
>>
>> 1 physical.

> Hi guys,
>
> I found similar issue on Blackfin board and I believe it is a common bug for bdi
> http://lkml.org/lkml/2008/10/16/126
>
> But there is no response about this bug, although I am working on it.

Peter, any idea? BDI seems to try to create duplicate devices.

Thanks,
Kay

2008-10-29 13:27:19

by folkert

[permalink] [raw]
Subject: Re: [2.6.26] kobject_add_internal failed for 2:0 with -EEXIST / unable to handle kernel NULL pointer dereference in sysfs_create_link

> >> > While running my http://vanheusden.com/pyk/ script (which randomly
> >> > inserts and removes modules) I triggered the folllowing oops in a 2.6.26
> >> > kernel on an IBM xSeries 260. This oops (in fact no oops at all) did not
> >> > get triggered in a 2.6.18 kernel on that system.
> >> >
> >> > [ 42.507375] FDC 0 is a National Semiconductor PC87306
> >> > [ 42.509057] kobject_add_internal failed for 2:0 with -EEXIST, don't try to register things with the same name in the same directory.
> >> > [ 42.509291] Pid: 5301, comm: modprobe Not tainted 2.6.26-1-amd64 #1
> >> > [ 42.509431]
> >> > [ 42.509433] Call Trace:
> >> > [ 42.509685] [<ffffffff8031b031>] kobject_add_internal+0x13f/0x17e
...
> >> > [ 42.511519] [<ffffffff8027d23b>] bdi_register+0x57/0xb4
> >>
> >> Looks like bdi sees two devices with the same devnum, or didn't
> >> cleanup an old entry. What does: ls -l "/sys/class/bdi/" print?
> >
> > The following:
> > folkert@debiantesthw:~$ ls -l /sys/class/bdi/
> > drwxr-xr-x 3 root root 0 2008-10-28 18:32 2:0
> > drwxr-xr-x 3 root root 0 2008-10-28 18:32 2:1
>
> Oh, you are running the old sysfs layout without symlinks. Care to
> tell where the "device" link in these directories points to?

None exist:
folkert@debiantesthw:~$ ls -la /sys/class/bdi/*/device
ls: cannot access /sys/class/bdi/*/device: No such file or directory


Folkert van Heusden

--
Multitail es una herramienta flexible que permite visualizar los "log
file" y seguir la ejecuci?n de comandos. Permite filtrar, a?adir
colores, combinar archivos, la visualizaci?n de diferencias (diff-
view), etc. http://www.vanheusden.com/multitail/
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, http://www.vanheusden.com

2008-10-29 14:49:21

by Kay Sievers

[permalink] [raw]
Subject: Re: [2.6.26] kobject_add_internal failed for 2:0 with -EEXIST / unable to handle kernel NULL pointer dereference in sysfs_create_link

On Wed, Oct 29, 2008 at 14:27, Folkert van Heusden
<[email protected]> wrote:
>> >> > While running my http://vanheusden.com/pyk/ script (which randomly
>> >> > inserts and removes modules) I triggered the folllowing oops in a 2.6.26
>> >> > kernel on an IBM xSeries 260. This oops (in fact no oops at all) did not
>> >> > get triggered in a 2.6.18 kernel on that system.
>> >> >
>> >> > [ 42.507375] FDC 0 is a National Semiconductor PC87306
>> >> > [ 42.509057] kobject_add_internal failed for 2:0 with -EEXIST, don't try to register things with the same name in the same directory.
>> >> > [ 42.509291] Pid: 5301, comm: modprobe Not tainted 2.6.26-1-amd64 #1
>> >> > [ 42.509431]
>> >> > [ 42.509433] Call Trace:
>> >> > [ 42.509685] [<ffffffff8031b031>] kobject_add_internal+0x13f/0x17e
> ...
>> >> > [ 42.511519] [<ffffffff8027d23b>] bdi_register+0x57/0xb4
>> >>
>> >> Looks like bdi sees two devices with the same devnum, or didn't
>> >> cleanup an old entry. What does: ls -l "/sys/class/bdi/" print?
>> >
>> > The following:
>> > folkert@debiantesthw:~$ ls -l /sys/class/bdi/
>> > drwxr-xr-x 3 root root 0 2008-10-28 18:32 2:0
>> > drwxr-xr-x 3 root root 0 2008-10-28 18:32 2:1
>>
>> Oh, you are running the old sysfs layout without symlinks. Care to
>> tell where the "device" link in these directories points to?
>
> None exist:
> folkert@debiantesthw:~$ ls -la /sys/class/bdi/*/device
> ls: cannot access /sys/class/bdi/*/device: No such file or directory

Ah, sorry. Seems the bdi stuff never got to pass the usual parent
device with the device registration, to let the bdi device show up at
the right place in the device tree.

Let's see what current devices on your box have the major 2:
find /sys -name dev | xargs grep '^2:'

Thanks,
Kay

2008-10-29 15:26:14

by folkert

[permalink] [raw]
Subject: Re: [2.6.26] kobject_add_internal failed for 2:0 with -EEXIST / unable to handle kernel NULL pointer dereference in sysfs_create_link

> >> >> > While running my http://vanheusden.com/pyk/ script (which randomly
> >> >> > inserts and removes modules) I triggered the folllowing oops in a 2.6.26
> >> >> > kernel on an IBM xSeries 260. This oops (in fact no oops at all) did not
> >> >> > get triggered in a 2.6.18 kernel on that system.
> >> >> >
> >> >> > [ 42.507375] FDC 0 is a National Semiconductor PC87306
> >> >> > [ 42.509057] kobject_add_internal failed for 2:0 with -EEXIST, don't try to register things with the same name in the same directory.
> >> >> > [ 42.509291] Pid: 5301, comm: modprobe Not tainted 2.6.26-1-amd64 #1
> >> >> > [ 42.509431]
> >> >> > [ 42.509433] Call Trace:
> >> >> > [ 42.509685] [<ffffffff8031b031>] kobject_add_internal+0x13f/0x17e
> > ...
> >> >> > [ 42.511519] [<ffffffff8027d23b>] bdi_register+0x57/0xb4
> >> >>
> >> >> Looks like bdi sees two devices with the same devnum, or didn't
> >> >> cleanup an old entry. What does: ls -l "/sys/class/bdi/" print?
> >> >
> >> > The following:
> >> > folkert@debiantesthw:~$ ls -l /sys/class/bdi/
> >> > drwxr-xr-x 3 root root 0 2008-10-28 18:32 2:0
> >> > drwxr-xr-x 3 root root 0 2008-10-28 18:32 2:1
> >>
> >> Oh, you are running the old sysfs layout without symlinks. Care to
> >> tell where the "device" link in these directories points to?
> >
> > None exist:
> > folkert@debiantesthw:~$ ls -la /sys/class/bdi/*/device
> > ls: cannot access /sys/class/bdi/*/device: No such file or directory
>
> Ah, sorry. Seems the bdi stuff never got to pass the usual parent
> device with the device registration, to let the bdi device show up at
> the right place in the device tree.
>
> Let's see what current devices on your box have the major 2:
> find /sys -name dev | xargs grep '^2:'

/sys/block/fd0/dev:2:0
/sys/block/fd1/dev:2:1

As my script does modprobe/rmmod in parallel (4 processes) maybe it is a
conflict of one process doing an modprobe of floppy while the other does
an rmmod? Or both a modprobe?


Folkert van Heusden

--
Multi tail barnamaj mowahib li mora9abat attasjilat wa nataij awamir
al 7asoub. damj, talwin, mora9abat attarchi7 wa ila akhirih.
http://www.vanheusden.com/multitail/
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, http://www.vanheusden.com

2008-10-29 21:51:25

by Kay Sievers

[permalink] [raw]
Subject: Re: [2.6.26] kobject_add_internal failed for 2:0 with -EEXIST / unable to handle kernel NULL pointer dereference in sysfs_create_link

On Wed, Oct 29, 2008 at 16:25, Folkert van Heusden
<[email protected]> wrote:
>> >> >> > While running my http://vanheusden.com/pyk/ script (which randomly
>> >> >> > inserts and removes modules) I triggered the folllowing oops in a 2.6.26
>> >> >> > kernel on an IBM xSeries 260. This oops (in fact no oops at all) did not
>> >> >> > get triggered in a 2.6.18 kernel on that system.
>> >> >> >
>> >> >> > [ 42.507375] FDC 0 is a National Semiconductor PC87306
>> >> >> > [ 42.509057] kobject_add_internal failed for 2:0 with -EEXIST, don't try to register things with the same name in the same directory.
>> >> >> > [ 42.509291] Pid: 5301, comm: modprobe Not tainted 2.6.26-1-amd64 #1
>> >> >> > [ 42.509431]
>> >> >> > [ 42.509433] Call Trace:
>> >> >> > [ 42.509685] [<ffffffff8031b031>] kobject_add_internal+0x13f/0x17e
>> > ...
>> >> >> > [ 42.511519] [<ffffffff8027d23b>] bdi_register+0x57/0xb4
>> >> >>
>> >> >> Looks like bdi sees two devices with the same devnum, or didn't
>> >> >> cleanup an old entry. What does: ls -l "/sys/class/bdi/" print?
>> >> >
>> >> > The following:
>> >> > folkert@debiantesthw:~$ ls -l /sys/class/bdi/
>> >> > drwxr-xr-x 3 root root 0 2008-10-28 18:32 2:0
>> >> > drwxr-xr-x 3 root root 0 2008-10-28 18:32 2:1
>> >>
>> >> Oh, you are running the old sysfs layout without symlinks. Care to
>> >> tell where the "device" link in these directories points to?
>> >
>> > None exist:
>> > folkert@debiantesthw:~$ ls -la /sys/class/bdi/*/device
>> > ls: cannot access /sys/class/bdi/*/device: No such file or directory
>>
>> Ah, sorry. Seems the bdi stuff never got to pass the usual parent
>> device with the device registration, to let the bdi device show up at
>> the right place in the device tree.
>>
>> Let's see what current devices on your box have the major 2:
>> find /sys -name dev | xargs grep '^2:'
>
> /sys/block/fd0/dev:2:0
> /sys/block/fd1/dev:2:1
>
> As my script does modprobe/rmmod in parallel (4 processes) maybe it is a
> conflict of one process doing an modprobe of floppy while the other does
> an rmmod? Or both a modprobe?

Might be, yes. If you just bootup, and don't run your modprobe/rmmod
script, does the box have 2 floppy devices in /sys too?

Kay

2008-10-30 10:55:56

by folkert

[permalink] [raw]
Subject: Re: [2.6.26] kobject_add_internal failed for 2:0 with -EEXIST / unable to handle kernel NULL pointer dereference in sysfs_create_link

> >> >> >> > While running my http://vanheusden.com/pyk/ script (which randomly
> >> >> >> > inserts and removes modules) I triggered the folllowing oops in a 2.6.26
> >> >> >> > kernel on an IBM xSeries 260. This oops (in fact no oops at all) did not
> >> >> >> > get triggered in a 2.6.18 kernel on that system.
> >> >> >> >
> >> >> >> > [ 42.507375] FDC 0 is a National Semiconductor PC87306
> >> >> >> > [ 42.509057] kobject_add_internal failed for 2:0 with -EEXIST, don't try to register things with the same name in the same directory.
> >> >> >> > [ 42.509291] Pid: 5301, comm: modprobe Not tainted 2.6.26-1-amd64 #1
> >> >> >> > [ 42.509431]
> >> >> >> > [ 42.509433] Call Trace:
> >> >> >> > [ 42.509685] [<ffffffff8031b031>] kobject_add_internal+0x13f/0x17e
> >> > ...
> >> >> >> > [ 42.511519] [<ffffffff8027d23b>] bdi_register+0x57/0xb4
> >> >> >>
> >> >> >> Looks like bdi sees two devices with the same devnum, or didn't
> >> >> >> cleanup an old entry. What does: ls -l "/sys/class/bdi/" print?
> >> >> >
> >> >> > The following:
> >> >> > folkert@debiantesthw:~$ ls -l /sys/class/bdi/
> >> >> > drwxr-xr-x 3 root root 0 2008-10-28 18:32 2:0
> >> >> > drwxr-xr-x 3 root root 0 2008-10-28 18:32 2:1
> >> >>
> >> >> Oh, you are running the old sysfs layout without symlinks. Care to
> >> >> tell where the "device" link in these directories points to?
> >> >
> >> > None exist:
> >> > folkert@debiantesthw:~$ ls -la /sys/class/bdi/*/device
> >> > ls: cannot access /sys/class/bdi/*/device: No such file or directory
> >>
> >> Ah, sorry. Seems the bdi stuff never got to pass the usual parent
> >> device with the device registration, to let the bdi device show up at
> >> the right place in the device tree.
> >>
> >> Let's see what current devices on your box have the major 2:
> >> find /sys -name dev | xargs grep '^2:'
> >
> > /sys/block/fd0/dev:2:0
> > /sys/block/fd1/dev:2:1
> >
> > As my script does modprobe/rmmod in parallel (4 processes) maybe it is a
> > conflict of one process doing an modprobe of floppy while the other does
> > an rmmod? Or both a modprobe?
>
> Might be, yes. If you just bootup, and don't run your modprobe/rmmod
> script, does the box have 2 floppy devices in /sys too?

Yes it does. One physical drive.


Folkert van Heusden

--
Multitail est un outil permettant la visualisation de fichiers de
journalisation et/ou le suivi de l'ex?cution de commandes. Filtrage,
mise en couleur de mot-cl?, fusions, visualisation de diff?rences
(diff-view), etc. http://www.vanheusden.com/multitail/
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, http://www.vanheusden.com

2008-10-30 23:07:12

by Kay Sievers

[permalink] [raw]
Subject: Re: [2.6.26] kobject_add_internal failed for 2:0 with -EEXIST / unable to handle kernel NULL pointer dereference in sysfs_create_link

On Thu, Oct 30, 2008 at 11:55, Folkert van Heusden
<[email protected]> wrote:
>> >> >> >> > While running my http://vanheusden.com/pyk/ script (which randomly
>> >> >> >> > inserts and removes modules) I triggered the folllowing oops in a 2.6.26
>> >> >> >> > kernel on an IBM xSeries 260. This oops (in fact no oops at all) did not
>> >> >> >> > get triggered in a 2.6.18 kernel on that system.
>> >> >> >> >
>> >> >> >> > [ 42.507375] FDC 0 is a National Semiconductor PC87306
>> >> >> >> > [ 42.509057] kobject_add_internal failed for 2:0 with -EEXIST, don't try to register things with the same name in the same directory.
>> >> >> >> > [ 42.509291] Pid: 5301, comm: modprobe Not tainted 2.6.26-1-amd64 #1
>> >> >> >> > [ 42.509431]
>> >> >> >> > [ 42.509433] Call Trace:
>> >> >> >> > [ 42.509685] [<ffffffff8031b031>] kobject_add_internal+0x13f/0x17e
>> >> > ...
>> >> >> >> > [ 42.511519] [<ffffffff8027d23b>] bdi_register+0x57/0xb4
>> >> >> >>
>> >> >> >> Looks like bdi sees two devices with the same devnum, or didn't
>> >> >> >> cleanup an old entry. What does: ls -l "/sys/class/bdi/" print?
>> >> >> >
>> >> >> > The following:
>> >> >> > folkert@debiantesthw:~$ ls -l /sys/class/bdi/
>> >> >> > drwxr-xr-x 3 root root 0 2008-10-28 18:32 2:0
>> >> >> > drwxr-xr-x 3 root root 0 2008-10-28 18:32 2:1
>> >> >>
>> >> >> Oh, you are running the old sysfs layout without symlinks. Care to
>> >> >> tell where the "device" link in these directories points to?
>> >> >
>> >> > None exist:
>> >> > folkert@debiantesthw:~$ ls -la /sys/class/bdi/*/device
>> >> > ls: cannot access /sys/class/bdi/*/device: No such file or directory
>> >>
>> >> Ah, sorry. Seems the bdi stuff never got to pass the usual parent
>> >> device with the device registration, to let the bdi device show up at
>> >> the right place in the device tree.
>> >>
>> >> Let's see what current devices on your box have the major 2:
>> >> find /sys -name dev | xargs grep '^2:'
>> >
>> > /sys/block/fd0/dev:2:0
>> > /sys/block/fd1/dev:2:1
>> >
>> > As my script does modprobe/rmmod in parallel (4 processes) maybe it is a
>> > conflict of one process doing an modprobe of floppy while the other does
>> > an rmmod? Or both a modprobe?
>>
>> Might be, yes. If you just bootup, and don't run your modprobe/rmmod
>> script, does the box have 2 floppy devices in /sys too?
>
> Yes it does. One physical drive.

Seems that always happens with multiple floppies. I can reproduce it
here with qemu. It seems not related to modprobing. Also mtd devices
suffer from the same problem, as bug reports show.

It might be a bug in bdi. Looks like floppies share a single queue,
the bdi structure lives in the queue. Now we register for every device
a bdi device, but the queue is shared and the former recorded dev_t in
the bdi structure is overwritten. At unregistering the bdi device, all
earlier devices using the same queue are not removed.

Peter, please check, if something like this can happen?

Thanks,
Kay

2008-10-30 23:23:31

by Kay Sievers

[permalink] [raw]
Subject: Re: [2.6.26] kobject_add_internal failed for 2:0 with -EEXIST / unable to handle kernel NULL pointer dereference in sysfs_create_link

On Fri, Oct 31, 2008 at 00:06, Kay Sievers <[email protected]> wrote:
> On Thu, Oct 30, 2008 at 11:55, Folkert van Heusden
> <[email protected]> wrote:
>>> >> >> >> > While running my http://vanheusden.com/pyk/ script (which randomly
>>> >> >> >> > inserts and removes modules) I triggered the folllowing oops in a 2.6.26
>>> >> >> >> > kernel on an IBM xSeries 260. This oops (in fact no oops at all) did not
>>> >> >> >> > get triggered in a 2.6.18 kernel on that system.
>>> >> >> >> >
>>> >> >> >> > [ 42.507375] FDC 0 is a National Semiconductor PC87306
>>> >> >> >> > [ 42.509057] kobject_add_internal failed for 2:0 with -EEXIST, don't try to register things with the same name in the same directory.
>>> >> >> >> > [ 42.509291] Pid: 5301, comm: modprobe Not tainted 2.6.26-1-amd64 #1
>>> >> >> >> > [ 42.509431]
>>> >> >> >> > [ 42.509433] Call Trace:
>>> >> >> >> > [ 42.509685] [<ffffffff8031b031>] kobject_add_internal+0x13f/0x17e
>>> >> > ...
>>> >> >> >> > [ 42.511519] [<ffffffff8027d23b>] bdi_register+0x57/0xb4
>>> >> >> >>
>>> >> >> >> Looks like bdi sees two devices with the same devnum, or didn't
>>> >> >> >> cleanup an old entry. What does: ls -l "/sys/class/bdi/" print?
>>> >> >> >
>>> >> >> > The following:
>>> >> >> > folkert@debiantesthw:~$ ls -l /sys/class/bdi/
>>> >> >> > drwxr-xr-x 3 root root 0 2008-10-28 18:32 2:0
>>> >> >> > drwxr-xr-x 3 root root 0 2008-10-28 18:32 2:1
>>> >> >>
>>> >> >> Oh, you are running the old sysfs layout without symlinks. Care to
>>> >> >> tell where the "device" link in these directories points to?
>>> >> >
>>> >> > None exist:
>>> >> > folkert@debiantesthw:~$ ls -la /sys/class/bdi/*/device
>>> >> > ls: cannot access /sys/class/bdi/*/device: No such file or directory
>>> >>
>>> >> Ah, sorry. Seems the bdi stuff never got to pass the usual parent
>>> >> device with the device registration, to let the bdi device show up at
>>> >> the right place in the device tree.
>>> >>
>>> >> Let's see what current devices on your box have the major 2:
>>> >> find /sys -name dev | xargs grep '^2:'
>>> >
>>> > /sys/block/fd0/dev:2:0
>>> > /sys/block/fd1/dev:2:1
>>> >
>>> > As my script does modprobe/rmmod in parallel (4 processes) maybe it is a
>>> > conflict of one process doing an modprobe of floppy while the other does
>>> > an rmmod? Or both a modprobe?
>>>
>>> Might be, yes. If you just bootup, and don't run your modprobe/rmmod
>>> script, does the box have 2 floppy devices in /sys too?
>>
>> Yes it does. One physical drive.
>
> Seems that always happens with multiple floppies. I can reproduce it
> here with qemu. It seems not related to modprobing. Also mtd devices
> suffer from the same problem, as bug reports show.
>
> It might be a bug in bdi. Looks like floppies share a single queue,
> the bdi structure lives in the queue. Now we register for every device
> a bdi device, but the queue is shared and the former recorded dev_t in
> the bdi structure is overwritten. At unregistering the bdi device, all
> earlier devices using the same queue are not removed.
>
> Peter, please check, if something like this can happen?

Ok, I get annoyed by these sysfs bugs. :)

Peter, it looks like bdi does not work for devices which share a single queue.
If I add:
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -184,6 +184,8 @@ int bdi_register(struct backing_dev_info *bdi,
struct device *parent,
goto exit;
}

+ printk("XXXXXXX old bdidev is %p\n", bdi->dev);
+ printk("XXXXXXX new bdidev is %p\n", dev);
bdi->dev = dev;
bdi_debug_register(bdi, dev_name(dev));

I get:
$ modprobe floppy
Floppy drive(s): fd0 is 1.44M, fd1 is 1.44M
FDC 0 is a S82078B
XXXXXXX old bdidev is 0000000000000000
XXXXXXX new bdidev is ffff88001f20cd10
XXXXXXX old bdidev is ffff88001f20cd10
XXXXXXX new bdidev is ffff88001f20de30

which very much looks like bdi will not remove any earlier registered
device, only the last one, right?

Thanks,
Kay

2008-10-31 01:13:48

by Kay Sievers

[permalink] [raw]
Subject: Re: [2.6.26] kobject_add_internal failed for 2:0 with -EEXIST / unable to handle kernel NULL pointer dereference in sysfs_create_link

On Fri, 2008-10-31 at 00:23 +0100, Kay Sievers wrote:
> On Fri, Oct 31, 2008 at 00:06, Kay Sievers <[email protected]> wrote:
> > On Thu, Oct 30, 2008 at 11:55, Folkert van Heusden
> > <[email protected]> wrote:
> >>> >> >> >> > While running my http://vanheusden.com/pyk/ script (which randomly
> >>> >> >> >> > inserts and removes modules) I triggered the folllowing oops in a 2.6.26
> >>> >> >> >> > kernel on an IBM xSeries 260. This oops (in fact no oops at all) did not
> >>> >> >> >> > get triggered in a 2.6.18 kernel on that system.
> >>> >> >> >> >
> >>> >> >> >> > [ 42.507375] FDC 0 is a National Semiconductor PC87306
> >>> >> >> >> > [ 42.509057] kobject_add_internal failed for 2:0 with -EEXIST, don't try to register things with the same name in the same directory.
> >>> >> >> >> > [ 42.509291] Pid: 5301, comm: modprobe Not tainted 2.6.26-1-amd64 #1
> >>> >> >> >> > [ 42.509431]
> >>> >> >> >> > [ 42.509433] Call Trace:
> >>> >> >> >> > [ 42.509685] [<ffffffff8031b031>] kobject_add_internal+0x13f/0x17e
> >>> >> > ...
> >>> >> >> >> > [ 42.511519] [<ffffffff8027d23b>] bdi_register+0x57/0xb4
> >>> >> >> >>
> >>> >> >> >> Looks like bdi sees two devices with the same devnum, or didn't
> >>> >> >> >> cleanup an old entry. What does: ls -l "/sys/class/bdi/" print?
> >>> >> >> >
> >>> >> >> > The following:
> >>> >> >> > folkert@debiantesthw:~$ ls -l /sys/class/bdi/
> >>> >> >> > drwxr-xr-x 3 root root 0 2008-10-28 18:32 2:0
> >>> >> >> > drwxr-xr-x 3 root root 0 2008-10-28 18:32 2:1
> >>> >> >>
> >>> >> >> Oh, you are running the old sysfs layout without symlinks. Care to
> >>> >> >> tell where the "device" link in these directories points to?
> >>> >> >
> >>> >> > None exist:
> >>> >> > folkert@debiantesthw:~$ ls -la /sys/class/bdi/*/device
> >>> >> > ls: cannot access /sys/class/bdi/*/device: No such file or directory
> >>> >>
> >>> >> Ah, sorry. Seems the bdi stuff never got to pass the usual parent
> >>> >> device with the device registration, to let the bdi device show up at
> >>> >> the right place in the device tree.
> >>> >>
> >>> >> Let's see what current devices on your box have the major 2:
> >>> >> find /sys -name dev | xargs grep '^2:'
> >>> >
> >>> > /sys/block/fd0/dev:2:0
> >>> > /sys/block/fd1/dev:2:1
> >>> >
> >>> > As my script does modprobe/rmmod in parallel (4 processes) maybe it is a
> >>> > conflict of one process doing an modprobe of floppy while the other does
> >>> > an rmmod? Or both a modprobe?
> >>>
> >>> Might be, yes. If you just bootup, and don't run your modprobe/rmmod
> >>> script, does the box have 2 floppy devices in /sys too?
> >>
> >> Yes it does. One physical drive.
> >
> > Seems that always happens with multiple floppies. I can reproduce it
> > here with qemu. It seems not related to modprobing. Also mtd devices
> > suffer from the same problem, as bug reports show.
> >
> > It might be a bug in bdi. Looks like floppies share a single queue,
> > the bdi structure lives in the queue. Now we register for every device
> > a bdi device, but the queue is shared and the former recorded dev_t in
> > the bdi structure is overwritten. At unregistering the bdi device, all
> > earlier devices using the same queue are not removed.
> >
> > Peter, please check, if something like this can happen?
>
> Ok, I get annoyed by these sysfs bugs. :)
>
> Peter, it looks like bdi does not work for devices which share a single queue.
> If I add:
> --- a/mm/backing-dev.c
> +++ b/mm/backing-dev.c
> @@ -184,6 +184,8 @@ int bdi_register(struct backing_dev_info *bdi,
> struct device *parent,
> goto exit;
> }
>
> + printk("XXXXXXX old bdidev is %p\n", bdi->dev);
> + printk("XXXXXXX new bdidev is %p\n", dev);
> bdi->dev = dev;
> bdi_debug_register(bdi, dev_name(dev));
>
> I get:
> $ modprobe floppy
> Floppy drive(s): fd0 is 1.44M, fd1 is 1.44M
> FDC 0 is a S82078B
> XXXXXXX old bdidev is 0000000000000000
> XXXXXXX new bdidev is ffff88001f20cd10
> XXXXXXX old bdidev is ffff88001f20cd10
> XXXXXXX new bdidev is ffff88001f20de30
>
> which very much looks like bdi will not remove any earlier registered
> device, only the last one, right?

This fixes it for me.

Thanks,
Kay


From: Kay Sievers <[email protected]>
Subject: bdi: register sysfs bdi device only once per queue

Devices which share the same queue, like floppies and mtd devices,
get registered multiple times in the bdi interface, but bdi accounts
only the last registered device of the devices sharing one queue.

On remove, all earlier registered devices leak, stay around in
sysfs, and cause "duplicate filename" errors if the devices are
recreated.

This prevents the creation of multiple bdi interfaces per queue,
and the bdi device will carry the dev_t name of the block device
which is the first one registered, of the pool of devices using
the same queue.

Signed-Off-By: Kay Sievers <[email protected]>
---


diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index f2e574d..e6676e5 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -176,6 +176,9 @@ int bdi_register(struct backing_dev_info *bdi, struct device *parent,
int ret = 0;
struct device *dev;

+ if (bdi->dev)
+ goto exit;
+
va_start(args, fmt);
dev = device_create_vargs(bdi_class, parent, MKDEV(0, 0), bdi, fmt, args);
va_end(args);

2008-10-31 09:28:38

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [2.6.26] kobject_add_internal failed for 2:0 with -EEXIST / unable to handle kernel NULL pointer dereference in sysfs_create_link

On Fri, 2008-10-31 at 00:23 +0100, Kay Sievers wrote:

> Peter, it looks like bdi does not work for devices which share a single queue.
> If I add:
> --- a/mm/backing-dev.c
> +++ b/mm/backing-dev.c
> @@ -184,6 +184,8 @@ int bdi_register(struct backing_dev_info *bdi,
> struct device *parent,
> goto exit;
> }
>
> + printk("XXXXXXX old bdidev is %p\n", bdi->dev);
> + printk("XXXXXXX new bdidev is %p\n", dev);
> bdi->dev = dev;
> bdi_debug_register(bdi, dev_name(dev));
>
> I get:
> $ modprobe floppy
> Floppy drive(s): fd0 is 1.44M, fd1 is 1.44M
> FDC 0 is a S82078B
> XXXXXXX old bdidev is 0000000000000000
> XXXXXXX new bdidev is ffff88001f20cd10
> XXXXXXX old bdidev is ffff88001f20cd10
> XXXXXXX new bdidev is ffff88001f20de30
>
> which very much looks like bdi will not remove any earlier registered
> device, only the last one, right?

Sharing a bdi is odd to begin with, let me poke at this a little.

2008-11-03 11:53:19

by Kay Sievers

[permalink] [raw]
Subject: Re: [2.6.26] kobject_add_internal failed for 2:0 with -EEXIST / unable to handle kernel NULL pointer dereference in sysfs_create_link

On Fri, Oct 31, 2008 at 10:28, Peter Zijlstra <[email protected]> wrote:
> On Fri, 2008-10-31 at 00:23 +0100, Kay Sievers wrote:
>
>> Peter, it looks like bdi does not work for devices which share a single queue.
>> If I add:
>> --- a/mm/backing-dev.c
>> +++ b/mm/backing-dev.c
>> @@ -184,6 +184,8 @@ int bdi_register(struct backing_dev_info *bdi,
>> struct device *parent,
>> goto exit;
>> }
>>
>> + printk("XXXXXXX old bdidev is %p\n", bdi->dev);
>> + printk("XXXXXXX new bdidev is %p\n", dev);
>> bdi->dev = dev;
>> bdi_debug_register(bdi, dev_name(dev));
>>
>> I get:
>> $ modprobe floppy
>> Floppy drive(s): fd0 is 1.44M, fd1 is 1.44M
>> FDC 0 is a S82078B
>> XXXXXXX old bdidev is 0000000000000000
>> XXXXXXX new bdidev is ffff88001f20cd10
>> XXXXXXX old bdidev is ffff88001f20cd10
>> XXXXXXX new bdidev is ffff88001f20de30
>>
>> which very much looks like bdi will not remove any earlier registered
>> device, only the last one, right?
>
> Sharing a bdi is odd to begin with, let me poke at this a little.

Yeah, it's odd, but I'm not sure if you want to touch floppy.c. :)
Any objection to this patch, until you possible fix it differently?
http://marc.info/?l=linux-kernel&m=122541569310798&w=4

Kay

2008-11-03 11:55:16

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [2.6.26] kobject_add_internal failed for 2:0 with -EEXIST / unable to handle kernel NULL pointer dereference in sysfs_create_link

On Mon, 2008-11-03 at 12:53 +0100, Kay Sievers wrote:
> On Fri, Oct 31, 2008 at 10:28, Peter Zijlstra <[email protected]> wrote:
> > On Fri, 2008-10-31 at 00:23 +0100, Kay Sievers wrote:
> >
> >> Peter, it looks like bdi does not work for devices which share a single queue.
> >> If I add:
> >> --- a/mm/backing-dev.c
> >> +++ b/mm/backing-dev.c
> >> @@ -184,6 +184,8 @@ int bdi_register(struct backing_dev_info *bdi,
> >> struct device *parent,
> >> goto exit;
> >> }
> >>
> >> + printk("XXXXXXX old bdidev is %p\n", bdi->dev);
> >> + printk("XXXXXXX new bdidev is %p\n", dev);
> >> bdi->dev = dev;
> >> bdi_debug_register(bdi, dev_name(dev));
> >>
> >> I get:
> >> $ modprobe floppy
> >> Floppy drive(s): fd0 is 1.44M, fd1 is 1.44M
> >> FDC 0 is a S82078B
> >> XXXXXXX old bdidev is 0000000000000000
> >> XXXXXXX new bdidev is ffff88001f20cd10
> >> XXXXXXX old bdidev is ffff88001f20cd10
> >> XXXXXXX new bdidev is ffff88001f20de30
> >>
> >> which very much looks like bdi will not remove any earlier registered
> >> device, only the last one, right?
> >
> > Sharing a bdi is odd to begin with, let me poke at this a little.
>
> Yeah, it's odd, but I'm not sure if you want to touch floppy.c. :)
> Any objection to this patch, until you possible fix it differently?
> http://marc.info/?l=linux-kernel&m=122541569310798&w=4

Non at all, touching floppy.c is dangerous, folks might assume you know
something about it ;-)