2007-05-11 06:39:55

by Kenji Kaneshige

[permalink] [raw]
Subject: [BUG][PATCH] Fix race condition about network device name allocation

Hi,

I encountered the following error when I was hot-plugging network card
using pci hotplug driver.

kobject_add failed for eth8 with -EEXIST, don't try to register things with the same name in the same directory.

Call Trace:
[<a000000100013940>] show_stack+0x40/0xa0
sp=e0000000652cfbf0 bsp=e0000000652c9450
[<a0000001000139d0>] dump_stack+0x30/0x60
sp=e0000000652cfdc0 bsp=e0000000652c9438
[<a0000001003b5af0>] kobject_shadow_add+0x3b0/0x440
sp=e0000000652cfdc0 bsp=e0000000652c93f0
[<a0000001003b5bb0>] kobject_add+0x30/0x60
sp=e0000000652cfdc0 bsp=e0000000652c93d0
[<a000000100488240>] device_add+0x160/0xf40
sp=e0000000652cfdc0 bsp=e0000000652c9358
[<a00000010061af00>] netdev_register_sysfs+0xc0/0xe0
sp=e0000000652cfdc0 bsp=e0000000652c9328
[<a0000001006006c0>] register_netdevice+0x600/0x7e0
sp=e0000000652cfdc0 bsp=e0000000652c92e8
[<a000000100600910>] register_netdev+0x70/0xc0
sp=e0000000652cfdd0 bsp=e0000000652c92c0
[<a00000020018dd70>] e1000_probe+0x1710/0x1a00 [e1000]
sp=e0000000652cfdd0 bsp=e0000000652c9258
[<a0000001003d56c0>] pci_device_probe+0xc0/0x120
sp=e0000000652cfde0 bsp=e0000000652c9218
[<a00000010048dc60>] really_probe+0x1c0/0x300
sp=e0000000652cfde0 bsp=e0000000652c91c8
[<a00000010048df40>] driver_probe_device+0x1a0/0x1c0
sp=e0000000652cfde0 bsp=e0000000652c9198
[<a00000010048df90>] __device_attach+0x30/0x60
sp=e0000000652cfde0 bsp=e0000000652c9170
[<a00000010048c080>] bus_for_each_drv+0x80/0x120
sp=e0000000652cfde0 bsp=e0000000652c9138
[<a00000010048e0c0>] device_attach+0xa0/0x100
sp=e0000000652cfe00 bsp=e0000000652c9110
[<a00000010048bec0>] bus_attach_device+0x60/0xe0
sp=e0000000652cfe00 bsp=e0000000652c90e0
[<a000000100488a30>] device_add+0x950/0xf40
sp=e0000000652cfe00 bsp=e0000000652c9068
[<a0000001003ca360>] pci_bus_add_device+0x20/0x100
sp=e0000000652cfe00 bsp=e0000000652c9040
[<a0000001003ca490>] pci_bus_add_devices+0x50/0x320
sp=e0000000652cfe00 bsp=e0000000652c9010
[<a000000200081520>] shpchp_configure_device+0x460/0x4a0 [shpchp]
sp=e0000000652cfe00 bsp=e0000000652c8fc0
[<a00000020007e7c0>] board_added+0x720/0x8c0 [shpchp]
sp=e0000000652cfe00 bsp=e0000000652c8f80
[<a00000020007f280>] shpchp_enable_slot+0x920/0xa40 [shpchp]
sp=e0000000652cfe10 bsp=e0000000652c8f30
[<a000000200080a40>] shpchp_sysfs_enable_slot+0x120/0x220 [shpchp]
sp=e0000000652cfe20 bsp=e0000000652c8ef8
[<a00000020007d270>] enable_slot+0x90/0xc0 [shpchp]
sp=e0000000652cfe20 bsp=e0000000652c8ed0
[<a00000020002a760>] power_write_file+0x1e0/0x2a0 [pci_hotplug]
sp=e0000000652cfe20 bsp=e0000000652c8ea0
[<a000000200028100>] hotplug_slot_attr_store+0x60/0xa0 [pci_hotplug]
sp=e0000000652cfe20 bsp=e0000000652c8e68
[<a0000001001cf010>] sysfs_write_file+0x270/0x300
sp=e0000000652cfe20 bsp=e0000000652c8e18
[<a0000001001395b0>] vfs_write+0x1b0/0x300
sp=e0000000652cfe20 bsp=e0000000652c8dc0
[<a00000010013a1b0>] sys_write+0x70/0xe0
sp=e0000000652cfe20 bsp=e0000000652c8d48
[<a00000010000bc20>] ia64_ret_from_syscall+0x0/0x20
sp=e0000000652cfe30 bsp=e0000000652c8d48
[<a000000000010620>] __kernel_syscall_via_break+0x0/0x20
sp=e0000000652d0000 bsp=e0000000652c8d48

This error is caused by the race condition between dev_alloc_name()
and unregistering net device. The dev_alloc_name() function checks
dev_base list to find a suitable id. The net device is removed from
the dev_base list when net device is unregistered. Those are done with
rtnl lock held, so there is no problem. However, removing sysfs entry
is delayed until netdev_run_todo() which is outside rtnl lock. Because
of this, dev_alloc_name() can create a device name which is duplicated
with the existing sysfs entry.

The following patch fixes this by changing dev_alloc_name() function
to check not only dev_base list but also todo list and snapshot list
to find a suitable id. If some devices exists on the todo list or
snapshot list, sysfs entries corresponding to them are not deleted
yet.

Thanks,
Kenji Kaneshige

---
Fix the race condition between dev_alloc_name() and net device
unregistration.

dev_alloc_name checks `dev_base' list to find a suitable ID. On the
other hand, net devices are removed from that list on net device
unregistration. Both of them are done with holding rtnl lock. However,
removing sysfs entry is delayed until netdev_run_todo(), which doesn't
acquire rtnl lock. Therefore, on dev_alloc_name(), device name could
conflict with the device, which is unregistered, but its sysfs entry
still exist.

To fix this bug, change dev_alloc_name to check not only `dev_base'
list but also todo list and snapshot list.

Signed-off-by: Kenji Kaneshige <[email protected]>

---
net/core/dev.c | 48 ++++++++++++++++++++++++++++++++++++++----------
1 files changed, 38 insertions(+), 10 deletions(-)

Index: linux-2.6.21/net/core/dev.c
===================================================================
--- linux-2.6.21.orig/net/core/dev.c
+++ linux-2.6.21/net/core/dev.c
@@ -218,6 +218,11 @@ extern void netdev_unregister_sysfs(stru
#define netdev_unregister_sysfs(dev) do { } while(0)
#endif

+/* Delayed registration/unregisteration */
+static DEFINE_SPINLOCK(net_todo_list_lock);
+static LIST_HEAD(net_todo_list);
+static LIST_HEAD(net_todo_snapshot_list);
+

/*******************************************************************************

@@ -702,7 +707,32 @@ int dev_alloc_name(struct net_device *de
set_bit(i, inuse);
}

+ spin_lock(&net_todo_list_lock);
+ list_for_each_entry(d, &net_todo_list, todo_list) {
+ if (!sscanf(d->name, name, &i))
+ continue;
+ if (i < 0 || i >= max_netdevices)
+ continue;
+
+ /* avoid cases where sscanf is not exact inverse of printf */
+ snprintf(buf, sizeof(buf), name, i);
+ if (!strncmp(buf, d->name, IFNAMSIZ))
+ set_bit(i, inuse);
+ }
+ list_for_each_entry(d, &net_todo_snapshot_list, todo_list) {
+ if (!sscanf(d->name, name, &i))
+ continue;
+ if (i < 0 || i >= max_netdevices)
+ continue;
+
+ /* avoid cases where sscanf is not exact inverse of printf */
+ snprintf(buf, sizeof(buf), name, i);
+ if (!strncmp(buf, d->name, IFNAMSIZ))
+ set_bit(i, inuse);
+ }
i = find_first_zero_bit(inuse, max_netdevices);
+ spin_unlock(&net_todo_list_lock);
+
free_page((unsigned long) inuse);
}

@@ -2843,10 +2873,6 @@ static int dev_new_index(void)

static int dev_boot_phase = 1;

-/* Delayed registration/unregisteration */
-static DEFINE_SPINLOCK(net_todo_list_lock);
-static struct list_head net_todo_list = LIST_HEAD_INIT(net_todo_list);
-
static inline void net_set_todo(struct net_device *dev)
{
spin_lock(&net_todo_list_lock);
@@ -3105,8 +3131,6 @@ static void netdev_wait_allrefs(struct n
static DEFINE_MUTEX(net_todo_run_mutex);
void netdev_run_todo(void)
{
- struct list_head list;
-
/* Need to guard against multiple cpu's getting out of order. */
mutex_lock(&net_todo_run_mutex);

@@ -3120,13 +3144,13 @@ void netdev_run_todo(void)

/* Snapshot list, allow later requests */
spin_lock(&net_todo_list_lock);
- list_replace_init(&net_todo_list, &list);
+ list_replace_init(&net_todo_list, &net_todo_snapshot_list);
spin_unlock(&net_todo_list_lock);

- while (!list_empty(&list)) {
+ while (!list_empty(&net_todo_snapshot_list)) {
struct net_device *dev
- = list_entry(list.next, struct net_device, todo_list);
- list_del(&dev->todo_list);
+ = list_entry(net_todo_snapshot_list.next,
+ struct net_device, todo_list);

if (unlikely(dev->reg_state != NETREG_UNREGISTERING)) {
printk(KERN_ERR "network todo '%s' but state %d\n",
@@ -3146,6 +3170,10 @@ void netdev_run_todo(void)
BUG_TRAP(!dev->ip6_ptr);
BUG_TRAP(!dev->dn_ptr);

+ spin_lock(&net_todo_list_lock);
+ list_del(&dev->todo_list);
+ spin_unlock(&net_todo_list_lock);
+
/* It must be the very last action,
* after this 'dev' may point to freed up memory.
*/



2007-05-11 15:50:35

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [BUG][PATCH] Fix race condition about network device name allocation

On Fri, 11 May 2007 14:40:45 +0900
Kenji Kaneshige <[email protected]> wrote:

> Hi,
>
> I encountered the following error when I was hot-plugging network card
> using pci hotplug driver.
>
> kobject_add failed for eth8 with -EEXIST, don't try to register things with the same name in the same directory.
>
> Call Trace:
> [<a000000100013940>] show_stack+0x40/0xa0
> sp=e0000000652cfbf0 bsp=e0000000652c9450
> [<a0000001000139d0>] dump_stack+0x30/0x60
> sp=e0000000652cfdc0 bsp=e0000000652c9438
> [<a0000001003b5af0>] kobject_shadow_add+0x3b0/0x440
> sp=e0000000652cfdc0 bsp=e0000000652c93f0
> [<a0000001003b5bb0>] kobject_add+0x30/0x60
> sp=e0000000652cfdc0 bsp=e0000000652c93d0
> [<a000000100488240>] device_add+0x160/0xf40
> sp=e0000000652cfdc0 bsp=e0000000652c9358
> [<a00000010061af00>] netdev_register_sysfs+0xc0/0xe0
> sp=e0000000652cfdc0 bsp=e0000000652c9328
> [<a0000001006006c0>] register_netdevice+0x600/0x7e0
> sp=e0000000652cfdc0 bsp=e0000000652c92e8
> [<a000000100600910>] register_netdev+0x70/0xc0
> sp=e0000000652cfdd0 bsp=e0000000652c92c0
> [<a00000020018dd70>] e1000_probe+0x1710/0x1a00 [e1000]
> sp=e0000000652cfdd0 bsp=e0000000652c9258
> [<a0000001003d56c0>] pci_device_probe+0xc0/0x120
> sp=e0000000652cfde0 bsp=e0000000652c9218
> [<a00000010048dc60>] really_probe+0x1c0/0x300
> sp=e0000000652cfde0 bsp=e0000000652c91c8
> [<a00000010048df40>] driver_probe_device+0x1a0/0x1c0
> sp=e0000000652cfde0 bsp=e0000000652c9198
> [<a00000010048df90>] __device_attach+0x30/0x60
> sp=e0000000652cfde0 bsp=e0000000652c9170
> [<a00000010048c080>] bus_for_each_drv+0x80/0x120
> sp=e0000000652cfde0 bsp=e0000000652c9138
> [<a00000010048e0c0>] device_attach+0xa0/0x100
> sp=e0000000652cfe00 bsp=e0000000652c9110
> [<a00000010048bec0>] bus_attach_device+0x60/0xe0
> sp=e0000000652cfe00 bsp=e0000000652c90e0
> [<a000000100488a30>] device_add+0x950/0xf40
> sp=e0000000652cfe00 bsp=e0000000652c9068
> [<a0000001003ca360>] pci_bus_add_device+0x20/0x100
> sp=e0000000652cfe00 bsp=e0000000652c9040
> [<a0000001003ca490>] pci_bus_add_devices+0x50/0x320
> sp=e0000000652cfe00 bsp=e0000000652c9010
> [<a000000200081520>] shpchp_configure_device+0x460/0x4a0 [shpchp]
> sp=e0000000652cfe00 bsp=e0000000652c8fc0
> [<a00000020007e7c0>] board_added+0x720/0x8c0 [shpchp]
> sp=e0000000652cfe00 bsp=e0000000652c8f80
> [<a00000020007f280>] shpchp_enable_slot+0x920/0xa40 [shpchp]
> sp=e0000000652cfe10 bsp=e0000000652c8f30
> [<a000000200080a40>] shpchp_sysfs_enable_slot+0x120/0x220 [shpchp]
> sp=e0000000652cfe20 bsp=e0000000652c8ef8
> [<a00000020007d270>] enable_slot+0x90/0xc0 [shpchp]
> sp=e0000000652cfe20 bsp=e0000000652c8ed0
> [<a00000020002a760>] power_write_file+0x1e0/0x2a0 [pci_hotplug]
> sp=e0000000652cfe20 bsp=e0000000652c8ea0
> [<a000000200028100>] hotplug_slot_attr_store+0x60/0xa0 [pci_hotplug]
> sp=e0000000652cfe20 bsp=e0000000652c8e68
> [<a0000001001cf010>] sysfs_write_file+0x270/0x300
> sp=e0000000652cfe20 bsp=e0000000652c8e18
> [<a0000001001395b0>] vfs_write+0x1b0/0x300
> sp=e0000000652cfe20 bsp=e0000000652c8dc0
> [<a00000010013a1b0>] sys_write+0x70/0xe0
> sp=e0000000652cfe20 bsp=e0000000652c8d48
> [<a00000010000bc20>] ia64_ret_from_syscall+0x0/0x20
> sp=e0000000652cfe30 bsp=e0000000652c8d48
> [<a000000000010620>] __kernel_syscall_via_break+0x0/0x20
> sp=e0000000652d0000 bsp=e0000000652c8d48
>
> This error is caused by the race condition between dev_alloc_name()
> and unregistering net device. The dev_alloc_name() function checks
> dev_base list to find a suitable id. The net device is removed from
> the dev_base list when net device is unregistered. Those are done with
> rtnl lock held, so there is no problem. However, removing sysfs entry
> is delayed until netdev_run_todo() which is outside rtnl lock. Because
> of this, dev_alloc_name() can create a device name which is duplicated
> with the existing sysfs entry.
>
> The following patch fixes this by changing dev_alloc_name() function
> to check not only dev_base list but also todo list and snapshot list
> to find a suitable id. If some devices exists on the todo list or
> snapshot list, sysfs entries corresponding to them are not deleted
> yet.
>
> Thanks,
> Kenji Kaneshige
>
> ---
> Fix the race condition between dev_alloc_name() and net device
> unregistration.
>
> dev_alloc_name checks `dev_base' list to find a suitable ID. On the
> other hand, net devices are removed from that list on net device
> unregistration. Both of them are done with holding rtnl lock. However,
> removing sysfs entry is delayed until netdev_run_todo(), which doesn't
> acquire rtnl lock. Therefore, on dev_alloc_name(), device name could
> conflict with the device, which is unregistered, but its sysfs entry
> still exist.
>
> To fix this bug, change dev_alloc_name to check not only `dev_base'
> list but also todo list and snapshot list.
>
> Signed-off-by: Kenji Kaneshige <[email protected]>

There should be a better way to fix this by getting rid of the kobjects
on deletion rather than the overhead another lock and list scans....

2007-05-11 16:25:35

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [BUG][PATCH] Fix race condition about network device name allocation

On Fri, 11 May 2007 14:40:45 +0900
Kenji Kaneshige <[email protected]> wrote:

> Hi,
>
> I encountered the following error when I was hot-plugging network card
> using pci hotplug driver.
>
> kobject_add failed for eth8 with -EEXIST, don't try to register things with the same name in the same directory.
>
> Call Trace:
> [<a000000100013940>] show_stack+0x40/0xa0
> sp=e0000000652cfbf0 bsp=e0000000652c9450
> [<a0000001000139d0>] dump_stack+0x30/0x60
> sp=e0000000652cfdc0 bsp=e0000000652c9438
> [<a0000001003b5af0>] kobject_shadow_add+0x3b0/0x440
> sp=e0000000652cfdc0 bsp=e0000000652c93f0
> [<a0000001003b5bb0>] kobject_add+0x30/0x60
> sp=e0000000652cfdc0 bsp=e0000000652c93d0
> [<a000000100488240>] device_add+0x160/0xf40
> sp=e0000000652cfdc0 bsp=e0000000652c9358
> [<a00000010061af00>] netdev_register_sysfs+0xc0/0xe0
> sp=e0000000652cfdc0 bsp=e0000000652c9328
> [<a0000001006006c0>] register_netdevice+0x600/0x7e0
> sp=e0000000652cfdc0 bsp=e0000000652c92e8
> [<a000000100600910>] register_netdev+0x70/0xc0
> sp=e0000000652cfdd0 bsp=e0000000652c92c0
> [<a00000020018dd70>] e1000_probe+0x1710/0x1a00 [e1000]
> sp=e0000000652cfdd0 bsp=e0000000652c9258
> [<a0000001003d56c0>] pci_device_probe+0xc0/0x120
> sp=e0000000652cfde0 bsp=e0000000652c9218
> [<a00000010048dc60>] really_probe+0x1c0/0x300
> sp=e0000000652cfde0 bsp=e0000000652c91c8
> [<a00000010048df40>] driver_probe_device+0x1a0/0x1c0
> sp=e0000000652cfde0 bsp=e0000000652c9198
> [<a00000010048df90>] __device_attach+0x30/0x60
> sp=e0000000652cfde0 bsp=e0000000652c9170
> [<a00000010048c080>] bus_for_each_drv+0x80/0x120
> sp=e0000000652cfde0 bsp=e0000000652c9138
> [<a00000010048e0c0>] device_attach+0xa0/0x100
> sp=e0000000652cfe00 bsp=e0000000652c9110
> [<a00000010048bec0>] bus_attach_device+0x60/0xe0
> sp=e0000000652cfe00 bsp=e0000000652c90e0
> [<a000000100488a30>] device_add+0x950/0xf40
> sp=e0000000652cfe00 bsp=e0000000652c9068
> [<a0000001003ca360>] pci_bus_add_device+0x20/0x100
> sp=e0000000652cfe00 bsp=e0000000652c9040
> [<a0000001003ca490>] pci_bus_add_devices+0x50/0x320
> sp=e0000000652cfe00 bsp=e0000000652c9010
> [<a000000200081520>] shpchp_configure_device+0x460/0x4a0 [shpchp]
> sp=e0000000652cfe00 bsp=e0000000652c8fc0
> [<a00000020007e7c0>] board_added+0x720/0x8c0 [shpchp]
> sp=e0000000652cfe00 bsp=e0000000652c8f80
> [<a00000020007f280>] shpchp_enable_slot+0x920/0xa40 [shpchp]
> sp=e0000000652cfe10 bsp=e0000000652c8f30
> [<a000000200080a40>] shpchp_sysfs_enable_slot+0x120/0x220 [shpchp]
> sp=e0000000652cfe20 bsp=e0000000652c8ef8
> [<a00000020007d270>] enable_slot+0x90/0xc0 [shpchp]
> sp=e0000000652cfe20 bsp=e0000000652c8ed0
> [<a00000020002a760>] power_write_file+0x1e0/0x2a0 [pci_hotplug]
> sp=e0000000652cfe20 bsp=e0000000652c8ea0
> [<a000000200028100>] hotplug_slot_attr_store+0x60/0xa0 [pci_hotplug]
> sp=e0000000652cfe20 bsp=e0000000652c8e68
> [<a0000001001cf010>] sysfs_write_file+0x270/0x300
> sp=e0000000652cfe20 bsp=e0000000652c8e18
> [<a0000001001395b0>] vfs_write+0x1b0/0x300
> sp=e0000000652cfe20 bsp=e0000000652c8dc0
> [<a00000010013a1b0>] sys_write+0x70/0xe0
> sp=e0000000652cfe20 bsp=e0000000652c8d48
> [<a00000010000bc20>] ia64_ret_from_syscall+0x0/0x20
> sp=e0000000652cfe30 bsp=e0000000652c8d48
> [<a000000000010620>] __kernel_syscall_via_break+0x0/0x20
> sp=e0000000652d0000 bsp=e0000000652c8d48
>
> This error is caused by the race condition between dev_alloc_name()
> and unregistering net device. The dev_alloc_name() function checks
> dev_base list to find a suitable id. The net device is removed from
> the dev_base list when net device is unregistered. Those are done with
> rtnl lock held, so there is no problem. However, removing sysfs entry
> is delayed until netdev_run_todo() which is outside rtnl lock. Because
> of this, dev_alloc_name() can create a device name which is duplicated
> with the existing sysfs entry.
>
> The following patch fixes this by changing dev_alloc_name() function
> to check not only dev_base list but also todo list and snapshot list
> to find a suitable id. If some devices exists on the todo list or
> snapshot list, sysfs entries corresponding to them are not deleted
> yet.
>
> Thanks,
> Kenji Kaneshige

How about the following (untested), instead. It hold a removes the sysfs
entries earlier (on unregister_netdevice), but holds a kobject reference.
Then when todo runs the actual last put free happens.

---
net/core/dev.c | 10 ++++++----
net/core/net-sysfs.c | 8 +++++++-
2 files changed, 13 insertions(+), 5 deletions(-)

--- net-2.6.orig/net/core/dev.c 2007-05-11 09:10:10.000000000 -0700
+++ net-2.6/net/core/dev.c 2007-05-11 09:21:01.000000000 -0700
@@ -3245,7 +3245,6 @@ void netdev_run_todo(void)
continue;
}

- netdev_unregister_sysfs(dev);
dev->reg_state = NETREG_UNREGISTERED;

netdev_wait_allrefs(dev);
@@ -3256,11 +3255,11 @@ void netdev_run_todo(void)
BUG_TRAP(!dev->ip6_ptr);
BUG_TRAP(!dev->dn_ptr);

- /* It must be the very last action,
- * after this 'dev' may point to freed up memory.
- */
if (dev->destructor)
dev->destructor(dev);
+
+ /* Free network device */
+ kobject_put(&dev->dev.kobj);
}

out:
@@ -3411,6 +3410,9 @@ void unregister_netdevice(struct net_dev
/* Notifier chain MUST detach us from master device. */
BUG_TRAP(!dev->master);

+ /* Remove entries from sysfs */
+ netdev_unregister_sysfs(dev);
+
/* Finish processing unregister after unlock */
net_set_todo(dev);

--- net-2.6.orig/net/core/net-sysfs.c 2007-05-11 09:10:14.000000000 -0700
+++ net-2.6/net/core/net-sysfs.c 2007-05-11 09:14:45.000000000 -0700
@@ -456,9 +456,15 @@ static struct class net_class = {
#endif
};

+/* Delete sysfs entries but hold kobject reference until after all
+ * netdev references are gone.
+ */
void netdev_unregister_sysfs(struct net_device * net)
{
- device_del(&(net->dev));
+ struct device *dev = &(net->dev);
+
+ kobject_get(&dev->kobj);
+ device_del(dev);
}

/* Create sysfs entries for network device. */

2007-05-14 01:35:04

by Kenji Kaneshige

[permalink] [raw]
Subject: Re: [BUG][PATCH] Fix race condition about network device name allocation

Hi Stephen,

Thank you for comments. I'll try your patch.

I have one concern about your patch, though I don't know very much
about netdev codes.

@@ -3411,6 +3410,9 @@ void unregister_netdevice(struct net_dev
> /* Notifier chain MUST detach us from master device. */
> BUG_TRAP(!dev->master);
>
> + /* Remove entries from sysfs */
> + netdev_unregister_sysfs(dev);
> +
> /* Finish processing unregister after unlock */
> net_set_todo(dev);
>

With your patch, the netdev_unregister_sysfs() will be called
earlier than now. Does it have no side effect? I'm worrying
about if there are somebody that depend on sysfs entry until
net_run_todo() is called.

Anyway, I'll try your patch.

Thanks,
Kenji Kaneshige


2007-05-11 (金) の 09:25 -0700 に Stephen Hemminger さんは書きました:
> On Fri, 11 May 2007 14:40:45 +0900
> Kenji Kaneshige <[email protected]> wrote:
>
> > Hi,
> >
> > I encountered the following error when I was hot-plugging network card
> > using pci hotplug driver.
> >
> > kobject_add failed for eth8 with -EEXIST, don't try to register things with the same name in the same directory.
> >
> > Call Trace:
> > [<a000000100013940>] show_stack+0x40/0xa0
> > sp=e0000000652cfbf0 bsp=e0000000652c9450
> > [<a0000001000139d0>] dump_stack+0x30/0x60
> > sp=e0000000652cfdc0 bsp=e0000000652c9438
> > [<a0000001003b5af0>] kobject_shadow_add+0x3b0/0x440
> > sp=e0000000652cfdc0 bsp=e0000000652c93f0
> > [<a0000001003b5bb0>] kobject_add+0x30/0x60
> > sp=e0000000652cfdc0 bsp=e0000000652c93d0
> > [<a000000100488240>] device_add+0x160/0xf40
> > sp=e0000000652cfdc0 bsp=e0000000652c9358
> > [<a00000010061af00>] netdev_register_sysfs+0xc0/0xe0
> > sp=e0000000652cfdc0 bsp=e0000000652c9328
> > [<a0000001006006c0>] register_netdevice+0x600/0x7e0
> > sp=e0000000652cfdc0 bsp=e0000000652c92e8
> > [<a000000100600910>] register_netdev+0x70/0xc0
> > sp=e0000000652cfdd0 bsp=e0000000652c92c0
> > [<a00000020018dd70>] e1000_probe+0x1710/0x1a00 [e1000]
> > sp=e0000000652cfdd0 bsp=e0000000652c9258
> > [<a0000001003d56c0>] pci_device_probe+0xc0/0x120
> > sp=e0000000652cfde0 bsp=e0000000652c9218
> > [<a00000010048dc60>] really_probe+0x1c0/0x300
> > sp=e0000000652cfde0 bsp=e0000000652c91c8
> > [<a00000010048df40>] driver_probe_device+0x1a0/0x1c0
> > sp=e0000000652cfde0 bsp=e0000000652c9198
> > [<a00000010048df90>] __device_attach+0x30/0x60
> > sp=e0000000652cfde0 bsp=e0000000652c9170
> > [<a00000010048c080>] bus_for_each_drv+0x80/0x120
> > sp=e0000000652cfde0 bsp=e0000000652c9138
> > [<a00000010048e0c0>] device_attach+0xa0/0x100
> > sp=e0000000652cfe00 bsp=e0000000652c9110
> > [<a00000010048bec0>] bus_attach_device+0x60/0xe0
> > sp=e0000000652cfe00 bsp=e0000000652c90e0
> > [<a000000100488a30>] device_add+0x950/0xf40
> > sp=e0000000652cfe00 bsp=e0000000652c9068
> > [<a0000001003ca360>] pci_bus_add_device+0x20/0x100
> > sp=e0000000652cfe00 bsp=e0000000652c9040
> > [<a0000001003ca490>] pci_bus_add_devices+0x50/0x320
> > sp=e0000000652cfe00 bsp=e0000000652c9010
> > [<a000000200081520>] shpchp_configure_device+0x460/0x4a0 [shpchp]
> > sp=e0000000652cfe00 bsp=e0000000652c8fc0
> > [<a00000020007e7c0>] board_added+0x720/0x8c0 [shpchp]
> > sp=e0000000652cfe00 bsp=e0000000652c8f80
> > [<a00000020007f280>] shpchp_enable_slot+0x920/0xa40 [shpchp]
> > sp=e0000000652cfe10 bsp=e0000000652c8f30
> > [<a000000200080a40>] shpchp_sysfs_enable_slot+0x120/0x220 [shpchp]
> > sp=e0000000652cfe20 bsp=e0000000652c8ef8
> > [<a00000020007d270>] enable_slot+0x90/0xc0 [shpchp]
> > sp=e0000000652cfe20 bsp=e0000000652c8ed0
> > [<a00000020002a760>] power_write_file+0x1e0/0x2a0 [pci_hotplug]
> > sp=e0000000652cfe20 bsp=e0000000652c8ea0
> > [<a000000200028100>] hotplug_slot_attr_store+0x60/0xa0 [pci_hotplug]
> > sp=e0000000652cfe20 bsp=e0000000652c8e68
> > [<a0000001001cf010>] sysfs_write_file+0x270/0x300
> > sp=e0000000652cfe20 bsp=e0000000652c8e18
> > [<a0000001001395b0>] vfs_write+0x1b0/0x300
> > sp=e0000000652cfe20 bsp=e0000000652c8dc0
> > [<a00000010013a1b0>] sys_write+0x70/0xe0
> > sp=e0000000652cfe20 bsp=e0000000652c8d48
> > [<a00000010000bc20>] ia64_ret_from_syscall+0x0/0x20
> > sp=e0000000652cfe30 bsp=e0000000652c8d48
> > [<a000000000010620>] __kernel_syscall_via_break+0x0/0x20
> > sp=e0000000652d0000 bsp=e0000000652c8d48
> >
> > This error is caused by the race condition between dev_alloc_name()
> > and unregistering net device. The dev_alloc_name() function checks
> > dev_base list to find a suitable id. The net device is removed from
> > the dev_base list when net device is unregistered. Those are done with
> > rtnl lock held, so there is no problem. However, removing sysfs entry
> > is delayed until netdev_run_todo() which is outside rtnl lock. Because
> > of this, dev_alloc_name() can create a device name which is duplicated
> > with the existing sysfs entry.
> >
> > The following patch fixes this by changing dev_alloc_name() function
> > to check not only dev_base list but also todo list and snapshot list
> > to find a suitable id. If some devices exists on the todo list or
> > snapshot list, sysfs entries corresponding to them are not deleted
> > yet.
> >
> > Thanks,
> > Kenji Kaneshige
>
> How about the following (untested), instead. It hold a removes the sysfs
> entries earlier (on unregister_netdevice), but holds a kobject reference.
> Then when todo runs the actual last put free happens.
>
> ---
> net/core/dev.c | 10 ++++++----
> net/core/net-sysfs.c | 8 +++++++-
> 2 files changed, 13 insertions(+), 5 deletions(-)
>
> --- net-2.6.orig/net/core/dev.c 2007-05-11 09:10:10.000000000 -0700
> +++ net-2.6/net/core/dev.c 2007-05-11 09:21:01.000000000 -0700
> @@ -3245,7 +3245,6 @@ void netdev_run_todo(void)
> continue;
> }
>
> - netdev_unregister_sysfs(dev);
> dev->reg_state = NETREG_UNREGISTERED;
>
> netdev_wait_allrefs(dev);
> @@ -3256,11 +3255,11 @@ void netdev_run_todo(void)
> BUG_TRAP(!dev->ip6_ptr);
> BUG_TRAP(!dev->dn_ptr);
>
> - /* It must be the very last action,
> - * after this 'dev' may point to freed up memory.
> - */
> if (dev->destructor)
> dev->destructor(dev);
> +
> + /* Free network device */
> + kobject_put(&dev->dev.kobj);
> }
>
> out:
> @@ -3411,6 +3410,9 @@ void unregister_netdevice(struct net_dev
> /* Notifier chain MUST detach us from master device. */
> BUG_TRAP(!dev->master);
>
> + /* Remove entries from sysfs */
> + netdev_unregister_sysfs(dev);
> +
> /* Finish processing unregister after unlock */
> net_set_todo(dev);
>
> --- net-2.6.orig/net/core/net-sysfs.c 2007-05-11 09:10:14.000000000 -0700
> +++ net-2.6/net/core/net-sysfs.c 2007-05-11 09:14:45.000000000 -0700
> @@ -456,9 +456,15 @@ static struct class net_class = {
> #endif
> };
>
> +/* Delete sysfs entries but hold kobject reference until after all
> + * netdev references are gone.
> + */
> void netdev_unregister_sysfs(struct net_device * net)
> {
> - device_del(&(net->dev));
> + struct device *dev = &(net->dev);
> +
> + kobject_get(&dev->kobj);
> + device_del(dev);
> }
>
> /* Create sysfs entries for network device. */

2007-05-14 08:19:51

by Kenji Kaneshige

[permalink] [raw]
Subject: Re: [BUG][PATCH] Fix race condition about network device name allocation

Hi,

> How about the following (untested), instead. It hold a removes the sysfs
> entries earlier (on unregister_netdevice), but holds a kobject reference.
> Then when todo runs the actual last put free happens.

I've confirmed the problem is fixed by Stephen's patch on my reproduction
environment.

Thanks,
Kenji Kaneshige


2007-05-11 (金) の 09:25 -0700 に Stephen Hemminger さんは書きました:
> On Fri, 11 May 2007 14:40:45 +0900
> Kenji Kaneshige <[email protected]> wrote:
>
> > Hi,
> >
> > I encountered the following error when I was hot-plugging network card
> > using pci hotplug driver.
> >
> > kobject_add failed for eth8 with -EEXIST, don't try to register things with the same name in the same directory.
> >
> > Call Trace:
> > [<a000000100013940>] show_stack+0x40/0xa0
> > sp=e0000000652cfbf0 bsp=e0000000652c9450
> > [<a0000001000139d0>] dump_stack+0x30/0x60
> > sp=e0000000652cfdc0 bsp=e0000000652c9438
> > [<a0000001003b5af0>] kobject_shadow_add+0x3b0/0x440
> > sp=e0000000652cfdc0 bsp=e0000000652c93f0
> > [<a0000001003b5bb0>] kobject_add+0x30/0x60
> > sp=e0000000652cfdc0 bsp=e0000000652c93d0
> > [<a000000100488240>] device_add+0x160/0xf40
> > sp=e0000000652cfdc0 bsp=e0000000652c9358
> > [<a00000010061af00>] netdev_register_sysfs+0xc0/0xe0
> > sp=e0000000652cfdc0 bsp=e0000000652c9328
> > [<a0000001006006c0>] register_netdevice+0x600/0x7e0
> > sp=e0000000652cfdc0 bsp=e0000000652c92e8
> > [<a000000100600910>] register_netdev+0x70/0xc0
> > sp=e0000000652cfdd0 bsp=e0000000652c92c0
> > [<a00000020018dd70>] e1000_probe+0x1710/0x1a00 [e1000]
> > sp=e0000000652cfdd0 bsp=e0000000652c9258
> > [<a0000001003d56c0>] pci_device_probe+0xc0/0x120
> > sp=e0000000652cfde0 bsp=e0000000652c9218
> > [<a00000010048dc60>] really_probe+0x1c0/0x300
> > sp=e0000000652cfde0 bsp=e0000000652c91c8
> > [<a00000010048df40>] driver_probe_device+0x1a0/0x1c0
> > sp=e0000000652cfde0 bsp=e0000000652c9198
> > [<a00000010048df90>] __device_attach+0x30/0x60
> > sp=e0000000652cfde0 bsp=e0000000652c9170
> > [<a00000010048c080>] bus_for_each_drv+0x80/0x120
> > sp=e0000000652cfde0 bsp=e0000000652c9138
> > [<a00000010048e0c0>] device_attach+0xa0/0x100
> > sp=e0000000652cfe00 bsp=e0000000652c9110
> > [<a00000010048bec0>] bus_attach_device+0x60/0xe0
> > sp=e0000000652cfe00 bsp=e0000000652c90e0
> > [<a000000100488a30>] device_add+0x950/0xf40
> > sp=e0000000652cfe00 bsp=e0000000652c9068
> > [<a0000001003ca360>] pci_bus_add_device+0x20/0x100
> > sp=e0000000652cfe00 bsp=e0000000652c9040
> > [<a0000001003ca490>] pci_bus_add_devices+0x50/0x320
> > sp=e0000000652cfe00 bsp=e0000000652c9010
> > [<a000000200081520>] shpchp_configure_device+0x460/0x4a0 [shpchp]
> > sp=e0000000652cfe00 bsp=e0000000652c8fc0
> > [<a00000020007e7c0>] board_added+0x720/0x8c0 [shpchp]
> > sp=e0000000652cfe00 bsp=e0000000652c8f80
> > [<a00000020007f280>] shpchp_enable_slot+0x920/0xa40 [shpchp]
> > sp=e0000000652cfe10 bsp=e0000000652c8f30
> > [<a000000200080a40>] shpchp_sysfs_enable_slot+0x120/0x220 [shpchp]
> > sp=e0000000652cfe20 bsp=e0000000652c8ef8
> > [<a00000020007d270>] enable_slot+0x90/0xc0 [shpchp]
> > sp=e0000000652cfe20 bsp=e0000000652c8ed0
> > [<a00000020002a760>] power_write_file+0x1e0/0x2a0 [pci_hotplug]
> > sp=e0000000652cfe20 bsp=e0000000652c8ea0
> > [<a000000200028100>] hotplug_slot_attr_store+0x60/0xa0 [pci_hotplug]
> > sp=e0000000652cfe20 bsp=e0000000652c8e68
> > [<a0000001001cf010>] sysfs_write_file+0x270/0x300
> > sp=e0000000652cfe20 bsp=e0000000652c8e18
> > [<a0000001001395b0>] vfs_write+0x1b0/0x300
> > sp=e0000000652cfe20 bsp=e0000000652c8dc0
> > [<a00000010013a1b0>] sys_write+0x70/0xe0
> > sp=e0000000652cfe20 bsp=e0000000652c8d48
> > [<a00000010000bc20>] ia64_ret_from_syscall+0x0/0x20
> > sp=e0000000652cfe30 bsp=e0000000652c8d48
> > [<a000000000010620>] __kernel_syscall_via_break+0x0/0x20
> > sp=e0000000652d0000 bsp=e0000000652c8d48
> >
> > This error is caused by the race condition between dev_alloc_name()
> > and unregistering net device. The dev_alloc_name() function checks
> > dev_base list to find a suitable id. The net device is removed from
> > the dev_base list when net device is unregistered. Those are done with
> > rtnl lock held, so there is no problem. However, removing sysfs entry
> > is delayed until netdev_run_todo() which is outside rtnl lock. Because
> > of this, dev_alloc_name() can create a device name which is duplicated
> > with the existing sysfs entry.
> >
> > The following patch fixes this by changing dev_alloc_name() function
> > to check not only dev_base list but also todo list and snapshot list
> > to find a suitable id. If some devices exists on the todo list or
> > snapshot list, sysfs entries corresponding to them are not deleted
> > yet.
> >
> > Thanks,
> > Kenji Kaneshige
>
> How about the following (untested), instead. It hold a removes the sysfs
> entries earlier (on unregister_netdevice), but holds a kobject reference.
> Then when todo runs the actual last put free happens.
>
> ---
> net/core/dev.c | 10 ++++++----
> net/core/net-sysfs.c | 8 +++++++-
> 2 files changed, 13 insertions(+), 5 deletions(-)
>
> --- net-2.6.orig/net/core/dev.c 2007-05-11 09:10:10.000000000 -0700
> +++ net-2.6/net/core/dev.c 2007-05-11 09:21:01.000000000 -0700
> @@ -3245,7 +3245,6 @@ void netdev_run_todo(void)
> continue;
> }
>
> - netdev_unregister_sysfs(dev);
> dev->reg_state = NETREG_UNREGISTERED;
>
> netdev_wait_allrefs(dev);
> @@ -3256,11 +3255,11 @@ void netdev_run_todo(void)
> BUG_TRAP(!dev->ip6_ptr);
> BUG_TRAP(!dev->dn_ptr);
>
> - /* It must be the very last action,
> - * after this 'dev' may point to freed up memory.
> - */
> if (dev->destructor)
> dev->destructor(dev);
> +
> + /* Free network device */
> + kobject_put(&dev->dev.kobj);
> }
>
> out:
> @@ -3411,6 +3410,9 @@ void unregister_netdevice(struct net_dev
> /* Notifier chain MUST detach us from master device. */
> BUG_TRAP(!dev->master);
>
> + /* Remove entries from sysfs */
> + netdev_unregister_sysfs(dev);
> +
> /* Finish processing unregister after unlock */
> net_set_todo(dev);
>
> --- net-2.6.orig/net/core/net-sysfs.c 2007-05-11 09:10:14.000000000 -0700
> +++ net-2.6/net/core/net-sysfs.c 2007-05-11 09:14:45.000000000 -0700
> @@ -456,9 +456,15 @@ static struct class net_class = {
> #endif
> };
>
> +/* Delete sysfs entries but hold kobject reference until after all
> + * netdev references are gone.
> + */
> void netdev_unregister_sysfs(struct net_device * net)
> {
> - device_del(&(net->dev));
> + struct device *dev = &(net->dev);
> +
> + kobject_get(&dev->kobj);
> + device_del(dev);
> }
>
> /* Create sysfs entries for network device. */

2007-05-14 15:42:20

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [BUG][PATCH] Fix race condition about network device name allocation

On Mon, 14 May 2007 10:33:01 +0900
Kenji Kaneshige <[email protected]> wrote:

> Hi Stephen,
>
> Thank you for comments. I'll try your patch.
>
> I have one concern about your patch, though I don't know very much
> about netdev codes.
>
> @@ -3411,6 +3410,9 @@ void unregister_netdevice(struct net_dev
> > /* Notifier chain MUST detach us from master device. */
> > BUG_TRAP(!dev->master);
> >
> > + /* Remove entries from sysfs */
> > + netdev_unregister_sysfs(dev);
> > +
> > /* Finish processing unregister after unlock */
> > net_set_todo(dev);
> >
>
> With your patch, the netdev_unregister_sysfs() will be called
> earlier than now. Does it have no side effect? I'm worrying
> about if there are somebody that depend on sysfs entry until
> net_run_todo() is called.


The unregister_sysfs() removes the entry in /sys/class/net and then
decrements the reference count. When ref count goes to zero, the kobject_release
callback gets called and which does kfree().

Changing the shutdown order makes the /sys/class/net entry disappear
sooner. In order to prevent the free from happening until after all the dev_put()
calls have happened, the reference count for the kobject needs to be increased.

2007-05-14 15:59:18

by Stephen Hemminger

[permalink] [raw]
Subject: [PATCH] Fix race condition about network device name allocation

Kenji Kaneshige found this race between device removal and
registration. On unregister it is possible for the old device to
exist, because sysfs file is still open. A new device with 'eth%d'
will select the same name, but sysfs kobject register will fial.

The following changes the shutdown order slightly. It hold a removes the sysfs
entries earlier (on unregister_netdevice), but holds a kobject reference.
Then when todo runs the actual last put free happens.

Signed-off-by: Stephen Hemminger <[email protected]>

---
net/core/dev.c | 10 ++++++----
net/core/net-sysfs.c | 8 +++++++-
2 files changed, 13 insertions(+), 5 deletions(-)

--- 2.6.21-rc1.orig/net/core/dev.c 2007-05-11 11:02:55.000000000 -0700
+++ 2.6.21-rc1/net/core/dev.c 2007-05-14 08:44:52.000000000 -0700
@@ -3245,7 +3245,6 @@ void netdev_run_todo(void)
continue;
}

- netdev_unregister_sysfs(dev);
dev->reg_state = NETREG_UNREGISTERED;

netdev_wait_allrefs(dev);
@@ -3256,11 +3255,11 @@ void netdev_run_todo(void)
BUG_TRAP(!dev->ip6_ptr);
BUG_TRAP(!dev->dn_ptr);

- /* It must be the very last action,
- * after this 'dev' may point to freed up memory.
- */
if (dev->destructor)
dev->destructor(dev);
+
+ /* Free network device */
+ kobject_put(&dev->dev.kobj);
}

out:
@@ -3411,6 +3410,9 @@ void unregister_netdevice(struct net_dev
/* Notifier chain MUST detach us from master device. */
BUG_TRAP(!dev->master);

+ /* Remove entries from sysfs */
+ netdev_unregister_sysfs(dev);
+
/* Finish processing unregister after unlock */
net_set_todo(dev);

--- 2.6.21-rc1.orig/net/core/net-sysfs.c 2007-05-11 11:02:55.000000000 -0700
+++ 2.6.21-rc1/net/core/net-sysfs.c 2007-05-14 08:44:52.000000000 -0700
@@ -456,9 +456,15 @@ static struct class net_class = {
#endif
};

+/* Delete sysfs entries but hold kobject reference until after all
+ * netdev references are gone.
+ */
void netdev_unregister_sysfs(struct net_device * net)
{
- device_del(&(net->dev));
+ struct device *dev = &(net->dev);
+
+ kobject_get(&dev->kobj);
+ device_del(dev);
}

/* Create sysfs entries for network device. */

2007-06-13 09:45:36

by Dan Aloni

[permalink] [raw]
Subject: Re: [PATCH] Fix race condition about network device name allocation

On Mon, May 14, 2007 at 08:58:40AM -0700, Stephen Hemminger wrote:
> Kenji Kaneshige found this race between device removal and
> registration. On unregister it is possible for the old device to
> exist, because sysfs file is still open. A new device with 'eth%d'
> will select the same name, but sysfs kobject register will fial.
>
> The following changes the shutdown order slightly. It hold a removes the sysfs
> entries earlier (on unregister_netdevice), but holds a kobject reference.
> Then when todo runs the actual last put free happens.
>
> Signed-off-by: Stephen Hemminger <[email protected]>

That patch breaks the bonding driver. After reverting it I avoid this crash:

<6>[1115260.637351] Ethernet Channel Bonding Driver: v3.1.2 (January 20, 2007)
<6>[1115260.637358] bonding: MII link monitoring set to 100 ms
<6>[1115260.637767] bonding: bond0 is being deleted...
<1>[1115260.695812] Unable to handle kernel NULL pointer dereference at 0000000000000020 RIP:
<1>[1115260.701504] [<ffffffff802805e2>] __lookup_hash+0x22/0x150
<6>[1115260.709754] PGD 798bf067 PUD 798c4067 PMD 0
<0>[1115260.714394] Oops: 0000 [1] SMP
[1]kdb>
[1]kdb> btc
btc: cpu status: Currently on cpu 1
Available cpus: 0(I), 1
Stack traceback for pid 0
0xffffffff80585420 0 0 1 0 I 0xffffffff80585710 swapper
rsp rip Function (args)
0xffffffff8066feb0 0xffffffff8046783f thread_return+0x5e
0xffffffff8066fec0 0xffffffff80469df9 _spin_unlock_irq+0x9
0xffffffff8066ff28 0xffffffff80209176 mwait_idle+0x46
0xffffffff8066ff50 0xffffffff802088e2 enter_idle+0x22
0xffffffff8066ff60 0xffffffff802090bc cpu_idle+0x5c
0xffffffff8066ff80 0xffffffff80207146 rest_init+0x26
0xffffffff8066ff90 0xffffffff80678c1a start_kernel+0x2ea
0xffffffff8066ffc0 0xffffffff8067815f _sinittext+0x15f
Stack traceback for pid 66
0xffff81006a411850 66 1 1 1 R 0xffff81006a411b40 *platform_node
rsp rip Function (args)
0xffff81006a46dd98 0xffffffff802805e2 __lookup_hash+0x22
0xffff81006a46de00 0xffffffff80280cba lookup_one_len+0x9a
0xffff81006a46de20 0xffffffff802be671 sysfs_remove_group+0x31
0xffff81006a46de50 0xffffffff8800fe4a [bonding]bond_destroy_sysfs_entry+0x1a
0xffff81006a46de60 0xffffffff88011974 [bonding]bonding_store_bonds+0x214
0xffff81006a46deb0 0xffffffff8037c9d4 class_attr_store+0x24
[1]more>
0xffff81006a46dec0 0xffffffff802bbe30 sysfs_write_file+0x100
0xffff81006a46df10 0xffffffff80277d7e vfs_write+0xbe
0xffff81006a46df40 0xffffffff80278400 sys_write+0x50
0xffff81006a46df80 0xffffffff80209e6e system_call+0x7e

--
Dan Aloni
XIV LTD, http://www.xivstorage.com
da-x (at) monatomic.org, dan (at) xiv.co.il

2007-06-13 16:37:36

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [PATCH] Fix race condition about network device name allocation

On Wed, 13 Jun 2007 12:45:21 +0300
Dan Aloni <[email protected]> wrote:

> On Mon, May 14, 2007 at 08:58:40AM -0700, Stephen Hemminger wrote:
> > Kenji Kaneshige found this race between device removal and
> > registration. On unregister it is possible for the old device to
> > exist, because sysfs file is still open. A new device with 'eth%d'
> > will select the same name, but sysfs kobject register will fial.
> >
> > The following changes the shutdown order slightly. It hold a removes the sysfs
> > entries earlier (on unregister_netdevice), but holds a kobject reference.
> > Then when todo runs the actual last put free happens.
> >
> > Signed-off-by: Stephen Hemminger <[email protected]>
>
> That patch breaks the bonding driver. After reverting it I avoid this crash:
>
> <6>[1115260.637351] Ethernet Channel Bonding Driver: v3.1.2 (January 20, 2007)
> <6>[1115260.637358] bonding: MII link monitoring set to 100 ms
> <6>[1115260.637767] bonding: bond0 is being deleted...
> <1>[1115260.695812] Unable to handle kernel NULL pointer dereference at 0000000000000020 RIP:
> <1>[1115260.701504] [<ffffffff802805e2>] __lookup_hash+0x22/0x150
> <6>[1115260.709754] PGD 798bf067 PUD 798c4067 PMD 0
> <0>[1115260.714394] Oops: 0000 [1] SMP
> [1]kdb>
> [1]kdb> btc
> btc: cpu status: Currently on cpu 1
> Available cpus: 0(I), 1
> Stack traceback for pid 0
> 0xffffffff80585420 0 0 1 0 I 0xffffffff80585710 swapper
> rsp rip Function (args)
> 0xffffffff8066feb0 0xffffffff8046783f thread_return+0x5e
> 0xffffffff8066fec0 0xffffffff80469df9 _spin_unlock_irq+0x9
> 0xffffffff8066ff28 0xffffffff80209176 mwait_idle+0x46
> 0xffffffff8066ff50 0xffffffff802088e2 enter_idle+0x22
> 0xffffffff8066ff60 0xffffffff802090bc cpu_idle+0x5c
> 0xffffffff8066ff80 0xffffffff80207146 rest_init+0x26
> 0xffffffff8066ff90 0xffffffff80678c1a start_kernel+0x2ea
> 0xffffffff8066ffc0 0xffffffff8067815f _sinittext+0x15f
> Stack traceback for pid 66
> 0xffff81006a411850 66 1 1 1 R 0xffff81006a411b40 *platform_node
> rsp rip Function (args)
> 0xffff81006a46dd98 0xffffffff802805e2 __lookup_hash+0x22
> 0xffff81006a46de00 0xffffffff80280cba lookup_one_len+0x9a
> 0xffff81006a46de20 0xffffffff802be671 sysfs_remove_group+0x31
> 0xffff81006a46de50 0xffffffff8800fe4a [bonding]bond_destroy_sysfs_entry+0x1a
> 0xffff81006a46de60 0xffffffff88011974 [bonding]bonding_store_bonds+0x214
> 0xffff81006a46deb0 0xffffffff8037c9d4 class_attr_store+0x24
> [1]more>
> 0xffff81006a46dec0 0xffffffff802bbe30 sysfs_write_file+0x100
> 0xffff81006a46df10 0xffffffff80277d7e vfs_write+0xbe
> 0xffff81006a46df40 0xffffffff80278400 sys_write+0x50
> 0xffff81006a46df80 0xffffffff80209e6e system_call+0x7e
>

I assume this happens when bonded slave device is removed?
Which kernel version?

2007-06-13 22:54:44

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [PATCH] Fix race condition about network device name allocation

Bonding refers to device after unregistering. This has always been
a dangerous thing. The following UNTESTED should fix the problem.

--- a/drivers/net/bonding/bond_sysfs.c 2007-06-13 15:48:37.000000000 -0700
+++ b/drivers/net/bonding/bond_sysfs.c 2007-06-13 15:49:17.000000000 -0700
@@ -164,9 +164,10 @@ static ssize_t bonding_store_bonds(struc
printk(KERN_INFO DRV_NAME
": %s is being deleted...\n",
bond->dev->name);
- unregister_netdevice(bond->dev);
+
bond_deinit(bond->dev);
bond_destroy_sysfs_entry(bond);
+ unregister_netdevice(bond->dev);
rtnl_unlock();
goto out;
}

2007-06-14 04:36:52

by Jay Vosburgh

[permalink] [raw]
Subject: Re: [PATCH] Fix race condition about network device name allocation


The following patch (based on a patch from Stephen Hemminger
<[email protected]>) removes use after free conditions in
the unregister path for the bonding master. Without this patch, an
operation of the form "echo -bond0 > /sys/class/net/bonding_masters"
would trigger a NULL pointer dereference in sysfs. I was not able to
induce the failure with the non-sysfs code path, but for consistency I
updated that code as well.

I also did some testing of the bonding /proc file being open
while the bond is being deleted, and didn't see any problems there.

Signed-off-by: Jay Vosburgh <[email protected]>

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 223517d..6287ffb 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4345,8 +4345,8 @@ static void bond_free_all(void)
bond_mc_list_destroy(bond);
/* Release the bonded slaves */
bond_release_all(bond_dev);
- unregister_netdevice(bond_dev);
bond_deinit(bond_dev);
+ unregister_netdevice(bond_dev);
}

#ifdef CONFIG_PROC_FS
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index a122baa..60cccf2 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -164,9 +164,9 @@ static ssize_t bonding_store_bonds(struct class *cls, const char *buffer, size_t
printk(KERN_INFO DRV_NAME
": %s is being deleted...\n",
bond->dev->name);
- unregister_netdevice(bond->dev);
bond_deinit(bond->dev);
bond_destroy_sysfs_entry(bond);
+ unregister_netdevice(bond->dev);
rtnl_unlock();
goto out;
}

2007-06-14 06:07:54

by Dan Aloni

[permalink] [raw]
Subject: Re: [PATCH] Fix race condition about network device name allocation

On Wed, Jun 13, 2007 at 09:36:31AM -0700, Stephen Hemminger wrote:
> On Wed, 13 Jun 2007 12:45:21 +0300
> Dan Aloni <[email protected]> wrote:
>
> > On Mon, May 14, 2007 at 08:58:40AM -0700, Stephen Hemminger wrote:
> > > Kenji Kaneshige found this race between device removal and
> > > registration. On unregister it is possible for the old device to
> > > exist, because sysfs file is still open. A new device with 'eth%d'
> > > will select the same name, but sysfs kobject register will fial.
> > >
> > > The following changes the shutdown order slightly. It hold a removes the sysfs
> > > entries earlier (on unregister_netdevice), but holds a kobject reference.
> > > Then when todo runs the actual last put free happens.
> > >
> > > Signed-off-by: Stephen Hemminger <[email protected]>
> >
> > That patch breaks the bonding driver. After reverting it I avoid this crash:
> >
>[..]
> >
>
> I assume this happens when bonded slave device is removed?

Yes, it's just a simple removal via sysfs.

> Which kernel version?

2.6.21.5

--
Dan Aloni
XIV LTD, http://www.xivstorage.com
da-x (at) monatomic.org, dan (at) xiv.co.il

2007-06-19 15:29:34

by Stephen Hemminger

[permalink] [raw]
Subject: Why is this patch not in 2.6.22-rc5?

On Wed, 13 Jun 2007 21:36:30 -0700
Jay Vosburgh <[email protected]> wrote:

>
> The following patch (based on a patch from Stephen Hemminger
> <[email protected]>) removes use after free conditions in
> the unregister path for the bonding master. Without this patch, an
> operation of the form "echo -bond0 > /sys/class/net/bonding_masters"
> would trigger a NULL pointer dereference in sysfs. I was not able to
> induce the failure with the non-sysfs code path, but for consistency I
> updated that code as well.
>
> I also did some testing of the bonding /proc file being open
> while the bond is being deleted, and didn't see any problems there.
>
> Signed-off-by: Jay Vosburgh <[email protected]>

Hey David, this patch fixes one of the bugs listed in 2.6.22-rc5
list. Jay submitted last week but it hasn't made it upstream.

2007-06-19 17:52:53

by Jeff Garzik

[permalink] [raw]
Subject: Re: Why is this patch not in 2.6.22-rc5?

Stephen Hemminger wrote:
> On Wed, 13 Jun 2007 21:36:30 -0700
> Jay Vosburgh <[email protected]> wrote:
>
>> The following patch (based on a patch from Stephen Hemminger
>> <[email protected]>) removes use after free conditions in
>> the unregister path for the bonding master. Without this patch, an
>> operation of the form "echo -bond0 > /sys/class/net/bonding_masters"
>> would trigger a NULL pointer dereference in sysfs. I was not able to
>> induce the failure with the non-sysfs code path, but for consistency I
>> updated that code as well.
>>
>> I also did some testing of the bonding /proc file being open
>> while the bond is being deleted, and didn't see any problems there.
>>
>> Signed-off-by: Jay Vosburgh <[email protected]>
>
> Hey David, this patch fixes one of the bugs listed in 2.6.22-rc5
> list. Jay submitted last week but it hasn't made it upstream.

Get him to forward it to me, like he does for other bonding patches...

Jeff



2007-06-19 18:12:33

by Jay Vosburgh

[permalink] [raw]
Subject: [PATCH] bonding: Fix use after free in unregister path


The following patch (based on a patch from Stephen Hemminger
<[email protected]>) removes use after free conditions in
the unregister path for the bonding master. Without this patch, an
operation of the form "echo -bond0 > /sys/class/net/bonding_masters"
would trigger a NULL pointer dereference in sysfs. I was not able to
induce the failure with the non-sysfs code path, but for consistency I
updated that code as well.

I also did some testing of the bonding /proc file being open
while the bond is being deleted, and didn't see any problems there.

Signed-off-by: Jay Vosburgh <[email protected]>

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 223517d..6287ffb 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4345,8 +4345,8 @@ static void bond_free_all(void)
bond_mc_list_destroy(bond);
/* Release the bonded slaves */
bond_release_all(bond_dev);
- unregister_netdevice(bond_dev);
bond_deinit(bond_dev);
+ unregister_netdevice(bond_dev);
}

#ifdef CONFIG_PROC_FS
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index a122baa..60cccf2 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -164,9 +164,9 @@ static ssize_t bonding_store_bonds(struct class *cls, const char *buffer, size_t
printk(KERN_INFO DRV_NAME
": %s is being deleted...\n",
bond->dev->name);
- unregister_netdevice(bond->dev);
bond_deinit(bond->dev);
bond_destroy_sysfs_entry(bond);
+ unregister_netdevice(bond->dev);
rtnl_unlock();
goto out;
}

2007-06-19 22:04:29

by David Miller

[permalink] [raw]
Subject: Re: Why is this patch not in 2.6.22-rc5?

From: Jeff Garzik <[email protected]>
Date: Tue, 19 Jun 2007 13:52:14 -0400

> Stephen Hemminger wrote:
> > On Wed, 13 Jun 2007 21:36:30 -0700
> > Jay Vosburgh <[email protected]> wrote:
> >
> >> The following patch (based on a patch from Stephen Hemminger
> >> <[email protected]>) removes use after free conditions in
> >> the unregister path for the bonding master. Without this patch, an
> >> operation of the form "echo -bond0 > /sys/class/net/bonding_masters"
> >> would trigger a NULL pointer dereference in sysfs. I was not able to
> >> induce the failure with the non-sysfs code path, but for consistency I
> >> updated that code as well.
> >>
> >> I also did some testing of the bonding /proc file being open
> >> while the bond is being deleted, and didn't see any problems there.
> >>
> >> Signed-off-by: Jay Vosburgh <[email protected]>
> >
> > Hey David, this patch fixes one of the bugs listed in 2.6.22-rc5
> > list. Jay submitted last week but it hasn't made it upstream.
>
> Get him to forward it to me, like he does for other bonding patches...

Yes, this patch definitely should go via Jeff, he handles merging the
bonding bits.

2007-06-20 23:13:23

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] bonding: Fix use after free in unregister path

Jay Vosburgh wrote:
> The following patch (based on a patch from Stephen Hemminger
> <[email protected]>) removes use after free conditions in
> the unregister path for the bonding master. Without this patch, an
> operation of the form "echo -bond0 > /sys/class/net/bonding_masters"
> would trigger a NULL pointer dereference in sysfs. I was not able to
> induce the failure with the non-sysfs code path, but for consistency I
> updated that code as well.
>
> I also did some testing of the bonding /proc file being open
> while the bond is being deleted, and didn't see any problems there.
>
> Signed-off-by: Jay Vosburgh <[email protected]>

applied to #upstream-fixes


2007-06-21 00:09:55

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH] bonding: Fix use after free in unregister path

* Jeff Garzik ([email protected]) wrote:
> Jay Vosburgh wrote:
> > The following patch (based on a patch from Stephen Hemminger
> ><[email protected]>) removes use after free conditions in
> >the unregister path for the bonding master. Without this patch, an
> >operation of the form "echo -bond0 > /sys/class/net/bonding_masters"
> >would trigger a NULL pointer dereference in sysfs. I was not able to
> >induce the failure with the non-sysfs code path, but for consistency I
> >updated that code as well.
> >
> > I also did some testing of the bonding /proc file being open
> >while the bond is being deleted, and didn't see any problems there.
> >
> >Signed-off-by: Jay Vosburgh <[email protected]>
>
> applied to #upstream-fixes

This was originally discovered on 2.6.21.5 IIRC, so plan to send this
to -stable as well?

thanks,
-chris