2008-10-28 15:01:42

by Nadia Derbey

[permalink] [raw]
Subject: [PATCH] SYSVIPC - Fix the ipc structures initialization


A problem was found while reviewing the code after Bugzilla bug
http://bugzilla.kernel.org/show_bug.cgi?id=11796.

In ipc_addid(), the newly allocated ipc structure is inserted into the ipcs
tree (i.e made visible to readers) without locking it.
This is not correct since its initialization continues after it has been
inserted in the tree.

This patch moves the ipc structure lock initialization + locking before
the actual insertion.

Regards,
Nadia


Signed-off-by: Nadia Derbey <[email protected]>

---
ipc/util.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)

Index: linux-2.6.27/ipc/util.c
===================================================================
--- linux-2.6.27.orig/ipc/util.c 2008-10-23 15:20:46.000000000 +0200
+++ linux-2.6.27/ipc/util.c 2008-10-28 16:52:17.000000000 +0100
@@ -266,9 +266,17 @@ int ipc_addid(struct ipc_ids* ids, struc
if (ids->in_use >= size)
return -ENOSPC;

+ spin_lock_init(&new->lock);
+ new->deleted = 0;
+ rcu_read_lock();
+ spin_lock(&new->lock);
+
err = idr_get_new(&ids->ipcs_idr, new, &id);
- if (err)
+ if (err) {
+ spin_unlock(&new->lock);
+ rcu_read_unlock();
return err;
+ }

ids->in_use++;

@@ -280,10 +288,6 @@ int ipc_addid(struct ipc_ids* ids, struc
ids->seq = 0;

new->id = ipc_buildid(id, new->seq);
- spin_lock_init(&new->lock);
- new->deleted = 0;
- rcu_read_lock();
- spin_lock(&new->lock);
return id;
}


--


2008-10-28 17:20:44

by Manfred Spraul

[permalink] [raw]
Subject: Re: [PATCH] SYSVIPC - Fix the ipc structures initialization

[email protected] wrote:
> Index: linux-2.6.27/ipc/util.c
> ===================================================================
> --- linux-2.6.27.orig/ipc/util.c 2008-10-23 15:20:46.000000000 +0200
> +++ linux-2.6.27/ipc/util.c 2008-10-28 16:52:17.000000000 +0100
> @@ -266,9 +266,17 @@ int ipc_addid(struct ipc_ids* ids, struc
> if (ids->in_use >= size)
> return -ENOSPC;
>
> + spin_lock_init(&new->lock);
> + new->deleted = 0;
>
Is "new->deleted = 0" still necessary?
Acquiring the lock should be sufficient.

--
Manfred

2008-10-29 09:11:20

by cboulte

[permalink] [raw]
Subject: Re: [PATCH] SYSVIPC - Fix the ipc structures initialization

On Tue, Oct 28, 2008 at 3:59 PM, <[email protected]> wrote:
>
> A problem was found while reviewing the code after Bugzilla bug
> http://bugzilla.kernel.org/show_bug.cgi?id=11796.
>
> In ipc_addid(), the newly allocated ipc structure is inserted into the ipcs
> tree (i.e made visible to readers) without locking it.
> This is not correct since its initialization continues after it has been
> inserted in the tree.
>
> This patch moves the ipc structure lock initialization + locking before
> the actual insertion.
>
> Regards,
> Nadia
>
>
> Signed-off-by: Nadia Derbey <[email protected]>
>
> ---
> ipc/util.c | 14 +++++++++-----
> 1 file changed, 9 insertions(+), 5 deletions(-)
>
> Index: linux-2.6.27/ipc/util.c
> ===================================================================
> --- linux-2.6.27.orig/ipc/util.c 2008-10-23 15:20:46.000000000 +0200
> +++ linux-2.6.27/ipc/util.c 2008-10-28 16:52:17.000000000 +0100
> @@ -266,9 +266,17 @@ int ipc_addid(struct ipc_ids* ids, struc
> if (ids->in_use >= size)
> return -ENOSPC;
>
> + spin_lock_init(&new->lock);
> + new->deleted = 0;
> + rcu_read_lock();
> + spin_lock(&new->lock);
> +
> err = idr_get_new(&ids->ipcs_idr, new, &id);
> - if (err)
> + if (err) {
> + spin_unlock(&new->lock);
> + rcu_read_unlock();
> return err;
> + }
>
> ids->in_use++;
>
> @@ -280,10 +288,6 @@ int ipc_addid(struct ipc_ids* ids, struc
> ids->seq = 0;
>
> new->id = ipc_buildid(id, new->seq);
> - spin_lock_init(&new->lock);
> - new->deleted = 0;
> - rcu_read_lock();
> - spin_lock(&new->lock);
> return id;
> }
>
>
> --
>

Sorry but the bug is still present. It takes longer to show up...

[ 4028.704877] INFO: trying to register non-static key.
[ 4028.708007] the code is fine but needs lockdep annotation.
[ 4028.708007] turning off the locking correctness validator.
[ 4028.708007] Pid: 8051, comm: sysv_test2 Not tainted 2.6.27-ipc_lock #2
[ 4028.708007]
[ 4028.708007] Call Trace:
[ 4028.708007] [<ffffffff80257055>] static_obj+0x60/0x77
[ 4028.708007] [<ffffffff8025af59>] __lock_acquire+0x1c8/0x779
[ 4028.708007] [<ffffffff8025b4ac>] __lock_acquire+0x71b/0x779
[ 4028.708007] [<ffffffff8025b59f>] lock_acquire+0x95/0xc2
[ 4028.708007] [<ffffffff802feb07>] ipc_lock+0x62/0x99
[ 4028.708007] [<ffffffff8045119d>] _spin_lock+0x2d/0x5a
[ 4028.708007] [<ffffffff802feb07>] ipc_lock+0x62/0x99
[ 4028.708007] [<ffffffff802feb07>] ipc_lock+0x62/0x99
[ 4028.708007] [<ffffffff802feaa5>] ipc_lock+0x0/0x99
[ 4028.708007] [<ffffffff803002e3>] sys_msgctl+0x188/0x461
[ 4028.708007] [<ffffffff802feb46>] ipc_lock_check+0x8/0x53
[ 4028.708007] [<ffffffff803002e3>] sys_msgctl+0x188/0x461
[ 4028.708007] [<ffffffff8044ef84>] thread_return+0x3e/0xa8
[ 4028.708007] [<ffffffff80450d69>] trace_hardirqs_on_thunk+0x3a/0x3f
[ 4028.708007] [<ffffffff80259ac7>] trace_hardirqs_on_caller+0x100/0x12a
[ 4028.708007] [<ffffffff80212e09>] sched_clock+0x5/0x7
[ 4028.708007] [<ffffffff80450d69>] trace_hardirqs_on_thunk+0x3a/0x3f
[ 4028.708007] [<ffffffff80213021>] native_sched_clock+0x8c/0xa5
[ 4028.708007] [<ffffffff80212e09>] sched_clock+0x5/0x7
[ 4028.708007] [<ffffffff8020bf7a>] system_call_fastpath+0x16/0x1b
[ 4028.708007]
[ 4094.180003] BUG: soft lockup - CPU#0 stuck for 61s! [sysv_test2:8051]
[ 4094.180007] Modules linked in: ipv6 nfs lockd nfs_acl sunrpc button
battery ac loop dm_mod md_mod usbkbd usbhid hid ff_memless mptctl
evdev tg3 iTCO_wdt libphy shpchp i2c_i801 i2c_core ehci_hcd
pci_hotplug rng_core uhci_hcd e752x_edac edac_core reiserfs edd fan
thermal processor thermal_sys mptspi mptscsih sg mptbase sr_mod cdrom
scsi_transport_spi ata_piix libata dock sd_mod scsi_mod [last
unloaded: freq_table]
[ 4094.180007] irq event stamp: 2118797
[ 4094.180007] hardirqs last enabled at (2118797):
[<ffffffff80450d69>] trace_hardirqs_on_thunk+0x3a/0x3f
[ 4094.180007] hardirqs last disabled at (2118796):
[<ffffffff80450da8>] trace_hardirqs_off_thunk+0x3a/0x3c
[ 4094.180007] softirqs last enabled at (2117034):
[<ffffffff8020d9bc>] call_softirq+0x1c/0x28
[ 4094.180007] softirqs last disabled at (2117029):
[<ffffffff8020d9bc>] call_softirq+0x1c/0x28
[ 4094.180007] CPU 0:
[ 4094.180007] Modules linked in: ipv6 nfs lockd nfs_acl sunrpc button
battery ac loop dm_mod md_mod usbkbd usbhid hid ff_memless mptctl
evdev tg3 iTCO_wdt libphy shpchp i2c_i801 i2c_core ehci_hcd
pci_hotplug rng_core uhci_hcd e752x_edac edac_core reiserfs edd fan
thermal processor thermal_sys mptspi mptscsih sg mptbase sr_mod cdrom
scsi_transport_spi ata_piix libata dock sd_mod scsi_mod [last
unloaded: freq_table]
[ 4094.180007] Pid: 8051, comm: sysv_test2 Not tainted 2.6.27-ipc_lock #2
[ 4094.180007] RIP: 0010:[<ffffffff80213541>] [<ffffffff80213541>]
native_read_tsc+0x8/0x18
[ 4094.180007] RSP: 0018:ffff88014580fe20 EFLAGS: 00000206
[ 4094.180007] RAX: 00000000f432530c RBX: fffffffff4325241 RCX:
00000000ffffffff[ 4094.180007] RDX: 0000000000000ac2 RSI:
ffffffff8053d176 RDI: 0000000000000001[ 4094.180007] RBP:
0000000000000000 R08: 0000000000000002 R09: 0000000000000000[
4094.180007] R10: 0000000000000000 R11: ffffffff8033a71e R12:
ffff88002f1861a0[ 4094.180007] R13: ffff8800ae95f000 R14:
ffff88014580e000 R15: ffffffff80827890[ 4094.180007] FS:
00007fa497b416d0(0000) GS:ffffffff80622a80(0000)
knlGS:0000000000000000
[ 4094.180007] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 4094.180007] CR2: 00007fa4978d3ae0 CR3: 000000013f980000 CR4:
00000000000006e0[ 4094.180007] DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000[ 4094.180007] DR3:
0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400[
4094.180007]
[ 4094.180007] Call Trace:
[ 4094.180007] [<ffffffff8033a73c>] delay_tsc+0x1e/0x45
[ 4094.180007] [<ffffffff8033dc8b>] _raw_spin_lock+0x98/0x100
[ 4094.180007] [<ffffffff804511be>] _spin_lock+0x4e/0x5a
[ 4094.180007] [<ffffffff802feb07>] ipc_lock+0x62/0x99
[ 4094.180007] [<ffffffff802feb07>] ipc_lock+0x62/0x99
[ 4094.180007] [<ffffffff802feaa5>] ipc_lock+0x0/0x99
[ 4094.180007] [<ffffffff803002e3>] sys_msgctl+0x188/0x461
[ 4094.180007] [<ffffffff802feb46>] ipc_lock_check+0x8/0x53
[ 4094.180007] [<ffffffff803002e3>] sys_msgctl+0x188/0x461
[ 4094.180007] [<ffffffff8044ef84>] thread_return+0x3e/0xa8
[ 4094.180007] [<ffffffff80450d69>] trace_hardirqs_on_thunk+0x3a/0x3f
[ 4094.180007] [<ffffffff80259ac7>] trace_hardirqs_on_caller+0x100/0x12a
[ 4094.180007] [<ffffffff80212e09>] sched_clock+0x5/0x7
[ 4094.180007] [<ffffffff80450d69>] trace_hardirqs_on_thunk+0x3a/0x3f
[ 4094.180007] [<ffffffff80213021>] native_sched_clock+0x8c/0xa5
[ 4094.180007] [<ffffffff80212e09>] sched_clock+0x5/0x7
[ 4094.180007] [<ffffffff8020bf7a>] system_call_fastpath+0x16/0x1b
[ 4094.180007]

I don't understand what's going on... I hope I'm not the only one who
still get it because I doubt about this bug.

c.

2008-11-11 22:16:58

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] SYSVIPC - Fix the ipc structures initialization

On Wed, 29 Oct 2008 10:11:06 +0100
[email protected] wrote:

> On Tue, Oct 28, 2008 at 3:59 PM, <[email protected]> wrote:
> >
> > A problem was found while reviewing the code after Bugzilla bug
> > http://bugzilla.kernel.org/show_bug.cgi?id=11796.
> >
> > In ipc_addid(), the newly allocated ipc structure is inserted into the ipcs
> > tree (i.e made visible to readers) without locking it.
> > This is not correct since its initialization continues after it has been
> > inserted in the tree.
> >
> > This patch moves the ipc structure lock initialization + locking before
> > the actual insertion.
> >
> > Regards,
> > Nadia
> >
> >
> > Signed-off-by: Nadia Derbey <[email protected]>
> >
> > ---
> > ipc/util.c | 14 +++++++++-----
> > 1 file changed, 9 insertions(+), 5 deletions(-)
> >
> > Index: linux-2.6.27/ipc/util.c
> > ===================================================================
> > --- linux-2.6.27.orig/ipc/util.c 2008-10-23 15:20:46.000000000 +0200
> > +++ linux-2.6.27/ipc/util.c 2008-10-28 16:52:17.000000000 +0100
> > @@ -266,9 +266,17 @@ int ipc_addid(struct ipc_ids* ids, struc
> > if (ids->in_use >= size)
> > return -ENOSPC;
> >
> > + spin_lock_init(&new->lock);
> > + new->deleted = 0;
> > + rcu_read_lock();
> > + spin_lock(&new->lock);
> > +
> > err = idr_get_new(&ids->ipcs_idr, new, &id);
> > - if (err)
> > + if (err) {
> > + spin_unlock(&new->lock);
> > + rcu_read_unlock();
> > return err;
> > + }
> >
> > ids->in_use++;
> >
> > @@ -280,10 +288,6 @@ int ipc_addid(struct ipc_ids* ids, struc
> > ids->seq = 0;
> >
> > new->id = ipc_buildid(id, new->seq);
> > - spin_lock_init(&new->lock);
> > - new->deleted = 0;
> > - rcu_read_lock();
> > - spin_lock(&new->lock);
> > return id;
> > }
> >
> >
> > --
> >
>
> Sorry but the bug is still present. It takes longer to show up...
>
> [ 4028.704877] INFO: trying to register non-static key.
> [ 4028.708007] the code is fine but needs lockdep annotation.
> [ 4028.708007] turning off the locking correctness validator.
> [ 4028.708007] Pid: 8051, comm: sysv_test2 Not tainted 2.6.27-ipc_lock #2
> [ 4028.708007]
> [ 4028.708007] Call Trace:
> [ 4028.708007] [<ffffffff80257055>] static_obj+0x60/0x77
> [ 4028.708007] [<ffffffff8025af59>] __lock_acquire+0x1c8/0x779
> [ 4028.708007] [<ffffffff8025b4ac>] __lock_acquire+0x71b/0x779
> [ 4028.708007] [<ffffffff8025b59f>] lock_acquire+0x95/0xc2
> [ 4028.708007] [<ffffffff802feb07>] ipc_lock+0x62/0x99
> [ 4028.708007] [<ffffffff8045119d>] _spin_lock+0x2d/0x5a
> [ 4028.708007] [<ffffffff802feb07>] ipc_lock+0x62/0x99
> [ 4028.708007] [<ffffffff802feb07>] ipc_lock+0x62/0x99
> [ 4028.708007] [<ffffffff802feaa5>] ipc_lock+0x0/0x99
> [ 4028.708007] [<ffffffff803002e3>] sys_msgctl+0x188/0x461
> [ 4028.708007] [<ffffffff802feb46>] ipc_lock_check+0x8/0x53
> [ 4028.708007] [<ffffffff803002e3>] sys_msgctl+0x188/0x461
> [ 4028.708007] [<ffffffff8044ef84>] thread_return+0x3e/0xa8
> [ 4028.708007] [<ffffffff80450d69>] trace_hardirqs_on_thunk+0x3a/0x3f
> [ 4028.708007] [<ffffffff80259ac7>] trace_hardirqs_on_caller+0x100/0x12a
> [ 4028.708007] [<ffffffff80212e09>] sched_clock+0x5/0x7
> [ 4028.708007] [<ffffffff80450d69>] trace_hardirqs_on_thunk+0x3a/0x3f
> [ 4028.708007] [<ffffffff80213021>] native_sched_clock+0x8c/0xa5
> [ 4028.708007] [<ffffffff80212e09>] sched_clock+0x5/0x7
> [ 4028.708007] [<ffffffff8020bf7a>] system_call_fastpath+0x16/0x1b
> [ 4028.708007]
> [ 4094.180003] BUG: soft lockup - CPU#0 stuck for 61s! [sysv_test2:8051]
> [ 4094.180007] Modules linked in: ipv6 nfs lockd nfs_acl sunrpc button
> battery ac loop dm_mod md_mod usbkbd usbhid hid ff_memless mptctl
> evdev tg3 iTCO_wdt libphy shpchp i2c_i801 i2c_core ehci_hcd
> pci_hotplug rng_core uhci_hcd e752x_edac edac_core reiserfs edd fan
> thermal processor thermal_sys mptspi mptscsih sg mptbase sr_mod cdrom
> scsi_transport_spi ata_piix libata dock sd_mod scsi_mod [last
> unloaded: freq_table]
> [ 4094.180007] irq event stamp: 2118797
> [ 4094.180007] hardirqs last enabled at (2118797):
> [<ffffffff80450d69>] trace_hardirqs_on_thunk+0x3a/0x3f
> [ 4094.180007] hardirqs last disabled at (2118796):
> [<ffffffff80450da8>] trace_hardirqs_off_thunk+0x3a/0x3c
> [ 4094.180007] softirqs last enabled at (2117034):
> [<ffffffff8020d9bc>] call_softirq+0x1c/0x28
> [ 4094.180007] softirqs last disabled at (2117029):
> [<ffffffff8020d9bc>] call_softirq+0x1c/0x28
> [ 4094.180007] CPU 0:
> [ 4094.180007] Modules linked in: ipv6 nfs lockd nfs_acl sunrpc button
> battery ac loop dm_mod md_mod usbkbd usbhid hid ff_memless mptctl
> evdev tg3 iTCO_wdt libphy shpchp i2c_i801 i2c_core ehci_hcd
> pci_hotplug rng_core uhci_hcd e752x_edac edac_core reiserfs edd fan
> thermal processor thermal_sys mptspi mptscsih sg mptbase sr_mod cdrom
> scsi_transport_spi ata_piix libata dock sd_mod scsi_mod [last
> unloaded: freq_table]
> [ 4094.180007] Pid: 8051, comm: sysv_test2 Not tainted 2.6.27-ipc_lock #2
> [ 4094.180007] RIP: 0010:[<ffffffff80213541>] [<ffffffff80213541>]
> native_read_tsc+0x8/0x18
> [ 4094.180007] RSP: 0018:ffff88014580fe20 EFLAGS: 00000206
> [ 4094.180007] RAX: 00000000f432530c RBX: fffffffff4325241 RCX:
> 00000000ffffffff[ 4094.180007] RDX: 0000000000000ac2 RSI:
> ffffffff8053d176 RDI: 0000000000000001[ 4094.180007] RBP:
> 0000000000000000 R08: 0000000000000002 R09: 0000000000000000[
> 4094.180007] R10: 0000000000000000 R11: ffffffff8033a71e R12:
> ffff88002f1861a0[ 4094.180007] R13: ffff8800ae95f000 R14:
> ffff88014580e000 R15: ffffffff80827890[ 4094.180007] FS:
> 00007fa497b416d0(0000) GS:ffffffff80622a80(0000)
> knlGS:0000000000000000
> [ 4094.180007] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 4094.180007] CR2: 00007fa4978d3ae0 CR3: 000000013f980000 CR4:
> 00000000000006e0[ 4094.180007] DR0: 0000000000000000 DR1:
> 0000000000000000 DR2: 0000000000000000[ 4094.180007] DR3:
> 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400[
> 4094.180007]
> [ 4094.180007] Call Trace:
> [ 4094.180007] [<ffffffff8033a73c>] delay_tsc+0x1e/0x45
> [ 4094.180007] [<ffffffff8033dc8b>] _raw_spin_lock+0x98/0x100
> [ 4094.180007] [<ffffffff804511be>] _spin_lock+0x4e/0x5a
> [ 4094.180007] [<ffffffff802feb07>] ipc_lock+0x62/0x99
> [ 4094.180007] [<ffffffff802feb07>] ipc_lock+0x62/0x99
> [ 4094.180007] [<ffffffff802feaa5>] ipc_lock+0x0/0x99
> [ 4094.180007] [<ffffffff803002e3>] sys_msgctl+0x188/0x461
> [ 4094.180007] [<ffffffff802feb46>] ipc_lock_check+0x8/0x53
> [ 4094.180007] [<ffffffff803002e3>] sys_msgctl+0x188/0x461
> [ 4094.180007] [<ffffffff8044ef84>] thread_return+0x3e/0xa8
> [ 4094.180007] [<ffffffff80450d69>] trace_hardirqs_on_thunk+0x3a/0x3f
> [ 4094.180007] [<ffffffff80259ac7>] trace_hardirqs_on_caller+0x100/0x12a
> [ 4094.180007] [<ffffffff80212e09>] sched_clock+0x5/0x7
> [ 4094.180007] [<ffffffff80450d69>] trace_hardirqs_on_thunk+0x3a/0x3f
> [ 4094.180007] [<ffffffff80213021>] native_sched_clock+0x8c/0xa5
> [ 4094.180007] [<ffffffff80212e09>] sched_clock+0x5/0x7
> [ 4094.180007] [<ffffffff8020bf7a>] system_call_fastpath+0x16/0x1b
> [ 4094.180007]
>
> I don't understand what's going on... I hope I'm not the only one who
> still get it because I doubt about this bug.
>

Time is starting to press on this one. Is there something which we can
revert which would fix this bug?

Thanks.

2008-11-12 06:41:56

by Manfred Spraul

[permalink] [raw]
Subject: Re: [PATCH] SYSVIPC - Fix the ipc structures initialization

Andrew Morton wrote:
>> I don't understand what's going on... I hope I'm not the only one who
>> still get it because I doubt about this bug.
>>
One thing is odd:
The test app uses fork(), two processes.
One process constantly does
msgctl(0, IPC_STAT, &buf);

The other process creates a new msg queue, then it destroys it.
But: The probablity that the new queue has the msqid is 0 is tiny
[impossible?].
That means the idr_find() call should [must?] fail.
But then the spin_lock() should never be executed. Why do you get an
oops in that call?

Do you still use the test app that is on bugzilla?

>>
>
> Time is starting to press on this one. Is there something which we can
> revert which would fix this bug?
>
> Thanks.
>

2008-11-13 06:10:14

by Manfred Spraul

[permalink] [raw]
Subject: Re: [PATCH] SYSVIPC - Fix the ipc structures initialization

Andrew Morton wrote:
> Time is starting to press on this one. Is there something which we can
> revert which would fix this bug?
>
My previous analysis was bogus, let's start from scratch:

1) the initial oops report:
http://bugzilla.kernel.org/show_bug.cgi?id=11796#c0

- lockdep is enabled, the oops is somewhere in __lock_acquire
- the instruction that oopses is

>>> lock incl 0x138(%r12)
R12 is 0x0038004000000000

That could be an debug_atomic_inc() in __lock_acquire. The class pointer
in the spinlock_t is not initialized, thus it crashes.
Ingo - is that possible?

2) the latest oops was actually a soft lockup:

It starts with:
> [ 400.393024] INFO: trying to register non-static key.
> [ 400.397005] the code is fine but needs lockdep annotation.
> [ 400.397005] turning off the locking correctness validator.
> [ 400.397005] Pid: 4207, comm: sysv_test2 Not tainted 2.6.27-ipc_lock #1
> [ 400.397005] Call Trace:
> [ 400.397005] [<ffffffff80257055>] static_obj+0x60/0x77
> [ 400.397005] [<ffffffff8025af59>] __lock_acquire+0x1c8/0x779
> [ 400.397005] [<ffffffff8025b59f>] lock_acquire+0x95/0xc2
> [ 400.397005] [<ffffffff802feb07>] ipc_lock+0x62/0x99
> [ 400.397005] [<ffffffff8045117d>] _spin_lock+0x2d/0x5a
> [ 400.397005] [<ffffffff802feb07>] ipc_lock+0x62/0x99
> [ 400.397005] [<ffffffff802feb07>] ipc_lock+0x62/0x99
> [ 400.397005] [<ffffffff802feaa5>] ipc_lock+0x0/0x99
> [ 400.397005] [<ffffffff802feb46>] ipc_lock_check+0x8/0x53
> [ 400.397005] [<ffffffff803002c3>] sys_msgctl+0x188/0x461
> [ 400.397005] [<ffffffff80259ac7>] trace_hardirqs_on_caller+0x100/0x12a
> [ 400.397005] [<ffffffff80450d49>] trace_hardirqs_on_thunk+0x3a/0x3f
> [ 400.397005] [<ffffffff80259ac7>] trace_hardirqs_on_caller+0x100/0x12a
> [ 400.397005] [<ffffffff80212e09>] sched_clock+0x5/0x7
> [ 400.397005] [<ffffffff80450d49>] trace_hardirqs_on_thunk+0x3a/0x3f
> [ 400.397005] [<ffffffff80213021>] native_sched_clock+0x8c/0xa5
> [ 400.397005] [<ffffffff80212e09>] sched_clock+0x5/0x7
> [ 400.397005] [<ffffffff8020bf7a>] system_call_fastpath+0x16/0x1b
> [ 400.397005]
> [ 464.933003] BUG: soft lockup - CPU#2 stuck for 61s! [sysv_test2:4207]
> [ 464.933006] Call Trace:
> [ 464.933006] [<ffffffff8033dc6b>] _raw_spin_lock+0x98/0x100
> [ 464.933006] [<ffffffff8045119e>] _spin_lock+0x4e/0x5a
> [ 464.933006] [<ffffffff802feb07>] ipc_lock+0x62/0x99

For me, it reads like an uninitialized spinlock_t:
The static_obj test in kernel/lockdep.c notices that something is wrong and disables itself.
But then _raw_spin_lock() tries to acquire the uninitialized spinlock and loops forever, because noone does spin_unlock().
after 60 seconds, the soft lockup detection notices the problem and oopses.


2008-11-13 08:07:17

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] SYSVIPC - Fix the ipc structures initialization

On Thu, 2008-11-13 at 07:10 +0100, Manfred Spraul wrote:
> Andrew Morton wrote:
> > Time is starting to press on this one. Is there something which we can
> > revert which would fix this bug?
> >
> My previous analysis was bogus, let's start from scratch:
>
> 1) the initial oops report:
> http://bugzilla.kernel.org/show_bug.cgi?id=11796#c0
>
> - lockdep is enabled, the oops is somewhere in __lock_acquire
> - the instruction that oopses is

> For me, it reads like an uninitialized spinlock_t:

Yes, most (if not all) oopsen in either __lock_acquire or __lock_release
are stomped on spinlocks, either use after free or otherwise.

2008-11-13 10:09:41

by Nadia Derbey

[permalink] [raw]
Subject: Re: [PATCH] SYSVIPC - Fix the ipc structures initialization

On Thu, 2008-11-13 at 07:10 +0100, Manfred Spraul wrote:
> Andrew Morton wrote:
> > Time is starting to press on this one. Is there something which we can
> > revert which would fix this bug?
> >
> My previous analysis was bogus, let's start from scratch:
>
> 1) the initial oops report:
> http://bugzilla.kernel.org/show_bug.cgi?id=11796#c0
>
> - lockdep is enabled, the oops is somewhere in __lock_acquire
> - the instruction that oopses is
>
> >>> lock incl 0x138(%r12)
> R12 is 0x0038004000000000
>
> That could be an debug_atomic_inc() in __lock_acquire. The class pointer
> in the spinlock_t is not initialized, thus it crashes.
> Ingo - is that possible?
>
> 2) the latest oops was actually a soft lockup:
>
> It starts with:
> > [ 400.393024] INFO: trying to register non-static key.
> > [ 400.397005] the code is fine but needs lockdep annotation.
> > [ 400.397005] turning off the locking correctness validator.
> > [ 400.397005] Pid: 4207, comm: sysv_test2 Not tainted 2.6.27-ipc_lock #1
> > [ 400.397005] Call Trace:
> > [ 400.397005] [<ffffffff80257055>] static_obj+0x60/0x77
> > [ 400.397005] [<ffffffff8025af59>] __lock_acquire+0x1c8/0x779
> > [ 400.397005] [<ffffffff8025b59f>] lock_acquire+0x95/0xc2
> > [ 400.397005] [<ffffffff802feb07>] ipc_lock+0x62/0x99
> > [ 400.397005] [<ffffffff8045117d>] _spin_lock+0x2d/0x5a
> > [ 400.397005] [<ffffffff802feb07>] ipc_lock+0x62/0x99
> > [ 400.397005] [<ffffffff802feb07>] ipc_lock+0x62/0x99
> > [ 400.397005] [<ffffffff802feaa5>] ipc_lock+0x0/0x99
> > [ 400.397005] [<ffffffff802feb46>] ipc_lock_check+0x8/0x53
> > [ 400.397005] [<ffffffff803002c3>] sys_msgctl+0x188/0x461
> > [ 400.397005] [<ffffffff80259ac7>] trace_hardirqs_on_caller+0x100/0x12a
> > [ 400.397005] [<ffffffff80450d49>] trace_hardirqs_on_thunk+0x3a/0x3f
> > [ 400.397005] [<ffffffff80259ac7>] trace_hardirqs_on_caller+0x100/0x12a
> > [ 400.397005] [<ffffffff80212e09>] sched_clock+0x5/0x7
> > [ 400.397005] [<ffffffff80450d49>] trace_hardirqs_on_thunk+0x3a/0x3f
> > [ 400.397005] [<ffffffff80213021>] native_sched_clock+0x8c/0xa5
> > [ 400.397005] [<ffffffff80212e09>] sched_clock+0x5/0x7
> > [ 400.397005] [<ffffffff8020bf7a>] system_call_fastpath+0x16/0x1b
> > [ 400.397005]
> > [ 464.933003] BUG: soft lockup - CPU#2 stuck for 61s! [sysv_test2:4207]
> > [ 464.933006] Call Trace:
> > [ 464.933006] [<ffffffff8033dc6b>] _raw_spin_lock+0x98/0x100
> > [ 464.933006] [<ffffffff8045119e>] _spin_lock+0x4e/0x5a
> > [ 464.933006] [<ffffffff802feb07>] ipc_lock+0x62/0x99
>
> For me, it reads like an uninitialized spinlock_t:

That's what the patch I sent at the beginning of this thread
(http://lkml.org/lkml/2008/10/28/177) fixes: the ipc structure is not
completely initialized before being inserted in the tree.

The patched kernel has been running for 4 days on my victim without any
problem, but Clement is still facing a problem (that comes later).

I think this patch should be included anyways, because ipc structures
initialization is not correct the way it is done today.

I'm still reviewing the code to see what else could be wrong.

Regards,
Nadia

> The static_obj test in kernel/lockdep.c notices that something is wrong and disables itself.
> But then _raw_spin_lock() tries to acquire the uninitialized spinlock and loops forever, because noone does spin_unlock().
> after 60 seconds, the soft lockup detection notices the problem and oopses.
>
>
>
>
--
Nadia Derbey <[email protected]>

2008-11-13 17:54:09

by Manfred Spraul

[permalink] [raw]
Subject: Re: [PATCH] SYSVIPC - Fix the ipc structures initialization

Nadia Derbey wrote:
> That's what the patch I sent at the beginning of this thread
> (http://lkml.org/lkml/2008/10/28/177) fixes: the ipc structure is not
> completely initialized before being inserted in the tree.
>
> The patched kernel has been running for 4 days on my victim without any
> problem, but Clement is still facing a problem (that comes later).
>
> I think this patch should be included anyways, because ipc structures
> initialization is not correct the way it is done today.
>
>
Definitively: Your patch belongs into 2.6.28 and the next 2.6.27.

> I'm still reviewing the code to see what else could be wrong.
>
I started looking into the idr code: perhaps idr returns a stale entry?
A race somewhere in the RCU code.
I.e. the new bug is not corruption due to access before
spin_lock_init(), but due to access after kfree()?

Clement: Could you send us an ipcs output from before the test?
Do you have any sysv msg queues on your system?
And: Do you use PREEMPT, do you use PREEMPT_RCU?

If there are no message queues around, then every msgget()/IPC_RMID will
create a new idr layer/tear the layer down again.

--
Manfred