Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753290AbYJ1Jot (ORCPT ); Tue, 28 Oct 2008 05:44:49 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751896AbYJ1Jol (ORCPT ); Tue, 28 Oct 2008 05:44:41 -0400 Received: from fg-out-1718.google.com ([72.14.220.152]:31098 "EHLO fg-out-1718.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751099AbYJ1Joj (ORCPT ); Tue, 28 Oct 2008 05:44:39 -0400 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references; b=P1xNVUO/m30jNShOzw5erck7Pa+rn1qsp0PZb57u8vSoJCh4p9pKN19WDCvwq6hZr7 fizk86tY0vCw/rG2Nb2uZkG64YgaMbeCLeE9TXlCccgN+CiXKWZbHOV27J6tpafAJKjc uqyWTizP38oxIP3+XkE5L33QoB3ottj2mZKJw= Message-ID: <4f3ee3290810280244ua378523kf77cfaed8a1ebc38@mail.gmail.com> Date: Tue, 28 Oct 2008 10:44:37 +0100 From: cboulte@gmail.com To: "Nadia Derbey" Subject: Re: [PATCH 1/1] (v3) SYSVIPC - Fix the ipc structures initialization Cc: akpm@linux-foundation.org, manfred@colorfullife.com, linux-kernel@vger.kernel.org In-Reply-To: <1225122161.2595.248.camel@frecb000730.frec.bull.fr> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20081027072826.266777838@bull.net> <20081027072836.438113395@bull.net> <4f3ee3290810270332p56e1f99q24a5eea818e626ee@mail.gmail.com> <1225105450.2595.215.camel@frecb000730.frec.bull.fr> <1225122161.2595.248.camel@frecb000730.frec.bull.fr> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 11526 Lines: 275 On Mon, Oct 27, 2008 at 4:42 PM, Nadia Derbey wrote: > On Mon, 2008-10-27 at 12:04 +0100, Nadia Derbey wrote: >> On Mon, 2008-10-27 at 11:32 +0100, cboulte@gmail.com wrote: >> > On Mon, Oct 27, 2008 at 8:28 AM, wrote: >> > > >> > > This patch is a fix for Bugzilla bug >> > > http://bugzilla.kernel.org/show_bug.cgi?id=11796. >> > > >> > > To summarize, a simple testcase is concurrently running an infinite loop on >> > > msgctl(IPC_STAT) and a call to msgget(): >> > > >> > > while (1) >> > > msgctl(id, IPC_STAT) 1 >> > > | >> > > | >> > > | >> > > 2 id = msgget(key, IPC_CREAT) >> > > | >> > > | >> > > | >> > > >> > > In the interval [1-2], the id doesn't exist yet. >> > > >> > > In that test, the problem is the following: >> > > When we are calling ipc_addid() from msgget() the msq structure is not >> > > completely initialized. So idr_get_new() is inserting a pointer into the >> > > idr tree, and the structure which is pointed to has, among other fields, >> > > its lock uninitialized. >> > > >> > > Since msgctl(IPC_STAT) is looping while (1), idr_find() returns the >> > > pointer as soon as it is inserted into the IDR tree. And ipc_lock() >> > > calls spin_lock(&mqs->lock), while we have not initialized that lock >> > > yet. >> > > >> > > This patch moves the spin_lock_init() before the call to ipc_addid(). >> > > It also sets the "deleted" flag to 1 in the window between msg structure >> > > allocation and msg structure locking in ipc_addid(). >> > > >> > > >> > > Regards, >> > > Nadia >> > > >> > > >> > > Signed-off-by: Nadia Derbey >> > > >> > > --- >> > > ipc/util.c | 16 ++++++++++++++-- >> > > 1 file changed, 14 insertions(+), 2 deletions(-) >> > > >> > > Index: linux-2.6.27/ipc/util.c >> > > =================================================================== >> > > --- linux-2.6.27.orig/ipc/util.c 2008-10-23 15:20:46.000000000 +0200 >> > > +++ linux-2.6.27/ipc/util.c 2008-10-24 17:48:33.000000000 +0200 >> > > @@ -266,6 +266,17 @@ int ipc_addid(struct ipc_ids* ids, struc >> > > if (ids->in_use >= size) >> > > return -ENOSPC; >> > > >> > > + spin_lock_init(&new->lock); >> > > + >> > > + /* >> > > + * We have a window between the time new is inserted into the idr >> > > + * tree and the time it is actually locked. >> > > + * In order to be safe during that window set the new ipc structure >> > > + * as deleted: a concurrent ipc_lock() will see it as not present >> > > + * until the initialization phase is complete. >> > > + */ >> > > + new->deleted = 1; >> > > + >> > > err = idr_get_new(&ids->ipcs_idr, new, &id); >> > > if (err) >> > > return err; >> > > @@ -280,10 +291,11 @@ int ipc_addid(struct ipc_ids* ids, struc >> > > ids->seq = 0; >> > > >> > > new->id = ipc_buildid(id, new->seq); >> > > - spin_lock_init(&new->lock); >> > > - new->deleted = 0; >> > > rcu_read_lock(); >> > > spin_lock(&new->lock); >> > > + >> > > + new->deleted = 0; >> > > + >> > > return id; >> > > } >> > > >> > > >> > > -- >> > > >> > >> > Still got the lock... I'm using a 4 cpus node: Intel Xeon @ 2.8GHz... >> > don't know if it has an impact. >> ??? >> The bad new, is that it becomes unreprodicible on my side. >> For my part, I've got 2 2.8 GHz Xeon CPUs. >> >> Will review the code once more. >> >> Thanks! >> Nadia >> >> > The only way I found to have no lock, it's to spin_lock the ipc >> > _before_ inserting it into the idr. >> > >> > Best regards, c. >> > > > I agree with you that it's more logical and correct to take the lock > before inserting the ipc structure (i.e. making it visible to readers). > > But I wanted to understand what's wrong with > 1. new->lock init > 2. new->deleted = 1 > 3. insert(new) > > I've been looking at the code again and again and the only thing I see > could have happened, is that instructions have been reordered and the > insertion done before the lock actually being initialized. > This means that a memory barrier is missing (this would explain why your > fix works: the spin_lock acts as a barrier). > But this memory barrier is supposed to be invoked by > rcu_assign_pointer() in idr_get_new(). So may be there's a problem with > the idr code. > Before going into a review of this code, I'd like to confirm what I'm > saying, doing the following (I'm sorry to ask you do it, but I can't > reproduce the problem in my side anymore): would you mind adding a > smp_wmb() just before the idr_get_new in ipc_addid() and tell me if this > solves the problem. > (BTW, I didn't ask you before, but I guess you're getting the same call > trace?) > > Regards, > Nadia > > -- > Nadia Derbey > > I tried this patch: Index: bug-sysv/ipc/util.c =================================================================== --- bug-sysv.orig/ipc/util.c 2008-10-27 09:21:44.000000000 +0100 +++ bug-sysv/ipc/util.c 2008-10-27 19:04:33.000000000 +0100 @@ -266,6 +266,19 @@ int ipc_addid(struct ipc_ids* ids, struc if (ids->in_use >= size) return -ENOSPC; + spin_lock_init(&new->lock); + + /* + * We have a window between the time new is inserted into the idr + * tree and the time it is actually locked. + * In order to be safe during that window set the new ipc structure + * as deleted: a concurrent ipc_lock() will see it as not present + * until the initialization phase is complete. + */ + new->deleted = 1; + + smp_wmb(); + err = idr_get_new(&ids->ipcs_idr, new, &id); if (err) return err; @@ -280,10 +293,11 @@ int ipc_addid(struct ipc_ids* ids, struc ids->seq = 0; new->id = ipc_buildid(id, new->seq); - spin_lock_init(&new->lock); - new->deleted = 0; rcu_read_lock(); spin_lock(&new->lock); + + new->deleted = 0; + return id; } And got the lock (the node is still usuable but I guess it's because only 1 cpu out of 4 is locked): [ 400.393024] INFO: trying to register non-static key. [ 400.397005] the code is fine but needs lockdep annotation. [ 400.397005] turning off the locking correctness validator. [ 400.397005] Pid: 4207, comm: sysv_test2 Not tainted 2.6.27-ipc_lock #1 [ 400.397005] [ 400.397005] Call Trace: [ 400.397005] [] static_obj+0x60/0x77 [ 400.397005] [] __lock_acquire+0x1c8/0x779 [ 400.397005] [] lock_acquire+0x95/0xc2 [ 400.397005] [] ipc_lock+0x62/0x99 [ 400.397005] [] _spin_lock+0x2d/0x5a [ 400.397005] [] ipc_lock+0x62/0x99 [ 400.397005] [] ipc_lock+0x62/0x99 [ 400.397005] [] ipc_lock+0x0/0x99 [ 400.397005] [] ipc_lock_check+0x8/0x53 [ 400.397005] [] sys_msgctl+0x188/0x461 [ 400.397005] [] trace_hardirqs_on_caller+0x100/0x12a [ 400.397005] [] trace_hardirqs_on_thunk+0x3a/0x3f [ 400.397005] [] trace_hardirqs_on_caller+0x100/0x12a [ 400.397005] [] sched_clock+0x5/0x7 [ 400.397005] [] trace_hardirqs_on_thunk+0x3a/0x3f [ 400.397005] [] native_sched_clock+0x8c/0xa5 [ 400.397005] [] sched_clock+0x5/0x7 [ 400.397005] [] system_call_fastpath+0x16/0x1b [ 400.397005] [ 464.933003] BUG: soft lockup - CPU#2 stuck for 61s! [sysv_test2:4207] [ 464.933006] Modules linked in: ipv6 nfs lockd nfs_acl sunrpc button battery ac loop dm_mod md_mod usbkbd usbhid hid ff_memless mptctl evdev tg3 libphy iTCO_wdt e752x_edac edac_core uhci_hcd rng_core shpchp i2c_i801 pci_hotplug i2c_core ehci_hcd reiserfs edd fan thermal processor thermal_sys mptspi mptscsih sg mptbase scsi_transport_spi sr_mod cdrom ata_piix libata dock sd_mod scsi_mod [last unloaded: freq_table] [ 464.933006] irq event stamp: 2136363 [ 464.933006] hardirqs last enabled at (2136363): [] trace_hardirqs_on_thunk+0x3a/0x3f [ 464.933006] hardirqs last disabled at (2136361): [] __do_softirq+0xa3/0xf7 [ 464.933006] softirqs last enabled at (2136362): [] call_softirq+0x1c/0x28 [ 464.933006] softirqs last disabled at (2136357): [] call_softirq+0x1c/0x28 [ 464.933006] CPU 2: [ 464.933006] Modules linked in: ipv6 nfs lockd nfs_acl sunrpc button battery ac loop dm_mod md_mod usbkbd usbhid hid ff_memless mptctl evdev tg3 libphy iTCO_wdt e752x_edac edac_core uhci_hcd rng_core shpchp i2c_i801 pci_hotplug i2c_core ehci_hcd reiserfs edd fan thermal processor thermal_sys mptspi mptscsih sg mptbase scsi_transport_spi sr_mod cdrom ata_piix libata dock sd_mod scsi_mod [last unloaded: freq_table] [ 464.933006] Pid: 4207, comm: sysv_test2 Not tainted 2.6.27-ipc_lock #1 [ 464.933006] RIP: 0010:[] [] _raw_spin_lock+0x98/0x100 [ 464.933006] RSP: 0018:ffff880145473e48 EFLAGS: 00000206 [ 464.933006] RAX: 00000000000000cb RBX: 000000001830d3f9 RCX: 00000000ffffffff[ 464.933006] RDX: 0000018500000000 RSI: ffffffff8053d176 RDI: 0000000000000001[ 464.933006] RBP: 0000000000000000 R08: 0000000000000002 R09: 0000000000000000[ 464.933006] R10: 0000000000000000 R11: ffffffff8033a6fe R12: 0000000000000000[ 464.933006] R13: ffffffff8033a6fe R14: ffffffff8020c7ee R15: 0000000000000002[ 464.933006] FS: 00007f40899b86d0(0000) GS:ffff88014707f508(0000) knlGS:0000000000000000 [ 464.933006] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 464.933006] CR2: 00007f408974aae0 CR3: 0000000143003000 CR4: 00000000000006e0[ 464.933006] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000[ 464.933006] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400[ 464.933006] [ 464.933006] Call Trace: [ 464.933006] [] _raw_spin_lock+0x98/0x100 [ 464.933006] [] _spin_lock+0x4e/0x5a [ 464.933006] [] ipc_lock+0x62/0x99 [ 464.933006] [] ipc_lock+0x62/0x99 [ 464.933006] [] ipc_lock+0x0/0x99 [ 464.933006] [] ipc_lock_check+0x8/0x53 [ 464.933006] [] sys_msgctl+0x188/0x461 [ 464.933006] [] trace_hardirqs_on_caller+0x100/0x12a [ 464.933006] [] trace_hardirqs_on_thunk+0x3a/0x3f [ 464.933006] [] trace_hardirqs_on_caller+0x100/0x12a [ 464.933006] [] sched_clock+0x5/0x7 [ 464.933006] [] trace_hardirqs_on_thunk+0x3a/0x3f [ 464.933006] [] native_sched_clock+0x8c/0xa5 [ 464.933006] [] sched_clock+0x5/0x7 [ 464.933006] [] system_call_fastpath+0x16/0x1b [ 464.933006] I checked it with two different distributions: Debian Lenny and Sles 10 SP 1. Regards, c. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/