Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754826Ab2BARML (ORCPT ); Wed, 1 Feb 2012 12:12:11 -0500 Received: from cantor2.suse.de ([195.135.220.15]:54526 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752519Ab2BARMJ (ORCPT ); Wed, 1 Feb 2012 12:12:09 -0500 Date: Wed, 1 Feb 2012 18:11:29 +0100 (CET) From: Jiri Kosina X-X-Sender: jikos@twin.jikos.cz To: Linus Torvalds Cc: Colin Walters , Jan Kara , LKML , linux-ia64@vger.kernel.org, dsterba@suse.cz, ptesarik@suse.cz, rguenther@suse.de, gcc@gcc.gnu.org Subject: Re: Memory corruption due to word sharing In-Reply-To: Message-ID: References: <20120201151918.GC16714@quack.suse.cz> <1328114266.5355.44.camel@lenny> User-Agent: Alpine 2.00 (LRH 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3024 Lines: 101 On Wed, 1 Feb 2012, Linus Torvalds wrote: > But the compiler turns the access to the bitfield (in a 32-bit aligned > word) into a 64-bit access that accesses the word *next* to it. > > That word next to it might *be* the lock, for example. > > So we could literally have this kind of situation: > > struct { > atomic_t counter; > unsigned int val:4, other:4, data:24; > }; > > and if we write code like this: > > spin_lock(&somelock); > s->data++; > spin_unlock(&somelock); > > and on another CPU we might do > > atomic_inc(&counter); > > and the access to the bitfield will *corrupt* the atomic counter, even > though both of them are perfectly fine! > > Quite frankly, if the bug is simply because gcc doesn't actually know > or care about the underlying size of the bitmask, it is possible that > we can find a case where gcc clearly is buggy even according to the > original C rules. > > Honza - since you have access to the compiler in question, try > compiling this trivial test-program: > > > struct example { > volatile int a; > int b:1; > }; > > .. > s->b = 1; > .. > > and if that bitfield access actually does a 64-bit access that also > touches 's->a', then dammit, that's a clear violation of even the > *old* C standard, and the gcc people cannot just wave away their bugs > by saying "we've got standads, pttthththt". > > And I suspect it really is a generic bug that can be shown even with > the above trivial example. I have actually tried exactly this earlier today (because while looking at this, I had an idea that putting volatile in place could be a workaround, causing gcc to generate a saner code), but it doesn't work either: # cat x.c struct x { long a; volatile unsigned int lock; unsigned int full:1; }; void wrong(struct x *ptr) { ptr->full = 1; } int main() { wrong(0); } # gcc -O2 x.c # gdb -q ./a.out Reading symbols from /root/a.out...done. (gdb) disassemble wrong Dump of assembler code for function wrong: 0x40000000000005c0 <+0>: [MMI] adds r32=8,r32 0x40000000000005c1 <+1>: nop.m 0x0 0x40000000000005c2 <+2>: mov r15=1;; 0x40000000000005d0 <+16>: [MMI] ld8 r14=[r32];; 0x40000000000005d1 <+17>: nop.m 0x0 0x40000000000005d2 <+18>: dep r14=r15,r14,32,1;; 0x40000000000005e0 <+32>: [MIB] st8 [r32]=r14 0x40000000000005e1 <+33>: nop.i 0x0 0x40000000000005e2 <+34>: br.ret.sptk.many b0;; In my opinion, this is a clear bug in gcc (while the original problem, without explitict volatile, is not a C spec violation per se, it's just very inconvenient :) ). -- Jiri Kosina SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/