Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752903AbYHXEwR (ORCPT ); Sun, 24 Aug 2008 00:52:17 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1750801AbYHXEwB (ORCPT ); Sun, 24 Aug 2008 00:52:01 -0400 Received: from 166-70-238-42.ip.xmission.com ([166.70.238.42]:41666 "EHLO ns1.wolfmountaingroup.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750759AbYHXEwA (ORCPT ); Sun, 24 Aug 2008 00:52:00 -0400 Message-ID: <51467.166.70.238.43.1219551931.squirrel@webmail.wolfmountaingroup.com> In-Reply-To: References: <200808210250.m7L2obNX028353@wolfmountaingroup.com> <48AD5A21.7020801@s5r6.in-berlin.de> <43593.166.70.238.46.1219321595.squirrel@webmail.wolfmountaingroup.com> <200808212337.38626.nickpiggin@yahoo.com.au> <48ADD8DE.3010209@goop.org> Date: Sat, 23 Aug 2008 22:25:31 -0600 (MDT) Subject: Re: [ANNOUNCE] mdb: Merkey's Linux Kernel Debugger 2.6.27-rc4 released From: jmerkey@wolfmountaingroup.com To: "Linus Torvalds" Cc: "Jeremy Fitzhardinge" , "Nick Piggin" , jmerkey@wolfmountaingroup.com, "Stefan Richter" , paulmck@linux.vnet.ibm.com, "Peter Zijlstra" , linux-kernel@vger.kernel.org, "David Howells" User-Agent: SquirrelMail/1.4.6 MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7BIT X-Priority: 3 (Normal) Importance: Normal Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 16429 Lines: 396 Results from Analysis of GCC volatile/memory barriers Use of volatile will produce the results intended in those files which have shared data elements, but will also result in some cases global data which is not referenced outside of a file and which has not also been declared as volatile as being treated as static and optimized into local variables in some cases. If volatile is avoided entirely, the compiler appears to make correct assumptions about whether or not it is in fact global memory references. My conclusion is that the code generation of gcc appears correct and in fact does a better job than Microsoft's implementation of shared data management on SMP systems. If you use volatile, and use optimization at the same time, the compiler will take you at your word and potentially optimize global references into local variables. This is in fact better than MS cl which will ALWAYS optimize global data into local if volatile is not used with SMP data within a single file. If you choose to use volatile, then you had better use it on every variable you need shared between processors -- or just leave it out entirely -- and gcc does appear figure it out code references properly (though some of them are quite odd). While this may be counter-intuitive, it makes sense. When you are using volatile, you are telling the compiler anything not declared as volatile witihin a given file is fair game for local optimization if you turn on optimization at the same time. Analysis Code Generation for two atomic_t variables. One an array and the other standalone -- macros in kernel includes and their interactions with the compiler may be the basis of some of these cases. atomic_inc(&debuggerActive); atomic_inc(&debuggerProcessors[processor]); 55a0: f0 ff 05 00 00 00 00 lock incl 0x0 55a7: 8d 2c 8d 00 00 00 00 lea 0x0(,%ecx,4),%ebp 55ae: 8d 85 00 00 00 00 lea 0x0(%ebp),%eax 55b4: 89 04 24 mov %eax,(%esp) 55b7: f0 ff 85 00 00 00 00 lock incl 0x0(%ebp) Although the emitted asssembly is essentially correct, its odd. two identical data types, one emitted as a global fixup and the other as a relative fixup indirected from the stack frame. This works since the fixup (substitute for the 0x0) input by the loader is a negative offset relative to the entire 32 bit address space i.e. lock incl [ebp-f800XXXX], but its still an odd way to treat an atomic variable. I would think these would result in an absolute address fixup record treated some other way than as data referenced from the stack frame. Code section with mixed volatile declarations volatile unsigned long ProcessorHold[MAX_PROCESSORS]; unsigned long ProcessorState[MAX_PROCESSORS]; case 2: /* nmi */ if (ProcessorHold[processor]) /* hold processor */ { ProcessorHold[processor] = 0; ProcessorState[processor] = PROCESSOR_SUSPEND; /* processor suspend loop */ atomic_inc(&nmiProcessors[processor]); while ((ProcessorState[processor] != PROCESSOR_RESUME) && (ProcessorState[processor] != PROCESSOR_SWITCH)) { if ((ProcessorState[processor] == PROCESSOR_RESUME) || (ProcessorState[processor] == PROCESSOR_SWITCH)) break; touch_nmi_watchdog(); cpu_relax(); } atomic_dec(&nmiProcessors[processor]); 56ec: 83 3c b5 00 00 00 00 cmpl $0x0,0x0(,%esi,4) 56f3: 00 56f4: 74 1b je 5711 56f6: c7 04 b5 00 00 00 00 movl $0x0,0x0(,%esi,4) 56fd: 00 00 00 00 5701: f0 ff 85 00 00 00 00 lock incl 0x0(%ebp) 5708: e8 fc ff ff ff call 5709 570d: f3 90 pause 570f: eb f7 jmp 5708 // THIS APPEARS BROKEN - THE COMPILER IS TREATING A GLOBAL ARRAY // AS LOCAL DATA 5711: 89 f1 mov %esi,%ecx 5713: 89 da mov %ebx,%edx 5715: b8 02 00 00 00 mov $0x2,%eax 571a: eb 06 jmp 5722 571c: 89 f1 mov %esi,%ecx 571e: 89 da mov %ebx,%edx 5720: 89 f8 mov %edi,%eax 5722: e8 fc ff ff ff call 5723 5727: 83 3c b5 00 00 00 00 cmpl $0x0,0x0(,%esi,4) 572e: 00 572f: 75 c5 jne 56f6 5731: e8 fc ff ff ff call 5732 5736: e8 fc ff ff ff call 5737 573b: 85 c0 test %eax,%eax 573d: 74 0f je 574e 573f: 89 f0 mov %esi,%eax 5741: c1 e0 07 shl $0x7,%eax 5744: 05 00 00 00 00 add $0x0,%eax 5749: e8 fc ff ff ff call 574a 574e: c7 04 b5 00 00 00 00 movl $0x0,0x0(,%esi,4) 5755: 00 00 00 00 5759: 8b 04 24 mov (%esp),%eax 575c: c7 04 b5 00 00 00 00 movl $0x1,0x0(,%esi,4) 5763: 01 00 00 00 5767: f0 ff 08 lock decl (%eax) Code section without ANY volatile declarations (CODE GENERATION CORRECT) unsigned long ProcessorHold[MAX_PROCESSORS]; unsigned long ProcessorState[MAX_PROCESSORS]; case 2: /* nmi */ if (ProcessorHold[processor]) /* hold processor */ { ProcessorHold[processor] = 0; ProcessorState[processor] = PROCESSOR_SUSPEND; /* processor suspend loop */ atomic_inc(&nmiProcessors[processor]); while ((ProcessorState[processor] != PROCESSOR_RESUME) && (ProcessorState[processor] != PROCESSOR_SWITCH)) { if ((ProcessorState[processor] == PROCESSOR_RESUME) || (ProcessorState[processor] == PROCESSOR_SWITCH)) break; touch_nmi_watchdog(); cpu_relax(); } atomic_dec(&nmiProcessors[processor]); Code output from section without ANY volatile declarations 56f2: 83 3c bd 00 00 00 00 cmpl $0x0,0x0(,%edi,4) 56f9: 00 56fa: 74 5f je 575b 56fc: c7 04 bd 00 00 00 00 movl $0x0,0x0(,%edi,4) 5703: 00 00 00 00 5707: 8d b5 00 00 00 00 lea 0x0(%ebp),%esi 570d: c7 04 bd 00 00 00 00 movl $0x2,0x0(,%edi,4) 5714: 02 00 00 00 5718: f0 ff 85 00 00 00 00 lock incl 0x0(%ebp) 571f: eb 11 jmp 5732 5721: 83 f8 03 cmp $0x3,%eax 5724: 74 1d je 5743 5726: 83 f8 07 cmp $0x7,%eax 5729: 74 18 je 5743 572b: e8 fc ff ff ff call 572c 5730: f3 90 pause 5732: 8b 04 bd 00 00 00 00 mov 0x0(,%edi,4),%eax 5739: 83 f8 03 cmp $0x3,%eax 573c: 74 05 je 5743 573e: 83 f8 07 cmp $0x7,%eax 5741: 75 de jne 5721 5743: f0 ff 0e lock decl (%esi) 5746: 83 3c bd 00 00 00 00 cmpl $0x7,0x0(,%edi,4) Code from section with volatile declarations (CODE GENERATION CORRECT) volatile unsigned long ProcessorHold[MAX_PROCESSORS]; volatile unsigned long ProcessorState[MAX_PROCESSORS]; case 2: /* nmi */ if (ProcessorHold[processor]) /* hold processor */ { ProcessorHold[processor] = 0; ProcessorState[processor] = PROCESSOR_SUSPEND; /* processor suspend loop */ atomic_inc(&nmiProcessors[processor]); while ((ProcessorState[processor] != PROCESSOR_RESUME) && (ProcessorState[processor] != PROCESSOR_SWITCH)) { if ((ProcessorState[processor] == PROCESSOR_RESUME) || (ProcessorState[processor] == PROCESSOR_SWITCH)) break; touch_nmi_watchdog(); cpu_relax(); } atomic_dec(&nmiProcessors[processor]); Code Output from section with volatile declarations 5896: 8b 04 9d 00 00 00 00 mov 0x0(,%ebx,4),%eax 589d: 85 c0 test %eax,%eax 589f: 74 73 je 5914 58a1: c7 04 9d 00 00 00 00 movl $0x0,0x0(,%ebx,4) 58a8: 00 00 00 00 58ac: 8d bd 00 00 00 00 lea 0x0(%ebp),%edi 58b2: c7 04 9d 00 00 00 00 movl $0x2,0x0(,%ebx,4) 58b9: 02 00 00 00 58bd: f0 ff 85 00 00 00 00 lock incl 0x0(%ebp) 58c4: eb 1f jmp 58e5 58c6: 8b 04 9d 00 00 00 00 mov 0x0(,%ebx,4),%eax 58cd: 83 f8 03 cmp $0x3,%eax 58d0: 74 2b je 58fd 58d2: 8b 04 9d 00 00 00 00 mov 0x0(,%ebx,4),%eax 58d9: 83 f8 07 cmp $0x7,%eax 58dc: 74 1f je 58fd 58de: e8 fc ff ff ff call 58df 58e3: f3 90 pause 58e5: 8b 04 9d 00 00 00 00 mov 0x0(,%ebx,4),%eax 58ec: 83 f8 03 cmp $0x3,%eax 58ef: 74 0c je 58fd 58f1: 8b 04 9d 00 00 00 00 mov 0x0(,%ebx,4),%eax 58f8: 83 f8 07 cmp $0x7,%eax 58fb: 75 c9 jne 58c6 58fd: f0 ff 0f lock decl (%edi) 5900: 8b 04 9d 00 00 00 00 mov 0x0(,%ebx,4),%eax Code from sections without volatile declaration using wmb()/rmb() (CODE GENERATION CORRECT) for (i=0; i < MAX_PROCESSORS; i++) { if (ProcessorState[i] != PROCESSOR_HOLD) { wmb(); ProcessorState[i] = PROCESSOR_RESUME; } } unsigned long ProcessorHold[MAX_PROCESSORS]; unsigned long ProcessorState[MAX_PROCESSORS]; case 2: /* nmi */ if (ProcessorHold[processor]) /* hold processor */ { ProcessorHold[processor] = 0; ProcessorState[processor] = PROCESSOR_SUSPEND; /* processor suspend loop */ atomic_inc(&nmiProcessors[processor]); while ((ProcessorState[processor] != PROCESSOR_RESUME) && (ProcessorState[processor] != PROCESSOR_SWITCH)) { rmb(); if ((ProcessorState[processor] == PROCESSOR_RESUME) || (ProcessorState[processor] == PROCESSOR_SWITCH)) break; touch_nmi_watchdog(); cpu_relax(); } atomic_dec(&nmiProcessors[processor]); Code output from sections without volatile declaration using wmb()/rmb() 56fa: 83 3c b5 00 00 00 00 cmpl $0x0,0x0(,%esi,4) 5701: 00 5702: 74 6b je 576f 5704: c7 04 b5 00 00 00 00 movl $0x0,0x0(,%esi,4) 570b: 00 00 00 00 570f: 8d bd 00 00 00 00 lea 0x0(%ebp),%edi 5715: c7 04 b5 00 00 00 00 movl $0x2,0x0(,%esi,4) 571c: 02 00 00 00 5720: f0 ff 85 00 00 00 00 lock incl 0x0(%ebp) 5727: eb 1d jmp 5746 5729: f0 83 04 24 00 lock addl $0x0,(%esp) 572e: 8b 04 b5 00 00 00 00 mov 0x0(,%esi,4),%eax 5735: 83 f8 03 cmp $0x3,%eax 5738: 74 1d je 5757 573a: 83 f8 07 cmp $0x7,%eax 573d: 74 18 je 5757 573f: e8 fc ff ff ff call 5740 5744: f3 90 pause 5746: 8b 04 b5 00 00 00 00 mov 0x0(,%esi,4),%eax 574d: 83 f8 03 cmp $0x3,%eax 5750: 74 05 je 5757 5752: 83 f8 07 cmp $0x7,%eax 5755: 75 d2 jne 5729 5757: f0 ff 0f lock decl (%edi) 575a: 83 3c b5 00 00 00 00 cmpl $0x7,0x0(,%esi,4) 5761: 07 5762: 75 21 jne 5785 5764: 89 f1 mov %esi,%ecx 000001e1 : 1e1: 31 c0 xor %eax,%eax 1e3: 83 3c 85 00 00 00 00 cmpl $0x8,0x0(,%eax,4) 1ea: 08 1eb: 74 10 je 1fd 1ed: f0 83 04 24 00 lock addl $0x0,(%esp) 1f2: c7 04 85 00 00 00 00 movl $0x3,0x0(,%eax,4) 1f9: 03 00 00 00 1fd: 40 inc %eax 1fe: 83 f8 08 cmp $0x8,%eax 201: 75 e0 jne 1e3 203: c3 ret Code from sections without volatile declaration using barrier() (CODE GENERATION CORRECT) for (i=0; i < MAX_PROCESSORS; i++) { if (ProcessorState[i] != PROCESSOR_HOLD) { barrier(); ProcessorState[i] = PROCESSOR_RESUME; } } unsigned long ProcessorHold[MAX_PROCESSORS]; unsigned long ProcessorState[MAX_PROCESSORS]; case 2: /* nmi */ if (ProcessorHold[processor]) /* hold processor */ { ProcessorHold[processor] = 0; ProcessorState[processor] = PROCESSOR_SUSPEND; /* processor suspend loop */ atomic_inc(&nmiProcessors[processor]); while ((ProcessorState[processor] != PROCESSOR_RESUME) && (ProcessorState[processor] != PROCESSOR_SWITCH)) { barrier(); if ((ProcessorState[processor] == PROCESSOR_RESUME) || (ProcessorState[processor] == PROCESSOR_SWITCH)) break; touch_nmi_watchdog(); cpu_relax(); } atomic_dec(&nmiProcessors[processor]); Code output from sections without volatile declaration using barrier() 56f5: 83 3c b5 00 00 00 00 cmpl $0x0,0x0(,%esi,4) 56fc: 00 56fd: 74 66 je 5765 56ff: c7 04 b5 00 00 00 00 movl $0x0,0x0(,%esi,4) 5706: 00 00 00 00 570a: 8d bd 00 00 00 00 lea 0x0(%ebp),%edi 5710: c7 04 b5 00 00 00 00 movl $0x2,0x0(,%esi,4) 5717: 02 00 00 00 571b: f0 ff 85 00 00 00 00 lock incl 0x0(%ebp) 5722: eb 18 jmp 573c 5724: 8b 04 b5 00 00 00 00 mov 0x0(,%esi,4),%eax 572b: 83 f8 03 cmp $0x3,%eax 572e: 74 1d je 574d 5730: 83 f8 07 cmp $0x7,%eax 5733: 74 18 je 574d 5735: e8 fc ff ff ff call 5736 573a: f3 90 pause 573c: 8b 04 b5 00 00 00 00 mov 0x0(,%esi,4),%eax 5743: 83 f8 03 cmp $0x3,%eax 5746: 74 05 je 574d 5748: 83 f8 07 cmp $0x7,%eax 574b: 75 d7 jne 5724 574d: f0 ff 0f lock decl (%edi) 5750: 83 3c b5 00 00 00 00 cmpl $0x7,0x0(,%esi,4) 5757: 07 000001e1 : 1e1: 31 c0 xor %eax,%eax 1e3: 83 3c 85 00 00 00 00 cmpl $0x8,0x0(,%eax,4) 1ea: 08 1eb: 74 0b je 1f8 1ed: c7 04 85 00 00 00 00 movl $0x3,0x0(,%eax,4) 1f4: 03 00 00 00 1f8: 40 inc %eax 1f9: 83 f8 08 cmp $0x8,%eax 1fc: 75 e5 jne 1e3 1fe: c3 ret Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/