Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965179AbdCXSTZ (ORCPT ); Fri, 24 Mar 2017 14:19:25 -0400 Received: from mail-vk0-f52.google.com ([209.85.213.52]:35079 "EHLO mail-vk0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935561AbdCXSRa (ORCPT ); Fri, 24 Mar 2017 14:17:30 -0400 MIME-Version: 1.0 In-Reply-To: <20170324180838.crc2dmxswklqmyrx@hirez.programming.kicks-ass.net> References: <20170324142140.vpyzl755oj6rb5qv@hirez.programming.kicks-ass.net> <20170324164108.ibcxxqbhvx6ao54r@hirez.programming.kicks-ass.net> <20170324172342.radlrhk2z6mwmdgk@hirez.programming.kicks-ass.net> <20170324180838.crc2dmxswklqmyrx@hirez.programming.kicks-ass.net> From: Dmitry Vyukov Date: Fri, 24 Mar 2017 19:16:57 +0100 Message-ID: Subject: Re: locking/atomic: Introduce atomic_try_cmpxchg() To: Peter Zijlstra Cc: Andy Lutomirski , Andrew Morton , Andy Lutomirski , Borislav Petkov , Brian Gerst , Denys Vlasenko , "H. Peter Anvin" , Josh Poimboeuf , Linus Torvalds , Paul McKenney , Thomas Gleixner , Ingo Molnar , LKML Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6574 Lines: 123 On Fri, Mar 24, 2017 at 7:08 PM, Peter Zijlstra wrote: > On Fri, Mar 24, 2017 at 06:51:15PM +0100, Dmitry Vyukov wrote: >> On Fri, Mar 24, 2017 at 6:23 PM, Peter Zijlstra wrote: >> > On Fri, Mar 24, 2017 at 09:54:46AM -0700, Andy Lutomirski wrote: >> >> > So the first snipped I tested regressed like so: >> >> > >> >> > >> >> > 0000000000000000 : 0000000000000000 : >> >> > 0: 8b 07 mov (%rdi),%eax 0: 8b 17 mov (%rdi),%edx >> >> > 2: 83 f8 ff cmp $0xffffffff,%eax 2: 83 fa ff cmp $0xffffffff,%edx >> >> > 5: 74 13 je 1a 5: 74 1a je 21 >> >> > 7: 85 c0 test %eax,%eax 7: 85 d2 test %edx,%edx >> >> > 9: 74 0d je 18 9: 74 13 je 1e >> >> > b: 8d 50 01 lea 0x1(%rax),%edx b: 8d 4a 01 lea 0x1(%rdx),%ecx >> >> > e: f0 0f b1 17 lock cmpxchg %edx,(%rdi) e: 89 d0 mov %edx,%eax >> >> > 12: 75 ee jne 2 10: f0 0f b1 0f lock cmpxchg %ecx,(%rdi) >> >> > 14: ff c2 inc %edx 14: 74 04 je 1a >> >> > 16: 75 02 jne 1a 16: 89 c2 mov %eax,%edx >> >> > 18: 0f 0b ud2 18: eb e8 jmp 2 >> >> > 1a: c3 retq 1a: ff c1 inc %ecx >> >> > 1c: 75 03 jne 21 >> >> > 1e: 0f 0b ud2 >> >> > 20: c3 retq >> >> > 21: c3 retq >> >> > >> This seems to help ;) >> >> #define try_cmpxchg(ptr, pold, new) __atomic_compare_exchange_n(ptr, pold, new, 0, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST) > > That gets me: > > 0000000000000000 : > 0: 8b 07 mov (%rdi),%eax > 2: 89 44 24 fc mov %eax,-0x4(%rsp) > 6: 8b 44 24 fc mov -0x4(%rsp),%eax > a: 83 f8 ff cmp $0xffffffff,%eax > d: 74 1c je 2b > f: 85 c0 test %eax,%eax > 11: 75 07 jne 1a > 13: 8b 44 24 fc mov -0x4(%rsp),%eax > 17: 0f 0b ud2 > 19: c3 retq > 1a: 8d 50 01 lea 0x1(%rax),%edx > 1d: 8b 44 24 fc mov -0x4(%rsp),%eax > 21: f0 0f b1 17 lock cmpxchg %edx,(%rdi) > 25: 75 db jne 2 > 27: ff c2 inc %edx > 29: 74 e8 je 13 > 2b: c3 retq > > > Which is even worse... (I did double check it actually compiled) For me gcc 7.0.1 generates: 0000000000000330 : 330: 55 push %rbp 331: 8b 07 mov (%rdi),%eax 333: 48 89 e5 mov %rsp,%rbp 336: 83 f8 ff cmp $0xffffffff,%eax 339: 74 12 je 34d 33b: 85 c0 test %eax,%eax 33d: 74 10 je 34f 33f: 8d 50 01 lea 0x1(%rax),%edx 342: f0 0f b1 17 lock cmpxchg %edx,(%rdi) 346: 75 ee jne 336 348: 83 fa ff cmp $0xffffffff,%edx 34b: 74 04 je 351 34d: 5d pop %rbp 34e: c3 retq 34f: 31 c0 xor %eax,%eax 351: 0f 0b ud2 353: 5d pop %rbp 354: c3 retq with: if (!success) \ *_old = __old; \ 0000000000000320 : 320: 8b 0f mov (%rdi),%ecx 322: 55 push %rbp 323: 48 89 e5 mov %rsp,%rbp 326: 83 f9 ff cmp $0xffffffff,%ecx 329: 74 2d je 358 32b: 85 c9 test %ecx,%ecx 32d: 74 25 je 354 32f: 8d 71 01 lea 0x1(%rcx),%esi 332: 89 c8 mov %ecx,%eax 334: f0 0f b1 37 lock cmpxchg %esi,(%rdi) 338: 89 c2 mov %eax,%edx 33a: 74 20 je 35c 33c: 83 fa ff cmp $0xffffffff,%edx 33f: 74 17 je 358 341: 85 d2 test %edx,%edx 343: 74 0f je 354 345: 8d 72 01 lea 0x1(%rdx),%esi 348: 89 d0 mov %edx,%eax 34a: f0 0f b1 37 lock cmpxchg %esi,(%rdi) 34e: 74 0a je 35a 350: 89 c2 mov %eax,%edx 352: eb e8 jmp 33c 354: 31 c9 xor %ecx,%ecx 356: 0f 0b ud2 358: 5d pop %rbp 359: c3 retq 35a: 89 d1 mov %edx,%ecx 35c: 83 fe ff cmp $0xffffffff,%esi 35f: 74 f5 je 356 361: 5d pop %rbp 362: c3 retq with __atomic_compare_exchange_n: exactly the same as the original code. But I don't have an answer for runtime patching of LOCK. Looks like something to fix in gcc.