Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935424AbdCXSNg (ORCPT ); Fri, 24 Mar 2017 14:13:36 -0400 Received: from merlin.infradead.org ([205.233.59.134]:33100 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935013AbdCXSNQ (ORCPT ); Fri, 24 Mar 2017 14:13:16 -0400 Date: Fri, 24 Mar 2017 19:13:03 +0100 From: Peter Zijlstra To: Dmitry Vyukov Cc: Andy Lutomirski , Andrew Morton , Andy Lutomirski , Borislav Petkov , Brian Gerst , Denys Vlasenko , "H. Peter Anvin" , Josh Poimboeuf , Linus Torvalds , Paul McKenney , Thomas Gleixner , Ingo Molnar , LKML Subject: Re: locking/atomic: Introduce atomic_try_cmpxchg() Message-ID: <20170324181303.fnasnmtw2tmcy27u@hirez.programming.kicks-ass.net> References: <20170324142140.vpyzl755oj6rb5qv@hirez.programming.kicks-ass.net> <20170324164108.ibcxxqbhvx6ao54r@hirez.programming.kicks-ass.net> <20170324172342.radlrhk2z6mwmdgk@hirez.programming.kicks-ass.net> <20170324180838.crc2dmxswklqmyrx@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170324180838.crc2dmxswklqmyrx@hirez.programming.kicks-ass.net> User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3849 Lines: 56 On Fri, Mar 24, 2017 at 07:08:38PM +0100, Peter Zijlstra wrote: > On Fri, Mar 24, 2017 at 06:51:15PM +0100, Dmitry Vyukov wrote: > > On Fri, Mar 24, 2017 at 6:23 PM, Peter Zijlstra wrote: > > > On Fri, Mar 24, 2017 at 09:54:46AM -0700, Andy Lutomirski wrote: > > >> > So the first snipped I tested regressed like so: > > >> > > > >> > > > >> > 0000000000000000 : 0000000000000000 : > > >> > 0: 8b 07 mov (%rdi),%eax 0: 8b 17 mov (%rdi),%edx > > >> > 2: 83 f8 ff cmp $0xffffffff,%eax 2: 83 fa ff cmp $0xffffffff,%edx > > >> > 5: 74 13 je 1a 5: 74 1a je 21 > > >> > 7: 85 c0 test %eax,%eax 7: 85 d2 test %edx,%edx > > >> > 9: 74 0d je 18 9: 74 13 je 1e > > >> > b: 8d 50 01 lea 0x1(%rax),%edx b: 8d 4a 01 lea 0x1(%rdx),%ecx > > >> > e: f0 0f b1 17 lock cmpxchg %edx,(%rdi) e: 89 d0 mov %edx,%eax > > >> > 12: 75 ee jne 2 10: f0 0f b1 0f lock cmpxchg %ecx,(%rdi) > > >> > 14: ff c2 inc %edx 14: 74 04 je 1a > > >> > 16: 75 02 jne 1a 16: 89 c2 mov %eax,%edx > > >> > 18: 0f 0b ud2 18: eb e8 jmp 2 > > >> > 1a: c3 retq 1a: ff c1 inc %ecx > > >> > 1c: 75 03 jne 21 > > >> > 1e: 0f 0b ud2 > > >> > 20: c3 retq > > >> > 21: c3 retq > > >> > > > This seems to help ;) > > > > #define try_cmpxchg(ptr, pold, new) __atomic_compare_exchange_n(ptr, pold, new, 0, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST) > > That gets me: > > 0000000000000000 : > 0: 8b 07 mov (%rdi),%eax > 2: 89 44 24 fc mov %eax,-0x4(%rsp) > 6: 8b 44 24 fc mov -0x4(%rsp),%eax > a: 83 f8 ff cmp $0xffffffff,%eax > d: 74 1c je 2b > f: 85 c0 test %eax,%eax > 11: 75 07 jne 1a > 13: 8b 44 24 fc mov -0x4(%rsp),%eax > 17: 0f 0b ud2 > 19: c3 retq > 1a: 8d 50 01 lea 0x1(%rax),%edx > 1d: 8b 44 24 fc mov -0x4(%rsp),%eax > 21: f0 0f b1 17 lock cmpxchg %edx,(%rdi) > 25: 75 db jne 2 > 27: ff c2 inc %edx > 29: 74 e8 je 13 > 2b: c3 retq > > > Which is even worse... (I did double check it actually compiled) Not to mention we cannot use the C11 atomics in kernel because we want to be able to runtime patch LOCK prefixes when only 1 CPU is available.