Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S936014AbdCXTRD (ORCPT ); Fri, 24 Mar 2017 15:17:03 -0400 Received: from mail.kernel.org ([198.145.29.136]:39992 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S966502AbdCXTQr (ORCPT ); Fri, 24 Mar 2017 15:16:47 -0400 MIME-Version: 1.0 In-Reply-To: <20170324181303.fnasnmtw2tmcy27u@hirez.programming.kicks-ass.net> References: <20170324142140.vpyzl755oj6rb5qv@hirez.programming.kicks-ass.net> <20170324164108.ibcxxqbhvx6ao54r@hirez.programming.kicks-ass.net> <20170324172342.radlrhk2z6mwmdgk@hirez.programming.kicks-ass.net> <20170324180838.crc2dmxswklqmyrx@hirez.programming.kicks-ass.net> <20170324181303.fnasnmtw2tmcy27u@hirez.programming.kicks-ass.net> From: Andy Lutomirski Date: Fri, 24 Mar 2017 12:16:11 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: locking/atomic: Introduce atomic_try_cmpxchg() To: Peter Zijlstra Cc: Dmitry Vyukov , Andrew Morton , Andy Lutomirski , Borislav Petkov , Brian Gerst , Denys Vlasenko , "H. Peter Anvin" , Josh Poimboeuf , Linus Torvalds , Paul McKenney , Thomas Gleixner , Ingo Molnar , LKML Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4111 Lines: 62 On Fri, Mar 24, 2017 at 11:13 AM, Peter Zijlstra wrote: > On Fri, Mar 24, 2017 at 07:08:38PM +0100, Peter Zijlstra wrote: >> On Fri, Mar 24, 2017 at 06:51:15PM +0100, Dmitry Vyukov wrote: >> > On Fri, Mar 24, 2017 at 6:23 PM, Peter Zijlstra wrote: >> > > On Fri, Mar 24, 2017 at 09:54:46AM -0700, Andy Lutomirski wrote: >> > >> > So the first snipped I tested regressed like so: >> > >> > >> > >> > >> > >> > 0000000000000000 : 0000000000000000 : >> > >> > 0: 8b 07 mov (%rdi),%eax 0: 8b 17 mov (%rdi),%edx >> > >> > 2: 83 f8 ff cmp $0xffffffff,%eax 2: 83 fa ff cmp $0xffffffff,%edx >> > >> > 5: 74 13 je 1a 5: 74 1a je 21 >> > >> > 7: 85 c0 test %eax,%eax 7: 85 d2 test %edx,%edx >> > >> > 9: 74 0d je 18 9: 74 13 je 1e >> > >> > b: 8d 50 01 lea 0x1(%rax),%edx b: 8d 4a 01 lea 0x1(%rdx),%ecx >> > >> > e: f0 0f b1 17 lock cmpxchg %edx,(%rdi) e: 89 d0 mov %edx,%eax >> > >> > 12: 75 ee jne 2 10: f0 0f b1 0f lock cmpxchg %ecx,(%rdi) >> > >> > 14: ff c2 inc %edx 14: 74 04 je 1a >> > >> > 16: 75 02 jne 1a 16: 89 c2 mov %eax,%edx >> > >> > 18: 0f 0b ud2 18: eb e8 jmp 2 >> > >> > 1a: c3 retq 1a: ff c1 inc %ecx >> > >> > 1c: 75 03 jne 21 >> > >> > 1e: 0f 0b ud2 >> > >> > 20: c3 retq >> > >> > 21: c3 retq >> > >> >> >> > This seems to help ;) >> > >> > #define try_cmpxchg(ptr, pold, new) __atomic_compare_exchange_n(ptr, pold, new, 0, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST) >> >> That gets me: >> >> 0000000000000000 : >> 0: 8b 07 mov (%rdi),%eax >> 2: 89 44 24 fc mov %eax,-0x4(%rsp) >> 6: 8b 44 24 fc mov -0x4(%rsp),%eax >> a: 83 f8 ff cmp $0xffffffff,%eax >> d: 74 1c je 2b >> f: 85 c0 test %eax,%eax >> 11: 75 07 jne 1a >> 13: 8b 44 24 fc mov -0x4(%rsp),%eax >> 17: 0f 0b ud2 >> 19: c3 retq >> 1a: 8d 50 01 lea 0x1(%rax),%edx >> 1d: 8b 44 24 fc mov -0x4(%rsp),%eax >> 21: f0 0f b1 17 lock cmpxchg %edx,(%rdi) >> 25: 75 db jne 2 >> 27: ff c2 inc %edx >> 29: 74 e8 je 13 >> 2b: c3 retq >> >> >> Which is even worse... (I did double check it actually compiled) > > Not to mention we cannot use the C11 atomics in kernel because we want > to be able to runtime patch LOCK prefixes when only 1 CPU is available. Is this really a show-stopper? I bet that objtool could be persuaded to emit a list of the locations of all those LOCK prefixes. --Andy