Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966734AbdCXTRl (ORCPT ); Fri, 24 Mar 2017 15:17:41 -0400 Received: from mail-it0-f47.google.com ([209.85.214.47]:37050 "EHLO mail-it0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S966688AbdCXTRb (ORCPT ); Fri, 24 Mar 2017 15:17:31 -0400 MIME-Version: 1.0 In-Reply-To: References: <20170324142140.vpyzl755oj6rb5qv@hirez.programming.kicks-ass.net> <20170324164108.ibcxxqbhvx6ao54r@hirez.programming.kicks-ass.net> <20170324172342.radlrhk2z6mwmdgk@hirez.programming.kicks-ass.net> From: Linus Torvalds Date: Fri, 24 Mar 2017 12:17:28 -0700 X-Google-Sender-Auth: NCc-CpIcK3pxdxSnxH30S5Ea3fk Message-ID: Subject: Re: locking/atomic: Introduce atomic_try_cmpxchg() To: Andy Lutomirski Cc: Peter Zijlstra , Dmitry Vyukov , Andrew Morton , Andy Lutomirski , Borislav Petkov , Brian Gerst , Denys Vlasenko , "H. Peter Anvin" , Josh Poimboeuf , Paul McKenney , Thomas Gleixner , Ingo Molnar , LKML Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2470 Lines: 63 On Fri, Mar 24, 2017 at 11:45 AM, Andy Lutomirski wrote: > > Is there some hack like if __builtin_is_unescaped(*val) *val = old; > that would work? See my recent email suggesting a completely different interface, which avoids this problem. My interface generates: 0000000000000000 : 0: 8b 07 mov (%rdi),%eax 2: 83 f8 ff cmp $0xffffffff,%eax 5: 74 12 je 19 7: 85 c0 test %eax,%eax 9: 74 0a je 15 b: 8d 50 01 lea 0x1(%rax),%edx e: f0 0f b1 17 lock cmpxchg %edx,(%rdi) 12: 75 ee jne 2 14: c3 retq 15: 31 c0 xor %eax,%eax 17: 0f 0b ud2 19: c3 retq for PeterZ's test-case, which seems optimal. Of course, PeterZ used -Os, which isn't actually very natural for the kernel. Using -O2 I get something else. It turns out that my macro should use if (likely(__txchg_success)) goto success_label; (that "likely()" is criticial) to make gcc not try to optimize for the looping case. So with that "likely()" fixed, with -O2 I get: 0000000000000000 : 0: 8b 07 mov (%rdi),%eax 2: 83 f8 ff cmp $0xffffffff,%eax 5: 74 0d je 14 7: 85 c0 test %eax,%eax 9: 74 12 je 1d b: 8d 50 01 lea 0x1(%rax),%edx e: f0 0f b1 17 lock cmpxchg %edx,(%rdi) 12: 75 02 jne 16 14: f3 c3 repz retq 16: 83 f8 ff cmp $0xffffffff,%eax 19: 75 ec jne 7 1b: f3 c3 repz retq 1d: 31 c0 xor %eax,%eax 1f: 0f 0b ud2 21: c3 retq which again looks pretty optimal (it did indeed actually generate bigger but potentially higher-performance code by making the good case be a fallthrough, and the unlikely case be a _forward_ jump that will be predicted not-taken in the absense of other rpediction information. (Of course, this also depends on the exact behavior that PeterZ's code had, namely an exception for use-after-free, but a silent saturation) Linus