Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753616AbcDUQHB (ORCPT ); Thu, 21 Apr 2016 12:07:01 -0400 Received: from e23smtp01.au.ibm.com ([202.81.31.143]:51551 "EHLO e23smtp01.au.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751920AbcDUQHA (ORCPT ); Thu, 21 Apr 2016 12:07:00 -0400 X-IBM-Helo: d23dlp01.au.ibm.com X-IBM-MailFrom: xinhui@linux.vnet.ibm.com X-IBM-RcptTo: linux-kernel@vger.kernel.org Message-ID: <5718F32B.3050409@linux.vnet.ibm.com> Date: Thu, 21 Apr 2016 23:35:07 +0800 From: Pan Xinhui User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.8.0 MIME-Version: 1.0 To: Peter Zijlstra CC: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, benh@kernel.crashing.org, paulus@samba.org, mpe@ellerman.id.au, boqun.feng@gmail.com, paulmck@linux.vnet.ibm.com, tglx@linutronix.de Subject: Re: [PATCH V3] powerpc: Implement {cmp}xchg for u8 and u16 References: <5715D04E.9050009@linux.vnet.ibm.com> <571782F0.2020201@linux.vnet.ibm.com> <20160420142408.GF3430@twins.programming.kicks-ass.net> In-Reply-To: <20160420142408.GF3430@twins.programming.kicks-ass.net> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16042115-1618-0000-0000-00004579D5AB Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1745 Lines: 48 On 2016年04月20日 22:24, Peter Zijlstra wrote: > On Wed, Apr 20, 2016 at 09:24:00PM +0800, Pan Xinhui wrote: > >> +#define __XCHG_GEN(cmp, type, sfx, skip, v) \ >> +static __always_inline unsigned long \ >> +__cmpxchg_u32##sfx(v unsigned int *p, unsigned long old, \ >> + unsigned long new); \ >> +static __always_inline u32 \ >> +__##cmp##xchg_##type##sfx(v void *ptr, u32 old, u32 new) \ >> +{ \ >> + int size = sizeof (type); \ >> + int off = (unsigned long)ptr % sizeof(u32); \ >> + volatile u32 *p = ptr - off; \ >> + int bitoff = BITOFF_CAL(size, off); \ >> + u32 bitmask = ((0x1 << size * BITS_PER_BYTE) - 1) << bitoff; \ >> + u32 oldv, newv, tmp; \ >> + u32 ret; \ >> + oldv = READ_ONCE(*p); \ >> + do { \ >> + ret = (oldv & bitmask) >> bitoff; \ >> + if (skip && ret != old) \ >> + break; \ >> + newv = (oldv & ~bitmask) | (new << bitoff); \ >> + tmp = oldv; \ >> + oldv = __cmpxchg_u32##sfx((v u32*)p, oldv, newv); \ >> + } while (tmp != oldv); \ >> + return ret; \ >> +} > > So for an LL/SC based arch using cmpxchg() like that is sub-optimal. > > Why did you choose to write it entirely in C? > yes, you are right. more load/store will be done in C code. However such xchg_u8/u16 is just used by qspinlock now. and I did not see any performance regression. So just wrote in C, for simple. :) Of course I have done xchg tests. we run code just like xchg((u8*)&v, j++); in several threads. and the result is, [ 768.374264] use time[1550072]ns in xchg_u8_asm [ 768.377102] use time[2826802]ns in xchg_u8_c I think this is because there is one more load in C. If possible, we can move such code in asm-generic/. thanks xinhui