Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753161AbbHCNCE (ORCPT ); Mon, 3 Aug 2015 09:02:04 -0400 Received: from us01smtprelay-2.synopsys.com ([198.182.47.9]:49709 "EHLO smtprelay.synopsys.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751791AbbHCNCB convert rfc822-to-8bit (ORCPT ); Mon, 3 Aug 2015 09:02:01 -0400 From: Vineet Gupta To: Peter Zijlstra CC: lkml , "arc-linux-dev@synopsys.com" Subject: Re: [PATCH 5/6] ARCv2: spinlock/rwlock/atomics: Delayed retry of failed SCOND with exponential backoff Thread-Topic: [PATCH 5/6] ARCv2: spinlock/rwlock/atomics: Delayed retry of failed SCOND with exponential backoff Thread-Index: AQHQzeFRjbAtcvqjWUyBSa5w/nnd2g== Date: Mon, 3 Aug 2015 13:01:56 +0000 Message-ID: References: <1438596188-10875-1-git-send-email-vgupta@synopsys.com> <1438596188-10875-6-git-send-email-vgupta@synopsys.com> <20150803114104.GK25159@twins.programming.kicks-ass.net> Accept-Language: en-US, en-IN Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.12.197.191] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5003 Lines: 137 On Monday 03 August 2015 05:11 PM, Peter Zijlstra wrote: > On Mon, Aug 03, 2015 at 03:33:07PM +0530, Vineet Gupta wrote: >> +#define SCOND_FAIL_RETRY_VAR_DEF \ >> + unsigned int delay = 1, tmp; \ >> + >> +#define SCOND_FAIL_RETRY_ASM \ >> + " bz 4f \n" \ >> + " ; --- scond fail delay --- \n" \ >> + " mov %[tmp], %[delay] \n" /* tmp = delay */ \ >> + "2: brne.d %[tmp], 0, 2b \n" /* while (tmp != 0) */ \ >> + " sub %[tmp], %[tmp], 1 \n" /* tmp-- */ \ >> + " asl %[delay], %[delay], 1 \n" /* delay *= 2 */ \ >> + " b 1b \n" /* start over */ \ >> + "4: ; --- success --- \n" \ >> + >> +#define SCOND_FAIL_RETRY_VARS \ >> + ,[delay] "+&r" (delay),[tmp] "=&r" (tmp) \ >> + >> +#define ATOMIC_OP(op, c_op, asm_op) \ >> +static inline void atomic_##op(int i, atomic_t *v) \ >> +{ \ >> + unsigned int val, delay = 1, tmp; \ > Maybe use your SCOND_FAIL_RETRY_VAR_DEF ? Right - not sure how I missed that ! > >> + \ >> + __asm__ __volatile__( \ >> + "1: llock %[val], [%[ctr]] \n" \ >> + " " #asm_op " %[val], %[val], %[i] \n" \ >> + " scond %[val], [%[ctr]] \n" \ >> + " \n" \ >> + SCOND_FAIL_RETRY_ASM \ >> + \ >> + : [val] "=&r" (val) /* Early clobber to prevent reg reuse */ \ >> + SCOND_FAIL_RETRY_VARS \ >> + : [ctr] "r" (&v->counter), /* Not "m": llock only supports reg direct addr mode */ \ >> + [i] "ir" (i) \ >> + : "cc"); \ >> +} \ >> + >> +#define ATOMIC_OP_RETURN(op, c_op, asm_op) \ >> +static inline int atomic_##op##_return(int i, atomic_t *v) \ >> +{ \ >> + unsigned int val, delay = 1, tmp; \ > Idem. OK ! >> + \ >> + /* \ >> + * Explicit full memory barrier needed before/after as \ >> + * LLOCK/SCOND thmeselves don't provide any such semantics \ >> + */ \ >> + smp_mb(); \ >> + \ >> + __asm__ __volatile__( \ >> + "1: llock %[val], [%[ctr]] \n" \ >> + " " #asm_op " %[val], %[val], %[i] \n" \ >> + " scond %[val], [%[ctr]] \n" \ >> + " \n" \ >> + SCOND_FAIL_RETRY_ASM \ >> + \ >> + : [val] "=&r" (val) \ >> + SCOND_FAIL_RETRY_VARS \ >> + : [ctr] "r" (&v->counter), \ >> + [i] "ir" (i) \ >> + : "cc"); \ >> + \ >> + smp_mb(); \ >> + \ >> + return val; \ >> +} >> +#define SCOND_FAIL_RETRY_VAR_DEF \ >> + unsigned int delay, tmp; \ >> + >> +#define SCOND_FAIL_RETRY_ASM \ >> + " ; --- scond fail delay --- \n" \ >> + " mov %[tmp], %[delay] \n" /* tmp = delay */ \ >> + "2: brne.d %[tmp], 0, 2b \n" /* while (tmp != 0) */ \ >> + " sub %[tmp], %[tmp], 1 \n" /* tmp-- */ \ >> + " asl %[delay], %[delay], 1 \n" /* delay *= 2 */ \ >> + " b 1b \n" /* start over */ \ >> + " \n" \ >> + "4: ; --- done --- \n" \ >> + >> +#define SCOND_FAIL_RETRY_VARS \ >> + ,[delay] "=&r" (delay), [tmp] "=&r" (tmp) \ > This is looking remarkably similar to the previous ones, why not a > shared header? I thought about it when duplicating the code - however it seemed that readability was better if code was present in same file, rather than having to look up in a different header with no context at all. Plus there are some subtle differences in two when looked closely. Basically spinlocks need the reset to 1 quirk which atomics don't which means we need the delay reset to 1 in spinlock inline asm (and a different inline asm constraint). Plus for atomics, the success branch (bz 4f) is folded away into the macro while we can't for lock try routines, as that branch uses a delay slot. Agreed that all of this is in the micro-optim realm, but I suppose worth when u have a 10 stage pipeline. >> +static inline void arch_spin_lock(arch_spinlock_t *lock) >> +{ >> + unsigned int val; >> + SCOND_FAIL_RETRY_VAR_DEF; >> + >> + smp_mb(); >> + >> + __asm__ __volatile__( >> + "0: mov %[delay], 1 \n" >> + "1: llock %[val], [%[slock]] \n" >> + " breq %[val], %[LOCKED], 1b \n" /* spin while LOCKED */ >> + " scond %[LOCKED], [%[slock]] \n" /* acquire */ >> + " bz 4f \n" /* done */ >> + " \n" >> + SCOND_FAIL_RETRY_ASM > But,... in the case that macro is empty, the label 4 does not actually > exist. I see no real reason for this to be different from the previous > incarnation either. Per current code, the macro is never empty. I initially wrote it to have one version of routines with different macro definition but then it was getting terribly difficult to follow so I resorted to duplicating all the routines, with macros to kind of compensate for duplication by factoring out common code in duplicated code :-) for locks, I can again fold the the bz into macro, but then we can't use the delay slot in try versions ! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/