Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757754AbbEWI0J (ORCPT ); Sat, 23 May 2015 04:26:09 -0400 Received: from mail-wi0-f177.google.com ([209.85.212.177]:36981 "EHLO mail-wi0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750967AbbEWI0F (ORCPT ); Sat, 23 May 2015 04:26:05 -0400 Date: Sat, 23 May 2015 10:26:01 +0200 From: Ingo Molnar To: Peter Zijlstra Cc: Stephane Eranian , Vince Weaver , Jiri Olsa , LKML Subject: Re: [PATCH 02/10] perf/x86: Improve HT workaround GP counter constraint Message-ID: <20150523082601.GB7025@gmail.com> References: <20150522125344.GA3644@twins.programming.kicks-ass.net> <20150522125908.GB3644@twins.programming.kicks-ass.net> <20150522132534.GD3644@twins.programming.kicks-ass.net> <20150522133636.GE3644@twins.programming.kicks-ass.net> <20150522134811.GI3644@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150522134811.GI3644@twins.programming.kicks-ass.net> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1851 Lines: 46 * Peter Zijlstra wrote: > On Fri, May 22, 2015 at 06:40:49AM -0700, Stephane Eranian wrote: > > On Fri, May 22, 2015 at 6:36 AM, Peter Zijlstra wrote: > > > On Fri, May 22, 2015 at 06:29:47AM -0700, Stephane Eranian wrote: > > >> On Fri, May 22, 2015 at 6:25 AM, Peter Zijlstra wrote: > > >> > On Fri, May 22, 2015 at 06:07:00AM -0700, Stephane Eranian wrote: > > >> >> > > >> >> One other thing I noticed is that the --n_excl needs to be protected by the > > >> >> excl_cntrs->lock in put_excl_constraints(). > > >> > > > >> > Nah, its strictly per cpu. > > >> > > >> No. the excl_cntrs struct is pointed to by cpuc but it is shared between the > > >> sibling HT. Otherwise this would not work! > > > > > > n_excl is per cpuc, see the trickery with has_exclusive vs > > > exclusive_present on how I avoid the lock. > > > > Yes, but I believe you create a store forward penalty with this. > > You store 16bits and you load 32 bits on the same cache line. Same cacheline access has no such penalty: only if the partial access is for the same word. > The store and load are fairly well spaced -- the entire scheduling > fast path is in between. > > And such a penalty is still cheap compared to locking, no? The 'penalty' is essentially just a delay in the execution of the load, if the store has not finished yet: typically less than 10 cycles, around 3 cycles on recent uarchs. So it should not be a big issue if there's indeed so much code between them - probably it's not even causing any delay anywhere. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/