Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757130Ab3FSP6m (ORCPT ); Wed, 19 Jun 2013 11:58:42 -0400 Received: from mx1.redhat.com ([209.132.183.28]:36983 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757039Ab3FSP6j (ORCPT ); Wed, 19 Jun 2013 11:58:39 -0400 Date: Wed, 19 Jun 2013 17:54:04 +0200 From: Oleg Nesterov To: Frederic Weisbecker Cc: Ingo Molnar , Vince Weaver , linux-kernel@vger.kernel.org, Peter Zijlstra , Paul Mackerras , Arnaldo Carvalho de Melo , trinity@vger.kernel.org, Jiri Olsa Subject: Re: [PATCH 2/2] hw_breakpoint: Introduce "struct bp_cpuinfo" Message-ID: <20130619155404.GB9176@redhat.com> References: <20130602194912.GA3277@redhat.com> <20130602195057.GC3277@redhat.com> <20130618123741.GC17619@somewhere.redhat.com> <20130618144225.GA26920@redhat.com> <20130618170145.GI17619@somewhere.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130618170145.GI17619@somewhere.redhat.com> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1762 Lines: 49 On 06/18, Frederic Weisbecker wrote: > > On Tue, Jun 18, 2013 at 04:42:25PM +0200, Oleg Nesterov wrote: > > > > Simplest example, > > > > for_each_possible_cpu(cpu) > > total_count = per_cpu(per_cpu_count, cpu); > > > > Every per_cpu() likely means the cache miss. Not to mention we need the > > additional math to calculate the address of the local counter. > > > > for_each_possible_cpu(cpu) > > total_count = bootmem_or_kmalloc_array[cpu]; > > > > is much better in this respect. > > > > And note also that per_cpu_count above can share the cacheline with > > another "hot" per-cpu variable. > > Ah I see, that's good to know. > > But these variables are supposed to only be touched from slow path > (perf events syscall, ptrace breakpoints creation, etc...), right? > So this is probably not a problem? Yes, sure. But please note that this can also penalize other CPUs. For example, toggle_bp_slot() writes to per_cpu(nr_cpu_bp_pinned), this invalidates the cachline which can contain another per-cpu variable. But let me clarify. I agree, this all is minor, I am not trying to say this change can actually improve the performance. The main point of this patch is to make the code look a bit better, and you seem to agree. The changelog mentions s/percpu/array/ only as a potential change which obviously needs more discussion, I didnt mean that we should necessarily do this. Although yes, personally I really dislike per-cpu in this case, but of course this is subjective and I won't argue ;) Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/