Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S943567AbcJSSlo (ORCPT ); Wed, 19 Oct 2016 14:41:44 -0400 Received: from mx2.suse.de ([195.135.220.15]:37182 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753173AbcJSSln (ORCPT ); Wed, 19 Oct 2016 14:41:43 -0400 Date: Wed, 19 Oct 2016 11:41:32 -0700 From: Davidlohr Bueso To: Sebastian Andrzej Siewior Cc: Arnaldo Carvalho de Melo , Peter Zijlstra , Ingo Molnar , linux-kernel@vger.kernel.org, Davidlohr Bueso Subject: Re: [PATCH] perf/bench-futex: Avoid worker cacheline bouncing Message-ID: <20161019184132.GC28074@linux-80c1.suse> References: <20161016190803.3392-1-bigeasy@linutronix.de> <20161018010949.GD29373@linux-80c1.suse> <20161019130722.t7viruflpg2xu5sx@linutronix.de> <20161019175933.GA28074@linux-80c1.suse> <20161019181308.maacqqzdx4ep5yld@linutronix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: <20161019181308.maacqqzdx4ep5yld@linutronix.de> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1375 Lines: 34 On Wed, 19 Oct 2016, Sebastian Andrzej Siewior wrote: >On 2016-10-19 10:59:33 [-0700], Davidlohr Bueso wrote: >> Sebastian noted that overhead for worker thread ops (throughput) >> accounting was producing 'perf' to appear in the profiles, consuming >> a non-trivial (ie 13%) amount of CPU. This is due to cacheline >> bouncing due to the increment of w->ops. We can easily fix this by >> just working on a local copy and updating the actual worker once >> done running, and ready to show the program summary. There is no >> danger of the worker being concurrent, so we can trust that no stale >> value is being seen by another thread. >> >> Reported-by: Sebastian Andrzej Siewior >Acked-by: Sebastian Andrzej Siewior Thanks. > >> --- a/tools/perf/bench/futex-hash.c >> +++ b/tools/perf/bench/futex-hash.c >> @@ -63,8 +63,9 @@ static const char * const bench_futex_hash_usage[] = { >> static void *workerfn(void *arg) >> { >> int ret; >> - unsigned int i; >> struct worker *w = (struct worker *) arg; >> + unsigned int i; >> + unsigned long ops = w->ops; /* avoid cacheline bouncing */ > >we start at 0 so there is probably no need to init it with w->ops. Yeah, but I prefer having it this way - separates the init from the actual work (although no big deal here). The extra load happens ncpu times, so also no big deal.