Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S942540AbcJSOeX convert rfc822-to-8bit (ORCPT ); Wed, 19 Oct 2016 10:34:23 -0400 Received: from Galois.linutronix.de ([146.0.238.70]:45073 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S942110AbcJSOeS (ORCPT ); Wed, 19 Oct 2016 10:34:18 -0400 Date: Wed, 19 Oct 2016 15:07:23 +0200 From: Sebastian Andrzej Siewior To: Davidlohr Bueso Cc: Arnaldo Carvalho de Melo , Peter Zijlstra , Ingo Molnar , linux-kernel@vger.kernel.org, Davidlohr Bueso Subject: Re: [PATCH 1/2] perf bench futex: cache align the worer struct Message-ID: <20161019130722.t7viruflpg2xu5sx@linutronix.de> References: <20161016190803.3392-1-bigeasy@linutronix.de> <20161018010949.GD29373@linux-80c1.suse> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8BIT In-Reply-To: <20161018010949.GD29373@linux-80c1.suse> User-Agent: NeoMutt/20161014 (1.7.1) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 965 Lines: 27 On 2016-10-17 18:09:49 [-0700], Davidlohr Bueso wrote: > On Sun, 16 Oct 2016, Sebastian Andrzej Siewior wrote: > > > It popped up in perf testing that the worker consumes some amount of > > CPU. It boils down to the increment of `ops` which causes cache line > > bouncing between the individual threads. > > Are you referring to this? > > │ for (i = 0; i < nfutexes; i++, w->ops++) { > │ be: add $0x1,%ebx > 65.87 │ addq $0x1,0x18(%r12) > > (which is like 65% of 13% on my box with a default futex-hash run). correct. > Even better, could we get rid entirely of the ops increments and just > use a local variable, then update the worker at the end of the function. > The following makes 'perf' pretty much disappear in the profile. this should do it, too. So what remains is the read access for w->futex but since it does not pop up in perf, it is probably not that important. > Thanks, > Davidlohr Sebastian