Subject: Re: [PATCH] Fix WARN_ON / WARN_ON_ONCE regression
From: Tim Chen <tim.c.chen@linux.intel.com>
Reply-To: tim.c.chen@linux.intel.com
To: Andrew Morton <akpm@osdl.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>, herbert@gondor.apana.org.au,
       linux-kernel@vger.kernel.org, leonid.i.ananiev@intel.com
In-Reply-To: <20061004103408.1a38b8ad.akpm@osdl.org>
References: <1159916644.8035.35.camel@localhost.localdomain>
	 <4522FB04.1080001@goop.org>
	 <1159919263.8035.65.camel@localhost.localdomain>
	 <45233B1E.3010100@goop.org>
	 <1159968095.8035.76.camel@localhost.localdomain>
	 <20061004093025.ab235eaa.akpm@osdl.org>
	 <1159978929.8035.109.camel@localhost.localdomain>
	 <20061004103408.1a38b8ad.akpm@osdl.org>
Content-Type: text/plain
Organization: Intel
Date: Mon, 09 Oct 2006 18:09:01 -0700
Message-Id: <1160442541.4548.15.camel@localhost.localdomain>
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2354
Lines: 65

On Wed, 2006-10-04 at 10:34 -0700, Andrew Morton wrote:

> > > Please don't just ignore my questions.  *why* are we getting a cache miss
> > > rate on that integer which is causing measurable performance changes?  If
> > > we're reading it that frequently then the variable should be in cache(!).
> > > 
> > 
> > The point is valid, __warn_once should be in cache, unless something
> > evicts it. What I have found so far is with patch by Andrew and Leonid
> > that avoid looking up the __warn_once integer, the cache miss rate is
> > reduced to the level before.  

> 
> I see, thanks.  How very peculiar.
> 
> I wonder if we just got unlucky and that particular benchmark with that
> particular kernel build just happens to reach the cache system's
> associativity threshold, and this one extra cacheline took it over the
> edge.  Or something.
> 


We believe we found the real cause of the performance regression: 
it is precisely a self-inflicted cache line conflict on __warn_once
global variable thanks to the older gcc (v3.4.5) backend assembly code
optimizer.  The newer version of gcc (v4.1.0) does not suffer the same
problem.

We fall in the trap of thinking the __warn_once variable 
is truly read mostly and is written only once in the very very unlikely
case of bug triggering. But the compiler is doing something which
turns the following innocent read only looking code into 
write always assembly code. The compiler is doing so to avoid a
conditional jump.


The original "C" code looks very innocent:

    if (WARN_ON(__ret_warn_once));
        __warn_once = 0;

The equivalent asm code generated by gcc looks like:

    temp = 0;
    if (!WARN_ON(__ret_warn_once))
        temp = __warn_once;
    __warn_once = temp;


As a result, a global variable is being written from all CPUs 
everywhere and caused excessive cache line bouncing on SMP.  
We measured that HITM event increased by 75% and 
read-for-ownership event increased by 50%. Adding a
__read_mostly directive to __warn_once didn't help 
because gcc still generate assembly code that write to 
that global variable.

Thanks.

Tim
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/