Subject: Re: [PATCH] Fix WARN_ON / WARN_ON_ONCE regression
From: Tim Chen <tim.c.chen@linux.intel.com>
Reply-To: tim.c.chen@linux.intel.com
To: Andrew Morton <akpm@osdl.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>, herbert@gondor.apana.org.au,
       linux-kernel@vger.kernel.org, leonid.i.ananiev@intel.com
In-Reply-To: <20061004093025.ab235eaa.akpm@osdl.org>
References: <1159916644.8035.35.camel@localhost.localdomain>
	 <4522FB04.1080001@goop.org>
	 <1159919263.8035.65.camel@localhost.localdomain>
	 <45233B1E.3010100@goop.org>
	 <1159968095.8035.76.camel@localhost.localdomain>
	 <20061004093025.ab235eaa.akpm@osdl.org>
Content-Type: text/plain
Organization: Intel
Date: Wed, 04 Oct 2006 09:22:09 -0700
Message-Id: <1159978929.8035.109.camel@localhost.localdomain>
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2347
Lines: 62

On Wed, 2006-10-04 at 09:30 -0700, Andrew Morton wrote:

> > I have measured the cache miss with tool.  So it is not just my theory.
> > 
> 
> And what did that tool tell you?

I am using emon.  Measuring a 20 second stretch of tbench run saw the L2
cache miss go from 14 million to 25 million on each of the cpu core.
> 
> Please don't just ignore my questions.  *why* are we getting a cache miss
> rate on that integer which is causing measurable performance changes?  If
> we're reading it that frequently then the variable should be in cache(!).
> 

The point is valid, __warn_once should be in cache, unless something
evicts it. What I have found so far is with patch by Andrew and Leonid
that avoid looking up the __warn_once integer, the cache miss rate is
reduced to the level before.  

> Again: do you know which callsite is causing the problem?  I assume one of
> the ones in softirq.c?  Do you know what the cache miss frequency is?  etc.
> 
Unfortunately emon does not directly give the callsite.  Oprofile data
shows a marked increase in time spent in do_softirq and local_bh_enable.
What I could do is to individually turn off WARN_ON_ONCE at these sites
and see if they are responsible for the cache miss.  Will let you know
what I found.

Oprofile data --
Before WARN_ON_ONCE patch:

117767 thread_return                           
106651 local_bh_enable                          
 83767 tcp_v4_rcv                                
 72266 copy_user_generic_unrolled               
 47136 do_softirq                              

 41100 tcp_recvmsg                               
 39394 tcp_sendmsg        
118383 thread_return                            
 88171 copy_user_generic                        
                           
..
  8281 local_bh_enable
  6790 do_softirq                                

After WARN_ON_ONCE patch:

117767 thread_return                           
106651 local_bh_enable                          
 83767 tcp_v4_rcv                                
 72266 copy_user_generic_unrolled               
 47136 do_softirq                              


Tim
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/