Subject: Re: RCU latency regression in 2.6.16-rc1
From: Lee Revell <rlrevell@joe-job.com>
To: Eric Dumazet <dada1@cosmosbay.com>
Cc: dipankar@in.ibm.com, paulmck@us.ibm.com, Ingo Molnar <mingo@elte.hu>,
       linux-kernel <linux-kernel@vger.kernel.org>,
       Linus Torvalds <torvalds@osdl.org>
In-Reply-To: <43DBCB62.7030308@cosmosbay.com>
References: <20060124092330.GA7060@elte.hu>
	 <1138095856.2771.103.camel@mindpipe> <20060124162846.GA7139@in.ibm.com>
	 <20060124213802.GC7139@in.ibm.com> <1138224506.3087.22.camel@mindpipe>
	 <20060126191809.GC6182@us.ibm.com> <1138388123.3131.26.camel@mindpipe>
	 <20060128170302.GB5633@in.ibm.com> <1138471203.2799.13.camel@mindpipe>
	 <1138474283.2799.24.camel@mindpipe> <20060128193412.GH5633@in.ibm.com>
	 <43DBCB62.7030308@cosmosbay.com>
Content-Type: text/plain
Date: Sun, 29 Jan 2006 02:38:02 -0500
Message-Id: <1138520283.2799.103.camel@mindpipe>
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 1854
Lines: 47

On Sat, 2006-01-28 at 20:52 +0100, Eric Dumazet wrote:
> > Your new trace shows that we are held up in in rt_run_flush(). 
> > I guess we need to investigate why we spend so much time in rt_run_flush(),
> > because of a big route table or the lock acquisitions.
> 
> Some machines have millions of entries in their route cache.
> 
> I suspect we cannot queue all them (or only hash heads as your
> previous patch) by RCU. Latencies and/or OOM can occur.
> 
> What can be done is :
> 
> in rt_run_flush(), allocate a new empty hash table, and exchange the
> hash tables.
> 
> Then wait a quiescent/grace RCU period (may be the exact term is not
> this one, sorry, I'm not RCU expert)
> 
> Then free all the entries from the old hash table (direclty of course,
> no need for RCU grace period), and free the hash table.
> 
> As the hash table can be huge, we might need allocate it at boot time,
> just in case a flush is needed (it usually is :) ). If we choose
> dynamic allocation and this allocation fails, then fallback to what is
> done today.
> 

No problem, I'm not a networking expert...

Ingo's response to these traces was that softirq preemption, which
simply offloads all softirq processing to softirqd and has been tested
in the -rt patchset for over a year, is the easiest solution.  Any
thoughts on that?  Personally, I'd rather fix the very few problematic
softirqs, than take such a drastic step - this softirq appears to be one
of the last obstacles to being able to meet a 1ms soft RT constraint
with the mainline kernel.

Thanks for looking at this; I'd be glad to test any patches...

Lee


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/