Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932730AbZFQIrG (ORCPT ); Wed, 17 Jun 2009 04:47:06 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S932458AbZFQIqx (ORCPT ); Wed, 17 Jun 2009 04:46:53 -0400 Received: from www.tglx.de ([62.245.132.106]:58878 "EHLO www.tglx.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932411AbZFQIqv (ORCPT ); Wed, 17 Jun 2009 04:46:51 -0400 Date: Wed, 17 Jun 2009 10:45:38 +0200 (CEST) From: Thomas Gleixner To: LKML cc: rt-users , Ingo Molnar , Steven Rostedt , Peter Zijlstra , Carsten Emde , Clark Williams , Frank Rowand , Robin Gareus , Gregory Haskins , Philippe Reynes , Fernando Lopez-Lezcano , Will Schmidt , Darren Hart , Jan Blunck , Sven-Thorsten Dietrich , Jon Masters Subject: [ANNOUNCE] 2.6.29.5-rt21 In-Reply-To: Message-ID: References: User-Agent: Alpine 2.00 (LFD 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3063 Lines: 84 We are pleased to announce the next update to our new preempt-rt series. - update to 2.6.29.5 (2.6.29.5-rt20, which I uploaded yesterday but did not announce due to the findings below) - softirq: lower default priority below hardirq default priority This fixes a long standing default priority configuration problem of the -rt series. On UP machines this can result in net_tx softirq running in an endless loop and starving the irq threads and the other softirq threads and of course everything with lower priority. It might be possible to happen on a SMP machine when the hardirq thread affinities are tweaked in the right way. What happens is: tx interrupt lock(card->tx_lock); dev_kfree_skb_any(skb); blocks on a contended lock net_tx softirq runs unlocks contended lock but does not schedule away due to equal prio repeat: calls xmit try_lock(card->tx_lock) fails -> reschedule skb which keeps net_tx running goto repeat; The scheduler does not schedule away net_tx, so this goes on forever. This has been there forever, but it seems to be easier to trigger in the 29 -rt series which is probably due to the slab cache lock breaks we did. The problem is restricted to a dozen of wireless adapters and network cards where e1000e is the most popular one. We could patch the affected drivers for -rt, but we need to have a closer look at the general assumptions of drivers vs. hardirq/softirq. Note, this is not a mainline problem as the semantics are entirely correct there. Lowering the priorities of the softirq threads below the hardirq threads priorities is a safe workaround for now. It prevents the runaway scenario under all circumstances as it resembles the mainline semantics closely. For all existing -rt systems the problem can be solved w/o patching the kernel by adjusting the priority of the softirq threads from the init scripts with chrt. It's extremly hard to trigger this, we never had a report of that before, and I want to say thanks to Bernd Oelker who meticulously worked on reproducing the problem and debugging it with all evil methods and patches I could come up with. And no, I'm not going to tell you which nasty hacks made it possible to decode this :) Download locations: http://rt.et.redhat.com/download/ http://www.kernel.org/pub/linux/kernel/projects/rt/ Information on the RT patch can be found at: http://rt.wiki.kernel.org/index.php/Main_Page to build the 2.6.29.5-rt21 tree, the following patches should be applied: http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.29.5.tar.bz2 http://www.kernel.org/pub/linux/kernel/projects/rt/patch-2.6.29.5-rt21.bz2 The broken out patches are also available at the same download locations. Enjoy ! tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/