Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753749AbZFBTc0 (ORCPT ); Tue, 2 Jun 2009 15:32:26 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751766AbZFBTcP (ORCPT ); Tue, 2 Jun 2009 15:32:15 -0400 Received: from e3.ny.us.ibm.com ([32.97.182.143]:48810 "EHLO e3.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750970AbZFBTcO (ORCPT ); Tue, 2 Jun 2009 15:32:14 -0400 Date: Tue, 2 Jun 2009 12:32:02 -0700 From: Gary Hade To: mingo@elte.hu, mingo@redhat.com Cc: linux-kernel@vger.kernel.org, tglx@linutronix.de, hpa@zytor.com, x86@kernel.org, ebiederm@xmission.com, yinghai@kernel.org, lcm@us.ibm.com Subject: [RESEND] [PATCH v2] [BUGFIX] x86/x86_64: fix CPU offlining triggered "inactive" device IRQ interrruption Message-ID: <20090602193202.GB7282@us.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.17+20080114 (2008-01-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3179 Lines: 86 Impact: Eliminates a race that can leave the system in an unusable state During rapid offlining of multiple CPUs there is a chance that an IRQ affinity move destination CPU will be offlined before the IRQ affinity move initiated during the offlining of a previous CPU completes. This can happen when the device is not very active and thus fails to generate the IRQ that is needed to complete the IRQ affinity move before the move destination CPU is offlined. When this happens there is an -EBUSY return from __assign_irq_vector() during the offlining of the IRQ move destination CPU which prevents initiation of a new IRQ affinity move operation to an online CPU. This leaves the IRQ affinity set to an offlined CPU. I have been able to reproduce the problem on some of our systems using the following script. When the system is idle the problem often reproduces during the first CPU offlining sequence. #!/bin/sh SYS_CPU_DIR=/sys/devices/system/cpu VICTIM_IRQ=25 IRQ_MASK=f0 iteration=0 while true; do echo $iteration echo $IRQ_MASK > /proc/irq/$VICTIM_IRQ/smp_affinity for cpudir in $SYS_CPU_DIR/cpu[1-9] $SYS_CPU_DIR/cpu??; do echo 0 > $cpudir/online done for cpudir in $SYS_CPU_DIR/cpu[1-9] $SYS_CPU_DIR/cpu??; do echo 1 > $cpudir/online done iteration=`expr $iteration + 1` done The proposed fix takes advantage of the fact that when all CPUs in the old domain are offline there is nothing to be done by send_cleanup_vector() during the affinity move completion. So, we simply avoid setting cfg->move_in_progress preventing the above mentioned -EBUSY return from __assign_irq_vector(). This allows initiation of a new IRQ affinity move to a CPU that is not going offline. Successfully tested with Ingo's linux-2.6-tip (32 and 64-bit builds) on the IBM x460, x3550 M2, x3850, and x3950 M2. v2: modified to integrate with Yinghai Lu's "x86/irq: remove leftover code from NUMA_MIGRATE_IRQ_DESC" patch which modified intersecting lines. Only comment changes were affected. The actual change to the code is the same. Signed-off-by: Gary Hade --- arch/x86/kernel/apic/io_apic.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) Index: linux-2.6-tip/arch/x86/kernel/apic/io_apic.c =================================================================== --- linux-2.6-tip.orig/arch/x86/kernel/apic/io_apic.c 2009-05-14 14:06:30.000000000 -0700 +++ linux-2.6-tip/arch/x86/kernel/apic/io_apic.c 2009-05-14 14:09:42.000000000 -0700 @@ -1218,8 +1218,11 @@ next: current_vector = vector; current_offset = offset; if (old_vector) { - cfg->move_in_progress = 1; cpumask_copy(cfg->old_domain, cfg->domain); + if (cpumask_intersects(cfg->old_domain, + cpu_online_mask)) { + cfg->move_in_progress = 1; + } } for_each_cpu_and(new_cpu, tmp_mask, cpu_online_mask) per_cpu(vector_irq, new_cpu)[vector] = irq; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/