Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933706AbbBDN2F (ORCPT ); Wed, 4 Feb 2015 08:28:05 -0500 Received: from cantor2.suse.de ([195.135.220.15]:43233 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933488AbbBDN16 (ORCPT ); Wed, 4 Feb 2015 08:27:58 -0500 Date: Wed, 4 Feb 2015 14:27:54 +0100 From: Joerg Roedel To: Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Prarit Bhargava , Yinghai Lu , Jiang Liu Cc: x86@kernel.org, linux-kernel@vger.kernel.org, alnovak@suse.com, jroedel@suse.de, joro@8bytes.org Subject: [PATCH] x86: irq: Check for valid irq descriptor in check_irq_vectors_for_cpu_disable Message-ID: <20150204132754.GA10078@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2345 Lines: 70 Hi, here is a patch to fix a kernel panic at shutdown we have seen recently. The issue is hard to reproduce, so the patch description about the root cause of the bug comes only from review and my current understanding of x86 irq code. So if what I wrote in the patch description doesn't make sense, please let me know. Constructive comments and feedback welcome. Thanks, Joerg >From 153e8f6cf39c42dd9767eb49a27eacfb69738fb0 Mon Sep 17 00:00:00 2001 From: Joerg Roedel Date: Wed, 4 Feb 2015 13:33:33 +0100 Subject: [PATCH] x86: irq: Check for valid irq descriptor in check_irq_vectors_for_cpu_disable When an interrupt is migrated away from a cpu it will stay in its vector_irq array until smp_irq_move_cleanup_interrupt succeeded. The cfg->move_in_progress flag is cleared already when the IPI was sent. When the interrupt is destroyed after migration its 'struct irq_desc' is freed and the vector_irq arrays are cleaned up. But since cfg->move_in_progress is already 0 the references at cpus before the last migration will not be cleared. So this would leave a reference to an already destroyed irq alive. When the cpu is taken down at this point, the check_irq_vectors_for_cpu_disable function finds a valid irq number in the vector_irq array, but gets NULL for its descriptor and dereferences it, causing a kernel panic. This has been observed on real systems at shutdown. Add a check to check_irq_vectors_for_cpu_disable for a valid 'struct irq_desc' to prevent this issue. Signed-off-by: Joerg Roedel --- arch/x86/kernel/irq.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c index 705ef8d..67b1cbe 100644 --- a/arch/x86/kernel/irq.c +++ b/arch/x86/kernel/irq.c @@ -302,6 +302,9 @@ int check_irq_vectors_for_cpu_disable(void) irq = __this_cpu_read(vector_irq[vector]); if (irq >= 0) { desc = irq_to_desc(irq); + if (!desc) + continue; + data = irq_desc_get_irq_data(desc); cpumask_copy(&affinity_new, data->affinity); cpu_clear(this_cpu, affinity_new); -- 1.8.4.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/