Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753413AbbFSQJx (ORCPT ); Fri, 19 Jun 2015 12:09:53 -0400 Received: from mga03.intel.com ([134.134.136.65]:61215 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751883AbbFSQJr (ORCPT ); Fri, 19 Jun 2015 12:09:47 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.13,644,1427785200"; d="scan'208";a="749706823" Message-ID: <55843EC9.9000401@linux.intel.com> Date: Sat, 20 Jun 2015 00:09:45 +0800 From: Jiang Liu Organization: Intel User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Thomas Gleixner , Sergey Senozhatsky CC: Borislav Petkov , linux-kernel@vger.kernel.org, Sergey Senozhatsky Subject: Re: [-next] !irqd_can_balance() WARNINGs at irq_move_masked_irq() References: <20150619071123.GA511@swordfish> In-Reply-To: Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2806 Lines: 79 On 2015/6/19 20:21, Thomas Gleixner wrote: > On Fri, 19 Jun 2015, Thomas Gleixner wrote: >> On Fri, 19 Jun 2015, Sergey Senozhatsky wrote: >>> [ 0.412291] WARNING: CPU: 0 PID: 0 at kernel/irq/migration.c:21 irq_move_masked_irq+0x57/0xc4() >>> [ 0.412371] Can't balance irq 0 [edge] >> >> Yuck. >> >>> Do you guys want to replace WAN_ON() with WARN_ONCE(), perhaps? This, of course, >>> doesn't fix anything; but at least one can boot the system. (not really a patch, >>> just an idea). >> >> Indeed. We really want to clear the move pending bit before the can >> balance check. Patch below. But that does not explain why this happens >> in the first place. >> >> Can you please send me a full dmesg, kernel config and output of >> /proc/interrupts ? (Private mail is fine, or upload it to some place) > > Thanks for providing the data. I think I know what happens. > > Something in the kernel (not yet clear what) tries to move the hpet > irq 0 by calling irq_set_affinity(). That's an kernel internal > interface which does not check whether the NO BALANCE flag is set for > the irq. So the call runs and triggers the move from next interrupt > machinery which ends up calling irq_move_masked_irq() and that trips > over the flag and yells. > > That's why I changed the WARN to a pr_warn() because we already know > the call stack. > > So the core behaviour is inconsistent. We let the caller of > irq_set_affinity() succeed and yell later because we think it's wrong. > > I'm pretty sure that we must drop the check for NO BALANCE in > irq_move_masked_irq() and only check for the per_cpu bit, but at the > same time I really want to know where that call to irq_set_affinity(irq0) > is coming from. > > Can you please collect the output of /proc/timer_list for the previous > patch and then replace the previous patch with the one below and > gather all the data again? Hi Thomas, Maybe it's caused by the hpet driver itself? irq_set_affinity() may set the IRQD_SETAFFINITY_PENDING flag, thus triggering the warning. --------------------------------------------------------------- static int hpet_setup_irq(struct hpet_dev *dev) { if (request_irq(dev->irq, hpet_interrupt_handler, IRQF_TIMER | IRQF_NOBALANCING, dev->name, dev)) return -1; disable_irq(dev->irq); irq_set_affinity(dev->irq, cpumask_of(dev->cpu)); enable_irq(dev->irq); printk(KERN_DEBUG "hpet: %s irq %d for MSI\n", dev->name, dev->irq); return 0; } ------------------------------------------------------------- Thanks! Gerry > > Thanks, > > tglx > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in Please read the FAQ at http://www.tux.org/lkml/