Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753188AbbFSNMi (ORCPT ); Fri, 19 Jun 2015 09:12:38 -0400 Received: from mail-pd0-f176.google.com ([209.85.192.176]:34988 "EHLO mail-pd0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751079AbbFSNMa (ORCPT ); Fri, 19 Jun 2015 09:12:30 -0400 Date: Fri, 19 Jun 2015 22:11:46 +0900 From: Sergey Senozhatsky To: Thomas Gleixner Cc: Sergey Senozhatsky , Jiang Liu , Borislav Petkov , linux-kernel@vger.kernel.org, Sergey Senozhatsky Subject: Re: [-next] !irqd_can_balance() WARNINGs at irq_move_masked_irq() Message-ID: <20150619131146.GA2365@swordfish> References: <20150619071123.GA511@swordfish> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23+89 (0255b37be491) (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2186 Lines: 53 On (06/19/15 14:21), Thomas Gleixner wrote: > On Fri, 19 Jun 2015, Thomas Gleixner wrote: > > On Fri, 19 Jun 2015, Sergey Senozhatsky wrote: > > > [ 0.412291] WARNING: CPU: 0 PID: 0 at kernel/irq/migration.c:21 irq_move_masked_irq+0x57/0xc4() > > > [ 0.412371] Can't balance irq 0 [edge] > > > > Yuck. > > > > > Do you guys want to replace WAN_ON() with WARN_ONCE(), perhaps? This, of course, > > > doesn't fix anything; but at least one can boot the system. (not really a patch, > > > just an idea). > > > > Indeed. We really want to clear the move pending bit before the can > > balance check. Patch below. But that does not explain why this happens > > in the first place. > > > > Can you please send me a full dmesg, kernel config and output of > > /proc/interrupts ? (Private mail is fine, or upload it to some place) > > Thanks for providing the data. I think I know what happens. > > Something in the kernel (not yet clear what) tries to move the hpet > irq 0 by calling irq_set_affinity(). That's an kernel internal > interface which does not check whether the NO BALANCE flag is set for > the irq. So the call runs and triggers the move from next interrupt > machinery which ends up calling irq_move_masked_irq() and that trips > over the flag and yells. > > That's why I changed the WARN to a pr_warn() because we already know > the call stack. > > So the core behaviour is inconsistent. We let the caller of > irq_set_affinity() succeed and yell later because we think it's wrong. > > I'm pretty sure that we must drop the check for NO BALANCE in > irq_move_masked_irq() and only check for the per_cpu bit, but at the > same time I really want to know where that call to irq_set_affinity(irq0) > is coming from. > > Can you please collect the output of /proc/timer_list for the previous > patch and then replace the previous patch with the one below and > gather all the data again? > It's 10pm here in Korea and I'm out of office already. I'll try to collect the data tomorrow (or on Monday in the worst case). Thank you. -ss -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in Please read the FAQ at http://www.tux.org/lkml/