Received: by 2002:a05:6a10:16a7:0:0:0:0 with SMTP id gp39csp1031893pxb; Thu, 5 Nov 2020 21:56:07 -0800 (PST) X-Google-Smtp-Source: ABdhPJwCV01GbJGb3Vsx4QE5jYixVouU9hGyhqSvewbrZvqdBwm9KdOFIayGzys4omeNV6lkhG1j X-Received: by 2002:a17:906:4dc2:: with SMTP id f2mr459142ejw.446.1604642167336; Thu, 05 Nov 2020 21:56:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1604642167; cv=none; d=google.com; s=arc-20160816; b=qgXrIXJvQ0tFoD4NWN1D0ul6VJq5WOekLuuIfDpIbchPPsbx809JVjSZSN1kofUKpI f7t3iL3XZ49uNaqodp34IpiGpCbMrqGcG5+YzWMHfR3ibKE+N5YMixLclhmrEN8gmyMp GnSP+FdLT2hvTzLSDtiU/Lndj4a9L5z+qhuO1Jw9xnPB9bPK2tFB3eBjYeRntLRzouPZ 0JTvWbRo2VFX9gPzG4OXqrvGGav2wJz5T1Fow6o1gI/NGYHFqWp+IsRZaRtFcjCxoYE8 27m8hU7nFGfr+bHskZj5LpgW7+swlHSPZxzonJMQUeZH341PL+vlh8BZfQKJu212Pp52 c1SA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=uDN6j31v9Jw+Jc4wz8SuOy1OBlhE5DLRx4P5BkScGeE=; b=ZGdi15zZxzH+Y5kIeLd4T3xo6uRq+beAUfzvpl4XwNzojZZDXcDnLXkVT0AOGes63h 2DF56bFgGWnPeDldklhUUCCZt1n9B3Pc9o7jTrzse4QkBorwnSTYuoZeb19PJElGqdHJ 0LL6wwodeMVveqH6UWZll4eFOSbeZvYIrtAgH4nsrg3bsFZV9QJK8M1FKx3yH1ExbOl/ 2daKuwJ1VPXcJB7rh5fUzu+HES3zam7Kg4LMnAaAXWmev7q8rDePg6gzQSQC7aU35AZb DDk0J1kHj23IJSp84aR9o8H80JoewfV97usu/ZCSfpQVoj4iX5qYmFwgifBlCSZ1Ykq5 LTzg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b="kMN/+i4H"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id gs24si143630ejb.74.2020.11.05.21.55.43; Thu, 05 Nov 2020 21:56:07 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b="kMN/+i4H"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726225AbgKFFyM (ORCPT + 99 others); Fri, 6 Nov 2020 00:54:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47820 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725828AbgKFFyM (ORCPT ); Fri, 6 Nov 2020 00:54:12 -0500 Received: from mail-il1-x142.google.com (mail-il1-x142.google.com [IPv6:2607:f8b0:4864:20::142]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1A861C0613CF; Thu, 5 Nov 2020 21:54:12 -0800 (PST) Received: by mail-il1-x142.google.com with SMTP id q1so97354ilt.6; Thu, 05 Nov 2020 21:54:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=uDN6j31v9Jw+Jc4wz8SuOy1OBlhE5DLRx4P5BkScGeE=; b=kMN/+i4HxRqtssG7OQTxjs1myQXLPHDEPjPnStBTSnwjWSQavWnT4k2phbKKAAXRus eL71cJlZ/4ELQuSLrfV7RX//SyaV8szYeHpvZR39CHXRzjeiS69nEe7yvcpuC7OU2/+G Q5OGYmMebO7j3zJNiGIE9gAxCxxitvOBynOFWTyQoJG7UAe9r63UU1n0cW9uQKwBM+GB +33JFClYjQ0Mib9ngtF9qOu6jnbNZBr+wmFHko8WHy0JmtZM4TxQsel4OVU4XK0BB0QO xn+h980fOCADb6utcR6DSfR6yl2o2hxqAZgKR8LF3gViQGWu97sHpxeQVCCxPHQyjhJJ JvrA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=uDN6j31v9Jw+Jc4wz8SuOy1OBlhE5DLRx4P5BkScGeE=; b=rnrBYteohM+K9V6xcbsIO5I2gMV7gb1XkxYHagM39ApFHv72G2/XSfmGYjTu+77jZs e0p9l2jHRDzTKboUHdwGB1OMO3NHZlRPuPuw+nNyGeoHfLyaUl5TJ+GqLg/NnI1iNPjx VImTWRpCCc8r7d5R9NKRGxfRjnKHMwyB8thgkoIx4AlHZdfPDatgtVbqfp/UMUPb/BWf oWZ9qmN78/tSQ+FDy9i9JTLFBuFJSceSV8vtxtmQnl9Q8V3rnOJ4mjiX7OpGrR0Ano2L at6KN/d7SoYHx/iXDMQq80Y2efLQIR+/WyghHX0gyL5qbSnXnLYUtuqxpK871VYcpUWU CGcw== X-Gm-Message-State: AOAM533z8uhd579C1x9JUVOjKU44qBt9HQhi5yM4WNsJMS3QicEcQgR3 76vKx+1DnicGRoBHydPb1lAHZvm9Ynd6Q52YdA== X-Received: by 2002:a92:d68a:: with SMTP id p10mr310494iln.34.1604642050883; Thu, 05 Nov 2020 21:54:10 -0800 (PST) MIME-Version: 1.0 References: <1603346163-21645-1-git-send-email-kernelfans@gmail.com> <871rhq7j1h.fsf@nanos.tec.linutronix.de> <87y2js3ghv.fsf@nanos.tec.linutronix.de> <87tuueftou.fsf@nanos.tec.linutronix.de> In-Reply-To: <87tuueftou.fsf@nanos.tec.linutronix.de> From: Pingfan Liu Date: Fri, 6 Nov 2020 13:53:59 +0800 Message-ID: Subject: Re: [PATCH 0/3] warn and suppress irqflood To: Thomas Gleixner Cc: Guilherme Piccoli , LKML , Peter Zijlstra , Jisheng Zhang , Andrew Morton , Petr Mladek , Marc Zyngier , Linus Walleij , afzal mohammed , Lina Iyer , "Gustavo A. R. Silva" , Maulik Shah , Al Viro , Jonathan Corbet , Pawan Gupta , Mike Kravetz , Oliver Neukum , linux-doc@vger.kernel.org, Kexec Mailing List , Bjorn Helgaas Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Oct 28, 2020 at 7:58 PM Thomas Gleixner wrote: > [...] > --- > include/linux/irqdesc.h | 4 ++ > kernel/irq/manage.c | 3 + > kernel/irq/spurious.c | 74 +++++++++++++++++++++++++++++++++++------------- > 3 files changed, 61 insertions(+), 20 deletions(-) > > --- a/include/linux/irqdesc.h > +++ b/include/linux/irqdesc.h > @@ -30,6 +30,8 @@ struct pt_regs; > * @tot_count: stats field for non-percpu irqs > * @irq_count: stats field to detect stalled irqs > * @last_unhandled: aging timer for unhandled count > + * @storm_count: Counter for irq storm detection > + * @storm_checked: Timestamp for irq storm detection > * @irqs_unhandled: stats field for spurious unhandled interrupts > * @threads_handled: stats field for deferred spurious detection of threaded handlers > * @threads_handled_last: comparator field for deferred spurious detection of theraded handlers > @@ -65,6 +67,8 @@ struct irq_desc { > unsigned int tot_count; > unsigned int irq_count; /* For detecting broken IRQs */ > unsigned long last_unhandled; /* Aging timer for unhandled count */ > + unsigned long storm_count; > + unsigned long storm_checked; > unsigned int irqs_unhandled; > atomic_t threads_handled; > int threads_handled_last; > --- a/kernel/irq/manage.c > +++ b/kernel/irq/manage.c > @@ -1581,6 +1581,9 @@ static int > if (!shared) { > init_waitqueue_head(&desc->wait_for_threads); > > + /* Take a timestamp for interrupt storm detection */ > + desc->storm_checked = jiffies; > + > /* Setup the type (level, edge polarity) if configured: */ > if (new->flags & IRQF_TRIGGER_MASK) { > ret = __irq_set_trigger(desc, > --- a/kernel/irq/spurious.c > +++ b/kernel/irq/spurious.c > @@ -21,6 +21,7 @@ static void poll_spurious_irqs(struct ti > static DEFINE_TIMER(poll_spurious_irq_timer, poll_spurious_irqs); > static int irq_poll_cpu; > static atomic_t irq_poll_active; > +static unsigned long irqstorm_limit __ro_after_init; > > /* > * We wait here for a poller to finish. > @@ -189,18 +190,21 @@ static inline int bad_action_ret(irqretu > * (The other 100-of-100,000 interrupts may have been a correctly > * functioning device sharing an IRQ with the failing one) > */ > -static void __report_bad_irq(struct irq_desc *desc, irqreturn_t action_ret) > +static void __report_bad_irq(struct irq_desc *desc, irqreturn_t action_ret, > + bool storm) > { > unsigned int irq = irq_desc_get_irq(desc); > struct irqaction *action; > unsigned long flags; > > - if (bad_action_ret(action_ret)) { > - printk(KERN_ERR "irq event %d: bogus return value %x\n", > - irq, action_ret); > - } else { > - printk(KERN_ERR "irq %d: nobody cared (try booting with " > + if (!storm) { > + if (bad_action_ret(action_ret)) { > + pr_err("irq event %d: bogus return value %x\n", > + irq, action_ret); > + } else { > + pr_err("irq %d: nobody cared (try booting with " > "the \"irqpoll\" option)\n", irq); > + } > } > dump_stack(); > printk(KERN_ERR "handlers:\n"); > @@ -228,7 +232,7 @@ static void report_bad_irq(struct irq_de > > if (count > 0) { > count--; > - __report_bad_irq(desc, action_ret); > + __report_bad_irq(desc, action_ret, false); > } > } > > @@ -267,6 +271,33 @@ try_misrouted_irq(unsigned int irq, stru > return action && (action->flags & IRQF_IRQPOLL); > } > > +static void disable_stuck_irq(struct irq_desc *desc, irqreturn_t action_ret, > + const char *reason, bool storm) > +{ > + __report_bad_irq(desc, action_ret, storm); > + pr_emerg("Disabling %s IRQ #%d\n", reason, irq_desc_get_irq(desc)); > + desc->istate |= IRQS_SPURIOUS_DISABLED; > + desc->depth++; > + irq_disable(desc); > +} > + > +/* Interrupt storm detector for runaway interrupts (handled or not). */ > +static bool irqstorm_detected(struct irq_desc *desc) > +{ > + unsigned long now = jiffies; > + > + if (++desc->storm_count < irqstorm_limit) { > + if (time_after(now, desc->storm_checked + HZ)) { > + desc->storm_count = 0; > + desc->storm_checked = now; > + } > + return false; > + } > + > + disable_stuck_irq(desc, IRQ_NONE, "runaway", true); > + return true; > +} > + > #define SPURIOUS_DEFERRED 0x80000000 > > void note_interrupt(struct irq_desc *desc, irqreturn_t action_ret) > @@ -403,24 +434,16 @@ void note_interrupt(struct irq_desc *des > desc->irqs_unhandled -= ok; > } > > + if (unlikely(irqstorm_limit && irqstorm_detected(desc))) > + return; > + > desc->irq_count++; > if (likely(desc->irq_count < 100000)) > return; > > desc->irq_count = 0; > if (unlikely(desc->irqs_unhandled > 99900)) { > - /* > - * The interrupt is stuck > - */ > - __report_bad_irq(desc, action_ret); > - /* > - * Now kill the IRQ > - */ > - printk(KERN_EMERG "Disabling IRQ #%d\n", irq); > - desc->istate |= IRQS_SPURIOUS_DISABLED; > - desc->depth++; > - irq_disable(desc); > - > + disable_stuck_irq(desc, action_ret, "unhandled", false); > mod_timer(&poll_spurious_irq_timer, > jiffies + POLL_SPURIOUS_IRQ_INTERVAL); > } > @@ -462,5 +485,16 @@ static int __init irqpoll_setup(char *st > "performance\n"); > return 1; > } > - > __setup("irqpoll", irqpoll_setup); > + > +static int __init irqstorm_setup(char *arg) > +{ > + int res = kstrtoul(arg, 0, &irqstorm_limit); > + > + if (!res) { > + pr_info("Interrupt storm detector enabled. Limit=%lu / s\n", > + irqstorm_limit); > + } > + return !!res; > +} > +__setup("irqstorm_limit", irqstorm_setup); It should be __setup("irqstorm_limit=", irqstorm_setup); And I have tested this patch on the P9 machine, where I set the limit to 70000. It works for kdump kernel. Thanks, Pingfan