Received: by 2002:a05:6358:1087:b0:cb:c9d3:cd90 with SMTP id j7csp4294593rwi; Mon, 17 Oct 2022 04:29:26 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4lfssCrk8OLg331INIMBtHvPe25lmwzpOsDuXil4tNRKCT3voKXjk07q8H9rEyNj6y9Az0 X-Received: by 2002:a05:6402:f96:b0:459:4180:6cf4 with SMTP id eh22-20020a0564020f9600b0045941806cf4mr9797934edb.64.1666006165880; Mon, 17 Oct 2022 04:29:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666006165; cv=none; d=google.com; s=arc-20160816; b=tsL4djZu5zaMQWEqG45N7s6GyHrGGVzXPa9+hLjD8nD5/U4yBvCjz/xjjhBhDKWj3t gdo+QQCkyPctuTAzM3VGbWh3Rn+mFqKRCqgAEqYHkUiCTUqxQN1oNbmLBE2bloIJBvPj ayNYyY2Ai9QH4Zt7dPYS+zuwIRk/6/TEofAp878xHnYcIxCBTGxKH31R0UFLimAV/101 cAV15rh4VJsPj380Xd/VcuGzzx6WKKt6VBy2w32YOPW6/2ZiBIQpNec2lXcRSdqIpbf2 /M2GcwBzmzLP0So29vV/9P0Kc3yOUbmEff36lKqenaRP/MwI+5umVUn/8qYUWs7iKo0L ZKyw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:references :in-reply-to:subject:cc:to:dkim-signature:dkim-signature:from; bh=NwZndv+r/c65ODTpqsHYf2SNVrRCUFFbsjAWIg9kKY4=; b=DccjqR6L9W8LZ4wgMlXUH2H79rB4j+YpneYBnDSsw1q/ITGNlS6y4AJmjw3xnD4wAg 6zeLbyPtaeTO1+U1aPtNWTunU9OPehh3UqCVcgMUFcUMznpXTzKFL02y3hPeLXRcn4Pf 3QsYb2MDz374FwJWap9QoFIaQMEYDv1oI5nHUXMz/MEaFGhLHVYZ7M2icI1BdnLb4/5d LTdAvE7ospgFZLAv8H0mb5O524s/K+Y80p7zQQYXbu0LpiMckJbcdyGN5bP4PzaNfp5A ypUl34pHM9IWeB50qL1QYkKDl1wmvywm1LsgkaF2SpqSHrn+sVuuC1iqV+3skxhGcFgV /SfA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=Avu8o04A; dkim=neutral (no key) header.i=@linutronix.de header.b=BGwiEI+y; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id xg11-20020a170907320b00b0077e9417b5a7si9969287ejb.938.2022.10.17.04.29.00; Mon, 17 Oct 2022 04:29:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=Avu8o04A; dkim=neutral (no key) header.i=@linutronix.de header.b=BGwiEI+y; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229732AbiJQLVe (ORCPT + 99 others); Mon, 17 Oct 2022 07:21:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45654 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230113AbiJQLVc (ORCPT ); Mon, 17 Oct 2022 07:21:32 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C8D815F123 for ; Mon, 17 Oct 2022 04:21:24 -0700 (PDT) From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1666005682; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=NwZndv+r/c65ODTpqsHYf2SNVrRCUFFbsjAWIg9kKY4=; b=Avu8o04AU2agPazv92IW7hNK5OL16hToBAA6+kuliXcr0jeSAa+vnvfWUvsQyluNSHfJ0+ zpVOrOo9L52qjMeoPeuD8VH7pjZdTsMY39mi1YRSO+r1HHm3Dm/LDHBNAYXBTrjQH9czJZ 7IMunqc5PvpryVXBZThrmfl3TTVn0XN38URBxZAr79joSjRKYmrAWCCeaj3IS6o0+cEhaP i7c8AWNHLsKWNfY0gbdwp7qhvlUNkAlRi3NcshGWdEAzQ278DtSAne6XaVI61qz0bk4IC2 qrv3qSJ900OAWhHlufImz/Tf7ojGoabn20pK6w/nlarIP9oPiUWu5dr5zkK4Pw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1666005682; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=NwZndv+r/c65ODTpqsHYf2SNVrRCUFFbsjAWIg9kKY4=; b=BGwiEI+y2tuUADNq47OpiRmlyv5FD7kidfoNo6JV7nnM6TopoeEx8+6Vwdidia04rB8Rqt CEKkAmqyDP9zWZAA== To: Zhang Xincheng , maz Cc: linux-kernel , oleksandr , Hans de Goede , bigeasy , "mark.rutland" , michael Subject: Re: [PATCH] interrupt: discover and disable very frequent interrupts In-Reply-To: References: <20220930064042.14564-1-zhangxincheng@uniontech.com> <86bkqx6wrd.wl-maz@kernel.org> <868rm16tbu.wl-maz@kernel.org> <87tu4dhhh2.wl-maz@kernel.org> Date: Mon, 17 Oct 2022 13:21:21 +0200 Message-ID: <8735bmr8zy.ffs@tglx> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Oct 09 2022 at 18:02, Zhang Xincheng wrote: > > +config FREQUENT_IRQ_DEBUG > + bool "Support for finding and reporting frequent interrupt" > + default n > + help > + Pointless newline. > + This is a mechanism to detect and report that interrupts > + are triggered too frequently. > + > +config COUNT_PER_SECOND > + int "Interrupt limit per second" > + depends on FREQUENT_IRQ_DEBUG > + default "2000" > + help 2000 interrupts per second is just a hillarious low default. Trivial to reach with networking. Aside of that on systems where the per CPU timer interrupt goes through this code, that's trivial to exceed with something simple like a periodic timer with a 250us interval. > +#ifdef CONFIG_FREQUENT_IRQ_DEBUG > +#define COUNT_PER_SECOND_MASK 0x0000ffff > +#define DURATION_LIMIT_MASK 0xffff0000 > +#define DURATION_LIMIT_COUNT 0x00010000 > +#define DURATION_LIMIT_OFFSET 16 > +static unsigned int count_per_second = CONFIG_COUNT_PER_SECOND; > +static unsigned int duration_limit = CONFIG_DURATION_LIMIT; > +static bool disable_frequent_irq; > +#endif /* CONFIG_FREQUENT_IRQ_DEBUG */ > + > /* > * We wait here for a poller to finish. > * > @@ -189,18 +199,16 @@ static inline int bad_action_ret(irqreturn_t action_ret) > * (The other 100-of-100,000 interrupts may have been a correctly > * functioning device sharing an IRQ with the failing one) > */ > -static void __report_bad_irq(struct irq_desc *desc, irqreturn_t action_ret) > +static void __report_bad_irq(struct irq_desc *desc, irqreturn_t action_ret, const char *msg) > { > unsigned int irq = irq_desc_get_irq(desc); > struct irqaction *action; > unsigned long flags; > > if (bad_action_ret(action_ret)) { > - printk(KERN_ERR "irq event %d: bogus return value %x\n", > - irq, action_ret); > + printk(msg, irq, action_ret); This wants to be pr_err() and that change needs to be split out into a seperate patch if at all. > +#ifdef CONFIG_FREQUENT_IRQ_DEBUG > +/* > + * Some bad hardware will trigger interrupts very frequently, which will > + * cause the CPU to process hardware interrupts all the time. We found > + * and reported it, and disabling it is optional. > + */ > +void report_frequent_irq(struct irq_desc *desc, irqreturn_t action_ret) static, no? > +{ > + if (desc->have_reported) > + return; > + > + if ((desc->gap_count & DURATION_LIMIT_MASK) == 0) What's the point of this mask dance here? Use seperate variables. This is unreadable and overoptimized for no value. Also why is a simple count per second not sufficient? Why do you need the extra duration limit? > + desc->gap_time = get_jiffies_64(); Why does this need 64bit jiffies? 32bit are plenty enough. > + > + desc->gap_count++; > + > + if ((desc->gap_count & COUNT_PER_SECOND_MASK) >= count_per_second) { > + if ((get_jiffies_64() - desc->gap_time) < HZ) { > + desc->gap_count += DURATION_LIMIT_COUNT; > + desc->gap_count &= DURATION_LIMIT_MASK; > + } else { > + desc->gap_count = 0; > + } > + > + if ((desc->gap_count >> DURATION_LIMIT_OFFSET) >= duration_limit) { > + __report_bad_irq(desc, action_ret, KERN_ERR "irq %d: triggered too " > + "frequently\n"); > + desc->have_reported = true; > + if (disable_frequent_irq) > + irq_disable(desc); How is this rate limiting? This is simply disabling the interrupt. So again if your limit is too narrow you might simply disable the wrong interrupt and render the machine useless. So if enabled in Kconfig it must be default off and you need a command line parameter to turn it on, but TBH I'm less than convinced that this is actually useful for general purpose debugging in it's current form simply because it is hard to get the limit right. Thanks, tglx