Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp3304151imm; Fri, 19 Oct 2018 08:26:03 -0700 (PDT) X-Google-Smtp-Source: ACcGV63S5AFmkzC1HSvKbIoKSfRLd19u9PaR4wGAEcFoa7dFhTLFfPSxYzSHnH9uRvpd5RJJOEtL X-Received: by 2002:a62:b87:: with SMTP id 7-v6mr35037271pfl.67.1539962763033; Fri, 19 Oct 2018 08:26:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539962763; cv=none; d=google.com; s=arc-20160816; b=nNbyB2iAPdrIdfTPT1Hb5fwDNH57il1ecukDrFFWhIJhsEVFqYJ4181a9qzgmFlDX6 pfkyONxmMvGLQHtZQqy91dV1sEBbUEBRLKzGtW3W1n6R8EFewWT4Dt89SendVHuDVVEQ dQTQGyBQqX0m7J5ARDXjpNM5s3p1fU25rqrk43SRu9XMdDeiDUBpmoY2uT2OPVtqCdVF 7uqDskYyyT8an6n7a35STNN2ogl8NXI/UIIwl/jIkbLLnbBDBvc996S8k8+F9Ui7I7E4 4grfu2H9QYexA17th++FvYnIkYa9oy0TF34enV4L9e/fh7Q1u0NEkFA7ttX/uVD+OH+v z48w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=OCRnk7l5FicepFP8fRUNpUHWr5QAOHIw4uw2Mxaq6qA=; b=R/QXqCNK5pDcAROsf7jW13ssu9u6K5WIOHK1dX5zrOY7FqQMMFtnhEymOPsrCF4st3 6fJfI1Y63/ExjXxhUiNW2nlDtwr5iO1Mmh5nIpDTvfycEKK7w3S3Id9cJ/KW/NPeM79Z wccRkxn4sHNgJfWmF24zy1P0r5VKa9/qdu2FlJJJSq7YV3DuafavwmZwgJc2gMj/ut6b ef5yOZvd4wSr8lMFLuNW8D7AAZgwufQoWBMsHB+KiEFBiWOSGCjq+A0SLAULBQZHthsV QUtOzxGeVOx+Q0ACQ6dvwqyc5cqdZwQGsPDjX3JTRxtIUPxRFRq0YL1/dy7lJkTsJ0tS F3KA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d2-v6si23141608plo.210.2018.10.19.08.25.47; Fri, 19 Oct 2018 08:26:02 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727467AbeJSX3o (ORCPT + 99 others); Fri, 19 Oct 2018 19:29:44 -0400 Received: from bmailout3.hostsharing.net ([176.9.242.62]:36775 "EHLO bmailout3.hostsharing.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726964AbeJSX3o (ORCPT ); Fri, 19 Oct 2018 19:29:44 -0400 Received: from h08.hostsharing.net (h08.hostsharing.net [83.223.95.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.hostsharing.net", Issuer "COMODO RSA Domain Validation Secure Server CA" (not verified)) by bmailout3.hostsharing.net (Postfix) with ESMTPS id BF9B6101E6845; Fri, 19 Oct 2018 17:23:07 +0200 (CEST) Received: by h08.hostsharing.net (Postfix, from userid 100393) id 64D3923FCD1; Fri, 19 Oct 2018 17:23:07 +0200 (CEST) Date: Fri, 19 Oct 2018 17:23:07 +0200 From: Lukas Wunner To: Thomas Gleixner Cc: linux-kernel@vger.kernel.org, Mathias Duckeck , Akshay Bhat , Casey Fitzpatrick Subject: Re: [PATCH] genirq: Fix race on spurious interrupt detection Message-ID: <20181019152307.62t6al6ney5ofo36@wunner.de> References: <1dfd8bbd16163940648045495e3e9698e63b50ad.1539867047.git.lukas@wunner.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Oct 19, 2018 at 04:31:30PM +0200, Thomas Gleixner wrote: > On Thu, 18 Oct 2018, Lukas Wunner wrote: > > Commit 1e77d0a1ed74 ("genirq: Sanitize spurious interrupt detection of > > threaded irqs") made detection of spurious interrupts work for threaded > > handlers by: > > > > a) incrementing a counter every time the thread returns IRQ_HANDLED, and > > b) checking whether that counter has increased every time the thread is > > woken. > > > > However for oneshot interrupts, the commit unmasks the interrupt before > > incrementing the counter. If another interrupt occurs right after > > unmasking but before the counter is incremented, that interrupt is > > incorrectly considered spurious: > > > > time > > | irq_thread() > > | irq_thread_fn() > > | action->thread_fn() > > | irq_finalize_oneshot() > > | unmask_threaded_irq() /* interrupt is unmasked */ > > | > > | /* interrupt fires, incorrectly deemed spurious */ > > | > > | atomic_inc(&desc->threads_handled); /* counter is incremented */ > > v > > > > I am seeing this with a hi3110 CAN controller receiving data at high > > volume (from a separate machine sending with "cangen -g 0 -i -x"): > > The controller signals a huge number of interrupts (hundreds of millions > > per day) and every second there are about a dozen which are deemed > > spurious. The issue is benign in this case, mostly just an irritation, > > but I'm worrying that at high CPU load and in the presence of higher > > priority tasks, the number of incorrectly detected spurious interrupts > > might increase beyond the 99,900 threshold and cause disablement of the > > IRQ. > > I doubt that this can happen in reality, so I'd rather reword that > paragraph slightly: > > In theory high CPU load and in the presence of higher priority tasks, the > number of incorrectly detected spurious interrupts might increase beyond > the 99,900 threshold and cause disablement of the interrupt. > > In practice it just increments the spurious interrupt count. But that can > cause people to waste time investigating it over and over. > > Hmm? Sure, fine by me. Would you prefer me to resend with that change or can you fold it in when applying? FWIW I did manage to reach the 99,900 threshold once because I had added copious amounts of printk() to the hi3110 IRQ thread to debug another issue. But I never experienced that without those printk()'s. Here's the resulting splat: irq 194: nobody cared (try booting with the "irqpoll" option) CPU: 0 PID: 1929 Comm: candump Tainted: G O 4.9.76-rt60-v7+ #1 Hardware name: BCM2835 [<8011106c>] (unwind_backtrace) from [<8010cdd8>] (show_stack+0x20/0x24) [<8010cdd8>] (show_stack) from [<8047cb2c>] (dump_stack+0xc8/0x10c) [<8047cb2c>] (dump_stack) from [<8018192c>] (__report_bad_irq+0x3c/0xdc) [<8018192c>] (__report_bad_irq) from [<80181d94>] (note_interrupt+0x29c/0x2ec) [<80181d94>] (note_interrupt) from [<8017ec9c>] (handle_irq_event_percpu+0x78/0x84) [<8017ec9c>] (handle_irq_event_percpu) from [<8017ed20>] (handle_irq_event+0x78/0xbc) [<8017ed20>] (handle_irq_event) from [<80182ad8>] (handle_edge_irq+0x13c/0x1e8) [<80182ad8>] (handle_edge_irq) from [<8017db64>] (generic_handle_irq+0x34/0x44) [<8017db64>] (generic_handle_irq) from [<804ae1a0>] (bcm2835_gpio_irq_handle_bank+0x88/0xac) [<804ae1a0>] (bcm2835_gpio_irq_handle_bank) from [<804ae2ac>] (bcm2835_gpio_irq_handler+0xe8/0x154) [<804ae2ac>] (bcm2835_gpio_irq_handler) from [<8017db64>] (generic_handle_irq+0x34/0x44) [<8017db64>] (generic_handle_irq) from [<804a7720>] (bcm2836_chained_handle_irq+0x38/0x50) [<804a7720>] (bcm2836_chained_handle_irq) from [<8017db64>] (generic_handle_irq+0x34/0x44) [<8017db64>] (generic_handle_irq) from [<8017e144>] (__handle_domain_irq+0x6c/0xc4) [<8017e144>] (__handle_domain_irq) from [<8010155c>] (bcm2836_arm_irqchip_handle_irq+0xac/0xb0) [<8010155c>] (bcm2836_arm_irqchip_handle_irq) from [<80775dec>] (__irq_usr+0x4c/0x60) Exception stack(0xb6b15fb0 to 0xb6b15ff8) 5fa0: 76ec1d50 0000000a 011f8026 fbad2aa4 5fc0: 76ec1d50 0000000a 76f1a000 000263bc 00000000 000263fc 00000001 7ec3f470 5fe0: 00000444 7ec3f2e8 76de5b90 76def528 40000010 ffffffff handlers: [<8017edc0>] irq_default_primary_handler threaded [<7f37c734>] hi3110_can_ist [hi311x] Disabling IRQ #194 Thanks, Lukas