Received: by 2002:a05:6a10:9e8c:0:0:0:0 with SMTP id y12csp791443pxx; Mon, 26 Oct 2020 23:50:54 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx/k7oZEnJD8HnbdHDwWWKi5iJbUOspXyzoZM2dThg6wqd/wOo1Wg0c889duq9C54zXhagp X-Received: by 2002:a17:906:3cc:: with SMTP id c12mr929313eja.216.1603781453924; Mon, 26 Oct 2020 23:50:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1603781453; cv=none; d=google.com; s=arc-20160816; b=Mfyrnl9nBjbKBWVukWMNP/pmedhh7UJq6osyUqSzC7vnKNo7+E9EnPolfupiatrw78 gm0NXOpTwSMOVKYJMmOjaUNY1cuXHjeAs8i+1752bZP023l0YoqgZztPlhqd4mc/uVyx iMqXbmG7clCPR2boiYVIJl6ZcY4YeJKdvkOltC554VhxMe2Nz1axvd9L/WHV3xguUw3I nqaY+o41KnEHMKx0RA2M4qOUW07LtEkfruIhpVtBoKswFIakPXEeEzbuudGProBPRnCJ HsIxHqL+dt1KanUvZmHJSztO+r/LUshN3tb/dZj+lfLmS1/RDJtXDoRk0I/0ckmsO8zJ +Plg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:references :in-reply-to:subject:cc:to:dkim-signature:dkim-signature:from; bh=HcxD8cvr4m2E7nYWZH8VSGjteDBWyZgfwDuVWY+FQfs=; b=LC30Y8+8YPF68iT7RfltG9eL8T96gJNhw2wTWxhIzZu24ZOjqE3HkpypNT6KKesU7y TBH8GQewAifBrjOkEFQ6c7b9mILVX6Rino0x5ldqQXToqk9j7L5QeRSkdLIu2xifqSb7 HmcO8i0l/a6GuaYPEjzf+eic+9sEY1jgb7wEw4CvkH2kY3GZFByHXc2yWXFgSESoJCc3 zdmuVahKUBWuW8LpY4NSeNw7YZJEpothAOk+aoC2z4jMjlyjTpTNCexegF/gOUaDr2dV iif7Hse7NzrJTZ94HsZuLuAchVAG+kSWg2NJwoFcT/AEwss/ue4RPFa76zr5ZdNfuqx5 6Nrw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=FhY2z9T0; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=ThqFstrt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id f4si219380edn.495.2020.10.26.23.50.32; Mon, 26 Oct 2020 23:50:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=FhY2z9T0; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=ThqFstrt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729901AbgJZUCh (ORCPT + 99 others); Mon, 26 Oct 2020 16:02:37 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:42126 "EHLO galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728205AbgJZT77 (ORCPT ); Mon, 26 Oct 2020 15:59:59 -0400 From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1603742396; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=HcxD8cvr4m2E7nYWZH8VSGjteDBWyZgfwDuVWY+FQfs=; b=FhY2z9T0i3dCMMBAUbn25V/hUkj6A2GJELIQz4b1whSe/gRIHyRWJWaWRwZKnFMS5vMF1c sfGykHoEtzl5Mr9FqP2oqxOX8Q0u1zTMoX1f+yXojEDO7ztcMm2V33RGl3WRm4Bk5cTkdB 1/TbDGOsbuOy0gYfQZEbZy0vBmgV/6TLKZopFK+ErcTD22jv5HPc4BxSkZYFDEKFcCgdHp Xe3aLtpRQIXtDg5/BgyqPtQ4Zv7ueKykRETRQPbe4adhTjbTT1aQRXxcVZGuZdB6AC0c+H aUHcu0BO7X95SqOKOuFoFBbJlFwQh/qAuUt6ayIL4+XxnTgKLCe+I32Pp4DtVQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1603742396; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=HcxD8cvr4m2E7nYWZH8VSGjteDBWyZgfwDuVWY+FQfs=; b=ThqFstrtyu1YATdBCd9nTnN0iyqg0KDpfrKuqOH/t6oREHRn1zpWAQPHKNEZ7CAvGKa1PA eGHhbxhG4rukR7AA== To: Guilherme Piccoli , Pingfan Liu Cc: LKML , Peter Zijlstra , Jisheng Zhang , Andrew Morton , Petr Mladek , Marc Zyngier , Linus Walleij , afzal mohammed , Lina Iyer , "Gustavo A. R. Silva" , Maulik Shah , Al Viro , Jonathan Corbet , Pawan Gupta , Mike Kravetz , Oliver Neukum , linux-doc@vger.kernel.org, Kexec Mailing List , Bjorn Helgaas Subject: Re: [PATCH 0/3] warn and suppress irqflood In-Reply-To: References: <1603346163-21645-1-git-send-email-kernelfans@gmail.com> <871rhq7j1h.fsf@nanos.tec.linutronix.de> Date: Mon, 26 Oct 2020 20:59:56 +0100 Message-ID: <87y2js3ghv.fsf@nanos.tec.linutronix.de> MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Oct 26 2020 at 12:06, Guilherme Piccoli wrote: > On Sun, Oct 25, 2020 at 8:12 AM Pingfan Liu wrote: > > Some time ago (2 years) we faced a similar issue in x86-64, a hard to > debug problem in kdump, that eventually was narrowed to a buggy NIC FW > flooding IRQs in kdump kernel, and no messages showed (although kernel > changed a lot since that time, today we might have better IRQ > handling/warning). We tried an early-boot fix, by disabling MSIs (as > per PCI spec) early in x86 boot, but it wasn't accepted - Bjorn asked > pertinent questions that I couldn't respond (I lost the reproducer) > [0]. ... > [0] lore.kernel.org/linux-pci/20181018183721.27467-1-gpiccoli@canonical.com With that broken firmware the NIC continued to send MSI messages to the vector/CPU which was assigned to it before the crash. But the crash kernel has no interrupt descriptor for this vector installed. So Liu's patches wont print anything simply because the interrupt core cannot detect it. To answer Bjorns still open question about when the point X is: https://lore.kernel.org/linux-pci/20181023170343.GA4587@bhelgaas-glaptop.roam.corp.google.com/ It gets flooded right at the point where the crash kernel enables interrupts in start_kernel(). At that point there is no device driver and no interupt requested. All you can see on the console for this is "common_interrupt: $VECTOR.$CPU No irq handler for vector" And contrary to Liu's patches which try to disable a requested interrupt if too many of them arrive, the kernel cannot do anything because there is nothing to disable in your case. That's why you needed to do the MSI disable magic in the early PCI quirks which run before interrupts get enabled. Also Liu's patch only works if: 1) CONFIG_IRQ_TIME_ACCOUNTING is enabled 2) the runaway interrupt has been requested by the relevant driver in the dump kernel. Especially #1 is not a sensible restriction. Thanks, tglx