Received: by 2002:a5b:505:0:0:0:0:0 with SMTP id o5csp165522ybp; Thu, 3 Oct 2019 11:43:23 -0700 (PDT) X-Google-Smtp-Source: APXvYqx0CJN2a9gnUXvVEpyOpQuP3STr6wBkRYk9nEXnF7gRhurW6zJMnV6ksVNVyKn36wZDq25t X-Received: by 2002:a17:906:c721:: with SMTP id fj1mr8856356ejb.177.1570128203666; Thu, 03 Oct 2019 11:43:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1570128203; cv=none; d=google.com; s=arc-20160816; b=fvMmjx8X/2w6XQBvHno+mtu8Nt38jh6QdB0wpTo2wnpFQcGCg159aWpp6Ezq1LrEsN 1ji2S5hPwcE4qnCboLYU8RO+7JOF3kGtdWB05W4sdlPH+N4QgckN+7iYWC2vpYtF4mJl 3gN8QKbYo0oaY/9TOgWxCjluFCVK7WeT5luTNznpT9aTlwdHKauMRY8p3Mc9Ef/YqeKv IMESy2HWHY8tpHEGc2BAU3mGeSpH2sh+ElpZN99Eio8sIW8Dp7Mp7h5qBxDFfJSZPADv pSggqMuIiiAIG7cvOIZ+XdFvm+oIB8Pyw4hLaOo5sjPtO9d8TfxSOeFOoDjLK5oGNTB3 h1sQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=F4BANh4jYufVlD3yr9K5ti4yNN2QxjNsJv2HL10KC7Q=; b=UqvkvsOmjWK1w4zJtqF6cFajKzwq/r8vLa0svltrjmxN2cIxbKhfQHkMnR9fSOkZE2 MTMwU9cAkFuu+bZzNYcbbNcZZNTOZIWnJi61rle7zyJZcdo0KFqFARXGMGN8HabdSduo kPalXQpLtNmORzloMR4woGCL/RvXXRl38IL2oWAZa3vKN3ausCWoVHG115taz15f0Xz6 H9GFrH5wYJWq+kIxakFRjL4qH2Cv2X9X1PSV72XVyNkuLHR25HqCumeZwnvf5/63Mlzd IRMD4SkzXIY9vbwQPqwjuFBJv+3MjegG69QAyoqp2X0WCZFlKppYGi6dDmGnfp/fPcpV Wa9g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=E3Cklpvf; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b15si1802706eda.152.2019.10.03.11.42.59; Thu, 03 Oct 2019 11:43:23 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=E3Cklpvf; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389796AbfJCRkD (ORCPT + 99 others); Thu, 3 Oct 2019 13:40:03 -0400 Received: from mail-qk1-f194.google.com ([209.85.222.194]:39502 "EHLO mail-qk1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731560AbfJCRkB (ORCPT ); Thu, 3 Oct 2019 13:40:01 -0400 Received: by mail-qk1-f194.google.com with SMTP id 4so3211729qki.6; Thu, 03 Oct 2019 10:40:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=F4BANh4jYufVlD3yr9K5ti4yNN2QxjNsJv2HL10KC7Q=; b=E3CklpvfMzqofh3/d/MYvH+IMRSZL64nyVP7noUQOtvqvLhML7zXDv2JX81PwDTriG Mc1vXfC77IHwsRJhCoTCFZflcR0wWMH28iLEoYxwN6lU8qxBmYd3rou5QhT+lrrQn2Jl 2QNmX3d7XKjYg1dsQjRxDttu8zmPfccNm8dj5UHDsXdn7+6xbyYc0K96CFlKhj2Mp8J3 e0cmJ3hwN8RXLG2oZyzkYZlQ0GtbRjlrugDkdSxdsfsZv/H8DuYKZuN62gppYQIl+hub QMvTM6eKjggx5qazaDHGzB86YoVCMLhVMK8ClwNl7fx+W+kb3IGq2j/j7ng3zpcjiVJH WYQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=F4BANh4jYufVlD3yr9K5ti4yNN2QxjNsJv2HL10KC7Q=; b=XoXR5hlPAtMtiCMRAFop6CLgT3UMaWAIglZjAKaCMZjJPkUlQFbkaRFnw2jD0fXhlL +8xymWMVjHwUDmOEMRkkhja7dbkghgWHygq7rWftZ1o7/emQ3zC1ltt5Z0AlNmrKr9Vk dR9HfuYWGiBfaZlawEQMXj4b83WZpcwV6y9tXjc2zxI+SRflttz9BEsJf1ug+DvEitX6 gNCqbaV/vNXDZAcOFyH2EHiBX6b7H7TCwDVWdJ7+J4Nkvj5hX0EoydS5Ht9dD9mN1WJE EzSJyU785sPmbbW0Du0DImenj3n2bPyTrOgFJPsXZw+DY9RvfUT8kW0GudSAmLqdod3g gahw== X-Gm-Message-State: APjAAAWqEgYViOhCb8ABJjWV/bIYrqo90ZyqccNY7DZAeoCukTiW1+PI RzB/VH+UEKYeVUYKmiHENRD+INy1yUj0MlJWruA= X-Received: by 2002:a05:620a:249:: with SMTP id q9mr5601851qkn.491.1570124400731; Thu, 03 Oct 2019 10:40:00 -0700 (PDT) MIME-Version: 1.0 References: <1570121672-12172-1-git-send-email-zdai@linux.vnet.ibm.com> In-Reply-To: <1570121672-12172-1-git-send-email-zdai@linux.vnet.ibm.com> From: Alexander Duyck Date: Thu, 3 Oct 2019 10:39:49 -0700 Message-ID: Subject: Re: [v1] e1000e: EEH on e1000e adapter detects io perm failure can trigger crash To: David Dai Cc: Jeff Kirsher , David Miller , intel-wired-lan , Netdev , LKML , zdai@us.ibm.com Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Oct 3, 2019 at 9:59 AM David Dai wrote: > > We see the behavior when EEH e1000e adapter detects io permanent failure, > it will crash kernel with this stack: > EEH: Beginning: 'error_detected(permanent failure)' > EEH: PE#900000 (PCI 0115:90:00.1): Invoking e1000e->error_detected(permanent failure) > EEH: PE#900000 (PCI 0115:90:00.1): e1000e driver reports: 'disconnect' > EEH: PE#900000 (PCI 0115:90:00.0): Invoking e1000e->error_detected(permanent failure) > EEH: PE#900000 (PCI 0115:90:00.0): e1000e driver reports: 'disconnect' > EEH: Finished:'error_detected(permanent failure)' > Oops: Exception in kernel mode, sig: 5 [#1] > NIP [c0000000007b1be0] free_msi_irqs+0xa0/0x280 > LR [c0000000007b1bd0] free_msi_irqs+0x90/0x280 > Call Trace: > [c0000004f491ba10] [c0000000007b1bd0] free_msi_irqs+0x90/0x280 (unreliable) > [c0000004f491ba70] [c0000000007b260c] pci_disable_msi+0x13c/0x180 > [c0000004f491bab0] [d0000000046381ac] e1000_remove+0x234/0x2a0 [e1000e] > [c0000004f491baf0] [c000000000783cec] pci_device_remove+0x6c/0x120 > [c0000004f491bb30] [c00000000088da6c] device_release_driver_internal+0x2bc/0x3f0 > [c0000004f491bb80] [c00000000076f5a8] pci_stop_and_remove_bus_device+0xb8/0x110 > [c0000004f491bbc0] [c00000000006e890] pci_hp_remove_devices+0x90/0x130 > [c0000004f491bc50] [c00000000004ad34] eeh_handle_normal_event+0x1d4/0x660 > [c0000004f491bd10] [c00000000004bf10] eeh_event_handler+0x1c0/0x1e0 > [c0000004f491bdc0] [c00000000017c4ac] kthread+0x1ac/0x1c0 > [c0000004f491be30] [c00000000000b75c] ret_from_kernel_thread+0x5c/0x80 > > Basically the e1000e irqs haven't been freed at the time eeh is trying to > remove the the e1000e device. > Need to make sure when e1000e_close is called to bring down the NIC, > if adapter error_state is pci_channel_io_perm_failure, it should also > bring down the link and free irqs. > > Reported-by: Morumuri Srivalli > Signed-off-by: David Dai > --- > drivers/net/ethernet/intel/e1000e/netdev.c | 3 ++- > 1 files changed, 2 insertions(+), 1 deletions(-) > > diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c > index d7d56e4..cf618e1 100644 > --- a/drivers/net/ethernet/intel/e1000e/netdev.c > +++ b/drivers/net/ethernet/intel/e1000e/netdev.c > @@ -4715,7 +4715,8 @@ int e1000e_close(struct net_device *netdev) > > pm_runtime_get_sync(&pdev->dev); > > - if (!test_bit(__E1000_DOWN, &adapter->state)) { > + if (!test_bit(__E1000_DOWN, &adapter->state) || > + (adapter->pdev->error_state == pci_channel_io_perm_failure)) { > e1000e_down(adapter, true); > e1000_free_irq(adapter); It seems like the issue is the fact that e1000_io_error_detected is calling e1000e_down without the e1000_free_irq() bit. Instead of doing this couldn't you simply add the following to e1000_is_slot_reset in the "result = PCI_ERS_RESULT_DISCONNECT" case: if (netif_running(netdev) e1000_free_irq(adapter); Alternatively we could look at freeing and reallocating the IRQs in the event of an error like we do for the e1000e_pm_freeze and e1000e_pm_thaw cases. That might make more sense since we are dealing with an error we might want to free and reallocate the IRQ resources assigned to the device. Thanks. - Alex