Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp2096421ybt; Mon, 15 Jun 2020 18:38:29 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwzemPFSI15Car2YvIcbgkJiKTxR3iSLMWN3Q4k9ocvC7kL5P3PCGiuToUY/fShyO3uPzQF X-Received: by 2002:aa7:cdcb:: with SMTP id h11mr522612edw.218.1592271509536; Mon, 15 Jun 2020 18:38:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1592271509; cv=none; d=google.com; s=arc-20160816; b=Sy9cbJotCf8xxyyEZT6hYwenPAuLuos5N5xrIt/CbE82qopV8133VX3PjI8/pR8Po2 Z6jOwg4nors6+emxFs7Qz/e4FaSYqXaCMsbpMMtL6/snBP5B2SkzNftwVqUByWSS56bn akA3BGjzC21DX3vS7Ln5NMCkkyp59+hoOT9MTie2UuVR29gw7Rs6ru69otaca+CXROiM PaASOcga8+IMDJt3Lvp38iaB1LzhpsfAzIJNb7bdqEb9RxDeZv60KW2MKVZejHXdJNWn dej32x/ncROKOjvqKLcwPD4xo6dE+8RzY0a+YZaNo9BT6rGnHQjXoJjgFy/wRdHClG88 KPSA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=cDOEIQV5YQAeIdEMMGx3+/GevUyhzhDnAbEIVpCOnXI=; b=U8kxyEI27H3FqQIBb35ThlFyhU0lRQ1fBDbulfDFK4O+c5ewdGI+rgw2RwNP5lSxyB 1oJAv8m/y5uIBjstuSozvFpyKc2kSP+zd9VJXcnq+4yWwSeMgrHrahLxUTYz/m5NWkDC Y7ixfzhmnRwRC0txchAl0mzs7tDBay+HmYrgVgTfwxu8NaW2oNhXPBgLOMSesz8UMxsi bHnKZfGXMizpANSYTWRbTMUzuZu7G2qa9ZuQf6PhpwGwE0+rCu2IuHOl778dlRDpUITe Mz5DXJJsmSrGQFopO/RNcSpKV9VrtNSj7/nHaDASIkBIkuzRVhLtTDDz2jrI8GwaCzhw yFbw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@broadcom.com header.s=google header.b=B8O7J4rJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=broadcom.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h12si9726270eji.721.2020.06.15.18.38.06; Mon, 15 Jun 2020 18:38:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@broadcom.com header.s=google header.b=B8O7J4rJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=broadcom.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726317AbgFPBdm (ORCPT + 99 others); Mon, 15 Jun 2020 21:33:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55860 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726253AbgFPBdm (ORCPT ); Mon, 15 Jun 2020 21:33:42 -0400 Received: from mail-qt1-x842.google.com (mail-qt1-x842.google.com [IPv6:2607:f8b0:4864:20::842]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F2DC9C08C5C2 for ; Mon, 15 Jun 2020 18:33:41 -0700 (PDT) Received: by mail-qt1-x842.google.com with SMTP id c12so14279243qtq.11 for ; Mon, 15 Jun 2020 18:33:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=broadcom.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=cDOEIQV5YQAeIdEMMGx3+/GevUyhzhDnAbEIVpCOnXI=; b=B8O7J4rJokQDXjU31zgPybo1cTW2Je/Kd6MMZp1L9D1ePsA7Yu4unjGsty4wjVY6y5 6KOLOawoFG95cEFHEsatqraAC7BNZXmTMnc1cgE/6paJHcMrwiZUwPlyLqKnmoXZzH6E f8uVTRRbgDFf56GPhBFtDkrkqau8ltI7y0uho= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=cDOEIQV5YQAeIdEMMGx3+/GevUyhzhDnAbEIVpCOnXI=; b=JaYNSUC1u5ZABeZCieSfKqJAcny3ri0u9VyJ9XNWj/7dPeSwHmaJ+48yORnbgYYqDC v9CZFbwupN83C7yOTotluG/DsueFGsOMhGUBtUS1dcKupln4DsBXCO8NZTEyQdicb0I7 7hwcckXqsFjPTLFE1vedbLGLTzgZK0saRquh43WGBeOG8f5csZdyDA1mFNVaKl3BBNuA s1cwxw243InmUv0G37F4zo3LsPYyqYTZUPbS7X+pRucKdbhea5yvSkhIu23bHyjS5sgB YLwUudIoqiN4sb6I+fH3KDaMQ2M0m0RV4OAkOkfKHJ9wGMrXJqXA1ivbuHQRWUhsI6we ythg== X-Gm-Message-State: AOAM533O+UZG6wBbg1B+mB/pxmb48Xr7FsKahnw0CHW1L32NgGpimMpT G5tSeRU4d6v4uGsI1eN3Gou/2jszZGTVWPkXHNrMcA== X-Received: by 2002:aed:35af:: with SMTP id c44mr18911337qte.349.1592271221067; Mon, 15 Jun 2020 18:33:41 -0700 (PDT) MIME-Version: 1.0 References: <20200615190119.382589-1-drc@linux.vnet.ibm.com> <95bf20c6-a812-32ad-fd38-45cba7e10491@linux.vnet.ibm.com> In-Reply-To: <95bf20c6-a812-32ad-fd38-45cba7e10491@linux.vnet.ibm.com> From: Michael Chan Date: Mon, 15 Jun 2020 18:33:29 -0700 Message-ID: Subject: Re: [PATCH] tg3: driver sleeps indefinitely when EEH errors exceed eeh_max_freezes To: David Christensen Cc: Netdev , Siva Reddy Kallam , Prashant Sreedharan , Michael Chan , open list Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jun 15, 2020 at 3:21 PM David Christensen wrote: > > On 6/15/20 1:45 PM, Michael Chan wrote: > > On Mon, Jun 15, 2020 at 12:01 PM David Christensen > > wrote: > >> > >> The driver function tg3_io_error_detected() calls napi_disable twice, > >> without an intervening napi_enable, when the number of EEH errors exceeds > >> eeh_max_freezes, resulting in an indefinite sleep while holding rtnl_lock. > >> > >> The function is called once with the PCI state pci_channel_io_frozen and > >> then called again with the state pci_channel_io_perm_failure when the > >> number of EEH failures in an hour exceeds eeh_max_freezes. > >> > >> Protecting the calls to napi_enable/napi_disable with a new state > >> variable prevents the long sleep. > > > > This works, but I think a simpler fix is to check tp->pcierr_recovery > > in tg3_io_error_detected() and skip most of the tg3 calls (including > > the one that disables NAPI) if the flag is true. > > This might be the smallest change that would work. Does it make sense > to the reader? > > diff --git a/drivers/net/ethernet/broadcom/tg3.c > b/drivers/net/ethernet/broadcom/tg3.c > index 7a3b22b35238..1f37c69d213d 100644 > --- a/drivers/net/ethernet/broadcom/tg3.c > +++ b/drivers/net/ethernet/broadcom/tg3.c > @@ -18168,8 +18168,8 @@ static pci_ers_result_t > tg3_io_error_detected(struct pci_dev *pdev, > > rtnl_lock(); > > - /* We probably don't have netdev yet */ > - if (!netdev || !netif_running(netdev)) > + /* May be second call or maybe we don't have netdev yet */ > + if (tp->pcierr_recovery || !netdev || !netif_running(netdev)) Dereferencing tp needs to be done after checking netdev. If we don't have netdev, tp won't be valid. Other than that, I think the logic looks good and is quite clear. Thanks.