Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp1865340pxk; Sat, 3 Oct 2020 00:08:51 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxONgzYBcgQjxJXLAtURU2sPgqKWQcpZx/hVmy4iW6+xJ6ovYh2jVM5TfPJn1Du8dMC7v6T X-Received: by 2002:aa7:c48d:: with SMTP id m13mr6698961edq.326.1601708931772; Sat, 03 Oct 2020 00:08:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1601708931; cv=none; d=google.com; s=arc-20160816; b=jPG7//7Md6en54jPsNqc0BZggXz/Nz2IGEeXX64336FtjFZ1M4zABhUHMJQ3c80VMz XcitsbnTDZd6d/rE9wBqlfep12hXaS74dye2N93Dr7TUQkLJCrthT6gLIiBo7W71A0VW 1O3g+n8qi2Nc/UjZ14zFc3BiMOB4muJ469DNYxXvEpqKpHoqwUYhPmEANQ3hLwJXti8F 4G1A2bxAS6ohFr1w+mVgvb6lFPBpsTb6ZcSsbcWklpDwCn60gr4hCN58/hX0MV+Q9UcP J3EmeW/1dnmPmIjCdl6tDcC3oFh/u97TUpKxGmOAt7hquZlY82Kup5HLLLZsxvwEk7RK 8K0Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=pJyOcOKILBJdWGFJmI1yQoUhibY5AAXXh2Y4ApfaMWE=; b=goWVSM/o92FBx4O7BlRTTozCYUof8mu/M5Bw56ovWU5lp4e9G+tyrjmZ+tKXfJP2Bb G9oqEKcc9MuFSnH6HB3OlCbrhKZmu5DaYkt1Bg5dmV/3KSG5tI+MMHupzRnwbtRyJ5bD Bx2yaqZI1aUqIarHEJNFm5Wp9ljLWd+U8U1iZe/MviTwiTNNWhSdiLma3vWjQuYmPoer RQo4kzefnPAPlsIR8emOB6RqxrkS4V7CNA7jrGc9825XOs4pI+yVOK83NEVfS2lh1d6L smYXWUlsaWmsmyebOrLHp1+fM/69ZLe/w3WZanOYO/FHPRDaCBxxS8OGW8q4uez4By6S HZjA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b="AZD7OG/+"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id p3si2688152edh.537.2020.10.03.00.08.06; Sat, 03 Oct 2020 00:08:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b="AZD7OG/+"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725767AbgJCHFt (ORCPT + 99 others); Sat, 3 Oct 2020 03:05:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38670 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725446AbgJCHFs (ORCPT ); Sat, 3 Oct 2020 03:05:48 -0400 Received: from mail-ej1-x644.google.com (mail-ej1-x644.google.com [IPv6:2a00:1450:4864:20::644]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 73F09C0613D0; Sat, 3 Oct 2020 00:05:48 -0700 (PDT) Received: by mail-ej1-x644.google.com with SMTP id u8so4791375ejg.1; Sat, 03 Oct 2020 00:05:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=pJyOcOKILBJdWGFJmI1yQoUhibY5AAXXh2Y4ApfaMWE=; b=AZD7OG/+LqTNmUAFpueGqp7AJ+zOuQmlmqEWDJbKcdCXjxTGHZL0KcgszQ0z7ovAMH ev0DM9g6Glb8goNvE5YELOhqxd129DqHqlE77+jbBfzVXQct2096+NQoj6DxqvDFWLQT SvXBnSzok9IU/IX6ccM2eOQdWzd8hayx6FPtcdc2BVmmo+c8wx9vDb8ED6cdibU9sC02 sxGU7cO7cGbNTONruIufd2N04t7RV+a46Q1picwtxhO6mxjz2mCrHFKWVLaAwYgKhekK ygt7OFtNEmcG9tVQntO8KuG9S7IllexJ8G1qGP7JCalsR5GY4mK/puK0KvRby6s8p9og Dvkg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=pJyOcOKILBJdWGFJmI1yQoUhibY5AAXXh2Y4ApfaMWE=; b=f+4FjSn10Yy87MeyvGOKRhgDNxWoILTtVNLbqYTLzPo6f2WnXYoxFFQYFxvhQM8iiK hbGPp/0n01Gk9vfw/+da1bU/S9eQZLyv5NUWF07og8WwEp0QZIWMNahz1SdRvDysNcum 65dQymbGyE5eQ8y4h2fe6UncDFt/PHupN+z7HYLviYF5kME/YQicYDoVqJXnXznvYcrW mNeTpChEd/J74iOT6FsCZm8odFo220v0s5LecPZwpmVBStvn+0u0ANwpfElmU3S90JU6 3QU1hP1bTZW62paO8ByVDdgG2j1+TCIXtlI3xf5NDWaOytt1FcJQ84Jy5fiWKAw3jQRE Nn2g== X-Gm-Message-State: AOAM530hn4k5O/B9gxzvSQd/LHJ7ZJjHkCE5gAyH24MrWKOwi35Jf/Bd vHl5BWrkabpJmtT4FyEU0K6YGEsa737Df5p/RUo= X-Received: by 2002:a17:906:30c5:: with SMTP id b5mr5655000ejb.98.1601708747130; Sat, 03 Oct 2020 00:05:47 -0700 (PDT) MIME-Version: 1.0 References: <20200930070537.30982-5-haifeng.zhao@intel.com> <20201002172913.GA2809822@bjorn-Precision-5520> In-Reply-To: <20201002172913.GA2809822@bjorn-Precision-5520> From: Ethan Zhao Date: Sat, 3 Oct 2020 15:05:35 +0800 Message-ID: Subject: Re: [PATCH v6 4/5] PCI: only return true when dev io state is really changed To: Bjorn Helgaas Cc: Ethan Zhao , Bjorn Helgaas , Oliver , ruscur@russell.cc, Lukas Wunner , Andy Shevchenko , Stuart Hayes , Alexandru Gagniuc , Mika Westerberg , linux-pci , Linux Kernel Mailing List , ashok.raj@linux.intel.com, Sathyanarayanan Kuppuswamy , Sinan Kaya Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Bjorn, On Sat, Oct 3, 2020 at 1:29 AM Bjorn Helgaas wrote: > > [+cc Sinan] > > On Wed, Sep 30, 2020 at 03:05:36AM -0400, Ethan Zhao wrote: > > When uncorrectable error happens, AER driver and DPC driver interrupt > > handlers likely call > > > > pcie_do_recovery() > > ->pci_walk_bus() > > ->report_frozen_detected() > > > > with pci_channel_io_frozen the same time. > > If pci_dev_set_io_state() return true even if the original state is > > pci_channel_io_frozen, that will cause AER or DPC handler re-enter > > the error detecting and recovery procedure one after another. > > The result is the recovery flow mixed between AER and DPC. > > So simplify the pci_dev_set_io_state() function to only return true > > when dev->error_state is changed. > > > > Signed-off-by: Ethan Zhao > > Tested-by: Wen Jin > > Tested-by: Shanshan Zhang > > Reviewed-by: Alexandru Gagniuc > > Reviewed-by: Andy Shevchenko > > --- > > Changnes: > > v2: revise description and code according to suggestion from Andy. > > v3: change code to simpler. > > v4: no change. > > v5: no change. > > v6: no change. > > > > drivers/pci/pci.h | 37 +++++-------------------------------- > > 1 file changed, 5 insertions(+), 32 deletions(-) > > > > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h > > index 455b32187abd..f2beeaeda321 100644 > > --- a/drivers/pci/pci.h > > +++ b/drivers/pci/pci.h > > @@ -359,39 +359,12 @@ struct pci_sriov { > > static inline bool pci_dev_set_io_state(struct pci_dev *dev, > > pci_channel_state_t new) > > { > > - bool changed = false; > > - > > device_lock_assert(&dev->dev); > > - switch (new) { > > - case pci_channel_io_perm_failure: > > - switch (dev->error_state) { > > - case pci_channel_io_frozen: > > - case pci_channel_io_normal: > > - case pci_channel_io_perm_failure: > > - changed = true; > > - break; > > - } > > - break; > > - case pci_channel_io_frozen: > > - switch (dev->error_state) { > > - case pci_channel_io_frozen: > > - case pci_channel_io_normal: > > - changed = true; > > - break; > > - } > > - break; > > - case pci_channel_io_normal: > > - switch (dev->error_state) { > > - case pci_channel_io_frozen: > > - case pci_channel_io_normal: > > - changed = true; > > - break; > > - } > > - break; > > - } > > - if (changed) > > - dev->error_state = new; > > - return changed; > > + if (dev->error_state == new) > > + return false; > > + > > + dev->error_state = new; > > + return true; > > } > > IIUC this changes the behavior of the function, but it's difficult to > analyze because it does a lot of simplification at the same time. > > Please consider the following, which is intended to simplify the > function while preserving the behavior (but please verify; it's been a > long time since I looked at this). Then maybe see how your patch > could be done on top of this? > > Alternatively, come up with your own simplification patch + the > functionality change. > > > commit 983d9b1f8177 ("PCI/ERR: Simplify pci_dev_set_io_state()") > Author: Bjorn Helgaas > Date: Tue May 19 12:28:57 2020 -0500 > > PCI/ERR: Simplify pci_dev_set_io_state() > > Truth table: > > requested new state > current ------------------------------------------ > state normal frozen perm_failure > ------------ + ------------- ------------- ------------ > normal | normal frozen perm_failure > frozen | normal frozen perm_failure > perm_failure | perm_failure* perm_failure* perm_failure > > * "not changed", returns false > > No functional change intended. > > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h > index 6d3f75867106..81408552f7c9 100644 > --- a/drivers/pci/pci.h > +++ b/drivers/pci/pci.h > @@ -358,39 +358,21 @@ struct pci_sriov { > static inline bool pci_dev_set_io_state(struct pci_dev *dev, > pci_channel_state_t new) > { > - bool changed = false; > - > device_lock_assert(&dev->dev); > - switch (new) { > - case pci_channel_io_perm_failure: > - switch (dev->error_state) { > - case pci_channel_io_frozen: > - case pci_channel_io_normal: > - case pci_channel_io_perm_failure: > - changed = true; > - break; > - } > - break; > - case pci_channel_io_frozen: > - switch (dev->error_state) { > - case pci_channel_io_frozen: > - case pci_channel_io_normal: > - changed = true; > - break; > - } > - break; > - case pci_channel_io_normal: > - switch (dev->error_state) { > - case pci_channel_io_frozen: > - case pci_channel_io_normal: > - changed = true; > - break; > - } > - break; > + > + /* Can always put a device in perm_failure state */ > + if (new == pci_channel_io_perm_failure) { > + dev->error_state == pci_channel_io_perm_failure; > + return true; > } > - if (changed) > - dev->error_state = new; > - return changed; > + > + /* If already in perm_failure, can't set to normal or frozen */ > + if (dev->error_state == pci_channel_io_perm_failure) > + return false; This line goes to the first. > + > + /* Can always change normal to frozen or vice versa */ > + dev->error_state = new; > + return true; > } Per code grepping, there are two kinds of cases pci_dev_set_io_state() are called. 1. doesn't care about the return value, just set it to pci_chnnel_io_perm_failure. and later handling depends on dev->error_state, whatever the value changed or not. pciehp_unconfigure_device() -> pci_walk_bus() -> pci_dev_set_disconnected() ->pci_dev_set_io_state() with pci_chnnel_io_perm_failure 2. cares about the return value or the dev->error_state is changed or not. and the later handling also depends on if the value of dev->error_state is changed or not. pcie_do_recovery() ->pci_walk_bus() ->report_frozen_detected() ->report_error_detected() ->pci_dev_set_io_state() with pci_channel_io_frozen. This case, if the original value was pci_channel_io_frozen, no need to set it again and say it is *changed*. ----- return true only it is really changed. pcie_do_recovery() ->pci_walk_bus() ->report_resume() ->pcI_dev_set_ios_state() with pci_channel_io_normal. The same. pcie_do_recovery() ->pci_walk_bus() ->report_normal_detected() with pci_channel_io_normal The same. I will combat the 'Truth Table' to keep the original functions and let it return true only when it really changes to avoid re-enter the later handling process. Will send v7 for your review. Thanks, Ethan > > static inline int pci_dev_set_disconnected(struct pci_dev *dev, void *unused)