Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp75804pxa; Fri, 31 Jul 2020 06:56:02 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzBwjaODf4Ia6AO2TOMtruCmRNWDVQeFxj9KKXKVmw6bHaOOZeK/1FU1VqYoaMkGAq4eHpw X-Received: by 2002:a05:6402:2d7:: with SMTP id b23mr4050570edx.145.1596203762410; Fri, 31 Jul 2020 06:56:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1596203762; cv=none; d=google.com; s=arc-20160816; b=bbuix/z5ERjFGeHe+X5kN/yv/YDA2raLT4lureVfZuq97JTPf4nxKGa9dHA3DhZcu1 SlUsRDP6UeV35O3wT2luT7/bJuDlXCQPMLD1lRgKjO6Vdo6ou4eYuQf12BUIo8ZfoV6E W2muLYOEvnvmTUAG0myo4M7xlJ2mMiEIaiOVSdYHdyvTE7RnDIXUzST720uJYXqcai+6 bqM9kugRSQNAR71BWf8/BGzd6e7ON+MLrJIC9U0s1PPSHKvMnMOF8dVhCEui7H/o+pLG PTErqP30++qRh2LkZX0dJ0XILe9h/6chJluMyEB4dV6IGRY9ilC7lqzvB/wrdnndq9OY M+6Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:message-id:subject:cc:to:from:date:dkim-signature; bh=jLzojmXb5+iLxiWWwH5CnbkxWwVbiBJfY95fBISaAek=; b=hFJyNDo+fn6kK3KE9drkCiRf6oG4QZlpVVTs0k7TZbUetYnRKe/hpJP37GqiGInSri w5LSB1lvrIGhdE9b/mZ+w0loLGrNiswpFvBm5hcKCEopT7gSTGN28oe8ziQdgyXk5hMm UlnGgaGdtd1PrhAZ0Xf1gOkpyUVXn/ha0ZIa4cOcCgzR12e2eEW26H+Hptf994CuIwTw /rqVIxExdmj6Jt5ybAIAMSzEPbtQJlFrjisQ3Ph9LwKeoshHAjHuVopAtc2wzHzxFpwk ZJ9y56v9/UjzI2Gg+SiGUYH0hg9sZUjv4lhCUFqi2HWoX4x6KuaxARRwi1rwUHyOuBC8 NJTg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=Mmszk77K; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bo27si5326762edb.0.2020.07.31.06.55.38; Fri, 31 Jul 2020 06:56:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=Mmszk77K; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731564AbgGaNz0 (ORCPT + 99 others); Fri, 31 Jul 2020 09:55:26 -0400 Received: from mail.kernel.org ([198.145.29.99]:56320 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730706AbgGaNz0 (ORCPT ); Fri, 31 Jul 2020 09:55:26 -0400 Received: from localhost (mobile-166-175-186-42.mycingular.net [166.175.186.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 2CF0C206DA; Fri, 31 Jul 2020 13:55:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1596203725; bh=vYmjK8mMYkxQN9Q5kjSg6X+4qsltfw1XYti+2tjxa/4=; h=Date:From:To:Cc:Subject:In-Reply-To:From; b=Mmszk77KtX/XuyJPvUeBF+uSMjC3wMTXVesDiMr1rqTPEibn4GlNI1dHLpdRa7Sl+ K+gM6afW6Fp5XeCSnC07IMsYqculLdCzaaTRS86Mi/gu9fmp8W75ncG97tp7g1TtvI VsUEIYwFaaR3pr2CWtVYygMUe62iVp1ky5BhBHvc= Date: Fri, 31 Jul 2020 08:55:23 -0500 From: Bjorn Helgaas To: "Saheed O. Bolarinwa" Cc: Mike Marciniszyn , Dennis Dalessandro , Doug Ledford , Jason Gunthorpe , bjorn@helgaas.com, skhan@linuxfoundation.org, linux-kernel-mentees@lists.linuxfoundation.org, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, "Michael J. Ruhl" , Ashutosh Dixit , Ian Kumlien , Puranjay Mohan Subject: Re: [PATCH v4 01/12] IB/hfi1: Check if pcie_capability_read_*() reads ~0 Message-ID: <20200731135523.GA3717@bjorn-Precision-5520> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200731110240.98326-2-refactormyself@gmail.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [+cc Michael, Ashutosh, Ian, Puranjay] On Fri, Jul 31, 2020 at 01:02:29PM +0200, Saheed O. Bolarinwa wrote: > On failure pcie_capability_read_dword() sets it's last parameter, > val to 0. In this case dn and up will be 0, so aspm_hw_l1_supported() > will return false. > However, with Patch 12/12, it is possible that val is set to ~0 on > failure. This would introduce a bug because (x & x) == (~0 & x). So > with dn and up being 0x02, a true value is return when the read has > actually failed. > > Since, the value ~0 is invalid here, > > Reset dn and up to 0 when a value of ~0 is read into them, this > ensures false is returned on failure in this case. > > Suggested-by: Bjorn Helgaas > Signed-off-by: Saheed O. Bolarinwa > --- > > drivers/infiniband/hw/hfi1/aspm.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/infiniband/hw/hfi1/aspm.c b/drivers/infiniband/hw/hfi1/aspm.c > index a3c53be4072c..9605b2145d19 100644 > --- a/drivers/infiniband/hw/hfi1/aspm.c > +++ b/drivers/infiniband/hw/hfi1/aspm.c > @@ -33,13 +33,13 @@ static bool aspm_hw_l1_supported(struct hfi1_devdata *dd) > return false; > > pcie_capability_read_dword(dd->pcidev, PCI_EXP_LNKCAP, &dn); > - dn = ASPM_L1_SUPPORTED(dn); > + dn = (dn == (u32)~0) ? 0 : ASPM_L1_SUPPORTED(dn); > > pcie_capability_read_dword(parent, PCI_EXP_LNKCAP, &up); > - up = ASPM_L1_SUPPORTED(up); > + up = (up == (u32)~0) ? 0 : ASPM_L1_SUPPORTED(up); I don't want to change this. The driver shouldn't be mucking with ASPM at all. The PCI core should take care of this automatically. If it doesn't, we need to fix the core. If the driver needs to disable ASPM to work around device errata or something, the core has an interface for that. But the driver should not override the system-wide policy for managing ASPM. Ah, some archaeology finds affa48de8417 ("staging/rdma/hfi1: Add support for enabling/disabling PCIe ASPM"), which says: hfi1 HW has a high PCIe ASPM L1 exit latency and also advertises an acceptable latency less than actual ASPM latencies. That suggests that either there is a device defect, e.g., advertising incorrect ASPM latencies, or a PCI core defect, e.g., incorrectly enabling ASPM when the path exit latency exceeds that hfi1 can tolerate. Coincidentally, Ian recently debugged a problem in how the PCI core computes exit latencies over a path [1]. Can anybody supply details about the hfi1 ASPM parameters, e.g., the output of "sudo lspci -vv"? Any details about the configuration where the problem occurs? Is there a switch in the path? [1] https://lore.kernel.org/r/20200727213045.2117855-1-ian.kumlien@gmail.com > /* ASPM works on A-step but is reported as not supported */ > - return (!!dn || is_ax(dd)) && !!up; > + return (dn || is_ax(dd)) && up; > } > > /* Set L1 entrance latency for slower entry to L1 */ > -- > 2.18.4 >