Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp391950imu; Tue, 8 Jan 2019 22:47:28 -0800 (PST) X-Google-Smtp-Source: ALg8bN7Nub1acGJNCVeV8SyfIkl3XcfT1bSk1E4jYQTll9+QYb17eHhfLkBFC87zG7PiwDcXoIKY X-Received: by 2002:a17:902:820d:: with SMTP id x13mr4989001pln.229.1547016448869; Tue, 08 Jan 2019 22:47:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547016448; cv=none; d=google.com; s=arc-20160816; b=ibP2+M+uezJXf4hzDLs6jOTvdZle/en0X93oSEtzyxvKhZtbO87n9J9+RxPG3Axtic tsfwwzsQJYluSWvU6/vqdFLzOx4C8m/7gOuH3YAVr3gH/DWLagn8KNU1+Y7exJRfOYVu iIQLzdFxY0lSsepBme/fXj4dXocDNPR9zYiXAO3xnCBRT70fjSYykCNAIHUShI5kad6L wU7NS905Qgsz8baYgdKYWfu9czAeHgYxC6d0knM1r0EZnO+JEXCZZj33QPT4OK2zzigh 8IEDKjUmkSk4hgxw8rV0jgVxISp47g0NpEqjIDKgLcpyfPs/RkQvrGXFJX53Vt3/DlaW xdPQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:cc:to:from:subject :message-id; bh=WM+bNjoKiktR18rvHOL+UHuXoARempzMIXGAVCQ7N8I=; b=PtR287uI+j2h8zlmtgYYyTXuJwqm7m53q7nhwGw6Xd47krsmLnCgqlY0s1+xkxISMQ 9kIJYGKEqh/0Rb/A2UpDCkxS35TF77fG6knV5kLAxJNFSW1uvXf6dN8ugJFh7NEBhADS zhTl+VXnV4kaWy/4kgqeWP666Uy+dhk17Lro7D8v8m8lbvj0Cd7o9ae70LPEzAYvjctt 5VoqTfw8RvXAhcQFCx0ac+aL7rkXiZ96RgZ16M4zBoe5O/SDIbTRbypMgf9jwz1MwRfC Z4BYmwbhfQefkYCV3VKnDfhjvFJddbXf2pD20i6Im01YhZ8EGbevM+Ns8eXtHypZmlA3 WCvw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 187si10247037pfz.249.2019.01.08.22.47.13; Tue, 08 Jan 2019 22:47:28 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729680AbfAIGJ4 (ORCPT + 99 others); Wed, 9 Jan 2019 01:09:56 -0500 Received: from gate.crashing.org ([63.228.1.57]:44895 "EHLO gate.crashing.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725440AbfAIGJ4 (ORCPT ); Wed, 9 Jan 2019 01:09:56 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by gate.crashing.org (8.14.1/8.14.1) with ESMTP id x09592Cr029143; Tue, 8 Jan 2019 23:09:04 -0600 Message-ID: <012d24d58a542ed44c8af9f517f1bd61ab912037.camel@kernel.crashing.org> Subject: Re: [PATCH] PCI: Add no-D3 quirk for Mellanox ConnectX-[45] From: Benjamin Herrenschmidt To: Jason Gunthorpe Cc: David Gibson , Leon Romanovsky , davem@davemloft.net, saeedm@mellanox.com, ogerlitz@mellanox.com, tariqt@mellanox.com, bhelgaas@google.com, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, netdev@vger.kernel.org, alex.williamson@redhat.com, linux-pci@vger.kernel.org, linux-rdma@vger.kernel.org, sbest@redhat.com, paulus@samba.org, Alexey Kardashevskiy Date: Wed, 09 Jan 2019 16:09:02 +1100 In-Reply-To: <20190108040129.GE5336@ziepe.ca> References: <20181206041951.22413-1-david@gibson.dropbear.id.au> <20181206064509.GM15544@mtr-leonro.mtl.com> <20190104034401.GA2801@umbus.fritz.box> <20190105175116.GB14238@ziepe.ca> <20190108040129.GE5336@ziepe.ca> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.30.3 (3.30.3-1.fc29) Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 2019-01-07 at 21:01 -0700, Jason Gunthorpe wrote: > > > In a very cryptic way that requires manual parsing using non-public > > docs sadly but yes. From the look of it, it's a completion timeout. > > > > Looks to me like we don't get a response to a config space access > > during the change of D state. I don't know if it's the write of the D3 > > state itself or the read back though (it's probably detected on the > > read back or a subsequent read, but that doesn't tell me which specific > > one failed). > > If it is just one card doing it (again, check you have latest > firmware) I wonder if it is a sketchy PCI-E electrical link that is > causing a long re-training cycle? Can you tell if the PCI-E link is > permanently gone or does it eventually return? No, it's 100% reproducable on systems with that specific card model, not card instance, and maybe different systems/cards as well, I'll let David & Alexey comment further on that. > Does the card work in Gen 3 when it starts? Is there any indication of > PCI-E link errors? Nope. > Everytime or sometimes? > > POWER 8 firmware is good? If the link does eventually come back, is > the POWER8's D3 resumption timeout long enough? > > If this doesn't lead to an obvious conclusion you'll probably need to > connect to IBM's Mellanox support team to get more information from > the card side. We are IBM :-) So far, it seems to be that the card is doing something not quite right, but we don't know what. We might need to engage Mellanox themselves. Cheers, Ben.