Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp2100116imu; Sat, 5 Jan 2019 14:46:30 -0800 (PST) X-Google-Smtp-Source: ALg8bN6vN1bnZgxgusarWGZtvY2yhdqrSmXHkyrDVvirs4zau0wlzu9idt1z66R6OlulYOkQqtAT X-Received: by 2002:a63:4926:: with SMTP id w38mr5856355pga.353.1546728390318; Sat, 05 Jan 2019 14:46:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1546728390; cv=none; d=google.com; s=arc-20160816; b=ynXq+k9BfmvVpywQYW8MBkU46lGFZbYLSRrLnf+5QfHPuAPOnS2nCzScs8ICmeqJPr VfnGPSmoSJZISv7ryE1rIGKe3gmUg1SVnvFeAz9aIpPyIiJIolly641L7l0ICYrpJF0k fV2rAr+hPUN/XiyrUy+xnyfYV7fG+ShRpl/m9twVR3IzijF4y4mNYEdEIwmk8CefOc1R BRc5A5fiyUBVD8GgLA1EURvlkGaLe6rZmTUG11+odnVd2e/PI8RFyNUwNy9a0Ik3QlB/ 0i2oslchHfgBquEKAkUNtpPdOLhH90oNErg3ql+RypjaGiAVuaTVBN8nVU+DTL9Tyhjp 9N3A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:cc:to:from:subject :message-id; bh=kxP8NckrAfRzZ9vGhTU6dC2nOcZ9TrS00OrnHPA7dd4=; b=IPXO5ir8VTQPCUA8iBh4N0arcdgJMsNxg+feUFy4fg+rPMnxJbpjqzQqRfXfaD5orm QdyJaqSDVNnLGxLLO4NIJe44DL03tOHx3lt/XEeCRwhJORpJmVWaU3WuCM6It3RrL+n0 M0NtKcCzVMnZEviNJC3Y2nsTpjCJRxH8LFletwEveyCr0u/k+4g5XN2COu7b4O6MLNqB xM00WhYC5HWYOW9rL+HZnsmHWxf8ImA6/LpcGT1k2ZBjGhDJXgGIGJz+4RjRDta1F9Ma VFIT1TKj8hB3Wv5WxCq9BbdKxwSlZzO3wiVwH8a6hlDwN+wWYmoQWmDS5l7+nVlOJYgd pkGA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s27si4779803pgm.501.2019.01.05.14.46.12; Sat, 05 Jan 2019 14:46:30 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726386AbfAEWo4 (ORCPT + 99 others); Sat, 5 Jan 2019 17:44:56 -0500 Received: from gate.crashing.org ([63.228.1.57]:43209 "EHLO gate.crashing.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726335AbfAEWoz (ORCPT ); Sat, 5 Jan 2019 17:44:55 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by gate.crashing.org (8.14.1/8.14.1) with ESMTP id x05MhkSS014869; Sat, 5 Jan 2019 16:43:51 -0600 Message-ID: Subject: Re: [PATCH] PCI: Add no-D3 quirk for Mellanox ConnectX-[45] From: Benjamin Herrenschmidt To: Jason Gunthorpe , David Gibson Cc: Leon Romanovsky , davem@davemloft.net, saeedm@mellanox.com, ogerlitz@mellanox.com, tariqt@mellanox.com, bhelgaas@google.com, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, netdev@vger.kernel.org, alex.williamson@redhat.com, linux-pci@vger.kernel.org, linux-rdma@vger.kernel.org, sbest@redhat.com, paulus@samba.org Date: Sun, 06 Jan 2019 09:43:46 +1100 In-Reply-To: <20190105175116.GB14238@ziepe.ca> References: <20181206041951.22413-1-david@gibson.dropbear.id.au> <20181206064509.GM15544@mtr-leonro.mtl.com> <20190104034401.GA2801@umbus.fritz.box> <20190105175116.GB14238@ziepe.ca> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.30.3 (3.30.3-1.fc29) Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, 2019-01-05 at 10:51 -0700, Jason Gunthorpe wrote: > > > Interesting. I've investigated this further, though I don't have as > > many new clues as I'd like. The problem occurs reliably, at least on > > one particular type of machine (a POWER8 "Garrison" with ConnectX-4). > > I don't yet know if it occurs with other machines, I'm having trouble > > getting access to other machines with a suitable card. I didn't > > manage to reproduce it on a different POWER8 machine with a > > ConnectX-5, but I don't know if it's the difference in machine or > > difference in card revision that's important. > > Make sure the card has the latest firmware is always good advice.. > > > So possibilities that occur to me: > > * It's something specific about how the vfio-pci driver uses D3 > > state - have you tried rebinding your device to vfio-pci? > > * It's something specific about POWER, either the kernel or the PCI > > bridge hardware > > * It's something specific about this particular type of machine > > Does the EEH indicate what happend to actually trigger it? In a very cryptic way that requires manual parsing using non-public docs sadly but yes. From the look of it, it's a completion timeout. Looks to me like we don't get a response to a config space access during the change of D state. I don't know if it's the write of the D3 state itself or the read back though (it's probably detected on the read back or a subsequent read, but that doesn't tell me which specific one failed). Some extra logging in OPAL might help pin that down by checking the InA error state in the config accessor after the config write (and polling on it for a while as from a CPU perspective I don't knw if the write is synchronous, probably not). Cheers, Ben.