Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp4285507imu; Mon, 7 Jan 2019 20:02:53 -0800 (PST) X-Google-Smtp-Source: ALg8bN7nRTNV/uvO9TCfpHzuxgPpgtrAK+Jv9EnCw0F2psQdaZg6dLRzZSP/kIuWn8iijVkjhPW4 X-Received: by 2002:a62:1e45:: with SMTP id e66mr192140pfe.152.1546920173725; Mon, 07 Jan 2019 20:02:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1546920173; cv=none; d=google.com; s=arc-20160816; b=Q4/16yHiQEpjjhgMs4yvz1PnBgkLWS39qHylwjdCtu4xY53KUAsHaMufVWFK/w6PVI kHC1I3zHv/TeKa+7pqJjt9L3lhktRvLZvnNKb3M4ZPg62VRPIaZp9iMJse2MbXPZNKme Bvp5JfWKBicBKwA88OgH6kE37bCsfKZTtP7T4LQQc6qdkpo55cNy3Hwa18IfzmNIEqIu QtQUTRMyUDLrKHyqNWViDj/8qt7/NkhZ4KqYHzJ881RWBT1VOq8YKPss8MeJCa+iTAfT rcaSW2sXhb9HUQfznR1VR3IovAzE9YVeUCz/6kUkiRu/OAKaz25wdShBXJapx7ZxcqIj QMFQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=hI3HFiiQiOBTLixuPxZ9I+eYojMS0YjMnMKMa+vFZv0=; b=xyYiLnWluepSBEW2w+L+WBWLeyR0yDSXuyf/Po+/eYwHuTk55pTPZGkqhLWl3vnzV9 Y4J0CEMCzgTPo1O5UtJbQMGdlLAG301OsQjymRD2PcmN54XH0YZm9+9etm+iAaBs0/YY yAx2IBOdn/Z2dvOFRBawFVvosyJN3tUQl+a5Z3XOXibFay+D4ACgQ7EfXbqGtJOQQmGg Byagsv0HQVZJST3XzGYZfWgr2iNrRtkw7bldsQjl2YtpciXcrN2Rcc3FVf5XKxObuBzF MuXpUmRSVSw20hrNhwXafeCLv0WPBYluPzFXYlBqGyG9wC8LmumiIJtCZVf50WzD8ceC ti/A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ziepe.ca header.s=google header.b=aqmzHBbv; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d66si9156431pfg.36.2019.01.07.20.02.37; Mon, 07 Jan 2019 20:02:53 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@ziepe.ca header.s=google header.b=aqmzHBbv; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727446AbfAHEBc (ORCPT + 99 others); Mon, 7 Jan 2019 23:01:32 -0500 Received: from mail-pg1-f196.google.com ([209.85.215.196]:36871 "EHLO mail-pg1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727295AbfAHEBc (ORCPT ); Mon, 7 Jan 2019 23:01:32 -0500 Received: by mail-pg1-f196.google.com with SMTP id c25so1111799pgb.4 for ; Mon, 07 Jan 2019 20:01:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=hI3HFiiQiOBTLixuPxZ9I+eYojMS0YjMnMKMa+vFZv0=; b=aqmzHBbvoLVOt7Kzd0jPRPAXuG0MwgDzU65fDYLm0mvdSLfTYsGF/j5wjnjs66hAMy 3SkmSAYSBAEMLq/Z1oHm/cHUXTbrjipWjnnT4yig3mAZdNbXf1UgnVbG6P+VV2gb8SgI hN5fF0DVkSi1e8TLsQV6Bo5BcDEODI8r7HNLU98uaeNKI6iymOkd7Hn6us+oGui1cjkS 8hlxeLMeV8tZvOxTgTSIDOeJhKtYX7Rn2A2ybKszGIbWZRspawStiEfEs78uThE9JKya b0Kjx5Z2Frg2AJUyLdloUVN0BGGOTuzFi/HPKdxI/pHTE96zgHlCzbtTJ9d2yehubXR8 PypQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=hI3HFiiQiOBTLixuPxZ9I+eYojMS0YjMnMKMa+vFZv0=; b=LnlXt5qCE5JvBY8AFNz0EPwXIAcuyNY678B2qwflUpFIOex2Hq8duH8QZbAwtb7gWV /0Om5Bw52G6tec0y8F5Jv2M0il8TnRw+QuVoX85LEtcQ41KH5aOJyjrkjl2ZthvQF+jy MXxFdJy6QBTzo6IL0LR2aD0JEfwqPMekhn39hpJSs+ayN4WkuNj6LPdZgeABZEsAatrU c6mVtedtzO1jCPZ8nAEX/LYWI0hNC8+N1eFU5qU7zmXkOb2lrnFp2Q+WIqeF61/ABvHm NQXllF37GNsxA7KwcJqenEU1O99pPUcZJrMMtlI22ugUFz5qwu5bUmjLKOqAlk+DJim3 h9SA== X-Gm-Message-State: AJcUukeewdhY/1AN18+8NdXiAaXzHrS8GHnF/yHte6JiGCM48LMaDTzu YMLlNKO0OlYJPpAFVoN+ZyDduA== X-Received: by 2002:a63:b24a:: with SMTP id t10mr135492pgo.223.1546920090935; Mon, 07 Jan 2019 20:01:30 -0800 (PST) Received: from ziepe.ca (S010614cc2056d97f.ed.shawcable.net. [174.3.196.123]) by smtp.gmail.com with ESMTPSA id g15sm298482974pfj.131.2019.01.07.20.01.29 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 07 Jan 2019 20:01:30 -0800 (PST) Received: from jgg by mlx.ziepe.ca with local (Exim 4.90_1) (envelope-from ) id 1ggia1-0007HO-C7; Mon, 07 Jan 2019 21:01:29 -0700 Date: Mon, 7 Jan 2019 21:01:29 -0700 From: Jason Gunthorpe To: Benjamin Herrenschmidt Cc: David Gibson , Leon Romanovsky , davem@davemloft.net, saeedm@mellanox.com, ogerlitz@mellanox.com, tariqt@mellanox.com, bhelgaas@google.com, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, netdev@vger.kernel.org, alex.williamson@redhat.com, linux-pci@vger.kernel.org, linux-rdma@vger.kernel.org, sbest@redhat.com, paulus@samba.org Subject: Re: [PATCH] PCI: Add no-D3 quirk for Mellanox ConnectX-[45] Message-ID: <20190108040129.GE5336@ziepe.ca> References: <20181206041951.22413-1-david@gibson.dropbear.id.au> <20181206064509.GM15544@mtr-leonro.mtl.com> <20190104034401.GA2801@umbus.fritz.box> <20190105175116.GB14238@ziepe.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Jan 06, 2019 at 09:43:46AM +1100, Benjamin Herrenschmidt wrote: > On Sat, 2019-01-05 at 10:51 -0700, Jason Gunthorpe wrote: > > > > > Interesting. I've investigated this further, though I don't have as > > > many new clues as I'd like. The problem occurs reliably, at least on > > > one particular type of machine (a POWER8 "Garrison" with ConnectX-4). > > > I don't yet know if it occurs with other machines, I'm having trouble > > > getting access to other machines with a suitable card. I didn't > > > manage to reproduce it on a different POWER8 machine with a > > > ConnectX-5, but I don't know if it's the difference in machine or > > > difference in card revision that's important. > > > > Make sure the card has the latest firmware is always good advice.. > > > > > So possibilities that occur to me: > > > * It's something specific about how the vfio-pci driver uses D3 > > > state - have you tried rebinding your device to vfio-pci? > > > * It's something specific about POWER, either the kernel or the PCI > > > bridge hardware > > > * It's something specific about this particular type of machine > > > > Does the EEH indicate what happend to actually trigger it? > > In a very cryptic way that requires manual parsing using non-public > docs sadly but yes. From the look of it, it's a completion timeout. > > Looks to me like we don't get a response to a config space access > during the change of D state. I don't know if it's the write of the D3 > state itself or the read back though (it's probably detected on the > read back or a subsequent read, but that doesn't tell me which specific > one failed). If it is just one card doing it (again, check you have latest firmware) I wonder if it is a sketchy PCI-E electrical link that is causing a long re-training cycle? Can you tell if the PCI-E link is permanently gone or does it eventually return? Does the card work in Gen 3 when it starts? Is there any indication of PCI-E link errors? Everytime or sometimes? POWER 8 firmware is good? If the link does eventually come back, is the POWER8's D3 resumption timeout long enough? If this doesn't lead to an obvious conclusion you'll probably need to connect to IBM's Mellanox support team to get more information from the card side. Jason