Received: by 10.223.176.46 with SMTP id f43csp4088852wra; Tue, 23 Jan 2018 04:10:30 -0800 (PST) X-Google-Smtp-Source: AH8x224/ez7P2GnIt4uBhtXJxb7TqCUektS9a5Kn/oNrhT1MidX2Uy0rXrUYtSydWnTYHIHlt5BF X-Received: by 2002:a17:902:3181:: with SMTP id x1-v6mr5417679plb.361.1516709430473; Tue, 23 Jan 2018 04:10:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516709430; cv=none; d=google.com; s=arc-20160816; b=ZffYtMfExiLAX/xrk+kcBSPZwbqvQERzvGc7lqjbm04FUfW5ZswXfCiJ5QysxxvDPA bzw4PXZQK/jwEu3ys+pU/ryZVipClwIh/Wlh7s7WObJhYPLwslOQugaVgNd/Cz/hPm8b NzBKL5g27Xj8GEbfAFOXinnuHpRj80Cyk58++0VdtSPNHzMV/JHxwdThONZZsgUAUUaK 0xckWURfcmloWB0BbQG1yH27fJgcz8fwj5H2ywlpt6luAdaC1+ozme2d0SBiYCPQWe0Y LaSTVpD7//s3fCxltyTGVqCL4LJqCUV3hMd2twJ2XdtYS881Xtfi5URnODhaUj7HWjNY O8AA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :arc-authentication-results; bh=QmcBfiYtAbS/NxD3WdninGHhVE9geRWK04fF0fFq/t4=; b=g972XB3EcHAUXzFaoARjCS0LhpJbTUC73pY2BmJsK4uz5zZMKbbH3L2t1zJ8J1klTX jlAAvCMWWeDhdyM74rrkxnV96/ilBToK/ocadR3V2WGsmdTlTYGE+vQtKA075jJjHViZ a1G3+yEpvEv5Oi47TZm3mxC026gRByDani3BjPvO6ANIf4WGEakUdHX8+9ju8y7dKwo/ 6CdCYQsdsgmF42zgMqbngCfrGm3faab1E14d7ZhGTyqTV0mEUmqwtT6QXSk+ROXemUf0 vkKQJtp6rUld1bNWNVv25fttT28TV6kW3zsG41c4pEQIAKFwkn3z57vTeBxZeQqqasKS hysw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m4-v6si4658579plt.44.2018.01.23.04.10.15; Tue, 23 Jan 2018 04:10:30 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751224AbeAWMJu (ORCPT + 99 others); Tue, 23 Jan 2018 07:09:50 -0500 Received: from stargate.chelsio.com ([12.32.117.8]:44671 "EHLO stargate.chelsio.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751086AbeAWMJs (ORCPT ); Tue, 23 Jan 2018 07:09:48 -0500 Received: from v4.blr.asicdesigners.com (v4.blr.asicdesigners.com [10.193.186.237]) by stargate.chelsio.com (8.13.8/8.13.8) with ESMTP id w0NC9WQ6016407; Tue, 23 Jan 2018 04:09:33 -0800 From: Arjun Vynipadath To: bhelgaas@google.com, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, davem@davemloft.net, netdev@vger.kernel.org Cc: arjun@chelsio.com, leedom@chelsio.com, santosh@chelsio.com, ganeshgr@chelsio.com, nirranjan@chelsio.com, kumaras@chelsio.com, swise@opengridcomputing.com Subject: [REGRESSION, bisect] pci: cxgb4 probe fails after commit 104daa71b3961434 ("PCI: Determine actual VPD size on first access") Date: Tue, 23 Jan 2018 17:59:09 +0530 Message-Id: <1516710549-26660-1-git-send-email-arjun@chelsio.com> X-Mailer: git-send-email 2.3.5 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Sending on behalf of "Casey Leedom " Way back on April 11, 2016 we reported a regression in Linux kernel 4.6-rc2 brought on by kernel.org commit 104daa71b396. This commit calculates the size of a PCI Device's VPD area by parsing the VPD Structure at offset 0x000, and restricts accesses to the VPD to that computed size. Our devices have a second VPD structure which is located starting at offset 0x400 which is the "real" VPD[1]. The 104daa71b396 commit (plus a follow on commit 408641e93aa5) caused efforts to read past the end of that computed length of the VPD to return silently without error leaving stack junk in the VPD read buffers. We introduced kernel.org commit cb92148b to allow a driver to tell the kernel how large the VPD area really is, introducing a new API pci_set_vpd_size() for this purpose. Now we've discovered a new subtlety to the problem. We have a KVM Hypervisor running a 4.9.70 kernel. So it has all of the above commits. When we attach our Physical Function 4 to a Virtual Machine and attempt to run cxgb4 in that VM, we see the problem again. The issue is that all of the VM Guest OS's efforts to access the PCIe VPD Capability are trapped into the KVM 4.9.70 kernel and executed there, with the results routed back to the VM Guest OS. The cxgb4 driver in the VM Guest OS uses the new pci_set_vpd_size() to notify the OS of the true size of the VPD, but that information of course is never sent to the KVM 4.9.70 Hypervisor. (And, truth be told, if the Guest OS were older than 4.6, it wouldn't even know that it needed to do this.) The result is that again we get silent VPD read failures with random stack garbage in the VPD read buffers. (sigh) It strikes me that the only way to handle this issue is to have KVM circumvent the VPD-Size Restricted logic which was added in kernel.org commits 104daa71b396 and 408641e93aa5. Maybe via a __pci_read_vpd() or similar API. But we are open to other suggestions. Thoughts? Casey. [1] Chelsio adapters actually have two VPD structures stored in the VPD. An abbreviated on at Offset 0x0 and the complete VPD at Offset 0x400. The abbreviated one only contains the PN, SN and EC Keywords, while the complete VPD contains those plus various adapter constants contained in V0, V1, etc. And it also contains the Base Ethernet MAC Address in the "NA" Keyword which the cxgb4 driver needs when it can't contact the adapter firmware. (We don't have the "NA" Keyword in the VPD Structure at Offset 0x000 because that's not an allowed VPD Keyword in the PCI-E 3.0 specification.) Note that two other drivers look like they may also do something similar, the Broadcom bnx2x and tg3.