Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752777AbdHIMdo (ORCPT ); Wed, 9 Aug 2017 08:33:44 -0400 Received: from mail-bl2nam02on0134.outbound.protection.outlook.com ([104.47.38.134]:6502 "EHLO NAM02-BL2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751929AbdHIMdm (ORCPT ); Wed, 9 Aug 2017 08:33:42 -0400 From: Casey Leedom To: Bjorn Helgaas , Ding Tianhong CC: "ashok.raj@intel.com" , "bhelgaas@google.com" , Michael Werner , Ganesh GR , "asit.k.mallick@intel.com" , "patrick.j.cramer@intel.com" , "Suravee.Suthikulpanit@amd.com" , "Bob.Shaw@amd.com" , "l.stach@pengutronix.de" , "amira@mellanox.com" , "gabriele.paoloni@huawei.com" , "David.Laight@aculab.com" , "jeffrey.t.kirsher@intel.com" , "catalin.marinas@arm.com" , "will.deacon@arm.com" , "mark.rutland@arm.com" , "robin.murphy@arm.com" , "davem@davemloft.net" , "alexander.duyck@gmail.com" , "linux-arm-kernel@lists.infradead.org" , "netdev@vger.kernel.org" , "linux-pci@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linuxarm@huawei.com" Subject: Re: [PATCH v9 2/4] PCI: Disable PCIe Relaxed Ordering if unsupported Thread-Topic: [PATCH v9 2/4] PCI: Disable PCIe Relaxed Ordering if unsupported Thread-Index: AQHTDbsebWIiNeFcrECFxCC/v9rVKKJ7UUiAgAChD0w= Date: Wed, 9 Aug 2017 12:33:34 +0000 Message-ID: References: <1501917313-9812-1-git-send-email-dingtianhong@huawei.com> <1501917313-9812-3-git-send-email-dingtianhong@huawei.com>,<20170809022239.GP16580@bhelgaas-glaptop.roam.corp.google.com> In-Reply-To: <20170809022239.GP16580@bhelgaas-glaptop.roam.corp.google.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=leedom@chelsio.com; x-originating-ip: [24.130.148.141] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1;MWHPR12MB1437;6:7LBy40Ym4t2dPw64daLMTFW4u8b+v4ZBs8H9z5VhTjCldVEdkanMCYACOUWJ3HAZlg/rgKyk03VdoE9fNxEw7UnvObUdC2Mzw2BsEiSLT3b9iuD3E28ii5D71wGNIy/vUPToxbluzbXH0krxegI512rBEDTix0F8TZ0inUp/lFeVU7nEzfPgW4VLs9hz9IRnQt0urtqP6KPQQ3dPN5cRxkkQixh1J7dW4UrACKqk56kFIkjBFUnHYmWYKaaslGHLOfMqcMLr/OVTrt1sHlBDoKyjueed6++vZJ/xpNo2ovpAwE5kQZu8dU3UNkNiLKW0+TrDnqkERlSwFnVdlzc3CA==;5:WgPyp1+mvYfBY5JoWaQr2I/QPvQ/ZjmITLdqMPYe5xqDuA8ZH5CdKtxvn6ENMC908I6zatSX9Oi2zXs4G0hFPT3sOm+gBDAZ7FWXKT4JMOLUJES7yJuPcJZSuT9V9Pya503adRRzlG3CKfzOFH7dDQ==;24:RgRzpFKN09nhTkya0XSkkmDDk/4DzX2q4+IoUd3+auMQnwRWt/KbY0+BUHTsfD8BKp2n/cQfASjH9QWI8pv63sLHvUVafyrKlF8sviNtppg=;7:1uTtaybWlQtvNqq1cSwtZOec1eDETcLqnmMiRgdtviVlCHc/tt+ESUjk96qebLghmPMJcW1JL9y68esH7ZPslGcoZdLRNprn7L4mXmE2enkKW2+jOQP2oWYbdYf6zCgVSmcbQPDP6tgTWynMRpfZbpgrn0IVXXQqDB0123EPjmdSUxhaes6PJKe85ZVSyzVZur8JhbO925vEssys2k1RwGkqGXN0moTTZd1K++W8ZLw= x-ms-office365-filtering-correlation-id: d0dfe17d-f2e4-44ce-4aa4-08d4df22da6a x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(22001)(300000502095)(300135100095)(2017030254152)(2017082002075)(300000503095)(300135400095)(2017052603031)(201703131423075)(201702281549075)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095);SRVR:MWHPR12MB1437; x-ms-traffictypediagnostic: MWHPR12MB1437: x-exchange-antispam-report-test: UriScan:; x-microsoft-antispam-prvs: x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(601004)(2401047)(5005006)(8121501046)(93006095)(93001095)(100000703101)(100105400095)(10201501046)(3002001)(6041248)(20161123558100)(20161123562025)(2016111802025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123555025)(20161123564025)(20161123560025)(6072148)(6043046)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095);SRVR:MWHPR12MB1437;BCL:0;PCL:0;RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095);SRVR:MWHPR12MB1437; x-forefront-prvs: 0394259C80 x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(6009001)(39400400002)(39410400002)(39450400003)(39830400002)(189002)(199003)(377454003)(8666007)(3280700002)(189998001)(7696004)(55016002)(105586002)(106356001)(4326008)(305945005)(2900100001)(7736002)(39060400002)(2906002)(25786009)(53936002)(14454004)(99286003)(9686003)(68736007)(478600001)(6436002)(3660700001)(54906002)(97736004)(8676002)(3846002)(6506006)(66066001)(86362001)(102836003)(74316002)(2950100002)(101416001)(81166006)(38730400002)(6246003)(5660300001)(33656002)(229853002)(50986999)(6116002)(54356999)(81156014)(76176999)(5890100001)(8936002)(7416002)(77096006)(551934003);DIR:OUT;SFP:1102;SCL:1;SRVR:MWHPR12MB1437;H:MWHPR12MB1600.namprd12.prod.outlook.com;FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" MIME-Version: 1.0 X-OriginatorOrg: chelsio.com X-MS-Exchange-CrossTenant-originalarrivaltime: 09 Aug 2017 12:33:34.9394 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 065db76d-a7ae-4c60-b78a-501e8fc17095 X-MS-Exchange-Transport-CrossTenantHeadersStamped: MWHPR12MB1437 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by nfs id v79CXlh8019176 Content-Length: 4082 Lines: 85 | From: Bjorn Helgaas | Sent: Tuesday, August 8, 2017 7:22 PM | ... | and the caller should do something like this: | | if (pcie_relaxed_ordering_broken(pci_find_pcie_root_port(pdev))) | adapter->flags |= ROOT_NO_RELAXED_ORDERING; | | That way it's obvious where the issue is, and it's obvious that the | answer might be different for peer-to-peer transactions than it is for | transactions to the root port, i.e., to coherent memory. | ... Which is back to something very close to what I initially suggested in my first patch submission. Because you're right, this isn't about broken Source Devices, it's about broken Completing Devices. Unfortunately, as Alexander Duyck noted, in a Virtual Machine we won't be able to see our Root Port in order to make this determination. (And in some Hypervisor implementations I've seen, there's not even a synthetic Root Port available to the VM at all, let alone read-only access to the real one.) So the current scheme was developed of having the Hypervisor kernel traverse down the PCIe Fabric when it finds a broken Root Port implementation (the issue that we've mostly been primarily focused upon), and turning off the PCIe Capability Device Control[Relaxed Ordering Enable]. This was to serve two purposes: 1. Turn that off in order to prevent sending any Transaction Layer Packets with the Relaxed Ordering Attribute to any Completer. Which unfortunately would also prevent Peer-to-Peer use of the Relaxed Ordering Attribute. 2. Act as a message to Device Drivers for those downstream devices that the they were dealing with a broken Root Port implementation. And this would work even for a driver in a VM with an attached device since it would be able to see the PCIe Configuration Space for the attached device. I haven't been excited about any of this because: A. While so far all of the examples we've talked about are broken Root Port Completers, it's perfectly possible that other devices could be broken -- say an NVMe Device which is not "Coherent Memory". How would this fit into the framework APIs being described? B. I have yet to see a n example of how the currently proposed API framework would be used in a hybrid environment where TLPs to the Root Port would not use Relaxed Ordering, but TLPs to a Peer would use Relaxed Ordering. So far its all been about using a "big hammer" to completely disable the use of Relaxed Ordering. But the VM problem keeps cropping up over and over. A driver in a VM doesn't have access to the Root Port to determine if its on a "Black List" and our only way of communicating with the driver in the VM is to leave the device in a particular state (i.e. initialize the PCIe Capability Device Control[Relaxed Ordering Enable] to "off"). Oh, and also, on the current patch submission's focus on broken Root Port implementations: one could suggest that even if we're stuck with the "Device attached to a VM Conundrum", that what we should really be thinking about is if ANY device within a PCIe Fabric has broken Relaxed Ordering completion problems, and, if so, "poisoning" the entire containing PCIe Fabric by turning off Relaxed Ordering Enable for every device, up, down sideways -- including the Root Port itself. | ... | This associates the message with the Requester that may potentially | use relaxed ordering. But there's nothing wrong or unusual about the | Requester; the issue is with the *Completer*, so I think the message | should be in the quirk where we set PCI_DEV_FLAGS_NO_RELAXED_ORDERING. | Maybe it should be both places; I dunno. | | This implementation assumes the device only initiates transactions to | coherent memory, i.e., it assumes the device never does peer-to-peer | DMA. I guess we'll have to wait and see if we trip over any | peer-to-peer issues, then figure out how to handle them. | ... Yes, as soon as we want to implement the hybrid use of Relaxed Ordering I mentioned above. And that the Intel document mentions itself. Casey