Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754077AbdC2HPt (ORCPT ); Wed, 29 Mar 2017 03:15:49 -0400 Received: from mail-sn1nam02on0051.outbound.protection.outlook.com ([104.47.36.51]:35936 "EHLO NAM02-SN1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753496AbdC2HPr (ORCPT ); Wed, 29 Mar 2017 03:15:47 -0400 From: "Nath, Arindam" To: "'Joerg Roedel'" , "Deucher, Alexander" CC: "'Joerg Roedel'" , Bjorn Helgaas , "linux-pci@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Daniel Drake Subject: RE: [PATCH] PCI: Blacklist AMD Stoney GPU devices for ATS Thread-Topic: [PATCH] PCI: Blacklist AMD Stoney GPU devices for ATS Thread-Index: AQHSp70w+OLXANWfy0m3Sq7/yXBLO6GqsIHAgAADzgCAAACekIAABxQAgAABsSCAABeBgIAAkQOw Date: Wed, 29 Mar 2017 07:15:42 +0000 Message-ID: References: <1490703404-4944-1-git-send-email-joro@8bytes.org> <20170328202844.GQ8329@suse.de> <20170328205616.GR8329@suse.de> <20170328222627.GS8329@suse.de> In-Reply-To: <20170328222627.GS8329@suse.de> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: suse.de; dkim=none (message not signed) header.d=none;suse.de; dmarc=none action=none header.from=amd.com; x-originating-ip: [122.172.141.26] x-microsoft-exchange-diagnostics: 1;MWHPR12MB1663;7:R8l72zxIj0HSIj6FwIleuLBbupfGgbnAhVVV4VKvzlE1m6gZeHkipebyDZZtkHepXc1Ubu4HeZCnzhLgD8z/5RE77aDEKmnWXUZSV7doXP2aaaiVRMYyRxAe5EzWNxD4XfAU2OQRVqsLQ1CMsS3d6uaooVZfPxA7iLF3Eh3+lHhUw2UW5TD4KW6uwQNzCm8QoxETQzXso6tdpCRIBnTToQ0+VFuLkN8iBbejy7sYS2WBgj+M9r8VXjXDKJ2p9q+ny3kIeMhC5z6JuzpVE/FMKQqZfOySCuv35xqA36pAGfcYmLusBHCUmJcnAuuH/rXiM3T1+3Cna1DiDoyttVsHtg==;20:A5Ff6uR5CxTKkMYeFaJ0217LggDohGWzMmrLsVGgg+QHPRdL/LyvAPz5Cxajo/VmSH6nv1rQysvm7sGBcxe8jHqmzE4tng7uHq/DtkOxET6flLlo1To4ynkfox1RVUM1O9yLoEn6uh4MXkvKft0FuxswoZcsdQRMhucFokkAaeR8kBbZsR6cO4CVQwQ+U4ECzdTBJuzzfb8TUsh5QZksO2VZNOhYMtGrEep4IyAtED+hsAXpqYRfXBhIT7jnK2rO x-forefront-antispam-report: SFV:SKI;SCL:-1SFV:NSPM;SFS:(10009020)(6009001)(39850400002)(39410400002)(39860400002)(39400400002)(39840400002)(39450400003)(13464003)(377454003)(24454002)(53936002)(8676002)(81166006)(7696004)(6436002)(4326008)(6506006)(2950100002)(8936002)(6636002)(99286003)(38730400002)(9686003)(55016002)(33656002)(3660700001)(6246003)(86362001)(305945005)(7736002)(77096006)(5660300001)(66066001)(3280700002)(189998001)(93886004)(50986999)(54356999)(76176999)(74316002)(102836003)(3846002)(6116002)(2900100001)(229853002)(122556002)(2906002)(25786009);DIR:OUT;SFP:1101;SCL:1;SRVR:MWHPR12MB1663;H:MWHPR12MB1518.namprd12.prod.outlook.com;FPR:;SPF:None;MLV:sfv;LANG:en; x-ms-office365-filtering-correlation-id: 7cffbe57-5440-4495-ba83-08d47673690b x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001)(2017030254075)(48565401081)(201703131423073)(201703031133079);SRVR:MWHPR12MB1663; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(9452136761055); x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(6040447)(601004)(2401047)(8121501046)(5005006)(10201501046)(3002001)(6055026)(6041248)(20161123560025)(20161123564025)(20161123558025)(201703131423072)(201702281528072)(201703061421072)(201703061406072)(20161123562025)(20161123555025)(6072148);SRVR:MWHPR12MB1663;BCL:0;PCL:0;RULEID:;SRVR:MWHPR12MB1663; x-forefront-prvs: 0261CCEEDF spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-originalarrivaltime: 29 Mar 2017 07:15:42.0414 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-Transport-CrossTenantHeadersStamped: MWHPR12MB1663 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by mail.home.local id v2T7G3Zi008200 Content-Length: 2531 Lines: 44 >-----Original Message----- >From: 'Joerg Roedel' [mailto:jroedel@suse.de] >Sent: Wednesday, March 29, 2017 3:56 AM >To: Deucher, Alexander >Cc: 'Joerg Roedel'; Bjorn Helgaas; linux-pci@vger.kernel.org; linux- >kernel@vger.kernel.org; Daniel Drake; Nath, Arindam >Subject: Re: [PATCH] PCI: Blacklist AMD Stoney GPU devices for ATS > >On Tue, Mar 28, 2017 at 09:13:23PM +0000, Deucher, Alexander wrote: >> If I understand Arindam's patch correctly, it only flushes TLB entries >> for domains in the flush queue whereas the previous behavior was to >> flush all domains. If there was no TLB flush in the queue for that >> domain, could flushing it cause a problem? > >No, that can't cause a problem. An io/tlb flush for the device is just a >message that the device should invalidate its own tlb. The device can't >know and doesn't need to know whether the page-tables it used to fill >the tlb really changed. Joerg, as per my limited understanding of ATS, the ATC will respond to invalidation requests after making sure there are no in-flight DMA transactions with the address requested by IOMMU to be invalidated. Now since the IOMMU was sending invalidate command to GPU even though there was no explicit page unmapping request from the graphics subsystem, we _might_ end up in a situation where the ATC takes longer than the invalidation timeout to respond to IOMMU. With the patch I provided, since only those domains who have actually requested for unmapping pages have been added to the flush queue, we send TLB invalidation commands to only those specific domains. This avoids sending invalidation command to GPU ATC every single flush. I do agree that the Stoney might have some issue which causes it not to be able complete the invalidation command in time since we are not observing the issue on CZ and other ASICs. Thanks, Arindam > >As it looks, the problem we are seeing here is that we are sending a >large amount of these requests to the GPU device, and wait for its >completion every time. This shouldn't be a problem for ATS devices, but >the GPU here seems to fail at some point and doesn't answer to the >invalidation request anymore, causing the completion-wait loop timeouts. > >Arindam's patch makes the high flush-frequency less likely, but it can >still happen, depending on how the GPU is used. So its the best to >keep ATS disabled on the device as it doesn't work correctly and we risk >running in the same problem again when we leave it enabled and just make >the trigger less likely. > > > Joerg