Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932403AbdC1U2S (ORCPT ); Tue, 28 Mar 2017 16:28:18 -0400 Received: from mx2.suse.de ([195.135.220.15]:55863 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932139AbdC1U2R (ORCPT ); Tue, 28 Mar 2017 16:28:17 -0400 Date: Tue, 28 Mar 2017 22:28:44 +0200 From: Joerg Roedel To: "Deucher, Alexander" Cc: "'Joerg Roedel'" , Bjorn Helgaas , "linux-pci@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Daniel Drake , "Nath, Arindam" Subject: Re: [PATCH] PCI: Blacklist AMD Stoney GPU devices for ATS Message-ID: <20170328202844.GQ8329@suse.de> References: <1490703404-4944-1-git-send-email-joro@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1504 Lines: 38 On Tue, Mar 28, 2017 at 08:18:26PM +0000, Deucher, Alexander wrote: > > -----Original Message----- > > From: Joerg Roedel [mailto:joro@8bytes.org] > > Sent: Tuesday, March 28, 2017 8:17 AM > > To: Bjorn Helgaas > > Cc: linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Joerg Roedel; > > Daniel Drake; Deucher, Alexander > > Subject: [PATCH] PCI: Blacklist AMD Stoney GPU devices for ATS > > > > From: Joerg Roedel > > > > ATS is broken on these devices. Under invalidation load, the > > GPU does not reply to invalidations anymore, causing > > Completion-wait loop timeouts on the AMD IOMMU driver side. > > Fix it by not enabling ATS on these devices. > > > > Note that below mentioned commit is not broken, it just > > triggers the issue because it might cause invalidation > > storms on devices. > > > > Fixes: b1516a14657a ('iommu/amd: Implement flush queue') > > Reported-by: Daniel Drake > > Cc: Daniel Drake > > Cc: Alexander Deucher > > Signed-off-by: Joerg Roedel > > Did you see Arindam's patch from yesterday[1]? Not sure which is the proper fix, maybe both? Arindam's patch makes sense on its own, but not as a fix for this issue. It lowers the invalidation load on the GPU, but there are still ways to trigger a high invalidation rate on the device. So it might hide the issue, but not fix it. We need to disable ATS on the device if it doesn't work reliably. Joerg