Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933251AbbFWOGo (ORCPT ); Tue, 23 Jun 2015 10:06:44 -0400 Received: from cantor2.suse.de ([195.135.220.15]:58949 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933357AbbFWOGg (ORCPT ); Tue, 23 Jun 2015 10:06:36 -0400 Date: Tue, 23 Jun 2015 16:06:31 +0200 From: Joerg Roedel To: David Woodhouse Cc: Joerg Roedel , iommu@lists.linux-foundation.org, zhen-hual@hp.com, bhe@redhat.com, ddutile@redhat.com, alex.williamson@redhat.com, dyoung@redhat.com, linux-kernel@vger.kernel.org, jroedel@8bytes.org Subject: Re: [PATCH 00/19] Fix Intel IOMMU breakage in kdump kernel Message-ID: <20150623140631.GB2724@suse.de> References: <1434178047-17809-1-git-send-email-joro@8bytes.org> <1435066290.12045.2.camel@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1435066290.12045.2.camel@infradead.org> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1703 Lines: 41 On Tue, Jun 23, 2015 at 02:31:30PM +0100, David Woodhouse wrote: > However, it's still fairly gratuitous for all non-broken hardware, and > will tend to hide hardware and driver bugs during testing of new > hardware. > > I'd much rather see this limited to a blacklist of known-broken > devices, an accompanied by a kernel message along the lines of > > 'Preserving VT-d page tables for broken HP device xxxx:xxxx' > > For *any* device which isn't so broken that it craps itself on taking > a DMA fault and cannot be reset, this page table copy shouldn't be > needed, right? In theory yes, but as it came to my mind recently, there is this BIOS "value-add" called APEI (ACPI Platform Error Interface) which has a 'Firmware first' mode. So when this is active the firmware handles any errors happening in the system and reports them to the OS with a severity it can decide on its own. Such errors could be DMA target aborts, for example. And I have seen systems where at least rejected interrupt requests were reported to the OS as fatal errors, causing a kernel panic in Linux. But the firmware is also free to report ordinary DMA failures as fatal errors, who knows... So while you are right that these changes might hide hardware and driver bugs, I think it is still the best to try avoiding such faults at all costs in the kdump kernel to actually get a dump, even if the device would actually be able to recover from the master abort. Joerg -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/