Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933829AbaDJPhz (ORCPT ); Thu, 10 Apr 2014 11:37:55 -0400 Received: from mga09.intel.com ([134.134.136.24]:32470 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1030317AbaDJPhu (ORCPT ); Thu, 10 Apr 2014 11:37:50 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.97,834,1389772800"; d="p7s'?scan'208";a="518570538" From: "Woodhouse, David" To: "Elliott@hp.com" , "bhelgaas@google.com" CC: "linux-kernel@vger.kernel.org" , "bhe@redhat.com" , "jiang.liu@linux.intel.com" , "joro@8bytes.org" , "linux-scsi@vger.kernel.org" , "iommu@lists.linux-foundation.org" , "James.Bottomley@hansenpartnership.com" , "linux-pci@vger.kernel.org" , "scameron@beardog.cce.hp.com" , "davidlohr@hp.com" Subject: Re: hpsa driver bug crack kernel down! Thread-Topic: hpsa driver bug crack kernel down! Thread-Index: AQHPVI0Vc4dwKY4180e5mx3ldqxwrpsK98jq///zBgA= Date: Thu, 10 Apr 2014 15:34:26 +0000 Message-ID: <1397144058.1308.22.camel@i7.infradead.org> References: <20140409023935.GE11839@dhcp-16-105.nay.redhat.com> <1397083799.2608.20.camel@buesod1.americas.hpqcorp.net> <1397084904.9519.62.camel@dabdike> <1397085044.9519.63.camel@dabdike> <1397086817.2608.25.camel@buesod1.americas.hpqcorp.net> <1397087425.9519.67.camel@dabdike> <1397089180.2608.27.camel@buesod1.americas.hpqcorp.net> <1397111557.2608.29.camel@buesod1.americas.hpqcorp.net> <20140410071535.GX13491@8bytes.org> <1397119587.19944.14.camel@shinybook.infradead.org> In-Reply-To: Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: x-originating-ip: [10.252.120.181] Content-Type: multipart/signed; micalg=sha-1; protocol="application/x-pkcs7-signature"; boundary="=-TLy+xqEq5eSJQS6eGO4U" MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --=-TLy+xqEq5eSJQS6eGO4U Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, 2014-04-10 at 09:14 -0600, Bjorn Helgaas wrote: > > Thus, my first guess would be that we are quite happily setting up the > > requested DMA maps on the *wrong* IOMMU, and then taking faults when th= e > > device actually tries to do DMA. > > > I like the "wrong IOMMU (or no IOMMU at all)" theory. If we didn't > connect the device with an IOMMU at all, that would explain the device > DMAing directly to a physical address, wouldn't it? An unlikely failure mode. We're much more likely to see *wrong* IOMMU than no IOMMU. And thus we'd still see the distinctive virtual addresses just below 4GiB. However, Rob's answer may solve that puzzle. If this is one of those abominations where the device continues to do DMA to system memory even after the OS is up and running and *thinks* it has control of the hardware, then the offending address will be listed in an RMRR entry (which tells the OS to set up a 1:1 mapping for access to certain memory ranges for a given device). And will be inside an E820 reserved region. A little odd that such an error would trigger only when we're actually trying to initialise the device from the Linux driver, not as soon as we enable the IOMMU. But all things are possible. But the DMAR table and dmesg that I asked for would give us a bit more information and hopefully let us stop speculating... > > We should also rate-limit DMA faults, which would avoid the lockup > > failure mode. Bjorn, what should an IOMMU driver *do* when it detects > > that a device is creating an endless stream of DMA faults and isn't > > aborting the transaction? >=20 > You mentioned that POWER with EEH does something intelligent in this > case, but I'm not familiar with that code. We have AER support, which > can result in resetting a device, but I think DMA faults are reported > differently, and I don't think there's any nice existing way for PCI > to deal with them. Maybe there should be, though. Quite frankly, I don't care how *you* deal with them, or even if you can. All I want to know is how I tell you about the problem, because *I* sure as hell don't want to be trying to deal with it in the IOMMU code. That's a generic PCI layer thing. :) --=20 David Woodhouse Open Source Technology Centre David.Woodhouse@intel.com Intel Corporation --=-TLy+xqEq5eSJQS6eGO4U Content-Type: application/x-pkcs7-signature; name="smime.p7s" Content-Disposition: attachment; filename="smime.p7s" Content-Transfer-Encoding: base64 MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIILITCCBOsw ggPToAMCAQICEFLpAsoR6ESdlGU4L6MaMLswDQYJKoZIhvcNAQEFBQAwbzELMAkGA1UEBhMCU0Ux FDASBgNVBAoTC0FkZFRydXN0IEFCMSYwJAYDVQQLEx1BZGRUcnVzdCBFeHRlcm5hbCBUVFAgTmV0 d29yazEiMCAGA1UEAxMZQWRkVHJ1c3QgRXh0ZXJuYWwgQ0EgUm9vdDAeFw0xMzAzMTkwMDAwMDBa Fw0yMDA1MzAxMDQ4MzhaMHkxCzAJBgNVBAYTAlVTMQswCQYDVQQIEwJDQTEUMBIGA1UEBxMLU2Fu dGEgQ2xhcmExGjAYBgNVBAoTEUludGVsIENvcnBvcmF0aW9uMSswKQYDVQQDEyJJbnRlbCBFeHRl cm5hbCBCYXNpYyBJc3N1aW5nIENBIDRBMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA 4LDMgJ3YSVX6A9sE+jjH3b+F3Xa86z3LLKu/6WvjIdvUbxnoz2qnvl9UKQI3sE1zURQxrfgvtP0b Pgt1uDwAfLc6H5eqnyi+7FrPsTGCR4gwDmq1WkTQgNDNXUgb71e9/6sfq+WfCDpi8ScaglyLCRp7 ph/V60cbitBvnZFelKCDBh332S6KG3bAdnNGB/vk86bwDlY6omDs6/RsfNwzQVwo/M3oPrux6y6z yIoRulfkVENbM0/9RrzQOlyK4W5Vk4EEsfW2jlCV4W83QKqRccAKIUxw2q/HoHVPbbETrrLmE6RR Z/+eWlkGWl+mtx42HOgOmX0BRdTRo9vH7yeBowIDAQABo4IBdzCCAXMwHwYDVR0jBBgwFoAUrb2Y ejS0Jvf6xCZU7wO94CTLVBowHQYDVR0OBBYEFB5pKrTcKP5HGE4hCz+8rBEv8Jj1MA4GA1UdDwEB /wQEAwIBhjASBgNVHRMBAf8ECDAGAQH/AgEAMDYGA1UdJQQvMC0GCCsGAQUFBwMEBgorBgEEAYI3 CgMEBgorBgEEAYI3CgMMBgkrBgEEAYI3FQUwFwYDVR0gBBAwDjAMBgoqhkiG+E0BBQFpMEkGA1Ud HwRCMEAwPqA8oDqGOGh0dHA6Ly9jcmwudHJ1c3QtcHJvdmlkZXIuY29tL0FkZFRydXN0RXh0ZXJu YWxDQVJvb3QuY3JsMDoGCCsGAQUFBwEBBC4wLDAqBggrBgEFBQcwAYYeaHR0cDovL29jc3AudHJ1 c3QtcHJvdmlkZXIuY29tMDUGA1UdHgQuMCygKjALgQlpbnRlbC5jb20wG6AZBgorBgEEAYI3FAID oAsMCWludGVsLmNvbTANBgkqhkiG9w0BAQUFAAOCAQEAKcLNo/2So1Jnoi8G7W5Q6FSPq1fmyKW3 sSDf1amvyHkjEgd25n7MKRHGEmRxxoziPKpcmbfXYU+J0g560nCo5gPF78Wd7ZmzcmCcm1UFFfIx fw6QA19bRpTC8bMMaSSEl8y39Pgwa+HENmoPZsM63DdZ6ziDnPqcSbcfYs8qd/m5d22rpXq5IGVU tX6LX7R/hSSw/3sfATnBLgiJtilVyY7OGGmYKCAS2I04itvSS1WtecXTt9OZDyNbl7LtObBrgMLh ZkpJW+pOR9f3h5VG2S5uKkA7Th9NC9EoScdwQCAIw+UWKbSQ0Isj2UFL7fHKvmqWKVTL98sRzvI3 seNC4DCCBi4wggUWoAMCAQICCmJiMmoAAAAATKAwDQYJKoZIhvcNAQEFBQAweTELMAkGA1UEBhMC VVMxCzAJBgNVBAgTAkNBMRQwEgYDVQQHEwtTYW50YSBDbGFyYTEaMBgGA1UEChMRSW50ZWwgQ29y cG9yYXRpb24xKzApBgNVBAMTIkludGVsIEV4dGVybmFsIEJhc2ljIElzc3VpbmcgQ0EgNEEwHhcN MTQwMzI3MTU0NzAwWhcNMTcwMzExMTU0NzAwWjBFMRkwFwYDVQQDExBXb29kaG91c2UsIERhdmlk MSgwJgYJKoZIhvcNAQkBFhlkYXZpZC53b29kaG91c2VAaW50ZWwuY29tMIIBIjANBgkqhkiG9w0B AQEFAAOCAQ8AMIIBCgKCAQEAxBWZsH+iiufLleSLvlA6oKOI4oknPkSIiFPrgp5eBcRyiduI/iDK 2I1MYM6mOmMSNbyT70AqyI+NEbgoadRHG2z+57H3eBh/p0eDs/ElRKOXCYTfP0YwSHMRORuqa0Zq KxjNxtjeILs8Lawu4ujqd+Wl1dUgPoYxHIsssUfPEiisls1NCH23iZOjvr1mPouqpLTcwQw7uEbu eiuerjtWlhbMRJvscT66sF65RumcikKsFfasJALDa8J0gFthgGyJ0mVaUsPVgkyMoVfEu/5tVjLl kiW8/Nj6KITQvHqz7x/Es0IRJCc9/zBES7yMeD+fgJKHAEv/uTcFfGM9HIWxPQIDAQABo4IC6jCC AuYwHQYDVR0OBBYEFGK1Mey+kPYGHowHJ0YXtQU4NmbSMB8GA1UdIwQYMBaAFB5pKrTcKP5HGE4h Cz+8rBEv8Jj1MIHJBgNVHR8EgcEwgb4wgbuggbiggbWGVGh0dHA6Ly93d3cuaW50ZWwuY29tL3Jl cG9zaXRvcnkvQ1JML0ludGVsJTIwRXh0ZXJuYWwlMjBCYXNpYyUyMElzc3VpbmclMjBDQSUyMDRB LmNybIZdaHR0cDovL2NlcnRpZmljYXRlcy5pbnRlbC5jb20vcmVwb3NpdG9yeS9DUkwvSW50ZWwl MjBFeHRlcm5hbCUyMEJhc2ljJTIwSXNzdWluZyUyMENBJTIwNEEuY3JsMIHvBggrBgEFBQcBAQSB 4jCB3zBpBggrBgEFBQcwAoZdaHR0cDovL3d3dy5pbnRlbC5jb20vcmVwb3NpdG9yeS9jZXJ0aWZp Y2F0ZXMvSW50ZWwlMjBFeHRlcm5hbCUyMEJhc2ljJTIwSXNzdWluZyUyMENBJTIwNEEuY3J0MHIG CCsGAQUFBzAChmZodHRwOi8vY2VydGlmaWNhdGVzLmludGVsLmNvbS9yZXBvc2l0b3J5L2NlcnRp ZmljYXRlcy9JbnRlbCUyMEV4dGVybmFsJTIwQmFzaWMlMjBJc3N1aW5nJTIwQ0ElMjA0QS5jcnQw CwYDVR0PBAQDAgeAMDwGCSsGAQQBgjcVBwQvMC0GJSsGAQQBgjcVCIbDjHWEmeVRg/2BKIWOn1OC kcAJZ4HevTmV8EMCAWQCAQgwHwYDVR0lBBgwFgYIKwYBBQUHAwQGCisGAQQBgjcKAwwwKQYJKwYB BAGCNxUKBBwwGjAKBggrBgEFBQcDBDAMBgorBgEEAYI3CgMMME8GA1UdEQRIMEagKQYKKwYBBAGC NxQCA6AbDBlkYXZpZC53b29kaG91c2VAaW50ZWwuY29tgRlkYXZpZC53b29kaG91c2VAaW50ZWwu Y29tMA0GCSqGSIb3DQEBBQUAA4IBAQBCQ4UH3yybC+PzPo7W4PQJQwIDkKfD2i20i/DosQ7+Yeof KF7qDASe9eoJGXbINBx1u648uOnaMBsxgUUamJo7pdt1ZnsetRtCQrJIsrsJA3Q2MOsrv7xHkzqn DF99KHEbO2yKvyjJVDznHUWh8M1OFmdoziyWE/VPdqTwXwS/UKO81XaTtWUDGO716HHVlfT9yPle Ukg2MTcIhhNWmlS8gDUayhteIAlPci71f/oXzXxBiGiO6FVZUEx+rZBQB84Ey0S0Tfm7hiGzoegg ra0hfiiMOKMio+n0r4NUn03Z+VRUTbdjHIA6Lkozwpadvs9/uK8dIGqfcgxYgk9qdjFPMYICDjCC AgoCAQEwgYcweTELMAkGA1UEBhMCVVMxCzAJBgNVBAgTAkNBMRQwEgYDVQQHEwtTYW50YSBDbGFy YTEaMBgGA1UEChMRSW50ZWwgQ29ycG9yYXRpb24xKzApBgNVBAMTIkludGVsIEV4dGVybmFsIEJh c2ljIElzc3VpbmcgQ0EgNEECCmJiMmoAAAAATKAwCQYFKw4DAhoFAKBdMBgGCSqGSIb3DQEJAzEL BgkqhkiG9w0BBwEwHAYJKoZIhvcNAQkFMQ8XDTE0MDQxMDE1MzQxOVowIwYJKoZIhvcNAQkEMRYE FPuvzaGui2lK7s/iisRU+tY10pCQMA0GCSqGSIb3DQEBAQUABIIBADEzJFwBLXb3yL1HDJhzNH47 0JkjIbFce8JqaJ3lQKVK1ERlkQuv1kuTYNKJ3Y2OeF0t+EzYhvbLRBH/RtugpyKErz9pLScWKKUz C2WT5M8vqVFMg0d035KdDSYoq9mou7VA4tUiHhcqfV+XKhpjGjecfbGAVd2fGaIaxPpE4XwAxZ9y YJIia0LYRbJ2FkHq5pZDQ+af2cnZ5mQdQZxK8tCw0d9GBtCiDDYCXYyeyvZna+sqjkVg7W5OpHjs kLPsnHUOPIrN0PsguSfuIGU1XwKxEpS4P2oEVLdbUGHWd+9Cgw8Xd4cOenN3TK1svvRM3zKqYHbJ iACCDh0BOZYvLFoAAAAAAAA= --=-TLy+xqEq5eSJQS6eGO4U-- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/