Hi,
Is the following normal on an X9SRL-F board (bios 1.0a)?
In the manual it states:
Data Direct I/O
Select Enabled to enable Intel I/OAT (I/O Acceleration Technology), which
significantly reduces CPU overhead by leveraging CPU architectural
improvements and freeing the system resource for other tasks. The options
are Disabled and Enabled.
Default is Enabled.
When enabled in the kernel, I see the following:
[ 0.696357] ioatdma: Intel(R) QuickData Technology Driver 4.00
[ 0.696487] ioatdma 0000:00:04.0: channel error register unreachable
[ 0.696546] ioatdma 0000:00:04.0: channel enumeration error
[ 0.696604] ioatdma 0000:00:04.0: Intel(R) I/OAT DMA Engine init failed
[ 0.696721] ioatdma 0000:00:04.1: channel error register unreachable
[ 0.696779] ioatdma 0000:00:04.1: channel enumeration error
[ 0.697522] ioatdma 0000:00:04.1: Intel(R) I/OAT DMA Engine init failed
[ 0.697617] ioatdma 0000:00:04.2: channel error register unreachable
[ 0.697681] ioatdma 0000:00:04.2: channel enumeration error
[ 0.697739] ioatdma 0000:00:04.2: Intel(R) I/OAT DMA Engine init failed
[ 0.697831] ioatdma 0000:00:04.3: channel error register unreachable
[ 0.697890] ioatdma 0000:00:04.3: channel enumeration error
[ 0.697948] ioatdma 0000:00:04.3: Intel(R) I/OAT DMA Engine init failed
[ 0.698037] ioatdma 0000:00:04.4: channel error register unreachable
[ 0.698095] ioatdma 0000:00:04.4: channel enumeration error
[ 0.698153] ioatdma 0000:00:04.4: Intel(R) I/OAT DMA Engine init failed
[ 0.698245] ioatdma 0000:00:04.5: channel error register unreachable
[ 0.698303] ioatdma 0000:00:04.5: channel enumeration error
[ 0.698360] ioatdma 0000:00:04.5: Intel(R) I/OAT DMA Engine init failed
[ 0.698449] ioatdma 0000:00:04.6: channel error register unreachable
[ 0.698508] ioatdma 0000:00:04.6: channel enumeration error
[ 0.698565] ioatdma 0000:00:04.6: Intel(R) I/OAT DMA Engine init failed
[ 0.698676] ioatdma 0000:00:04.7: channel error register unreachable
[ 0.698735] ioatdma 0000:00:04.7: channel enumeration error
[ 0.698792] ioatdma 0000:00:04.7: Intel(R) I/OAT DMA Engine init failed
--
Also, I tried using ASPM (enabled in BIOS), but since ACPI Linux query is
ignored, it fails to work:
[ 0.562229] [Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored
I assume this is something Supermicro has to fix?
Justin.
Hi Justin,
On Sat, 24 November 2012 "Justin Piszcz" wrote:
> Is the following normal on an X9SRL-F board (bios 1.0a)?
>
> In the manual it states:
>
> Data Direct I/O
> Select Enabled to enable Intel I/OAT (I/O Acceleration Technology), which
> significantly reduces CPU overhead by leveraging CPU architectural
> improvements and freeing the system resource for other tasks. The options
> are Disabled and Enabled.
>
> Default is Enabled.
>
> When enabled in the kernel, I see the following:
>
> [ 0.696357] ioatdma: Intel(R) QuickData Technology Driver 4.00
> [ 0.696487] ioatdma 0000:00:04.0: channel error register unreachable
> [ 0.696546] ioatdma 0000:00:04.0: channel enumeration error
> [ 0.696604] ioatdma 0000:00:04.0: Intel(R) I/OAT DMA Engine init failed
> [ 0.696721] ioatdma 0000:00:04.1: channel error register unreachable
> [ 0.696779] ioatdma 0000:00:04.1: channel enumeration error
> [ 0.697522] ioatdma 0000:00:04.1: Intel(R) I/OAT DMA Engine init failed
> [ 0.697617] ioatdma 0000:00:04.2: channel error register unreachable
> [ 0.697681] ioatdma 0000:00:04.2: channel enumeration error
> [ 0.697739] ioatdma 0000:00:04.2: Intel(R) I/OAT DMA Engine init failed
> [ 0.697831] ioatdma 0000:00:04.3: channel error register unreachable
> [ 0.697890] ioatdma 0000:00:04.3: channel enumeration error
> [ 0.697948] ioatdma 0000:00:04.3: Intel(R) I/OAT DMA Engine init failed
> [ 0.698037] ioatdma 0000:00:04.4: channel error register unreachable
> [ 0.698095] ioatdma 0000:00:04.4: channel enumeration error
> [ 0.698153] ioatdma 0000:00:04.4: Intel(R) I/OAT DMA Engine init failed
> [ 0.698245] ioatdma 0000:00:04.5: channel error register unreachable
> [ 0.698303] ioatdma 0000:00:04.5: channel enumeration error
> [ 0.698360] ioatdma 0000:00:04.5: Intel(R) I/OAT DMA Engine init failed
> [ 0.698449] ioatdma 0000:00:04.6: channel error register unreachable
> [ 0.698508] ioatdma 0000:00:04.6: channel enumeration error
> [ 0.698565] ioatdma 0000:00:04.6: Intel(R) I/OAT DMA Engine init failed
> [ 0.698676] ioatdma 0000:00:04.7: channel error register unreachable
> [ 0.698735] ioatdma 0000:00:04.7: channel enumeration error
> [ 0.698792] ioatdma 0000:00:04.7: Intel(R) I/OAT DMA Engine init failed
>
> --
>
> Also, I tried using ASPM (enabled in BIOS), but since ACPI Linux query is
> ignored, it fails to work:
> [ 0.562229] [Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored
>
> I assume this is something Supermicro has to fix?
You are probably missing some kernel config option(s) :) - I did fight similar
issues on a Fujitsu SandyBridge Xeon based server.
Check if enabling CONFIG_X86_X2APIC helps as well as other APIC/IOMMU options.
Bruno
> [ 0.696357] ioatdma: Intel(R) QuickData Technology Driver 4.00
> [ 0.696487] ioatdma 0000:00:04.0: channel error register unreachable
> I assume this is something Supermicro has to fix?
You are probably missing some kernel config option(s) :) - I did fight
similar
issues on a Fujitsu SandyBridge Xeon based server.
Check if enabling CONFIG_X86_X2APIC helps as well as other APIC/IOMMU
options.
Bruno
=> Enabled:
CONFIG_IOMMU_SUPPORT
CONFIG_INTEL_IOMMU
CONFIG_INTEL_IOMMU_DEFAULT_ON
CONFIG_IRQ_REMAP
Also tried enabling NUMA, etc:
[ 0.330998] ACPI FADT declares the system doesn't support PCIe ASPM, so
disable it
[ 0.331068] ACPI: bus type pci registered
[ 0.615234] ACPI: Dynamic OEM Table Load:
[ 0.615373] ACPI: PRAD (null) 000BE (v02 PRADID PRADTID
00000001 MSFT 04000000)
[ 0.615631] \_SB_:_OSC invalid UUID
[ 0.615633] _OSC request data:1 7
[ 0.663138] pci 0000:ff:13.5: [8086:3c44] type 00 class 0x110100
[ 0.663170] pci 0000:ff:13.6: [8086:3c45] type 00 class 0x088000
[ 0.663211] pci0000:ff: ACPI _OSC support notification failed, disabling
PCIe ASPM
[ 0.663281] pci0000:ff: Unable to request _OSC control (_OSC support
mask: 0x08)
:(
Justin.
[+cc Dan]
On Mon, Nov 26, 2012 at 2:42 PM, Bruno Pr?mont
<[email protected]> wrote:
> Hi Justin,
>
> On Sat, 24 November 2012 "Justin Piszcz" wrote:
>> Is the following normal on an X9SRL-F board (bios 1.0a)?
>>
>> In the manual it states:
>>
>> Data Direct I/O
>> Select Enabled to enable Intel I/OAT (I/O Acceleration Technology), which
>> significantly reduces CPU overhead by leveraging CPU architectural
>> improvements and freeing the system resource for other tasks. The options
>> are Disabled and Enabled.
>>
>> Default is Enabled.
>>
>> When enabled in the kernel, I see the following:
>>
>> [ 0.696357] ioatdma: Intel(R) QuickData Technology Driver 4.00
>> [ 0.696487] ioatdma 0000:00:04.0: channel error register unreachable
>> [ 0.696546] ioatdma 0000:00:04.0: channel enumeration error
>> [ 0.696604] ioatdma 0000:00:04.0: Intel(R) I/OAT DMA Engine init failed
>> [ 0.696721] ioatdma 0000:00:04.1: channel error register unreachable
>> [ 0.696779] ioatdma 0000:00:04.1: channel enumeration error
>> [ 0.697522] ioatdma 0000:00:04.1: Intel(R) I/OAT DMA Engine init failed
>> [ 0.697617] ioatdma 0000:00:04.2: channel error register unreachable
>> [ 0.697681] ioatdma 0000:00:04.2: channel enumeration error
>> [ 0.697739] ioatdma 0000:00:04.2: Intel(R) I/OAT DMA Engine init failed
>> [ 0.697831] ioatdma 0000:00:04.3: channel error register unreachable
>> [ 0.697890] ioatdma 0000:00:04.3: channel enumeration error
>> [ 0.697948] ioatdma 0000:00:04.3: Intel(R) I/OAT DMA Engine init failed
>> [ 0.698037] ioatdma 0000:00:04.4: channel error register unreachable
>> [ 0.698095] ioatdma 0000:00:04.4: channel enumeration error
>> [ 0.698153] ioatdma 0000:00:04.4: Intel(R) I/OAT DMA Engine init failed
>> [ 0.698245] ioatdma 0000:00:04.5: channel error register unreachable
>> [ 0.698303] ioatdma 0000:00:04.5: channel enumeration error
>> [ 0.698360] ioatdma 0000:00:04.5: Intel(R) I/OAT DMA Engine init failed
>> [ 0.698449] ioatdma 0000:00:04.6: channel error register unreachable
>> [ 0.698508] ioatdma 0000:00:04.6: channel enumeration error
>> [ 0.698565] ioatdma 0000:00:04.6: Intel(R) I/OAT DMA Engine init failed
>> [ 0.698676] ioatdma 0000:00:04.7: channel error register unreachable
>> [ 0.698735] ioatdma 0000:00:04.7: channel enumeration error
>> [ 0.698792] ioatdma 0000:00:04.7: Intel(R) I/OAT DMA Engine init failed
>>
>> --
>>
>> Also, I tried using ASPM (enabled in BIOS), but since ACPI Linux query is
>> ignored, it fails to work:
>> [ 0.562229] [Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored
>>
>> I assume this is something Supermicro has to fix?
>
> You are probably missing some kernel config option(s) :) - I did fight similar
> issues on a Fujitsu SandyBridge Xeon based server.
>
> Check if enabling CONFIG_X86_X2APIC helps as well as other APIC/IOMMU options.
Changing config options is not a valid fix for error messages like
this. We should be able to make the config smarter by adding
dependencies or something, or else make the driver smart enough to
give a more useful diagnostic.
The "channel error register unreachable" message indicates that
pci_read_config_dword() failed. The register in question
(IOAT_PCI_CHANERR_INT_OFFSET) is at 0x180, so possibly we don't have
PCI config accessors for the extended config space (0x100-0xfff). A
complete dmesg log should show that.
[Try Dan's current email address; sorry Dan]
On Mon, Nov 26, 2012 at 5:56 PM, Bjorn Helgaas <[email protected]> wrote:
> [+cc Dan]
>
> On Mon, Nov 26, 2012 at 2:42 PM, Bruno Pr?mont
> <[email protected]> wrote:
>> Hi Justin,
>>
>> On Sat, 24 November 2012 "Justin Piszcz" wrote:
>>> Is the following normal on an X9SRL-F board (bios 1.0a)?
>>>
>>> In the manual it states:
>>>
>>> Data Direct I/O
>>> Select Enabled to enable Intel I/OAT (I/O Acceleration Technology), which
>>> significantly reduces CPU overhead by leveraging CPU architectural
>>> improvements and freeing the system resource for other tasks. The options
>>> are Disabled and Enabled.
>>>
>>> Default is Enabled.
>>>
>>> When enabled in the kernel, I see the following:
>>>
>>> [ 0.696357] ioatdma: Intel(R) QuickData Technology Driver 4.00
>>> [ 0.696487] ioatdma 0000:00:04.0: channel error register unreachable
>>> [ 0.696546] ioatdma 0000:00:04.0: channel enumeration error
>>> [ 0.696604] ioatdma 0000:00:04.0: Intel(R) I/OAT DMA Engine init failed
>>> [ 0.696721] ioatdma 0000:00:04.1: channel error register unreachable
>>> [ 0.696779] ioatdma 0000:00:04.1: channel enumeration error
>>> [ 0.697522] ioatdma 0000:00:04.1: Intel(R) I/OAT DMA Engine init failed
>>> [ 0.697617] ioatdma 0000:00:04.2: channel error register unreachable
>>> [ 0.697681] ioatdma 0000:00:04.2: channel enumeration error
>>> [ 0.697739] ioatdma 0000:00:04.2: Intel(R) I/OAT DMA Engine init failed
>>> [ 0.697831] ioatdma 0000:00:04.3: channel error register unreachable
>>> [ 0.697890] ioatdma 0000:00:04.3: channel enumeration error
>>> [ 0.697948] ioatdma 0000:00:04.3: Intel(R) I/OAT DMA Engine init failed
>>> [ 0.698037] ioatdma 0000:00:04.4: channel error register unreachable
>>> [ 0.698095] ioatdma 0000:00:04.4: channel enumeration error
>>> [ 0.698153] ioatdma 0000:00:04.4: Intel(R) I/OAT DMA Engine init failed
>>> [ 0.698245] ioatdma 0000:00:04.5: channel error register unreachable
>>> [ 0.698303] ioatdma 0000:00:04.5: channel enumeration error
>>> [ 0.698360] ioatdma 0000:00:04.5: Intel(R) I/OAT DMA Engine init failed
>>> [ 0.698449] ioatdma 0000:00:04.6: channel error register unreachable
>>> [ 0.698508] ioatdma 0000:00:04.6: channel enumeration error
>>> [ 0.698565] ioatdma 0000:00:04.6: Intel(R) I/OAT DMA Engine init failed
>>> [ 0.698676] ioatdma 0000:00:04.7: channel error register unreachable
>>> [ 0.698735] ioatdma 0000:00:04.7: channel enumeration error
>>> [ 0.698792] ioatdma 0000:00:04.7: Intel(R) I/OAT DMA Engine init failed
>>>
>>> --
>>>
>>> Also, I tried using ASPM (enabled in BIOS), but since ACPI Linux query is
>>> ignored, it fails to work:
>>> [ 0.562229] [Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored
>>>
>>> I assume this is something Supermicro has to fix?
>>
>> You are probably missing some kernel config option(s) :) - I did fight similar
>> issues on a Fujitsu SandyBridge Xeon based server.
>>
>> Check if enabling CONFIG_X86_X2APIC helps as well as other APIC/IOMMU options.
>
> Changing config options is not a valid fix for error messages like
> this. We should be able to make the config smarter by adding
> dependencies or something, or else make the driver smart enough to
> give a more useful diagnostic.
>
> The "channel error register unreachable" message indicates that
> pci_read_config_dword() failed. The register in question
> (IOAT_PCI_CHANERR_INT_OFFSET) is at 0x180, so possibly we don't have
> PCI config accessors for the extended config space (0x100-0xfff). A
> complete dmesg log should show that.
-----Original Message-----
From: Bjorn Helgaas [mailto:[email protected]]
Sent: Monday, November 26, 2012 8:00 PM
To: Bruno Pr?mont
Cc: Justin Piszcz; [email protected]; [email protected]; Dan
Williams
Subject: Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware
bug question
[Try Dan's current email address; sorry Dan]
On Mon, Nov 26, 2012 at 5:56 PM, Bjorn Helgaas <[email protected]> wrote:
> [+cc Dan]
>
> On Mon, Nov 26, 2012 at 2:42 PM, Bruno Pr?mont
> <[email protected]> wrote:
>> Hi Justin,
>>
>> On Sat, 24 November 2012 "Justin Piszcz" wrote:
>>> Is the following normal on an X9SRL-F board (bios 1.0a)?
>>>
>>> In the manual it states:
>>>
>>> Data Direct I/O
>>> Select Enabled to enable Intel I/OAT (I/O Acceleration Technology),
which
>>> significantly reduces CPU overhead by leveraging CPU architectural
>>> improvements and freeing the system resource for other tasks. The
options
>>> are Disabled and Enabled.
>>>
>>> Default is Enabled.
>>>
>>> When enabled in the kernel, I see the following:
>>>
>>> [ 0.696357] ioatdma: Intel(R) QuickData Technology Driver 4.00
>>> [ 0.696487] ioatdma 0000:00:04.0: channel error register unreachable
>>> [ 0.696546] ioatdma 0000:00:04.0: channel enumeration error
>>> [ 0.696604] ioatdma 0000:00:04.0: Intel(R) I/OAT DMA Engine init
failed
>>> [ 0.696721] ioatdma 0000:00:04.1: channel error register unreachable
>>> [ 0.696779] ioatdma 0000:00:04.1: channel enumeration error
>>> [ 0.697522] ioatdma 0000:00:04.1: Intel(R) I/OAT DMA Engine init
failed
>>> [ 0.697617] ioatdma 0000:00:04.2: channel error register unreachable
>>> [ 0.697681] ioatdma 0000:00:04.2: channel enumeration error
>>> [ 0.697739] ioatdma 0000:00:04.2: Intel(R) I/OAT DMA Engine init
failed
>>> [ 0.697831] ioatdma 0000:00:04.3: channel error register unreachable
>>> [ 0.697890] ioatdma 0000:00:04.3: channel enumeration error
>>> [ 0.697948] ioatdma 0000:00:04.3: Intel(R) I/OAT DMA Engine init
failed
>>> [ 0.698037] ioatdma 0000:00:04.4: channel error register unreachable
>>> [ 0.698095] ioatdma 0000:00:04.4: channel enumeration error
>>> [ 0.698153] ioatdma 0000:00:04.4: Intel(R) I/OAT DMA Engine init
failed
>>> [ 0.698245] ioatdma 0000:00:04.5: channel error register unreachable
>>> [ 0.698303] ioatdma 0000:00:04.5: channel enumeration error
>>> [ 0.698360] ioatdma 0000:00:04.5: Intel(R) I/OAT DMA Engine init
failed
>>> [ 0.698449] ioatdma 0000:00:04.6: channel error register unreachable
>>> [ 0.698508] ioatdma 0000:00:04.6: channel enumeration error
>>> [ 0.698565] ioatdma 0000:00:04.6: Intel(R) I/OAT DMA Engine init
failed
>>> [ 0.698676] ioatdma 0000:00:04.7: channel error register unreachable
>>> [ 0.698735] ioatdma 0000:00:04.7: channel enumeration error
>>> [ 0.698792] ioatdma 0000:00:04.7: Intel(R) I/OAT DMA Engine init
failed
>>>
>>> --
>>>
>>> Also, I tried using ASPM (enabled in BIOS), but since ACPI Linux query
is
>>> ignored, it fails to work:
>>> [ 0.562229] [Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored
>>>
>>> I assume this is something Supermicro has to fix?
>>
>> You are probably missing some kernel config option(s) :) - I did fight
similar
>> issues on a Fujitsu SandyBridge Xeon based server.
>>
>> Check if enabling CONFIG_X86_X2APIC helps as well as other APIC/IOMMU
options.
>
> Changing config options is not a valid fix for error messages like
> this. We should be able to make the config smarter by adding
> dependencies or something, or else make the driver smart enough to
> give a more useful diagnostic.
>
> The "channel error register unreachable" message indicates that
> pci_read_config_dword() failed. The register in question
> (IOAT_PCI_CHANERR_INT_OFFSET) is at 0x180, so possibly we don't have
> PCI config accessors for the extended config space (0x100-0xfff). A
> complete dmesg log should show that.
--
Here is the full dmesg: (I went back to my older kernel, let me know if you
need a dmesg w/ those options enabled)
http://home.comcast.net/~jpiszcz/20121126/dmesg.txt
Justin.
On Mon, Nov 26, 2012 at 4:56 PM, Bjorn Helgaas <[email protected]> wrote:
> The "channel error register unreachable" message indicates that
> pci_read_config_dword() failed. The register in question
> (IOAT_PCI_CHANERR_INT_OFFSET) is at 0x180, so possibly we don't have
> PCI config accessors for the extended config space (0x100-0xfff). A
> complete dmesg log should show that.
Yes, this happens when extended pci configuration space is not
reachable. Although the need to access this register has been found
to no longer be needed. So, it appears this patch [1] from Dave
should be modified to just stop touching that register altogether and
then go to -stable.
--
Dan
[1]: http://marc.info/?l=linux-kernel&m=135310841032707&w=2
On Mon, Nov 26, 2012 at 6:00 PM, Justin Piszcz <[email protected]> wrote:
>
>
> -----Original Message-----
> From: Bjorn Helgaas [mailto:[email protected]]
> Sent: Monday, November 26, 2012 8:00 PM
> To: Bruno Pr?mont
> Cc: Justin Piszcz; [email protected]; [email protected]; Dan
> Williams
> Subject: Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware
> bug question
>
> [Try Dan's current email address; sorry Dan]
>
> On Mon, Nov 26, 2012 at 5:56 PM, Bjorn Helgaas <[email protected]> wrote:
>> [+cc Dan]
>>
>> On Mon, Nov 26, 2012 at 2:42 PM, Bruno Pr?mont
>> <[email protected]> wrote:
>>> Hi Justin,
>>>
>>> On Sat, 24 November 2012 "Justin Piszcz" wrote:
>>>> Is the following normal on an X9SRL-F board (bios 1.0a)?
>>>>
>>>> In the manual it states:
>>>>
>>>> Data Direct I/O
>>>> Select Enabled to enable Intel I/OAT (I/O Acceleration Technology),
> which
>>>> significantly reduces CPU overhead by leveraging CPU architectural
>>>> improvements and freeing the system resource for other tasks. The
> options
>>>> are Disabled and Enabled.
>>>>
>>>> Default is Enabled.
>>>>
>>>> When enabled in the kernel, I see the following:
>>>>
>>>> [ 0.696357] ioatdma: Intel(R) QuickData Technology Driver 4.00
>>>> [ 0.696487] ioatdma 0000:00:04.0: channel error register unreachable
>>>> [ 0.696546] ioatdma 0000:00:04.0: channel enumeration error
>>>> [ 0.696604] ioatdma 0000:00:04.0: Intel(R) I/OAT DMA Engine init
> failed
>>>> [ 0.696721] ioatdma 0000:00:04.1: channel error register unreachable
>>>> [ 0.696779] ioatdma 0000:00:04.1: channel enumeration error
>>>> [ 0.697522] ioatdma 0000:00:04.1: Intel(R) I/OAT DMA Engine init
> failed
>>>> [ 0.697617] ioatdma 0000:00:04.2: channel error register unreachable
>>>> [ 0.697681] ioatdma 0000:00:04.2: channel enumeration error
>>>> [ 0.697739] ioatdma 0000:00:04.2: Intel(R) I/OAT DMA Engine init
> failed
>>>> [ 0.697831] ioatdma 0000:00:04.3: channel error register unreachable
>>>> [ 0.697890] ioatdma 0000:00:04.3: channel enumeration error
>>>> [ 0.697948] ioatdma 0000:00:04.3: Intel(R) I/OAT DMA Engine init
> failed
>>>> [ 0.698037] ioatdma 0000:00:04.4: channel error register unreachable
>>>> [ 0.698095] ioatdma 0000:00:04.4: channel enumeration error
>>>> [ 0.698153] ioatdma 0000:00:04.4: Intel(R) I/OAT DMA Engine init
> failed
>>>> [ 0.698245] ioatdma 0000:00:04.5: channel error register unreachable
>>>> [ 0.698303] ioatdma 0000:00:04.5: channel enumeration error
>>>> [ 0.698360] ioatdma 0000:00:04.5: Intel(R) I/OAT DMA Engine init
> failed
>>>> [ 0.698449] ioatdma 0000:00:04.6: channel error register unreachable
>>>> [ 0.698508] ioatdma 0000:00:04.6: channel enumeration error
>>>> [ 0.698565] ioatdma 0000:00:04.6: Intel(R) I/OAT DMA Engine init
> failed
>>>> [ 0.698676] ioatdma 0000:00:04.7: channel error register unreachable
>>>> [ 0.698735] ioatdma 0000:00:04.7: channel enumeration error
>>>> [ 0.698792] ioatdma 0000:00:04.7: Intel(R) I/OAT DMA Engine init
> failed
>>>>
>>>> --
>>>>
>>>> Also, I tried using ASPM (enabled in BIOS), but since ACPI Linux query
> is
>>>> ignored, it fails to work:
>>>> [ 0.562229] [Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored
>>>>
>>>> I assume this is something Supermicro has to fix?
>>>
>>> You are probably missing some kernel config option(s) :) - I did fight
> similar
>>> issues on a Fujitsu SandyBridge Xeon based server.
>>>
>>> Check if enabling CONFIG_X86_X2APIC helps as well as other APIC/IOMMU
> options.
>>
>> Changing config options is not a valid fix for error messages like
>> this. We should be able to make the config smarter by adding
>> dependencies or something, or else make the driver smart enough to
>> give a more useful diagnostic.
>>
>> The "channel error register unreachable" message indicates that
>> pci_read_config_dword() failed. The register in question
>> (IOAT_PCI_CHANERR_INT_OFFSET) is at 0x180, so possibly we don't have
>> PCI config accessors for the extended config space (0x100-0xfff). A
>> complete dmesg log should show that.
>
> --
>
> Here is the full dmesg: (I went back to my older kernel, let me know if you
> need a dmesg w/ those options enabled)
> http://home.comcast.net/~jpiszcz/20121126/dmesg.txt
It looks like maybe you don't have CONFIG_PCI_MMCONFIG turned on?
-----Original Message-----
From: Bjorn Helgaas [mailto:[email protected]]
Sent: Monday, November 26, 2012 8:12 PM
To: Justin Piszcz
Cc: Bruno Pr?mont; [email protected]; [email protected]; Dan
Williams
Subject: Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware
bug question
On Mon, Nov 26, 2012 at 6:00 PM, Justin Piszcz <[email protected]>
wrote:
>
>
> -----Original Message-----
> From: Bjorn Helgaas [mailto:[email protected]]
> Sent: Monday, November 26, 2012 8:00 PM
> To: Bruno Pr?mont
> Cc: Justin Piszcz; [email protected]; [email protected];
Dan
> Williams
> Subject: Re: Supermicro X9SRL-F - channel enumeration error &
ACPI/firmware
> bug question
>
> [Try Dan's current email address; sorry Dan]
>
> On Mon, Nov 26, 2012 at 5:56 PM, Bjorn Helgaas <[email protected]>
wrote:
>> [+cc Dan]
>>
>> On Mon, Nov 26, 2012 at 2:42 PM, Bruno Pr?mont
>> <[email protected]> wrote:
>>> Hi Justin,
>>>
>>> On Sat, 24 November 2012 "Justin Piszcz" wrote:
>>>> Is the following normal on an X9SRL-F board (bios 1.0a)?
>>>>
>>>> In the manual it states:
>>>>
>>>> Data Direct I/O
>>>> Select Enabled to enable Intel I/OAT (I/O Acceleration Technology),
> which
>>>> significantly reduces CPU overhead by leveraging CPU architectural
>>>> improvements and freeing the system resource for other tasks. The>
> Here is the full dmesg: (I went back to my older kernel, let me know if
you
> need a dmesg w/ those options enabled)
> http://home.comcast.net/~jpiszcz/20121126/dmesg.txt
It looks like maybe you don't have CONFIG_PCI_MMCONFIG turned on?
Hi,
I have two supermicro boards I am trying this on, I tried this on another
system I have (X8DTH-6F), with all of these options enabled, the system does
not boot. It cannot talk to the SATA boot drive.
" 5520 chips built in, the X8DTH-6/X8DTH-6F/X8DTH-i/X8DTH-iF offers ......
The Intel I/OAT (I/O Acceleration Technology) significantly reduces CPU
over- head by ..."
When the following options are enabled, the system does not boot:
+CONFIG_HAVE_INTEL_TXT=y
+CONFIG_IOMMU_API=y
+CONFIG_IOMMU_SUPPORT=y
+CONFIG_DMAR_TABLE=y
+CONFIG_INTEL_IOMMU=y
+CONFIG_INTEL_IOMMU_DEFAULT_ON=y
+CONFIG_INTEL_IOMMU_FLOPPY_WA=y
It fails like so:
(Fails to talk to the SSD)
http://home.comcast.net/~jpiszcz/20121127/photo1-resize.jpg
(then, a few moments later: Kernel panic)
http://home.comcast.net/~jpiszcz/20121127/photo2-resize.jpg
With those options disabled, the system boots (and always has booted fine).
Is there a certain combination of parameters that allows I/OAT to be enabled
_and_ allow the system to boot?
Justin.
> It looks like maybe you don't have CONFIG_PCI_MMCONFIG turned on?
===> FOR I/OAT DMA
Latest status, it _appears_ its working on the X9SRL-F now, thank you!
1) Supermicro X9SRL-F (GOOD)
[ 0.738510] ioatdma: Intel(R) QuickData Technology Driver 4.00
[ 0.738719] ioatdma 0000:00:04.0: irq 75 for MSI/MSI-X
[ 0.739088] ioatdma 0000:00:04.1: irq 76 for MSI/MSI-X
[ 0.739408] ioatdma 0000:00:04.2: irq 77 for MSI/MSI-X
[ 0.739739] ioatdma 0000:00:04.3: irq 78 for MSI/MSI-X
[ 0.740040] ioatdma 0000:00:04.4: irq 79 for MSI/MSI-X
[ 0.740342] ioatdma 0000:00:04.5: irq 80 for MSI/MSI-X
[ 0.740670] ioatdma 0000:00:04.6: irq 81 for MSI/MSI-X
[ 0.740971] ioatdma 0000:00:04.7: irq 82 for MSI/MSI-X
It is _not_ working on the:
2) Supermicro X8DTH-F (the boot drive in this system is running off a PCI-e
card, could the IRQ for the I/O controller be getting re-mapped and fail?)--
worse case I can move the SSD from the 6.0gbpa SATA card to the motherboard
and see if that works, but that kind of defeats the purpose of a 6.0gbps
SATA SSD.
(Fails to talk to the SSD)
http://home.comcast.net/~jpiszcz/20121127/photo1-resize.jpg
(then, a few moments later: Kernel panic)
http://home.comcast.net/~jpiszcz/20121127/photo2-resize.jpg
Would be curious if anyone had any suggestions besides removing the
controller card?
--
==> Further issues with the X9SRL-F -- does this board support ASPM or is
this a Linux/ASPM implementation issue?
[ 0.632170] pci0000:ff: ACPI _OSC support notification failed, disabling
PCIe ASPM
[ 0.632239] pci0000:ff: Unable to request _OSC control (_OSC support
mask: 0x08)
Justin.
> It is _not_ working on the:
> 2) Supermicro X8DTH-F (the boot drive in this system is running off a
PCI-e
> card, could the IRQ for the I/O controller be getting re-mapped and
fail?)--
> worse case I can move the SSD from the 6.0gbpa SATA card to the
motherboard
> and see if that works, but that kind of defeats the purpose of a 6.0gbps
> SATA SSD.
When IOMMU is disabled, I/OAT DMA is successful on the second motherboard
(X8DTH-6F).
Specifically:
--- DMA Engine support
[*] Intel I/OAT DMA support
[*] Network: TCP receive copy offload
[*] Async_tx: Offload support for the async_tx api
When IOMMU/X2APIC is enabled on the X8DTH-6F it fails to boot.
Will keep doing more testing to see if I get anywhere w/regards to the
IOMMU.
Proof of success:
[ 0.757467] ioatdma: Intel(R) QuickData Technology Driver 4.00
[ 0.757690] ioatdma 0000:00:16.0: irq 88 for MSI/MSI-X
[ 0.757948] ioatdma 0000:00:16.1: irq 89 for MSI/MSI-X
[ 0.758166] ioatdma 0000:00:16.2: irq 90 for MSI/MSI-X
[ 0.758377] ioatdma 0000:00:16.3: irq 91 for MSI/MSI-X
[ 0.758577] ioatdma 0000:00:16.4: irq 92 for MSI/MSI-X
[ 0.758794] ioatdma 0000:00:16.5: irq 93 for MSI/MSI-X
[ 0.759000] ioatdma 0000:00:16.6: irq 94 for MSI/MSI-X
[ 0.759214] ioatdma 0000:00:16.7: irq 95 for MSI/MSI-X
[ 0.759461] ioatdma 0000:80:16.0: irq 96 for MSI/MSI-X
[ 0.759720] ioatdma 0000:80:16.1: irq 97 for MSI/MSI-X
[ 0.759963] ioatdma 0000:80:16.2: irq 98 for MSI/MSI-X
[ 0.760190] ioatdma 0000:80:16.3: irq 99 for MSI/MSI-X
[ 0.760414] ioatdma 0000:80:16.4: irq 100 for MSI/MSI-X
[ 0.760630] ioatdma 0000:80:16.5: irq 101 for MSI/MSI-X
[ 0.760862] ioatdma 0000:80:16.6: irq 102 for MSI/MSI-X
[ 0.761081] ioatdma 0000:80:16.7: irq 103 for MSI/MSI-X
--
==> Further issues with the X9SRL-F -- does this board support ASPM or is
this a Linux/ASPM implementation issue?
[ 0.632170] pci0000:ff: ACPI _OSC support notification failed, disabling
PCIe ASPM
[ 0.632239] pci0000:ff: Unable to request _OSC control (_OSC support
mask: 0x08)
Justin.
-----Original Message-----
From: Justin Piszcz [mailto:[email protected]]
Sent: Tuesday, November 27, 2012 8:56 AM
To: 'Bjorn Helgaas'
Cc: 'Bruno Pr?mont'; [email protected]; [email protected];
'Dan Williams'
Subject: RE: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware
bug question
> It is _not_ working on the:
> 2) Supermicro X8DTH-F (the boot drive in this system is running off a
PCI-e
> card, could the IRQ for the I/O controller be getting re-mapped and
fail?)--
> worse case I can move the SSD from the 6.0gbpa SATA card to the
motherboard
> and see if that works, but that kind of defeats the purpose of a 6.0gbps
> SATA SSD.
When I removed the Highpoint 2-port SATA card and plugged it into the
motherboard, the system boots (plugged the SSD into the motherboard).
So if you use a HIGHPOINT 2-PORT SATA 6.0gbps card, do NOT enable IOMMU or
it will fail to initialize the Highpoint 2-port SATA controller card!
I also tried upgrading the BIOS (of the mobo, no diff)
I also tried just leaving the SATA card in and plugging it into the
motherboard (no diff)
Removed the Highpoint 2-port SATA card and then success, it would be nice to
use that card with IOMMU support though, is it just not compatible
(marvell-problem?) or is a driver bug? Based on the pictures/etc sent
earlier?
$ dmesg|grep -i iommu
[ 0.055134] dmar: IOMMU 0: reg_base_addr cfdfe000 ver 1:0 cap
c90780106f0462 ecap f020f6
[ 0.055396] dmar: IOMMU 1: reg_base_addr fecfe000 ver 1:0 cap
c90780106f0462 ecap f020f6
[ 0.760665] IOMMU 0 0xcfdfe000: using Queued invalidation
[ 0.760803] IOMMU 1 0xfecfe000: using Queued invalidation
[ 0.760937] IOMMU: Setting RMRR:
[ 0.761102] IOMMU: Setting identity map for device 0000:00:1d.0
[0xbf7ec000 - 0xbf7fffff]
[ 0.761329] IOMMU: Setting identity map for device 0000:00:1d.1
[0xbf7ec000 - 0xbf7fffff]
[ 0.761542] IOMMU: Setting identity map for device 0000:00:1d.2
[0xbf7ec000 - 0xbf7fffff]
[ 0.761758] IOMMU: Setting identity map for device 0000:00:1d.7
[0xbf7ec000 - 0xbf7fffff]
[ 0.761974] IOMMU: Setting identity map for device 0000:00:1a.0
[0xbf7ec000 - 0xbf7fffff]
[ 0.762190] IOMMU: Setting identity map for device 0000:00:1a.1
[0xbf7ec000 - 0xbf7fffff]
[ 0.762407] IOMMU: Setting identity map for device 0000:00:1a.2
[0xbf7ec000 - 0xbf7fffff]
[ 0.762620] IOMMU: Setting identity map for device 0000:00:1a.7
[0xbf7ec000 - 0xbf7fffff]
[ 0.762816] IOMMU: Setting identity map for device 0000:00:1d.0 [0xec000
- 0xeffff]
[ 0.763010] IOMMU: Setting identity map for device 0000:00:1d.1 [0xec000
- 0xeffff]
[ 0.763197] IOMMU: Setting identity map for device 0000:00:1d.2 [0xec000
- 0xeffff]
[ 0.763382] IOMMU: Setting identity map for device 0000:00:1d.7 [0xec000
- 0xeffff]
[ 0.763567] IOMMU: Setting identity map for device 0000:00:1a.0 [0xec000
- 0xeffff]
[ 0.763749] IOMMU: Setting identity map for device 0000:00:1a.1 [0xec000
- 0xeffff]
[ 0.763934] IOMMU: Setting identity map for device 0000:00:1a.2 [0xec000
- 0xeffff]
[ 0.764127] IOMMU: Setting identity map for device 0000:00:1a.7 [0xec000
- 0xeffff]
[ 0.764311] IOMMU: Prepare 0-16MiB unity mapping for LPC
[ 0.764465] IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 -
0xffffff]
--
==> Further issues with the X9SRL-F -- does this board support ASPM or is
this a Linux/ASPM implementation issue?
[ 0.632170] pci0000:ff: ACPI _OSC support notification failed, disabling
PCIe ASPM
[ 0.632239] pci0000:ff: Unable to request _OSC control (_OSC support
mask: 0x08)
Justin.
On Tue, Nov 27, 2012 at 6:49 AM, Justin Piszcz <[email protected]> wrote:
>
>> It looks like maybe you don't have CONFIG_PCI_MMCONFIG turned on?
>
> ===> FOR I/OAT DMA
> Latest status, it _appears_ its working on the X9SRL-F now, thank you!
>
> 1) Supermicro X9SRL-F (GOOD)
> [ 0.738510] ioatdma: Intel(R) QuickData Technology Driver 4.00
> [ 0.738719] ioatdma 0000:00:04.0: irq 75 for MSI/MSI-X
> [ 0.739088] ioatdma 0000:00:04.1: irq 76 for MSI/MSI-X
> [ 0.739408] ioatdma 0000:00:04.2: irq 77 for MSI/MSI-X
> [ 0.739739] ioatdma 0000:00:04.3: irq 78 for MSI/MSI-X
> [ 0.740040] ioatdma 0000:00:04.4: irq 79 for MSI/MSI-X
> [ 0.740342] ioatdma 0000:00:04.5: irq 80 for MSI/MSI-X
> [ 0.740670] ioatdma 0000:00:04.6: irq 81 for MSI/MSI-X
> [ 0.740971] ioatdma 0000:00:04.7: irq 82 for MSI/MSI-X
Good. You have two issues, and I'm going to separate them and only
address the first one here. I opened a bug report [1] against the
IOAT driver. It should do something more useful when
CONFIG_PCI_MMCONFIG=n so we don't have to debug this again in the
future. But otherwise, it sounds like this issue is resolved.
[1] https://bugzilla.kernel.org/show_bug.cgi?id=51101
On Tue, Nov 27, 2012 at 7:35 AM, Justin Piszcz <[email protected]> wrote:
>
>
> -----Original Message-----
> From: Justin Piszcz [mailto:[email protected]]
> Sent: Tuesday, November 27, 2012 8:56 AM
> To: 'Bjorn Helgaas'
> Cc: 'Bruno Pr?mont'; [email protected]; [email protected];
> 'Dan Williams'
> Subject: RE: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware
> bug question
>
>
>> It is _not_ working on the:
>
>> 2) Supermicro X8DTH-F (the boot drive in this system is running off a
> PCI-e
>> card, could the IRQ for the I/O controller be getting re-mapped and
> fail?)--
>> worse case I can move the SSD from the 6.0gbpa SATA card to the
> motherboard
>> and see if that works, but that kind of defeats the purpose of a 6.0gbps
>> SATA SSD.
>
> When I removed the Highpoint 2-port SATA card and plugged it into the
> motherboard, the system boots (plugged the SSD into the motherboard).
> So if you use a HIGHPOINT 2-PORT SATA 6.0gbps card, do NOT enable IOMMU or
> it will fail to initialize the Highpoint 2-port SATA controller card!
> I also tried upgrading the BIOS (of the mobo, no diff)
> I also tried just leaving the SATA card in and plugging it into the
> motherboard (no diff)
> Removed the Highpoint 2-port SATA card and then success, it would be nice to
> use that card with IOMMU support though, is it just not compatible
> (marvell-problem?) or is a driver bug? Based on the pictures/etc sent
> earlier?
I would guess this is a core bug, but it's hard to tell without more
information.
If you boot with "intel_iommu=off", I would guess the Highpoint card
would work (this should have the same effect as turning off
CONFIG_INTEL_IOMMU). I'd like to compare the complete dmesg log for
that boot with the one that fails.
It sounds like it might be hard to collect the log for the failing
case -- you said the boot fails when the Highpoint card is in the
system even if the SSD is connected to the motherboard instead of the
Highpoint card. The panic in the photo2 image looks like it's just a
failure to mount the root filesystem, which is what I'd expect if we
can't find the SSD. It seems like we ought to be able to *boot* with
the SSD connected to the motherboard, even if the Highpoint card
doesn't work. But worst-case, a video of the failing boot might be
enough, especially if you can slow it down with "boot_delay="
> $ dmesg|grep -i iommu
> [ 0.055134] dmar: IOMMU 0: reg_base_addr cfdfe000 ver 1:0 cap
> c90780106f0462 ecap f020f6
> [ 0.055396] dmar: IOMMU 1: reg_base_addr fecfe000 ver 1:0 cap
> c90780106f0462 ecap f020f6
> [ 0.760665] IOMMU 0 0xcfdfe000: using Queued invalidation
> [ 0.760803] IOMMU 1 0xfecfe000: using Queued invalidation
> [ 0.760937] IOMMU: Setting RMRR:
> [ 0.761102] IOMMU: Setting identity map for device 0000:00:1d.0
> [0xbf7ec000 - 0xbf7fffff]
> [ 0.761329] IOMMU: Setting identity map for device 0000:00:1d.1
> [0xbf7ec000 - 0xbf7fffff]
> [ 0.761542] IOMMU: Setting identity map for device 0000:00:1d.2
> [0xbf7ec000 - 0xbf7fffff]
> [ 0.761758] IOMMU: Setting identity map for device 0000:00:1d.7
> [0xbf7ec000 - 0xbf7fffff]
> [ 0.761974] IOMMU: Setting identity map for device 0000:00:1a.0
> [0xbf7ec000 - 0xbf7fffff]
> [ 0.762190] IOMMU: Setting identity map for device 0000:00:1a.1
> [0xbf7ec000 - 0xbf7fffff]
> [ 0.762407] IOMMU: Setting identity map for device 0000:00:1a.2
> [0xbf7ec000 - 0xbf7fffff]
> [ 0.762620] IOMMU: Setting identity map for device 0000:00:1a.7
> [0xbf7ec000 - 0xbf7fffff]
> [ 0.762816] IOMMU: Setting identity map for device 0000:00:1d.0 [0xec000
> - 0xeffff]
> [ 0.763010] IOMMU: Setting identity map for device 0000:00:1d.1 [0xec000
> - 0xeffff]
> [ 0.763197] IOMMU: Setting identity map for device 0000:00:1d.2 [0xec000
> - 0xeffff]
> [ 0.763382] IOMMU: Setting identity map for device 0000:00:1d.7 [0xec000
> - 0xeffff]
> [ 0.763567] IOMMU: Setting identity map for device 0000:00:1a.0 [0xec000
> - 0xeffff]
> [ 0.763749] IOMMU: Setting identity map for device 0000:00:1a.1 [0xec000
> - 0xeffff]
> [ 0.763934] IOMMU: Setting identity map for device 0000:00:1a.2 [0xec000
> - 0xeffff]
> [ 0.764127] IOMMU: Setting identity map for device 0000:00:1a.7 [0xec000
> - 0xeffff]
> [ 0.764311] IOMMU: Prepare 0-16MiB unity mapping for LPC
> [ 0.764465] IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 -
> 0xffffff]
>
> --
>
>
> ==> Further issues with the X9SRL-F -- does this board support ASPM or is
> this a Linux/ASPM implementation issue?
> [ 0.632170] pci0000:ff: ACPI _OSC support notification failed, disabling
> PCIe ASPM
> [ 0.632239] pci0000:ff: Unable to request _OSC control (_OSC support
> mask: 0x08)
I'm going to ignore this issue for the time being. I know we complain
about this on many machines, and I don't know whether it's a real
problem or just an overly alarming message.
Bjorn
On 11/27/2012 07:49 AM, Justin Piszcz wrote:
>
>> It looks like maybe you don't have CONFIG_PCI_MMCONFIG turned on?
>
> ===> FOR I/OAT DMA
> Latest status, it _appears_ its working on the X9SRL-F now, thank you!
>
> 1) Supermicro X9SRL-F (GOOD)
> [ 0.738510] ioatdma: Intel(R) QuickData Technology Driver 4.00
> [ 0.738719] ioatdma 0000:00:04.0: irq 75 for MSI/MSI-X
> [ 0.739088] ioatdma 0000:00:04.1: irq 76 for MSI/MSI-X
> [ 0.739408] ioatdma 0000:00:04.2: irq 77 for MSI/MSI-X
> [ 0.739739] ioatdma 0000:00:04.3: irq 78 for MSI/MSI-X
> [ 0.740040] ioatdma 0000:00:04.4: irq 79 for MSI/MSI-X
> [ 0.740342] ioatdma 0000:00:04.5: irq 80 for MSI/MSI-X
> [ 0.740670] ioatdma 0000:00:04.6: irq 81 for MSI/MSI-X
> [ 0.740971] ioatdma 0000:00:04.7: irq 82 for MSI/MSI-X
>
> It is _not_ working on the:
>
> 2) Supermicro X8DTH-F (the boot drive in this system is running off a PCI-e
> card, could the IRQ for the I/O controller be getting re-mapped and fail?)--
> worse case I can move the SSD from the 6.0gbpa SATA card to the motherboard
> and see if that works, but that kind of defeats the purpose of a 6.0gbps
> SATA SSD.
>
> (Fails to talk to the SSD)
> http://home.comcast.net/~jpiszcz/20121127/photo1-resize.jpg
>
> (then, a few moments later: Kernel panic)
> http://home.comcast.net/~jpiszcz/20121127/photo2-resize.jpg
>
> Would be curious if anyone had any suggestions besides removing the
> controller card?
What does lspci -vv show on that controller? Not sure what actual
chipset that controller is, but there's a known issue with some Marvell
6Gbps SATA controllers with DMAR enabled - it seems the device issues
memory read/write requests from the wrong PCI function ID and the IOMMU
rightly denies access as the function listed in the requests doesn't
have any mapping to that memory. I don't think there's presently a
workaround other than disabling DMAR. We could (and likely should) be
detecting that device and adding some kind of quirk for it.
>
> --
>
>
> ==> Further issues with the X9SRL-F -- does this board support ASPM or is
> this a Linux/ASPM implementation issue?
> [ 0.632170] pci0000:ff: ACPI _OSC support notification failed, disabling
> PCIe ASPM
> [ 0.632239] pci0000:ff: Unable to request _OSC control (_OSC support
> mask: 0x08)
What's the full dmesg from this machine (or is it already posted somewhere)?
-----Original Message-----
From: Bjorn Helgaas [mailto:[email protected]]
Sent: Wednesday, November 28, 2012 6:54 PM
To: Justin Piszcz
Cc: Bruno Pr?mont; [email protected]; [email protected]; Dan
Williams
Subject: Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware
bug question
On Tue, Nov 27, 2012 at 6:49 AM, Justin Piszcz <[email protected]>
wrote:
>
>> It looks like maybe you don't have CONFIG_PCI_MMCONFIG turned on?
>
> ===> FOR I/OAT DMA
> Latest status, it _appears_ its working on the X9SRL-F now, thank you!
>
> 1) Supermicro X9SRL-F (GOOD)
> [ 0.738510] ioatdma: Intel(R) QuickData Technology Driver 4.00
> [ 0.738719] ioatdma 0000:00:04.0: irq 75 for MSI/MSI-X
> [ 0.739088] ioatdma 0000:00:04.1: irq 76 for MSI/MSI-X
> [ 0.739408] ioatdma 0000:00:04.2: irq 77 for MSI/MSI-X
> [ 0.739739] ioatdma 0000:00:04.3: irq 78 for MSI/MSI-X
> [ 0.740040] ioatdma 0000:00:04.4: irq 79 for MSI/MSI-X
> [ 0.740342] ioatdma 0000:00:04.5: irq 80 for MSI/MSI-X
> [ 0.740670] ioatdma 0000:00:04.6: irq 81 for MSI/MSI-X
> [ 0.740971] ioatdma 0000:00:04.7: irq 82 for MSI/MSI-X
Good. You have two issues, and I'm going to separate them and only
address the first one here. I opened a bug report [1] against the
IOAT driver. It should do something more useful when
CONFIG_PCI_MMCONFIG=n so we don't have to debug this again in the
future. But otherwise, it sounds like this issue is resolved.
[1] https://bugzilla.kernel.org/show_bug.cgi?id=51101
--
Yes--(agree w/ config option) Thank you!
Justin.
-----Original Message-----
From: Bjorn Helgaas [mailto:[email protected]]
Sent: Wednesday, November 28, 2012 7:09 PM
To: Justin Piszcz
Cc: Bruno Pr?mont; [email protected]; [email protected]; Dan
Williams
Subject: Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware
bug question
On Tue, Nov 27, 2012 at 7:35 AM, Justin Piszcz <[email protected]>
wrote:
>
>
> -----Original Message-----
> From: Justin Piszcz [mailto:[email protected]]
> Sent: Tuesday, November 27, 2012 8:56 AM
> To: 'Bjorn Helgaas'
> Cc: 'Bruno Pr?mont'; [email protected]; [email protected];
> 'Dan Williams'
> Subject: RE: Supermicro X9SRL-F - channel enumeration error &
ACPI/firmware
> bug question
>
>
>> It is _not_ working on the:
>
>> 2) Supermicro X8DTH-F (the boot drive in this system is running off a
> PCI-e
>> card, could the IRQ for the I/O controller be getting re-mapped and
> fail?)--
>> worse case I can move the SSD from the 6.0gbpa SATA card to the
> motherboard
>> and see if that works, but that kind of defeats the purpose of a 6.0gbps
>> SATA SSD.
>
> When I removed the Highpoint 2-port SATA card and plugged it into the
> motherboard, the system boots (plugged the SSD into the motherboard).
> So if you use a HIGHPOINT 2-PORT SATA 6.0gbps card, do NOT enable IOMMU or
> it will fail to initialize the Highpoint 2-port SATA controller card!
> I also tried upgrading the BIOS (of the mobo, no diff)
> I also tried just leaving the SATA card in and plugging it into the
> motherboard (no diff)
> Removed the Highpoint 2-port SATA card and then success, it would be nice
to
> use that card with IOMMU support though, is it just not compatible
> (marvell-problem?) or is a driver bug? Based on the pictures/etc sent
> earlier?
I would guess this is a core bug, but it's hard to tell without more
information.
If you boot with "intel_iommu=off", I would guess the Highpoint card
would work (this should have the same effect as turning off
CONFIG_INTEL_IOMMU). I'd like to compare the complete dmesg log for
that boot with the one that fails.
It sounds like it might be hard to collect the log for the failing
case -- you said the boot fails when the Highpoint card is in the
system even if the SSD is connected to the motherboard instead of the
Highpoint card. The panic in the photo2 image looks like it's just a
failure to mount the root filesystem, which is what I'd expect if we
can't find the SSD. It seems like we ought to be able to *boot* with
the SSD connected to the motherboard, even if the Highpoint card
doesn't work. But worst-case, a video of the failing boot might be
enough, especially if you can slow it down with "boot_delay="
--
SUMMARY: Card fails with iommu support in the kernel: (but system does now
boot (3.6.8) with the card in as long as the system disk isn't attached to
it, not sure what was wrong earlier).
It seems to be working now:
=> SSD on motherboard
=> PCI-e card (highpoint in the system but not used, no disks attached)
(After I enabled nouveau, not sure that has anything to do with it) I put
the card in, and it errors as usual but the SSD now on the motherboard it
does boot successfully.
Here are the errors from the kernel trying to initialize the board with
iommu enabled (retrieved via netconsole) also picture below (w/help from
boot_delay=100 && nouveau enabled):
http://home.comcast.net/~jpiszcz/20121128/highpoint.jpg
Nov 28 19:30:16 p34 [ 7.771060] ata14.00: qc timeout (cmd 0xa1)
Nov 28 19:30:16 p34 [ 8.270153] ata14.00: failed to IDENTIFY (I/O error,
err_mask=0x4)
Nov 28 19:30:17 p34 [ 9.073935] ata14: SATA link up 1.5 Gbps (SStatus 113
SControl 300)
Nov 28 19:30:27 p34 [ 19.058915] ata14.00: qc timeout (cmd 0xa1)
Nov 28 19:30:28 p34 [ 19.557885] ata14.00: failed to IDENTIFY (I/O error,
err_mask=0x4)
Nov 28 19:30:28 p34 [ 19.558478] ata14: limiting SATA link speed to 1.5
Gbps
Nov 28 19:30:29 p34 [ 20.363658] ata14: SATA link up 1.5 Gbps (SStatus 113
SControl 310)
Nov 28 19:30:48 p34 [ 39.568234] dmar: DRHD: handling fault status reg 502
Nov 28 19:30:48 p34 [ 39.571508] dmar: DMAR:[DMA Read] Request device
[04:00.0] fault addr 0 [ 39.571508] DMAR:[fault reason 06] PTE Read
access is not set
Nov 28 19:30:59 p34 [ 50.318146] ata14.00: qc timeout (cmd 0xa1)
Nov 28 19:30:59 p34 [ 50.818061] ata14.00: failed to IDENTIFY (I/O error,
err_mask=0x4)
Nov 28 19:31:00 p34 [ 51.621827] ata14: SATA link up 1.5 Gbps (SStatus 113
SControl 310)
Justin.
-----Original Message-----
From: Robert Hancock [mailto:[email protected]]
Sent: Wednesday, November 28, 2012 7:35 PM
To: Justin Piszcz
Cc: 'Bjorn Helgaas'; 'Bruno Pr?mont'; [email protected];
[email protected]; 'Dan Williams'
Subject: Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware
bug question
What does lspci -vv show on that controller? Not sure what actual
chipset that controller is, but there's a known issue with some Marvell
6Gbps SATA controllers with DMAR enabled - it seems the device issues
memory read/write requests from the wrong PCI function ID and the IOMMU
rightly denies access as the function listed in the requests doesn't
have any mapping to that memory. I don't think there's presently a
workaround other than disabling DMAR. We could (and likely should) be
detecting that device and adding some kind of quirk for it.
That sounds likely...
It is shown below:
Card name: HighPoint Rocket 620 Dual Port SATA 6 Gbps PCI Express 2.0 Host
Adapter
lspci -vv output:
84:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9123 PCIe SATA
6.0 Gb/s controller (rev 11) (prog-if 01 [AHCI 1.0])
Subsystem: Marvell Technology Group Ltd. 88SE9123 PCIe SATA 6.0 Gb/s
controller
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 256 bytes
Interrupt: pin A routed to IRQ 119
Region 0: I/O ports at e000 [size=8]
Region 1: I/O ports at dc00 [size=4]
Region 2: I/O ports at ec00 [size=8]
Region 3: I/O ports at e800 [size=4]
Region 4: I/O ports at e400 [size=16]
Region 5: Memory at cfeee000 (32-bit, non-prefetchable) [size=2K]
Expansion ROM at cfef0000 [disabled] [size=64K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot+,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit-
Address: fee20000 Data: 4076
Capabilities: [70] Express (v2) Legacy Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 <512ns,
L1 <64us
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt-
ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis+
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-,
Selectable De-emphasis: -6dB
Transmit Margin: Normal Operating Range, EnterModifiedCompliance-
ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-,
EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF-
MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF-
MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+
MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
Kernel driver in use: ahci
>
> --
>
>
> ==> Further issues with the X9SRL-F -- does this board support ASPM or is
> this a Linux/ASPM implementation issue?
> [ 0.632170] pci0000:ff: ACPI _OSC support notification failed,
disabling
> PCIe ASPM
> [ 0.632239] pci0000:ff: Unable to request _OSC control (_OSC support
> mask: 0x08)
What's the full dmesg from this machine (or is it already posted somewhere)?
It is now available here:
http://home.comcast.net/~jpiszcz/20121128/dmesg.txt
Justin.
On Wed, Nov 28, 2012 at 6:49 PM, Justin Piszcz <[email protected]> wrote:
>
>
> -----Original Message-----
> From: Robert Hancock [mailto:[email protected]]
> Sent: Wednesday, November 28, 2012 7:35 PM
> To: Justin Piszcz
> Cc: 'Bjorn Helgaas'; 'Bruno Pr?mont'; [email protected];
> [email protected]; 'Dan Williams'
> Subject: Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware
> bug question
>
>
> What does lspci -vv show on that controller? Not sure what actual
> chipset that controller is, but there's a known issue with some Marvell
> 6Gbps SATA controllers with DMAR enabled - it seems the device issues
> memory read/write requests from the wrong PCI function ID and the IOMMU
> rightly denies access as the function listed in the requests doesn't
> have any mapping to that memory. I don't think there's presently a
> workaround other than disabling DMAR. We could (and likely should) be
> detecting that device and adding some kind of quirk for it.
>
> That sounds likely...
> It is shown below:
>
> Card name: HighPoint Rocket 620 Dual Port SATA 6 Gbps PCI Express 2.0 Host
> Adapter
>
> lspci -vv output:
>
> 84:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9123 PCIe SATA
> 6.0 Gb/s controller (rev 11) (prog-if 01 [AHCI 1.0])
> Subsystem: Marvell Technology Group Ltd. 88SE9123 PCIe SATA 6.0 Gb/s
> controller
Yeah, that's one of those controllers I think. But I can't tell from
the bit of the dmesg you posted exactly what's going on. Can you post
a full boot log from having the card installed and some drive attached
(by putting the boot drive on another controller for example)?
>> ==> Further issues with the X9SRL-F -- does this board support ASPM or is
>> this a Linux/ASPM implementation issue?
>> [ 0.632170] pci0000:ff: ACPI _OSC support notification failed,
> disabling
>> PCIe ASPM
>> [ 0.632239] pci0000:ff: Unable to request _OSC control (_OSC support
>> mask: 0x08)
>
> What's the full dmesg from this machine (or is it already posted somewhere)?
>
> It is now available here:
> http://home.comcast.net/~jpiszcz/20121128/dmesg.txt
Is that the same boot log? It doesn't have this error in it.
-----Original Message-----
From: Robert Hancock [mailto:[email protected]]
Sent: Wednesday, November 28, 2012 7:55 PM
To: Justin Piszcz
Cc: Bjorn Helgaas; Bruno Pr?mont; [email protected];
[email protected]; Dan Williams
Subject: Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware
bug question
On Wed, Nov 28, 2012 at 6:49 PM, Justin Piszcz <[email protected]>
wrote:
>
>
> -----Original Message-----
> From: Robert Hancock [mailto:[email protected]]
> Sent: Wednesday, November 28, 2012 7:35 PM
> To: Justin Piszcz
> Cc: 'Bjorn Helgaas'; 'Bruno Pr?mont'; [email protected];
> [email protected]; 'Dan Williams'
> Subject: Re: Supermicro X9SRL-F - channel enumeration error &
ACPI/firmware
> bug question
>
>
> What does lspci -vv show on that controller? Not sure what actual
> chipset that controller is, but there's a known issue with some Marvell
> 6Gbps SATA controllers with DMAR enabled - it seems the device issues
> memory read/write requests from the wrong PCI function ID and the IOMMU
> rightly denies access as the function listed in the requests doesn't
> have any mapping to that memory. I don't think there's presently a
> workaround other than disabling DMAR. We could (and likely should) be
> detecting that device and adding some kind of quirk for it.
>
> That sounds likely...
> It is shown below:
>
> Card name: HighPoint Rocket 620 Dual Port SATA 6 Gbps PCI Express 2.0 Host
> Adapter
>
> lspci -vv output:
>
> 84:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9123 PCIe SATA
> 6.0 Gb/s controller (rev 11) (prog-if 01 [AHCI 1.0])
> Subsystem: Marvell Technology Group Ltd. 88SE9123 PCIe SATA 6.0 Gb/s
> controller
Yeah, that's one of those controllers I think. But I can't tell from
the bit of the dmesg you posted exactly what's going on. Can you post
a full boot log from having the card installed and some drive attached
(by putting the boot drive on another controller for example)?
>> ==> Further issues with the X9SRL-F -- does this board support ASPM or is
>> this a Linux/ASPM implementation issue?
>> [ 0.632170] pci0000:ff: ACPI _OSC support notification failed,
> disabling
>> PCIe ASPM
>> [ 0.632239] pci0000:ff: Unable to request _OSC control (_OSC support
>> mask: 0x08)
>
> What's the full dmesg from this machine (or is it already posted
somewhere)?
>
> It is now available here:
> http://home.comcast.net/~jpiszcz/20121128/dmesg.txt
> Is that the same boot log? It doesn't have this error in it.
Yes, the error is here: (its towards the bottom)
[ 7.973015] ata14.00: qc timeout (cmd 0xa1)
[ 8.472120] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 9.275922] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 19.260667] ata14.00: qc timeout (cmd 0xa1)
[ 19.759828] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 19.760451] ata14: limiting SATA link speed to 1.5 Gbps
[ 20.566598] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 50.521078] ata14.00: qc timeout (cmd 0xa1)
[ 51.020880] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 51.824664] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 51.824682] dmar: DRHD: handling fault status reg 502
[ 51.824686] dmar: DMAR:[DMA Read] Request device [04:00.0] fault addr 0
[ 51.824686] DMAR:[fault reason 06] PTE Read access is not set
[ 52.338871] EXT3-fs (sdb2): error: couldn't mount because of unsupported
optional features (240)
[ 52.348938] EXT2-fs (sdb2): error: couldn't mount because of unsupported
optional features (240)
[ 52.360314] EXT4-fs (sdb2): mounted filesystem with ordered data mode.
Opts: (null)
The system does not boot when the SSD is on that SATA controller.
The error we were trying to get earlier (kernel panic)-- I cannot reproduce
that anymore after adding nouveau for whatever reason.
So to re-cap it boots now with nothing connected to the controller but the
controller is non-workable/useless, as shown above.
When you put the SSD on it, it cannot mount rootfs.
Justin.
On Thu, Nov 29, 2012 at 1:55 AM, Justin Piszcz <[email protected]> wrote:
>
>
> -----Original Message-----
> From: Robert Hancock [mailto:[email protected]]
> Sent: Wednesday, November 28, 2012 7:55 PM
> To: Justin Piszcz
> Cc: Bjorn Helgaas; Bruno Pr?mont; [email protected];
> [email protected]; Dan Williams
> Subject: Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware
> bug question
>
> On Wed, Nov 28, 2012 at 6:49 PM, Justin Piszcz <[email protected]>
> wrote:
>>
>>
>> -----Original Message-----
>> From: Robert Hancock [mailto:[email protected]]
>> Sent: Wednesday, November 28, 2012 7:35 PM
>> To: Justin Piszcz
>> Cc: 'Bjorn Helgaas'; 'Bruno Pr?mont'; [email protected];
>> [email protected]; 'Dan Williams'
>> Subject: Re: Supermicro X9SRL-F - channel enumeration error &
> ACPI/firmware
>> bug question
>>
>>
>> What does lspci -vv show on that controller? Not sure what actual
>> chipset that controller is, but there's a known issue with some Marvell
>> 6Gbps SATA controllers with DMAR enabled - it seems the device issues
>> memory read/write requests from the wrong PCI function ID and the IOMMU
>> rightly denies access as the function listed in the requests doesn't
>> have any mapping to that memory. I don't think there's presently a
>> workaround other than disabling DMAR. We could (and likely should) be
>> detecting that device and adding some kind of quirk for it.
>>
>> That sounds likely...
>> It is shown below:
>>
>> Card name: HighPoint Rocket 620 Dual Port SATA 6 Gbps PCI Express 2.0 Host
>> Adapter
>>
>> lspci -vv output:
>>
>> 84:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9123 PCIe SATA
>> 6.0 Gb/s controller (rev 11) (prog-if 01 [AHCI 1.0])
>> Subsystem: Marvell Technology Group Ltd. 88SE9123 PCIe SATA 6.0 Gb/s
>> controller
>
> Yeah, that's one of those controllers I think. But I can't tell from
> the bit of the dmesg you posted exactly what's going on. Can you post
> a full boot log from having the card installed and some drive attached
> (by putting the boot drive on another controller for example)?
>
>>> ==> Further issues with the X9SRL-F -- does this board support ASPM or is
>>> this a Linux/ASPM implementation issue?
>>> [ 0.632170] pci0000:ff: ACPI _OSC support notification failed,
>> disabling
>>> PCIe ASPM
>>> [ 0.632239] pci0000:ff: Unable to request _OSC control (_OSC support
>>> mask: 0x08)
>>
>> What's the full dmesg from this machine (or is it already posted
> somewhere)?
>>
>> It is now available here:
>> http://home.comcast.net/~jpiszcz/20121128/dmesg.txt
>
>> Is that the same boot log? It doesn't have this error in it.
>
> Yes, the error is here: (its towards the bottom)
>
> [ 7.973015] ata14.00: qc timeout (cmd 0xa1)
> [ 8.472120] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> [ 9.275922] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> [ 19.260667] ata14.00: qc timeout (cmd 0xa1)
> [ 19.759828] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> [ 19.760451] ata14: limiting SATA link speed to 1.5 Gbps
> [ 20.566598] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> [ 50.521078] ata14.00: qc timeout (cmd 0xa1)
> [ 51.020880] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> [ 51.824664] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> [ 51.824682] dmar: DRHD: handling fault status reg 502
> [ 51.824686] dmar: DMAR:[DMA Read] Request device [04:00.0] fault addr 0
> [ 51.824686] DMAR:[fault reason 06] PTE Read access is not set
You have these devices:
pci 0000:04:00.0: [10de:01d3] type 00 class 0x030000 nVidia G72
pci 0000:84:00.0: [1b4b:9123] type 00 class 0x010601 Marvell 88SE9123 SATA
pci 0000:84:00.1: [1b4b:91a4] type 00 class 0x01018f Marvell 88SE9128 IDE
I think the 04:00.0 DMAR errors are symptoms of nouveau driver issues,
and if you get rid of that driver, they'll probably go away.
But this 84:00.1 DMAR error:
dmar: DMAR:[DMA Read] Request device [84:00.1] fault addr fff00000
DMAR:[fault reason 02] Present bit in context entry is clear
looks like the probable cause of the Marvell issue. It looks similar
to https://bugzilla.kernel.org/show_bug.cgi?id=42679, although the
reports there show a bb:dd.0 device (but no bb:dd.1 device), and the
DMAR rejects DMA that appears to be from bb:dd.1.
Another report that's even more similar is
https://bugzilla.redhat.com/show_bug.cgi?id=757166 . In that case,
both bb:dd.0 and bb:dd.1 exist (as in your system), and the DMAR fault
is exactly like what you're seeing.
So you're not alone, but unfortunately, nobody seems to be working on
either bug report. I took the liberty to add you to the cc: list of
both.
I don't really know what else to do at this point. Maybe a SATA
expert with some Marvell docs could figure out why we're seeing DMA
from the IDE controller, but I'm not that person :)
> [ 52.338871] EXT3-fs (sdb2): error: couldn't mount because of unsupported
> optional features (240)
> [ 52.348938] EXT2-fs (sdb2): error: couldn't mount because of unsupported
> optional features (240)
> [ 52.360314] EXT4-fs (sdb2): mounted filesystem with ordered data mode.
> Opts: (null)
>
> The system does not boot when the SSD is on that SATA controller.
> The error we were trying to get earlier (kernel panic)-- I cannot reproduce
> that anymore after adding nouveau for whatever reason.
> So to re-cap it boots now with nothing connected to the controller but the
> controller is non-workable/useless, as shown above.
> When you put the SSD on it, it cannot mount rootfs.
>
> Justin.
>
>
On Thu, Nov 29, 2012 at 12:16 PM, Bjorn Helgaas <[email protected]> wrote:
> On Thu, Nov 29, 2012 at 1:55 AM, Justin Piszcz <[email protected]> wrote:
>>
>>
>> -----Original Message-----
>> From: Robert Hancock [mailto:[email protected]]
>> Sent: Wednesday, November 28, 2012 7:55 PM
>> To: Justin Piszcz
>> Cc: Bjorn Helgaas; Bruno Pr?mont; [email protected];
>> [email protected]; Dan Williams
>> Subject: Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware
>> bug question
>>
>> On Wed, Nov 28, 2012 at 6:49 PM, Justin Piszcz <[email protected]>
>> wrote:
>>>
>>>
>>> -----Original Message-----
>>> From: Robert Hancock [mailto:[email protected]]
>>> Sent: Wednesday, November 28, 2012 7:35 PM
>>> To: Justin Piszcz
>>> Cc: 'Bjorn Helgaas'; 'Bruno Pr?mont'; [email protected];
>>> [email protected]; 'Dan Williams'
>>> Subject: Re: Supermicro X9SRL-F - channel enumeration error &
>> ACPI/firmware
>>> bug question
>>>
>>>
>>> What does lspci -vv show on that controller? Not sure what actual
>>> chipset that controller is, but there's a known issue with some Marvell
>>> 6Gbps SATA controllers with DMAR enabled - it seems the device issues
>>> memory read/write requests from the wrong PCI function ID and the IOMMU
>>> rightly denies access as the function listed in the requests doesn't
>>> have any mapping to that memory. I don't think there's presently a
>>> workaround other than disabling DMAR. We could (and likely should) be
>>> detecting that device and adding some kind of quirk for it.
>>>
>>> That sounds likely...
>>> It is shown below:
>>>
>>> Card name: HighPoint Rocket 620 Dual Port SATA 6 Gbps PCI Express 2.0 Host
>>> Adapter
>>>
>>> lspci -vv output:
>>>
>>> 84:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9123 PCIe SATA
>>> 6.0 Gb/s controller (rev 11) (prog-if 01 [AHCI 1.0])
>>> Subsystem: Marvell Technology Group Ltd. 88SE9123 PCIe SATA 6.0 Gb/s
>>> controller
>>
>> Yeah, that's one of those controllers I think. But I can't tell from
>> the bit of the dmesg you posted exactly what's going on. Can you post
>> a full boot log from having the card installed and some drive attached
>> (by putting the boot drive on another controller for example)?
>>
>>>> ==> Further issues with the X9SRL-F -- does this board support ASPM or is
>>>> this a Linux/ASPM implementation issue?
>>>> [ 0.632170] pci0000:ff: ACPI _OSC support notification failed,
>>> disabling
>>>> PCIe ASPM
>>>> [ 0.632239] pci0000:ff: Unable to request _OSC control (_OSC support
>>>> mask: 0x08)
>>>
>>> What's the full dmesg from this machine (or is it already posted
>> somewhere)?
>>>
>>> It is now available here:
>>> http://home.comcast.net/~jpiszcz/20121128/dmesg.txt
>>
>>> Is that the same boot log? It doesn't have this error in it.
>>
>> Yes, the error is here: (its towards the bottom)
>>
>> [ 7.973015] ata14.00: qc timeout (cmd 0xa1)
>> [ 8.472120] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>> [ 9.275922] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>> [ 19.260667] ata14.00: qc timeout (cmd 0xa1)
>> [ 19.759828] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>> [ 19.760451] ata14: limiting SATA link speed to 1.5 Gbps
>> [ 20.566598] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
>> [ 50.521078] ata14.00: qc timeout (cmd 0xa1)
>> [ 51.020880] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>> [ 51.824664] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
>> [ 51.824682] dmar: DRHD: handling fault status reg 502
>> [ 51.824686] dmar: DMAR:[DMA Read] Request device [04:00.0] fault addr 0
>> [ 51.824686] DMAR:[fault reason 06] PTE Read access is not set
>
> You have these devices:
>
> pci 0000:04:00.0: [10de:01d3] type 00 class 0x030000 nVidia G72
> pci 0000:84:00.0: [1b4b:9123] type 00 class 0x010601 Marvell 88SE9123 SATA
> pci 0000:84:00.1: [1b4b:91a4] type 00 class 0x01018f Marvell 88SE9128 IDE
>
> I think the 04:00.0 DMAR errors are symptoms of nouveau driver issues,
> and if you get rid of that driver, they'll probably go away.
>
> But this 84:00.1 DMAR error:
>
> dmar: DMAR:[DMA Read] Request device [84:00.1] fault addr fff00000
> DMAR:[fault reason 02] Present bit in context entry is clear
>
> looks like the probable cause of the Marvell issue. It looks similar
> to https://bugzilla.kernel.org/show_bug.cgi?id=42679, although the
> reports there show a bb:dd.0 device (but no bb:dd.1 device), and the
> DMAR rejects DMA that appears to be from bb:dd.1.
>
> Another report that's even more similar is
> https://bugzilla.redhat.com/show_bug.cgi?id=757166 . In that case,
> both bb:dd.0 and bb:dd.1 exist (as in your system), and the DMAR fault
> is exactly like what you're seeing.
>
> So you're not alone, but unfortunately, nobody seems to be working on
> either bug report. I took the liberty to add you to the cc: list of
> both.
>
> I don't really know what else to do at this point. Maybe a SATA
> expert with some Marvell docs could figure out why we're seeing DMA
> from the IDE controller, but I'm not that person :)
I doubt any Marvell docs would really be very helpful (except for
maybe an errata list but that likely would just tell us what we can
already figure out). The SATA controller part of the device seems to
just be issuing accesses with the wrong PCI function ID.
The only solution I can think of would be at the PCI/DMAR layer -
basically functions 0 and 1 on this device should be allowed to access
each other's DMA regions.
[+cc Jeff, linux-ide, David, Joerg, iommu]
On Thu, Nov 29, 2012 at 7:39 PM, Robert Hancock <[email protected]> wrote:
> On Thu, Nov 29, 2012 at 12:16 PM, Bjorn Helgaas <[email protected]> wrote:
>> On Thu, Nov 29, 2012 at 1:55 AM, Justin Piszcz <[email protected]> wrote:
>>>
>>>
>>> -----Original Message-----
>>> From: Robert Hancock [mailto:[email protected]]
>>> Sent: Wednesday, November 28, 2012 7:55 PM
>>> To: Justin Piszcz
>>> Cc: Bjorn Helgaas; Bruno Pr?mont; [email protected];
>>> [email protected]; Dan Williams
>>> Subject: Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware
>>> bug question
>>>
>>> On Wed, Nov 28, 2012 at 6:49 PM, Justin Piszcz <[email protected]>
>>> wrote:
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Robert Hancock [mailto:[email protected]]
>>>> Sent: Wednesday, November 28, 2012 7:35 PM
>>>> To: Justin Piszcz
>>>> Cc: 'Bjorn Helgaas'; 'Bruno Pr?mont'; [email protected];
>>>> [email protected]; 'Dan Williams'
>>>> Subject: Re: Supermicro X9SRL-F - channel enumeration error &
>>> ACPI/firmware
>>>> bug question
>>>>
>>>>
>>>> What does lspci -vv show on that controller? Not sure what actual
>>>> chipset that controller is, but there's a known issue with some Marvell
>>>> 6Gbps SATA controllers with DMAR enabled - it seems the device issues
>>>> memory read/write requests from the wrong PCI function ID and the IOMMU
>>>> rightly denies access as the function listed in the requests doesn't
>>>> have any mapping to that memory. I don't think there's presently a
>>>> workaround other than disabling DMAR. We could (and likely should) be
>>>> detecting that device and adding some kind of quirk for it.
>>>>
>>>> That sounds likely...
>>>> It is shown below:
>>>>
>>>> Card name: HighPoint Rocket 620 Dual Port SATA 6 Gbps PCI Express 2.0 Host
>>>> Adapter
>>>>
>>>> lspci -vv output:
>>>>
>>>> 84:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9123 PCIe SATA
>>>> 6.0 Gb/s controller (rev 11) (prog-if 01 [AHCI 1.0])
>>>> Subsystem: Marvell Technology Group Ltd. 88SE9123 PCIe SATA 6.0 Gb/s
>>>> controller
>>>
>>> Yeah, that's one of those controllers I think. But I can't tell from
>>> the bit of the dmesg you posted exactly what's going on. Can you post
>>> a full boot log from having the card installed and some drive attached
>>> (by putting the boot drive on another controller for example)?
>>>
>>>>> ==> Further issues with the X9SRL-F -- does this board support ASPM or is
>>>>> this a Linux/ASPM implementation issue?
>>>>> [ 0.632170] pci0000:ff: ACPI _OSC support notification failed,
>>>> disabling
>>>>> PCIe ASPM
>>>>> [ 0.632239] pci0000:ff: Unable to request _OSC control (_OSC support
>>>>> mask: 0x08)
>>>>
>>>> What's the full dmesg from this machine (or is it already posted
>>> somewhere)?
>>>>
>>>> It is now available here:
>>>> http://home.comcast.net/~jpiszcz/20121128/dmesg.txt
>>>
>>>> Is that the same boot log? It doesn't have this error in it.
>>>
>>> Yes, the error is here: (its towards the bottom)
>>>
>>> [ 7.973015] ata14.00: qc timeout (cmd 0xa1)
>>> [ 8.472120] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>> [ 9.275922] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>>> [ 19.260667] ata14.00: qc timeout (cmd 0xa1)
>>> [ 19.759828] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>> [ 19.760451] ata14: limiting SATA link speed to 1.5 Gbps
>>> [ 20.566598] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
>>> [ 50.521078] ata14.00: qc timeout (cmd 0xa1)
>>> [ 51.020880] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>> [ 51.824664] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
>>> [ 51.824682] dmar: DRHD: handling fault status reg 502
>>> [ 51.824686] dmar: DMAR:[DMA Read] Request device [04:00.0] fault addr 0
>>> [ 51.824686] DMAR:[fault reason 06] PTE Read access is not set
>>
>> You have these devices:
>>
>> pci 0000:04:00.0: [10de:01d3] type 00 class 0x030000 nVidia G72
>> pci 0000:84:00.0: [1b4b:9123] type 00 class 0x010601 Marvell 88SE9123 SATA
>> pci 0000:84:00.1: [1b4b:91a4] type 00 class 0x01018f Marvell 88SE9128 IDE
>>
>> I think the 04:00.0 DMAR errors are symptoms of nouveau driver issues,
>> and if you get rid of that driver, they'll probably go away.
>>
>> But this 84:00.1 DMAR error:
>>
>> dmar: DMAR:[DMA Read] Request device [84:00.1] fault addr fff00000
>> DMAR:[fault reason 02] Present bit in context entry is clear
>>
>> looks like the probable cause of the Marvell issue. It looks similar
>> to https://bugzilla.kernel.org/show_bug.cgi?id=42679, although the
>> reports there show a bb:dd.0 device (but no bb:dd.1 device), and the
>> DMAR rejects DMA that appears to be from bb:dd.1.
>>
>> Another report that's even more similar is
>> https://bugzilla.redhat.com/show_bug.cgi?id=757166 . In that case,
>> both bb:dd.0 and bb:dd.1 exist (as in your system), and the DMAR fault
>> is exactly like what you're seeing.
>>
>> So you're not alone, but unfortunately, nobody seems to be working on
>> either bug report. I took the liberty to add you to the cc: list of
>> both.
>>
>> I don't really know what else to do at this point. Maybe a SATA
>> expert with some Marvell docs could figure out why we're seeing DMA
>> from the IDE controller, but I'm not that person :)
>
> I doubt any Marvell docs would really be very helpful (except for
> maybe an errata list but that likely would just tell us what we can
> already figure out). The SATA controller part of the device seems to
> just be issuing accesses with the wrong PCI function ID.
>
> The only solution I can think of would be at the PCI/DMAR layer -
> basically functions 0 and 1 on this device should be allowed to access
> each other's DMA regions.
That's essentially the patch at
https://bugzilla.redhat.com/show_bug.cgi?id=757166#c16, which in my
opinion is too ugly to consider. But fortunately, I'm not the
maintainer for any IOMMU drivers.
My point about the docs is that often we think "this hardware is
clearly broken and the only workaround is X," but sometimes it's just
that we don't understand the hardware designer's intent. It may be
that the hardware was just never tested with DMAR and is indeed
broken, or it may be that it does work with DMAR given a different
driver structure or different device initialization. I just don't
want lack of imagination to force us to assume there's only one
workaround.