LinuxLists.cc - RFC: IOMMU/AMD: Error Handling

2013-04-29 19:45:51

Subject: RFC: IOMMU/AMD: Error Handling

Joerg,

We are in the process of implementing AMD IOMMU error handling, and I
would like some comments from you and the community.

Currently, the AMD IOMMU driver only reports events from the event log
in the dmesg, and does not try to handle them in case of errors. AMD
IOMMU errors can be categorized as device-specific errors and IOMMU errors.

1. For IOMMU errors such as:
- DEV_TAB_HADWARE_ERROR
- PAGE_TAB_ERROR
- COMMAND_HARDWARE_ERROR
If the error is detected during IOMMU initialization, we could disable
IOMMU and proceed. If the error occurs after IOMMU is initialized, we
won't be able to recover from this, and might need to result in panic.

2. For device-specific errors such as:
- ILLEGAL_DEV_TABLE_ENTRY
- IO_PAGE_FAULT
- INVALDE_DEVICE_REQUEST
We think the AMD IOMMU driver should try to isolate the device. This
involves blocking device transactions at IOMMU DTE and tries to disable
the device (e.g. calling the remove(struct pci_dev *pdev) interface
generally provides by device drivers). This could prevents the device
from continuing to fail and to risk of system instability.

3. In case of posted memory write transaction, device driver might not
be aware that the transaction has failed and blocked at IOMMU. If there
is no HW IOMMU, I believe this is handled by PCI error handling code.
If the IOMMU hardware reporth such case, could this potentially leverage
the Linux IOMMU fault handling interface, iommu_set_fault_handler() and
report_iommu_fault(), to communicate to device driver or PCI driver?

Any feedback or comments are appreciated.

Thank you,
Suravee

2013-04-29 20:10:27

by Donald Dutile

[permalink] [raw]

Subject: Re: RFC: IOMMU/AMD: Error Handling

On 04/29/2013 03:45 PM, Suravee Suthikulanit wrote:
> Joerg,
>
> We are in the process of implementing AMD IOMMU error handling, and I would like some comments from you and the community.
>
> Currently, the AMD IOMMU driver only reports events from the event log in the dmesg, and does not try to handle them in case of errors. AMD IOMMU errors can be categorized as device-specific errors and IOMMU errors.
>
> 1. For IOMMU errors such as:
> - DEV_TAB_HADWARE_ERROR
> - PAGE_TAB_ERROR
> - COMMAND_HARDWARE_ERROR
> If the error is detected during IOMMU initialization, we could disable IOMMU and proceed. If the error occurs after IOMMU is initialized, we won't be able to recover from this, and might need to result in panic.
>
> 2. For device-specific errors such as:
> - ILLEGAL_DEV_TABLE_ENTRY
> - IO_PAGE_FAULT
> - INVALDE_DEVICE_REQUEST
> We think the AMD IOMMU driver should try to isolate the device. This involves blocking device transactions at IOMMU DTE and tries to disable the device (e.g. calling the remove(struct pci_dev *pdev) interface generally provides by device drivers). This could prevents the device from continuing to fail and to risk of system instability.
>
disabling the device is not an option.
We've seen mis-configured ACPI tables generate storms
of invalide dte messages after iommu setup but before they are cleared up when
the OS driver is started & resets the device. The original storm is from bios-use
of IOMMU with a device.
I'd recommend creating a filter that prevents further logging from a device
for 5 mins at a time if a storm of DTE-related errors are seen.
by definition, the DMA is blocked from corrupting/changing memory, so isolation has been established;
keeping the failure log from consuming the system is the needed fix.

> 3. In case of posted memory write transaction, device driver might not be aware that the transaction has failed and blocked at IOMMU. If there is no HW IOMMU, I believe this is handled by PCI error handling code. If the IOMMU hardware reporth such case, could this potentially leverage the Linux IOMMU fault handling interface, iommu_set_fault_handler() and report_iommu_fault(), to communicate to device driver or PCI driver?
>
Wondering if you could use AER-like callback mechanism so a driver can be invoked when IOMMU error occurs,
so the device driver can quiesce or reset the device if it deems it transient.

> Any feedback or comments are appreciated.
>
> Thank you,
> Suravee
>
>
>
>
> _______________________________________________
> iommu mailing list
> [email protected]
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

2013-04-29 20:34:36

by Duran, Leo

[permalink] [raw]

Subject: RE: RFC: IOMMU/AMD: Error Handling

I'm wondering if resetting the IOMMU at init-time (once) would clear any BIOS induced noise.
Leo

> -----Original Message-----
> From: [email protected] [mailto:iommu-
> [email protected]] On Behalf Of Don Dutile
> Sent: Monday, April 29, 2013 3:10 PM
> To: Suthikulpanit, Suravee
> Cc: [email protected]; [email protected]
> Subject: Re: RFC: IOMMU/AMD: Error Handling
>
> On 04/29/2013 03:45 PM, Suravee Suthikulanit wrote:
> > Joerg,
> >
> > We are in the process of implementing AMD IOMMU error handling, and I
> would like some comments from you and the community.
> >
> > Currently, the AMD IOMMU driver only reports events from the event log
> in the dmesg, and does not try to handle them in case of errors. AMD
> IOMMU errors can be categorized as device-specific errors and IOMMU
> errors.
> >
> > 1. For IOMMU errors such as:
> > - DEV_TAB_HADWARE_ERROR
> > - PAGE_TAB_ERROR
> > - COMMAND_HARDWARE_ERROR
> > If the error is detected during IOMMU initialization, we could disable
> IOMMU and proceed. If the error occurs after IOMMU is initialized, we won't
> be able to recover from this, and might need to result in panic.
> >
> > 2. For device-specific errors such as:
> > - ILLEGAL_DEV_TABLE_ENTRY
> > - IO_PAGE_FAULT
> > - INVALDE_DEVICE_REQUEST
> > We think the AMD IOMMU driver should try to isolate the device. This
> involves blocking device transactions at IOMMU DTE and tries to disable the
> device (e.g. calling the remove(struct pci_dev *pdev) interface generally
> provides by device drivers). This could prevents the device from continuing
> to fail and to risk of system instability.
> >
> disabling the device is not an option.
> We've seen mis-configured ACPI tables generate storms of invalide dte
> messages after iommu setup but before they are cleared up when the OS
> driver is started & resets the device. The original storm is from bios-use of
> IOMMU with a device.
> I'd recommend creating a filter that prevents further logging from a device
> for 5 mins at a time if a storm of DTE-related errors are seen.
> by definition, the DMA is blocked from corrupting/changing memory, so
> isolation has been established; keeping the failure log from consuming the
> system is the needed fix.
>
> > 3. In case of posted memory write transaction, device driver might not be
> aware that the transaction has failed and blocked at IOMMU. If there is no
> HW IOMMU, I believe this is handled by PCI error handling code. If the
> IOMMU hardware reporth such case, could this potentially leverage the
> Linux IOMMU fault handling interface, iommu_set_fault_handler() and
> report_iommu_fault(), to communicate to device driver or PCI driver?
> >
> Wondering if you could use AER-like callback mechanism so a driver can be
> invoked when IOMMU error occurs, so the device driver can quiesce or reset
> the device if it deems it transient.
>
>
> > Any feedback or comments are appreciated.
> >
> > Thank you,
> > Suravee
> >
> >
> >
> >
> > _______________________________________________
> > iommu mailing list
> > [email protected]
> > https://lists.linuxfoundation.org/mailman/listinfo/iommu
>
> _______________________________________________
> iommu mailing list
> [email protected]
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

2013-04-29 21:42:31

by Donald Dutile

[permalink] [raw]

Subject: Re: RFC: IOMMU/AMD: Error Handling

On 04/29/2013 04:34 PM, Duran, Leo wrote:
> I'm wondering if resetting the IOMMU at init-time (once) would clear any BIOS induced noise.
> Leo
>
Well, depends what you mean by 'reset'....
(a) setting it up for OS use is effectively a reset, but doesn't quiesce a device
doing dma reads of a (bios-setup) queue. then the noisy messages begin
(b) disable the iommu, and then the dma just occurs... and bad for writes, potentially.

Similar issue is being reported & worked for kdump, where device are still
doing DMA while the system is trying to 'reset' to the kexec'd kernel, and
take a crash dump.

Solution: stop devices from doing dma... but some you _want_ enabled throughout...
like keyboard & mouse via usb controller, so you get to pick os from
grub... not so for kexec...

so, again, for isolation faults.... let the hw do its job -- isolate
and throttle/silence the fault messages on a per-device, time-duration heuristic
so the system can get through boot-up where enough OS is init'd (drivers started)
to stop the temporary noise.

>> -----Original Message-----
>> From: [email protected] [mailto:iommu-
>> [email protected]] On Behalf Of Don Dutile
>> Sent: Monday, April 29, 2013 3:10 PM
>> To: Suthikulpanit, Suravee
>> Cc: [email protected]; [email protected]
>> Subject: Re: RFC: IOMMU/AMD: Error Handling
>>
>> On 04/29/2013 03:45 PM, Suravee Suthikulanit wrote:
>>> Joerg,
>>>
>>> We are in the process of implementing AMD IOMMU error handling, and I
>> would like some comments from you and the community.
>>>
>>> Currently, the AMD IOMMU driver only reports events from the event log
>> in the dmesg, and does not try to handle them in case of errors. AMD
>> IOMMU errors can be categorized as device-specific errors and IOMMU
>> errors.
>>>
>>> 1. For IOMMU errors such as:
>>> - DEV_TAB_HADWARE_ERROR
>>> - PAGE_TAB_ERROR
>>> - COMMAND_HARDWARE_ERROR
>>> If the error is detected during IOMMU initialization, we could disable
>> IOMMU and proceed. If the error occurs after IOMMU is initialized, we won't
>> be able to recover from this, and might need to result in panic.
>>>
>>> 2. For device-specific errors such as:
>>> - ILLEGAL_DEV_TABLE_ENTRY
>>> - IO_PAGE_FAULT
>>> - INVALDE_DEVICE_REQUEST
>>> We think the AMD IOMMU driver should try to isolate the device. This
>> involves blocking device transactions at IOMMU DTE and tries to disable the
>> device (e.g. calling the remove(struct pci_dev *pdev) interface generally
>> provides by device drivers). This could prevents the device from continuing
>> to fail and to risk of system instability.
>>>
>> disabling the device is not an option.
>> We've seen mis-configured ACPI tables generate storms of invalide dte
>> messages after iommu setup but before they are cleared up when the OS
>> driver is started& resets the device. The original storm is from bios-use of
>> IOMMU with a device.
>> I'd recommend creating a filter that prevents further logging from a device
>> for 5 mins at a time if a storm of DTE-related errors are seen.
>> by definition, the DMA is blocked from corrupting/changing memory, so
>> isolation has been established; keeping the failure log from consuming the
>> system is the needed fix.
>>
>>> 3. In case of posted memory write transaction, device driver might not be
>> aware that the transaction has failed and blocked at IOMMU. If there is no
>> HW IOMMU, I believe this is handled by PCI error handling code. If the
>> IOMMU hardware reporth such case, could this potentially leverage the
>> Linux IOMMU fault handling interface, iommu_set_fault_handler() and
>> report_iommu_fault(), to communicate to device driver or PCI driver?
>>>
>> Wondering if you could use AER-like callback mechanism so a driver can be
>> invoked when IOMMU error occurs, so the device driver can quiesce or reset
>> the device if it deems it transient.
>>
>>
>>> Any feedback or comments are appreciated.
>>>
>>> Thank you,
>>> Suravee
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> iommu mailing list
>>> [email protected]
>>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>>
>> _______________________________________________
>> iommu mailing list
>> [email protected]
>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>
>

2013-04-29 22:31:30

by Duran, Leo

[permalink] [raw]

Subject: RE: RFC: IOMMU/AMD: Error Handling

I see... I suppose the trick is going to be how to 'filter' this non intended behavior (once, during OS boot).
Thanks,
Leo.

> -----Original Message-----
> From: Don Dutile [mailto:[email protected]]
> Sent: Monday, April 29, 2013 4:42 PM
> To: Duran, Leo
> Cc: Suthikulpanit, Suravee; [email protected]; linux-
> [email protected]
> Subject: Re: RFC: IOMMU/AMD: Error Handling
>
> On 04/29/2013 04:34 PM, Duran, Leo wrote:
> > I'm wondering if resetting the IOMMU at init-time (once) would clear any
> BIOS induced noise.
> > Leo
> >
> Well, depends what you mean by 'reset'....
> (a) setting it up for OS use is effectively a reset, but doesn't quiesce a device
> doing dma reads of a (bios-setup) queue. then the noisy messages begin
> (b) disable the iommu, and then the dma just occurs... and bad for writes,
> potentially.
>
> Similar issue is being reported & worked for kdump, where device are still
> doing DMA while the system is trying to 'reset' to the kexec'd kernel, and
> take a crash dump.
>
> Solution: stop devices from doing dma... but some you _want_ enabled
> throughout...
> like keyboard & mouse via usb controller, so you get to pick os from
> grub... not so for kexec...
>
> so, again, for isolation faults.... let the hw do its job -- isolate and
> throttle/silence the fault messages on a per-device, time-duration heuristic
> so the system can get through boot-up where enough OS is init'd (drivers
> started) to stop the temporary noise.
>
> >> -----Original Message-----
> >> From: [email protected] [mailto:iommu-
> >> [email protected]] On Behalf Of Don Dutile
> >> Sent: Monday, April 29, 2013 3:10 PM
> >> To: Suthikulpanit, Suravee
> >> Cc: [email protected]; [email protected]
> >> Subject: Re: RFC: IOMMU/AMD: Error Handling
> >>
> >> On 04/29/2013 03:45 PM, Suravee Suthikulanit wrote:
> >>> Joerg,
> >>>
> >>> We are in the process of implementing AMD IOMMU error handling, and
> >>> I
> >> would like some comments from you and the community.
> >>>
> >>> Currently, the AMD IOMMU driver only reports events from the event
> >>> log
> >> in the dmesg, and does not try to handle them in case of errors. AMD
> >> IOMMU errors can be categorized as device-specific errors and IOMMU
> >> errors.
> >>>
> >>> 1. For IOMMU errors such as:
> >>> - DEV_TAB_HADWARE_ERROR
> >>> - PAGE_TAB_ERROR
> >>> - COMMAND_HARDWARE_ERROR
> >>> If the error is detected during IOMMU initialization, we could
> >>> disable
> >> IOMMU and proceed. If the error occurs after IOMMU is initialized, we
> >> won't be able to recover from this, and might need to result in panic.
> >>>
> >>> 2. For device-specific errors such as:
> >>> - ILLEGAL_DEV_TABLE_ENTRY
> >>> - IO_PAGE_FAULT
> >>> - INVALDE_DEVICE_REQUEST
> >>> We think the AMD IOMMU driver should try to isolate the device. This
> >> involves blocking device transactions at IOMMU DTE and tries to
> >> disable the device (e.g. calling the remove(struct pci_dev *pdev)
> >> interface generally provides by device drivers). This could prevents
> >> the device from continuing to fail and to risk of system instability.
> >>>
> >> disabling the device is not an option.
> >> We've seen mis-configured ACPI tables generate storms of invalide dte
> >> messages after iommu setup but before they are cleared up when the OS
> >> driver is started& resets the device. The original storm is from
> >> bios-use of IOMMU with a device.
> >> I'd recommend creating a filter that prevents further logging from a
> >> device for 5 mins at a time if a storm of DTE-related errors are seen.
> >> by definition, the DMA is blocked from corrupting/changing memory, so
> >> isolation has been established; keeping the failure log from
> >> consuming the system is the needed fix.
> >>
> >>> 3. In case of posted memory write transaction, device driver might
> >>> not be
> >> aware that the transaction has failed and blocked at IOMMU. If there
> >> is no HW IOMMU, I believe this is handled by PCI error handling code.
> >> If the IOMMU hardware reporth such case, could this potentially
> >> leverage the Linux IOMMU fault handling interface,
> >> iommu_set_fault_handler() and report_iommu_fault(), to communicate
> to device driver or PCI driver?
> >>>
> >> Wondering if you could use AER-like callback mechanism so a driver
> >> can be invoked when IOMMU error occurs, so the device driver can
> >> quiesce or reset the device if it deems it transient.
> >>
> >>
> >>> Any feedback or comments are appreciated.
> >>>
> >>> Thank you,
> >>> Suravee
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> iommu mailing list
> >>> [email protected]
> >>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> >>
> >> _______________________________________________
> >> iommu mailing list
> >> [email protected]
> >> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> >
> >
>

2013-04-30 14:49:26

by Suthikulpanit, Suravee

[permalink] [raw]

Subject: Re: RFC: IOMMU/AMD: Error Handling

On 4/29/2013 3:10 PM, Don Dutile wrote:
> On 04/29/2013 03:45 PM, Suravee Suthikulanit wrote:
>> Joerg,
>>
>> We are in the process of implementing AMD IOMMU error handling, and I
>> would like some comments from you and the community.
>>
>> Currently, the AMD IOMMU driver only reports events from the event
>> log in the dmesg, and does not try to handle them in case of errors.
>> AMD IOMMU errors can be categorized as device-specific errors and
>> IOMMU errors.
>>
>> 1. For IOMMU errors such as:
>> - DEV_TAB_HADWARE_ERROR
>> - PAGE_TAB_ERROR
>> - COMMAND_HARDWARE_ERROR
>> If the error is detected during IOMMU initialization, we could
>> disable IOMMU and proceed. If the error occurs after IOMMU is
>> initialized, we won't be able to recover from this, and might need to
>> result in panic.
>>
>> 2. For device-specific errors such as:
>> - ILLEGAL_DEV_TABLE_ENTRY
>> - IO_PAGE_FAULT
>> - INVALDE_DEVICE_REQUEST
>> We think the AMD IOMMU driver should try to isolate the device. This
>> involves blocking device transactions at IOMMU DTE and tries to
>> disable the device (e.g. calling the remove(struct pci_dev *pdev)
>> interface generally provides by device drivers). This could prevents
>> the device from continuing to fail and to risk of system instability.
>>
> disabling the device is not an option.
> We've seen mis-configured ACPI tables generate storms
> of invalide dte messages after iommu setup but before they are cleared
> up when
> the OS driver is started & resets the device. The original storm is
> from bios-use
> of IOMMU with a device.
Would some sorts of threshold to help determine the badness of errors
might be sufficient? For instance, if the device has generated N errors,
it is then be removed (where N is tunable through sysfs or kernel boot
options).

> I'd recommend creating a filter that prevents further logging from a
> device
> for 5 mins at a time if a storm of DTE-related errors are seen.
> by definition, the DMA is blocked from corrupting/changing memory, so
> isolation has been established;
> keeping the failure log from consuming the system is the needed fix.

I believe the IOMMU hardware can be configured to suppress logging of subsequent I/O page fault errors until
the device table cache is cleared. This should help avoiding storm of interrupts you are seeing.

>
>> 3. In case of posted memory write transaction, device driver might
>> not be aware that the transaction has failed and blocked at IOMMU. If
>> there is no HW IOMMU, I believe this is handled by PCI error handling
>> code. If the IOMMU hardware reporth such case, could this potentially
>> leverage the Linux IOMMU fault handling interface,
>> iommu_set_fault_handler() and report_iommu_fault(), to communicate to
>> device driver or PCI driver?
>>
> Wondering if you could use AER-like callback mechanism so a driver can
> be invoked when IOMMU error occurs,
> so the device driver can quiesce or reset the device if it deems it
> transient.
That might also be possible. I might need to look into it more.

Suravee
>
>> Any feedback or comments are appreciated.
>>
>> Thank you,
>> Suravee
>>
>>
>>
>>
>> _______________________________________________
>> iommu mailing list
>> [email protected]
>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>
>

2013-04-30 14:56:30

by Suthikulpanit, Suravee

[permalink] [raw]

Subject: Re: RFC: IOMMU/AMD: Error Handling

On 4/29/2013 4:42 PM, Don Dutile wrote:
> On 04/29/2013 04:34 PM, Duran, Leo wrote:
>> I'm wondering if resetting the IOMMU at init-time (once) would clear
>> any BIOS induced noise.
>> Leo
>>
> Well, depends what you mean by 'reset'....
> (a) setting it up for OS use is effectively a reset, but doesn't
> quiesce a device
> doing dma reads of a (bios-setup) queue. then the noisy messages
> begin
> (b) disable the iommu, and then the dma just occurs... and bad for
> writes, potentially.
>
> Similar issue is being reported & worked for kdump, where device are
> still
> doing DMA while the system is trying to 'reset' to the kexec'd kernel,
> and
> take a crash dump.
>
> Solution: stop devices from doing dma... but some you _want_ enabled
> throughout...
> like keyboard & mouse via usb controller, so you get to pick
> os from
> grub... not so for kexec...
>
> so, again, for isolation faults.... let the hw do its job -- isolate
> and throttle/silence the fault messages on a per-device, time-duration
> heuristic
> so the system can get through boot-up where enough OS is init'd
> (drivers started)
> to stop the temporary noise.
This sounds more like issue with the order of how things are initialized
in the system.
If so, could we separate the code which enabling of IOMMU error
logging/handling and
delay it until we are certain that systems are stable?

Suravee

2013-04-30 15:06:39

by Donald Dutile

[permalink] [raw]

Subject: Re: RFC: IOMMU/AMD: Error Handling

On 04/30/2013 10:49 AM, Suravee Suthikulanit wrote:
> On 4/29/2013 3:10 PM, Don Dutile wrote:
>> On 04/29/2013 03:45 PM, Suravee Suthikulanit wrote:
>>> Joerg,
>>>
>>> We are in the process of implementing AMD IOMMU error handling, and I would like some comments from you and the community.
>>>
>>> Currently, the AMD IOMMU driver only reports events from the event log in the dmesg, and does not try to handle them in case of errors. AMD IOMMU errors can be categorized as device-specific errors and IOMMU errors.
>>>
>>> 1. For IOMMU errors such as:
>>> - DEV_TAB_HADWARE_ERROR
>>> - PAGE_TAB_ERROR
>>> - COMMAND_HARDWARE_ERROR
>>> If the error is detected during IOMMU initialization, we could disable IOMMU and proceed. If the error occurs after IOMMU is initialized, we won't be able to recover from this, and might need to result in panic.
>>>
>>> 2. For device-specific errors such as:
>>> - ILLEGAL_DEV_TABLE_ENTRY
>>> - IO_PAGE_FAULT
>>> - INVALDE_DEVICE_REQUEST
>>> We think the AMD IOMMU driver should try to isolate the device. This involves blocking device transactions at IOMMU DTE and tries to disable the device (e.g. calling the remove(struct pci_dev *pdev) interface generally provides by device drivers). This could prevents the device from continuing to fail and to risk of system instability.
>>>
>> disabling the device is not an option.
>> We've seen mis-configured ACPI tables generate storms
>> of invalide dte messages after iommu setup but before they are cleared up when
>> the OS driver is started & resets the device. The original storm is from bios-use
>> of IOMMU with a device.
> Would some sorts of threshold to help determine the badness of errors might be sufficient? For instance, if the device has generated N errors, it is then be removed (where N is tunable through sysfs or kernel boot options).
>
No! removing a device is _not_ acceptable.
Again, the most common case I've seen is the *boot* device
not having the proper IVMD(AMD) or RMRR(Intel) structures in the ACPI tables,
or they are temporarily invalided during reboot (esp. during kexec'd kdump kernels).
Second most common -- the usb controller that the user may need to control the
system on power-up. It'll be more fun when IPMI + IOMMU are put together in the ARM space.
Filter faults from a device; 'nuf said.

>> I'd recommend creating a filter that prevents further logging from a device
>> for 5 mins at a time if a storm of DTE-related errors are seen.
>> by definition, the DMA is blocked from corrupting/changing memory, so isolation has been established;
>> keeping the failure log from consuming the system is the needed fix.
>
> I believe the IOMMU hardware can be configured to suppress logging of subsequent I/O page fault errors until
> the device table cache is cleared. This should help avoiding storm of interrupts you are seeing.
>
If the tables are correct... if not.... then hung system.

>>
>>> 3. In case of posted memory write transaction, device driver might not be aware that the transaction has failed and blocked at IOMMU. If there is no HW IOMMU, I believe this is handled by PCI error handling code. If the IOMMU hardware reporth such case, could this potentially leverage the Linux IOMMU fault handling interface, iommu_set_fault_handler() and report_iommu_fault(), to communicate to device driver or PCI driver?
>>>
>> Wondering if you could use AER-like callback mechanism so a driver can be invoked when IOMMU error occurs,
>> so the device driver can quiesce or reset the device if it deems it transient.
> That might also be possible. I might need to look into it more.
>
> Suravee

In summary: when BIOS's are made perfect, then you could implement your perfect disabling algorithm;
unfortunately, esp. with IOMMU's & intr-remap acpi tables, the bios's are notoriously buggy.

>>
>>> Any feedback or comments are appreciated.
>>>
>>> Thank you,
>>> Suravee
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> iommu mailing list
>>> [email protected]
>>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>>
>>
>
>

2013-04-30 15:09:13

by Donald Dutile

[permalink] [raw]

Subject: Re: RFC: IOMMU/AMD: Error Handling

On 04/30/2013 10:56 AM, Suravee Suthikulanit wrote:
> On 4/29/2013 4:42 PM, Don Dutile wrote:
>> On 04/29/2013 04:34 PM, Duran, Leo wrote:
>>> I'm wondering if resetting the IOMMU at init-time (once) would clear any BIOS induced noise.
>>> Leo
>>>
>> Well, depends what you mean by 'reset'....
>> (a) setting it up for OS use is effectively a reset, but doesn't quiesce a device
>> doing dma reads of a (bios-setup) queue. then the noisy messages begin
>> (b) disable the iommu, and then the dma just occurs... and bad for writes, potentially.
>>
>> Similar issue is being reported & worked for kdump, where device are still
>> doing DMA while the system is trying to 'reset' to the kexec'd kernel, and
>> take a crash dump.
>>
>> Solution: stop devices from doing dma... but some you _want_ enabled throughout...
>> like keyboard & mouse via usb controller, so you get to pick os from
>> grub... not so for kexec...
>>
>> so, again, for isolation faults.... let the hw do its job -- isolate
>> and throttle/silence the fault messages on a per-device, time-duration heuristic
>> so the system can get through boot-up where enough OS is init'd (drivers started)
>> to stop the temporary noise.
> This sounds more like issue with the order of how things are initialized in the system.
> If so, could we separate the code which enabling of IOMMU error logging/handling and
> delay it until we are certain that systems are stable?
>
So, you are proposing we not enable fault events when IOMMU is initially configured;
use the IOMMU through boot/driver-config, hoping all is well, and if not, continue blindly,
and then enable IOMMU faults post/late-init ?

> Suravee
>

2013-04-30 15:21:52

by Joerg Roedel

[permalink] [raw]

Subject: Re: RFC: IOMMU/AMD: Error Handling

On Tue, Apr 30, 2013 at 09:56:22AM -0500, Suthikulpanit, Suravee wrote:

> This sounds more like issue with the order of how things are
> initialized in the system.

No, the problem is that almost all BIOS-provided IVRS tables are buggy
because they do not define a unity-mapped region for devices that need
one (like USB controllers). So there is a time window from iommu driver
initialization to where the usb driver takes over the controller where
these io-page-faults can happen.

Joerg

2013-04-30 16:02:33

by Alex Williamson

[permalink] [raw]

Subject: Re: RFC: IOMMU/AMD: Error Handling

On Tue, 2013-04-30 at 11:06 -0400, Don Dutile wrote:
> On 04/30/2013 10:49 AM, Suravee Suthikulanit wrote:
> > On 4/29/2013 3:10 PM, Don Dutile wrote:
> >> On 04/29/2013 03:45 PM, Suravee Suthikulanit wrote:
> >>> Joerg,
> >>>
> >>> We are in the process of implementing AMD IOMMU error handling, and I would like some comments from you and the community.
> >>>
> >>> Currently, the AMD IOMMU driver only reports events from the event log in the dmesg, and does not try to handle them in case of errors. AMD IOMMU errors can be categorized as device-specific errors and IOMMU errors.
> >>>
> >>> 1. For IOMMU errors such as:
> >>> - DEV_TAB_HADWARE_ERROR
> >>> - PAGE_TAB_ERROR
> >>> - COMMAND_HARDWARE_ERROR
> >>> If the error is detected during IOMMU initialization, we could disable IOMMU and proceed. If the error occurs after IOMMU is initialized, we won't be able to recover from this, and might need to result in panic.
> >>>
> >>> 2. For device-specific errors such as:
> >>> - ILLEGAL_DEV_TABLE_ENTRY
> >>> - IO_PAGE_FAULT
> >>> - INVALDE_DEVICE_REQUEST
> >>> We think the AMD IOMMU driver should try to isolate the device. This involves blocking device transactions at IOMMU DTE and tries to disable the device (e.g. calling the remove(struct pci_dev *pdev) interface generally provides by device drivers). This could prevents the device from continuing to fail and to risk of system instability.
> >>>
> >> disabling the device is not an option.
> >> We've seen mis-configured ACPI tables generate storms
> >> of invalide dte messages after iommu setup but before they are cleared up when
> >> the OS driver is started & resets the device. The original storm is from bios-use
> >> of IOMMU with a device.
> > Would some sorts of threshold to help determine the badness of errors might be sufficient? For instance, if the device has generated N errors, it is then be removed (where N is tunable through sysfs or kernel boot options).
> >
> No! removing a device is _not_ acceptable.
> Again, the most common case I've seen is the *boot* device
> not having the proper IVMD(AMD) or RMRR(Intel) structures in the ACPI tables,
> or they are temporarily invalided during reboot (esp. during kexec'd kdump kernels).
> Second most common -- the usb controller that the user may need to control the
> system on power-up. It'll be more fun when IPMI + IOMMU are put together in the ARM space.
> Filter faults from a device; 'nuf said.
>
>
> >> I'd recommend creating a filter that prevents further logging from a device
> >> for 5 mins at a time if a storm of DTE-related errors are seen.
> >> by definition, the DMA is blocked from corrupting/changing memory, so isolation has been established;
> >> keeping the failure log from consuming the system is the needed fix.
> >
> > I believe the IOMMU hardware can be configured to suppress logging of subsequent I/O page fault errors until
> > the device table cache is cleared. This should help avoiding storm of interrupts you are seeing.
> >
> If the tables are correct... if not.... then hung system.
>
> >>
> >>> 3. In case of posted memory write transaction, device driver might not be aware that the transaction has failed and blocked at IOMMU. If there is no HW IOMMU, I believe this is handled by PCI error handling code. If the IOMMU hardware reporth such case, could this potentially leverage the Linux IOMMU fault handling interface, iommu_set_fault_handler() and report_iommu_fault(), to communicate to device driver or PCI driver?
> >>>
> >> Wondering if you could use AER-like callback mechanism so a driver can be invoked when IOMMU error occurs,
> >> so the device driver can quiesce or reset the device if it deems it transient.
> > That might also be possible. I might need to look into it more.
> >
> > Suravee
>
> In summary: when BIOS's are made perfect, then you could implement your perfect disabling algorithm;
> unfortunately, esp. with IOMMU's & intr-remap acpi tables, the bios's are notoriously buggy.

I don't think it's just the BIOS. I netboot systems and regularly see a
few faults between IOMMU init and driver initialization of the device
(AMD & Intel, otherwise perfect BIOS). It's similar to the kexec case.
Maybe you can brush that aside as another boot-time issue as well, but
then we have devices assigned to guests, which can generate faults any
time. Do we really want an IOMMU driver directed .remove() in response
to those faults? I don't think so. What's wrong with just rate
limiting the errors? The IOMMU is doing it's job, if you want to have
something take action based on the error, create a path to tell the
driver or at least the subsystem about it, but randomly
calling .remove() sounds like a hack. Thanks,

Alex