2012-06-19 10:21:01

by Joerg Roedel

[permalink] [raw]
Subject: IO_PAGE_FAULTS with igb or igbvf on AMD IOMMU system

Hi,

I am trying to use an Intel 82576 NIC on an AMD IOMMU system with
SR-IOV. When I load the igb module with max_vfs=1 to enable a virtual
function I get IO_PAGE_FAULTS from the virtual functions. The relevant
part of dmesg is:

[ 45.788134] igb: Intel(R) Gigabit Ethernet Network Driver - version 3.4.7-k
[ 45.795090] igb: Copyright (c) 2007-2012 Intel Corporation.
[ 45.801049] igb 0000:02:00.0: irq 80 for MSI/MSI-X
[ 45.801056] igb 0000:02:00.0: irq 81 for MSI/MSI-X
[ 45.801061] igb 0000:02:00.0: irq 82 for MSI/MSI-X
[ 45.801067] igb 0000:02:00.0: irq 83 for MSI/MSI-X
[ 45.901445] pci 0000:02:10.0: [8086:10ca] type 00 class 0x020000
[ 45.901585] AMD-Vi: New device 0000:02:10.0
[ 45.906486] igb 0000:02:00.0: 1 VFs allocated
[ 45.937918] igbvf: Intel(R) Gigabit Virtual Function Network Driver - version 2.0.1-k
[ 45.945751] igbvf: Copyright (c) 2009 - 2012 Intel Corporation.
[ 46.071749] igb 0000:02:00.0: Intel(R) Gigabit Ethernet Network Connection
[ 46.078605] igb 0000:02:00.0: eth5: (PCIe:2.5Gb/s:Width x4) 00:1b:21:49:2e:cc
[ 46.085804] igb 0000:02:00.0: eth5: PBA No: E43709-003
[ 46.090946] igb 0000:02:00.0: Using MSI-X interrupts. 2 rx queue(s), 1 tx queue(s)
[ 46.098870] igb 0000:02:00.1: irq 84 for MSI/MSI-X
[ 46.098876] igb 0000:02:00.1: irq 85 for MSI/MSI-X
[ 46.098881] igb 0000:02:00.1: irq 86 for MSI/MSI-X
[ 46.098886] igb 0000:02:00.1: irq 87 for MSI/MSI-X
[ 46.104262] AMD-Vi: Using protection domain 23 for device 0000:02:00.0
[ 46.172988] IPv6: ADDRCONF(NETDEV_UP): eth5: link is not ready
[ 46.202875] pci 0000:02:10.1: [8086:10ca] type 00 class 0x020000
[ 46.203013] AMD-Vi: New device 0000:02:10.1
[ 46.207935] igb 0000:02:00.1: 1 VFs allocated
[ 46.373149] igb 0000:02:00.1: Intel(R) Gigabit Ethernet Network Connection
[ 46.380019] igb 0000:02:00.1: eth1: (PCIe:2.5Gb/s:Width x4) 00:1b:21:49:2e:cd
[ 46.387213] igb 0000:02:00.1: eth6: PBA No: E43709-003
[ 46.392347] igb 0000:02:00.1: Using MSI-X interrupts. 2 rx queue(s), 1 tx queue(s)
[ 46.400072] igbvf 0000:02:10.0: enabling device (0000 -> 0002)
[ 46.405977] igbvf 0000:02:10.0: irq 88 for MSI/MSI-X
[ 46.405983] igbvf 0000:02:10.0: irq 89 for MSI/MSI-X
[ 46.405988] igbvf 0000:02:10.0: irq 90 for MSI/MSI-X
[ 46.411492] AMD-Vi: Using protection domain 24 for device 0000:02:00.1
[ 46.480625] IPv6: ADDRCONF(NETDEV_UP): eth6: link is not ready
[ 46.486980] igbvf 0000:02:10.0: Intel(R) 82576 Virtual Function
[ 46.492895] igbvf 0000:02:10.0: Address: ce:5e:41:2f:36:ce
[ 46.498510] igbvf 0000:02:10.1: enabling device (0000 -> 0002)
[ 46.504394] igbvf 0000:02:10.1: irq 91 for MSI/MSI-X
[ 46.504400] igbvf 0000:02:10.1: irq 92 for MSI/MSI-X
[ 46.504405] igbvf 0000:02:10.1: irq 93 for MSI/MSI-X
[ 46.527012] igbvf 0000:02:10.1: Intel(R) 82576 Virtual Function
[ 46.532931] igbvf 0000:02:10.1: Address: 52:3e:8f:47:60:da
[ 46.573209] AMD-Vi: Event logged [IO_PAGE_FAULT device=02:10.0 domain=0x0000 address=0x000000021e170000 flags=0x0050]
[ 46.575620] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
[ 46.589607] AMD-Vi: Event logged [IO_PAGE_FAULT device=02:10.0 domain=0x0000 address=0x000000021e170040 flags=0x0050]
[ 46.600186] AMD-Vi: Event logged [IO_PAGE_FAULT device=02:10.0 domain=0x0000 address=0x000000021e170080 flags=0x0050]
[ 46.610763] AMD-Vi: Event logged [IO_PAGE_FAULT device=02:10.0 domain=0x0000 address=0x000000021e1700c0 flags=0x0050]
[ 46.669940] IPv6: ADDRCONF(NETDEV_UP): eth2: link is not ready

The devices (physical and virtual) are not operational after this. I
think this is a problem in the igb or igbvf driver. The addresses
reported in the IO_PAGE_FAULTS are system-ram addresses are were not
handed out by the AMD IOMMU driver (the driver only hands out DMA
handles below 4GB). Also the reported domain is 0 which means that the
driver for that device has not yet issued _any_ call to the DMA-API. But
the device is doing an DMA write request (as seen in the flags). Any
ideas? Also let me know if you need any additional information.

Thanks,

Joerg

--
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


2012-06-19 18:19:22

by Duyck, Alexander H

[permalink] [raw]
Subject: Re: IO_PAGE_FAULTS with igb or igbvf on AMD IOMMU system

On 06/19/2012 03:20 AM, Joerg Roedel wrote:
> Hi,
>
> I am trying to use an Intel 82576 NIC on an AMD IOMMU system with
> SR-IOV. When I load the igb module with max_vfs=1 to enable a virtual
> function I get IO_PAGE_FAULTS from the virtual functions. The relevant
> part of dmesg is:
>
> [ 45.788134] igb: Intel(R) Gigabit Ethernet Network Driver - version 3.4.7-k
> [ 45.795090] igb: Copyright (c) 2007-2012 Intel Corporation.
> [ 45.801049] igb 0000:02:00.0: irq 80 for MSI/MSI-X
> [ 45.801056] igb 0000:02:00.0: irq 81 for MSI/MSI-X
> [ 45.801061] igb 0000:02:00.0: irq 82 for MSI/MSI-X
> [ 45.801067] igb 0000:02:00.0: irq 83 for MSI/MSI-X
> [ 45.901445] pci 0000:02:10.0: [8086:10ca] type 00 class 0x020000
> [ 45.901585] AMD-Vi: New device 0000:02:10.0
> [ 45.906486] igb 0000:02:00.0: 1 VFs allocated
> [ 45.937918] igbvf: Intel(R) Gigabit Virtual Function Network Driver - version 2.0.1-k
> [ 45.945751] igbvf: Copyright (c) 2009 - 2012 Intel Corporation.
> [ 46.071749] igb 0000:02:00.0: Intel(R) Gigabit Ethernet Network Connection
> [ 46.078605] igb 0000:02:00.0: eth5: (PCIe:2.5Gb/s:Width x4) 00:1b:21:49:2e:cc
> [ 46.085804] igb 0000:02:00.0: eth5: PBA No: E43709-003
> [ 46.090946] igb 0000:02:00.0: Using MSI-X interrupts. 2 rx queue(s), 1 tx queue(s)
> [ 46.098870] igb 0000:02:00.1: irq 84 for MSI/MSI-X
> [ 46.098876] igb 0000:02:00.1: irq 85 for MSI/MSI-X
> [ 46.098881] igb 0000:02:00.1: irq 86 for MSI/MSI-X
> [ 46.098886] igb 0000:02:00.1: irq 87 for MSI/MSI-X
> [ 46.104262] AMD-Vi: Using protection domain 23 for device 0000:02:00.0
> [ 46.172988] IPv6: ADDRCONF(NETDEV_UP): eth5: link is not ready
> [ 46.202875] pci 0000:02:10.1: [8086:10ca] type 00 class 0x020000
> [ 46.203013] AMD-Vi: New device 0000:02:10.1
> [ 46.207935] igb 0000:02:00.1: 1 VFs allocated
> [ 46.373149] igb 0000:02:00.1: Intel(R) Gigabit Ethernet Network Connection
> [ 46.380019] igb 0000:02:00.1: eth1: (PCIe:2.5Gb/s:Width x4) 00:1b:21:49:2e:cd
> [ 46.387213] igb 0000:02:00.1: eth6: PBA No: E43709-003
> [ 46.392347] igb 0000:02:00.1: Using MSI-X interrupts. 2 rx queue(s), 1 tx queue(s)
> [ 46.400072] igbvf 0000:02:10.0: enabling device (0000 -> 0002)
> [ 46.405977] igbvf 0000:02:10.0: irq 88 for MSI/MSI-X
> [ 46.405983] igbvf 0000:02:10.0: irq 89 for MSI/MSI-X
> [ 46.405988] igbvf 0000:02:10.0: irq 90 for MSI/MSI-X
> [ 46.411492] AMD-Vi: Using protection domain 24 for device 0000:02:00.1
> [ 46.480625] IPv6: ADDRCONF(NETDEV_UP): eth6: link is not ready
> [ 46.486980] igbvf 0000:02:10.0: Intel(R) 82576 Virtual Function
> [ 46.492895] igbvf 0000:02:10.0: Address: ce:5e:41:2f:36:ce
> [ 46.498510] igbvf 0000:02:10.1: enabling device (0000 -> 0002)
> [ 46.504394] igbvf 0000:02:10.1: irq 91 for MSI/MSI-X
> [ 46.504400] igbvf 0000:02:10.1: irq 92 for MSI/MSI-X
> [ 46.504405] igbvf 0000:02:10.1: irq 93 for MSI/MSI-X
> [ 46.527012] igbvf 0000:02:10.1: Intel(R) 82576 Virtual Function
> [ 46.532931] igbvf 0000:02:10.1: Address: 52:3e:8f:47:60:da
> [ 46.573209] AMD-Vi: Event logged [IO_PAGE_FAULT device=02:10.0 domain=0x0000 address=0x000000021e170000 flags=0x0050]
> [ 46.575620] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
> [ 46.589607] AMD-Vi: Event logged [IO_PAGE_FAULT device=02:10.0 domain=0x0000 address=0x000000021e170040 flags=0x0050]
> [ 46.600186] AMD-Vi: Event logged [IO_PAGE_FAULT device=02:10.0 domain=0x0000 address=0x000000021e170080 flags=0x0050]
> [ 46.610763] AMD-Vi: Event logged [IO_PAGE_FAULT device=02:10.0 domain=0x0000 address=0x000000021e1700c0 flags=0x0050]
> [ 46.669940] IPv6: ADDRCONF(NETDEV_UP): eth2: link is not ready
>
> The devices (physical and virtual) are not operational after this. I
> think this is a problem in the igb or igbvf driver. The addresses
> reported in the IO_PAGE_FAULTS are system-ram addresses are were not
> handed out by the AMD IOMMU driver (the driver only hands out DMA
> handles below 4GB). Also the reported domain is 0 which means that the
> driver for that device has not yet issued _any_ call to the DMA-API. But
> the device is doing an DMA write request (as seen in the flags). Any
> ideas? Also let me know if you need any additional information.
>
> Thanks,
>
> Joerg
>
Joerg,

Based on the faults it would look like accessing the descriptor rings is
probably triggering the errors. We allocate the descriptor rings using
dma_alloc_coherent so the rings should be mapped correctly.

The PF and VF will end up being locked out since they are hung on an
uncompleted DMA transaction. Normally we recommend that PCIe Advanced
Error Reporting be enabled if an IOMMU is enabled so the device can be
reset after triggering a page fault event.

The first thing that pops into my head for possible issues would be that
maybe the VF pci_dev structure or the device structure isn't being
correctly initialized when SR-IOV is enabled on the igb interface. Do
you know if there are any AMD IOMMU specific values on those structures,
such as the domain, that are supposed to be initialized prior to calling
the DMA API calls? If so, have you tried adding debug output to verify
if those values are initialized on a VF prior to bringing up a VF interface?

Also have you tried any other SR-IOV capable devices on this system?
That would be a valuable data point because we could then exclude the
SR-IOV code as being a possible cause for the issues if other SR-IOV
devices are working without any issues.

Thanks,

Alex

2012-06-20 09:48:56

by Joerg Roedel

[permalink] [raw]
Subject: Re: IO_PAGE_FAULTS with igb or igbvf on AMD IOMMU system

Hi Alexander,

On Tue, Jun 19, 2012 at 11:19:20AM -0700, Alexander Duyck wrote:
> Based on the faults it would look like accessing the descriptor rings is
> probably triggering the errors. We allocate the descriptor rings using
> dma_alloc_coherent so the rings should be mapped correctly.

Can this happen before the driver actually allocated the descriptors? As
I said, the faults appear before any DMA-API call was made for that
device (hence, domain=0x0000, because the domain is assigned on the
first call to the DMA-API for a device).

Also, I don't see the faults every time. One out of ten times
(estimated) there are not faults. Is it possible that this is a race
condition, e.g. that the card trys to access its descriptor rings before
the driver allocated them (or something like that).

> The PF and VF will end up being locked out since they are hung on an
> uncompleted DMA transaction. Normally we recommend that PCIe Advanced
> Error Reporting be enabled if an IOMMU is enabled so the device can be
> reset after triggering a page fault event.
>
> The first thing that pops into my head for possible issues would be that
> maybe the VF pci_dev structure or the device structure isn't being
> correctly initialized when SR-IOV is enabled on the igb interface. Do
> you know if there are any AMD IOMMU specific values on those structures,
> such as the domain, that are supposed to be initialized prior to calling
> the DMA API calls? If so, have you tried adding debug output to verify
> if those values are initialized on a VF prior to bringing up a VF interface?

Well, when the device appears in the system the IOMMU driver gets
notified about it using the device_change notifiers. It will then
allocate all necessary data structures. I also verified that this works
correctly while debugging this issue. So I am pretty sure the problem
isn't there :)

> Also have you tried any other SR-IOV capable devices on this system?
> That would be a valuable data point because we could then exclude the
> SR-IOV code as being a possible cause for the issues if other SR-IOV
> devices are working without any issues.

I have another SR-IOV device, but that fails to even enable SR-IOV
because the BIOS did not let enough MMIO resources left. So I couldn't
try it with that device. With the 82576 card enabling SR-IOV works fine
but results in the faults from the VF.

Regards,

Joerg

--
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

2012-06-20 16:51:41

by Rose, Gregory V

[permalink] [raw]
Subject: RE: IO_PAGE_FAULTS with igb or igbvf on AMD IOMMU system

> -----Original Message-----
> From: Joerg Roedel [mailto:[email protected]]
> Sent: Wednesday, June 20, 2012 2:49 AM
> To: Duyck, Alexander H
> Cc: Kirsher, Jeffrey T; Brandeburg, Jesse; Allan, Bruce W; Wyborny,
> Carolyn; Skidmore, Donald C; Rose, Gregory V; Waskiewicz Jr, Peter P;
> Ronciak, John; [email protected]; linux-
> [email protected]
> Subject: Re: IO_PAGE_FAULTS with igb or igbvf on AMD IOMMU system
>
> Hi Alexander,
>
> On Tue, Jun 19, 2012 at 11:19:20AM -0700, Alexander Duyck wrote:
> > Based on the faults it would look like accessing the descriptor rings
> > is probably triggering the errors. We allocate the descriptor rings
> > using dma_alloc_coherent so the rings should be mapped correctly.
>
> Can this happen before the driver actually allocated the descriptors? As I
> said, the faults appear before any DMA-API call was made for that device
> (hence, domain=0x0000, because the domain is assigned on the first call to
> the DMA-API for a device).
>
> Also, I don't see the faults every time. One out of ten times
> (estimated) there are not faults. Is it possible that this is a race
> condition, e.g. that the card trys to access its descriptor rings before
> the driver allocated them (or something like that).
>
> > The PF and VF will end up being locked out since they are hung on an
> > uncompleted DMA transaction. Normally we recommend that PCIe Advanced
> > Error Reporting be enabled if an IOMMU is enabled so the device can be
> > reset after triggering a page fault event.
> >
> > The first thing that pops into my head for possible issues would be
> > that maybe the VF pci_dev structure or the device structure isn't
> > being correctly initialized when SR-IOV is enabled on the igb
> > interface. Do you know if there are any AMD IOMMU specific values on
> > those structures, such as the domain, that are supposed to be
> > initialized prior to calling the DMA API calls? If so, have you tried
> > adding debug output to verify if those values are initialized on a VF
> prior to bringing up a VF interface?
>
> Well, when the device appears in the system the IOMMU driver gets notified
> about it using the device_change notifiers. It will then allocate all
> necessary data structures. I also verified that this works correctly while
> debugging this issue. So I am pretty sure the problem isn't there :)
>
> > Also have you tried any other SR-IOV capable devices on this system?
> > That would be a valuable data point because we could then exclude the
> > SR-IOV code as being a possible cause for the issues if other SR-IOV
> > devices are working without any issues.
>
> I have another SR-IOV device, but that fails to even enable SR-IOV because
> the BIOS did not let enough MMIO resources left. So I couldn't try it with
> that device. With the 82576 card enabling SR-IOV works fine but results in
> the faults from the VF.

That sounds very suspicious to me. The 82576 might still seem to work because it only has less than 8 VFs, which might be why it isn't reporting the MMIO resources issue. That doesn't mean it would work correctly and I suspect that the IO_PAGE_FAULT error is due to an MMIO access, not a DMA access. MMIO resources for devices are page mapped and if your BIOS is broken that might not be done correctly.

I have the feeling the issue is the BIOS. You probably want to contact your system vendor and make sure you have the correct BIOS installed or even whether they claim that the system is supposed to support SR-IOV.

- Greg

>
> Regards,
>
> Joerg
>
> --
> AMD Operating System Research Center
>
> Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General
> Managers: Alberto Bozzo
> Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr.
> 43632

2012-06-20 22:48:47

by Duyck, Alexander H

[permalink] [raw]
Subject: Re: IO_PAGE_FAULTS with igb or igbvf on AMD IOMMU system

On 06/20/2012 02:48 AM, Joerg Roedel wrote:
> Hi Alexander,
>
> On Tue, Jun 19, 2012 at 11:19:20AM -0700, Alexander Duyck wrote:
>> Based on the faults it would look like accessing the descriptor rings is
>> probably triggering the errors. We allocate the descriptor rings using
>> dma_alloc_coherent so the rings should be mapped correctly.
> Can this happen before the driver actually allocated the descriptors? As
> I said, the faults appear before any DMA-API call was made for that
> device (hence, domain=0x0000, because the domain is assigned on the
> first call to the DMA-API for a device).
>
> Also, I don't see the faults every time. One out of ten times
> (estimated) there are not faults. Is it possible that this is a race
> condition, e.g. that the card trys to access its descriptor rings before
> the driver allocated them (or something like that).
The descriptor rings are located in system memory so it shouldn't be
possible for the driver to start accessing anything without requesting
the address via the DMA-API calls. As I said we use dma_alloc_coherent
to allocate the descriptor rings so I don't think it is possible for us
to configure the rings without calling the DMA-API. You might try
double checking the timing on igbvf driver load versus the notifier
getting to the IOMMU. I supposed this could be some sort of
initialization race between the driver and the IOMMU. One way to verify
that would be to blacklist the VF driver and make sure it is unloaded
prior to loading the PF driver. Load it manually via insmod after you
have have created the PCI devices by loading the PF module. This way
you can guarantee that the igbvf load isn't somehow occuring prior to
the IOMMU notifier being triggered.

>> The PF and VF will end up being locked out since they are hung on an
>> uncompleted DMA transaction. Normally we recommend that PCIe Advanced
>> Error Reporting be enabled if an IOMMU is enabled so the device can be
>> reset after triggering a page fault event.
>>
>> The first thing that pops into my head for possible issues would be that
>> maybe the VF pci_dev structure or the device structure isn't being
>> correctly initialized when SR-IOV is enabled on the igb interface. Do
>> you know if there are any AMD IOMMU specific values on those structures,
>> such as the domain, that are supposed to be initialized prior to calling
>> the DMA API calls? If so, have you tried adding debug output to verify
>> if those values are initialized on a VF prior to bringing up a VF interface?
> Well, when the device appears in the system the IOMMU driver gets
> notified about it using the device_change notifiers. It will then
> allocate all necessary data structures. I also verified that this works
> correctly while debugging this issue. So I am pretty sure the problem
> isn't there :)
You might want to try also adding some code to the dma_alloc_coherent
function in your kernel to verify that when the VF driver is allocating
the descriptor rings that the domain info is set-up correctly.

>> Also have you tried any other SR-IOV capable devices on this system?
>> That would be a valuable data point because we could then exclude the
>> SR-IOV code as being a possible cause for the issues if other SR-IOV
>> devices are working without any issues.
> I have another SR-IOV device, but that fails to even enable SR-IOV
> because the BIOS did not let enough MMIO resources left. So I couldn't
> try it with that device. With the 82576 card enabling SR-IOV works fine
> but results in the faults from the VF.
If you are working on a recent kernel you should be able to overcome the
BIOS issues. I believe there are the options "pci=assign-busses" if the
BIOS doesn't place enough buses on the bridge to support SR-IOV, and
there is "pci=realloc" which will reassign the MMIO resources to make
enough room for VF MMIO bars.

Thanks,

Alex

2012-06-25 11:20:49

by Joerg Roedel

[permalink] [raw]
Subject: Re: IO_PAGE_FAULTS with igb or igbvf on AMD IOMMU system

Hi,
On Wed, Jun 20, 2012 at 03:48:38PM -0700, Alexander Duyck wrote:
> On 06/20/2012 02:48 AM, Joerg Roedel wrote:

> If you are working on a recent kernel you should be able to overcome the
> BIOS issues. I believe there are the options "pci=assign-busses" if the
> BIOS doesn't place enough buses on the bridge to support SR-IOV, and
> there is "pci=realloc" which will reassign the MMIO resources to make
> enough room for VF MMIO bars.

Thanks for all your help, it turned out to be a bug in the AMD IOMMU
driver. The pdev->dev didn't get the right dma_ops struct assigned so
that mapping requests actually never made it to the IOMMU driver. This
only happened with hotplugged and VFs of SR-IOV devices. Too bad, but I
sent a fix for that and it work now again. Thanks again for your help.

Regards,

Joerg

--
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632