2019-09-23 14:15:33

by Kurt Garloff

[permalink] [raw]
Subject: IOMMU vs Ryzen embedded EMMC controller

Hi,



enabling the IOMMU on my Ryzen v1605b (UDOO Bolt v8) does result in a non-working EMMC driver.
Without enabling IOMMU, it works like a charm.
From my POV this needs fixing, and I consider this a bug.

I looked into sdhci to see whether the right dma_map_sg() calls are missing, but they are there. The sdhci driver appears to do the right thing.
It seems that the EMMC controller is not considered and reported as PCI device while it still goes through the PCI IOMMU :-O

Here is what happens on loading the sdhci-acpi driver on kernel 5.3:

[12916.740148] mmc0: ADMA error
[12916.740154] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[12916.740163] mmc0: sdhci: Sys addr:  0x00000000 | Version:  0x00001002
[12916.740170] mmc0: sdhci: Blk size:  0x00007200 | Blk cnt:  0x00000001
[12916.740179] mmc0: sdhci: Argument:  0x00000000 | Trn mode: 0x00000013
[12916.740184] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:13.1 domain=0x0000 address=0x6f2163200 flags=0x0050]
[12916.740193] mmc0: sdhci: Present:   0xf1ff0000 | Host ctl: 0x00000019
[12916.740202] mmc0: sdhci: Power:     0x0000000f | Blk gap:  0x00000000
[12916.740211] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x0000f447
[12916.740219] mmc0: sdhci: Timeout:   0x0000000a | Int stat: 0x00000000
[12916.740226] mmc0: sdhci: Int enab:  0x03ff000b | Sig enab: 0x03ff000b
[12916.740232] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000
[12916.740239] mmc0: sdhci: Caps:      0x71fec8b2 | Caps_1:   0x00000577
[12916.740246] mmc0: sdhci: Cmd:       0x0000083a | Max curr: 0x00c80064
[12916.740253] mmc0: sdhci: Resp[0]:   0x00000700 | Resp[1]:  0xffffffff
[12916.740259] mmc0: sdhci: Resp[2]:   0x328f5903 | Resp[3]:  0x00d00f00
[12916.740264] mmc0: sdhci: Host ctl2: 0x00000000
[12916.740273] mmc0: sdhci: ADMA Err:  0x00000001 | ADMA Ptr: 0x00000006f2163200
[12916.740274] mmc0: sdhci: ============================================
[12916.740337] mmc0: error -5 whilst initialising MMC card

As you can see, from an IOMMU perspective, this is PCI device 00:13.1.
However, from a kernel perspective, it's not on the PCI bus and does not require IOMMU translation, but rather just direct DMA mapping.


[email protected](/):~/linux-53/drivers/mmc/host [0]$ sudo /sbin/lspci
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Root Complex
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 IOMMU
00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP Bridge [6:0]
00:01.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP Bridge [6:0]
00:01.6 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP Bridge [6:0]
00:01.7 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP Bridge [6:0]
00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Internal PCIe GPP Bridge 0 to Bus A
00:08.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Internal PCIe GPP Bridge 0 to Bus B
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 61)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 3
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 5
00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 6
00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 7
01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
03:00.0 Network controller: Intel Corporation Dual Band Wireless-AC 3168NGW [Stone Peak] (rev 10)
04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Raven Ridge [Radeon Vega Series / Radeon Vega Mobile Series] (rev 83)
05:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Raven/Raven2/Fenghuang HDMI/DP Audio Controller
05:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor
05:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Raven USB 3.1
05:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Raven USB 3.1
05:00.5 Multimedia controller: Advanced Micro Devices, Inc. [AMD] Raven/Raven2/FireFlight/Renoir Audio Processor
05:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) HD Audio Controller
05:00.7 Non-VGA unclassified device: Advanced Micro Devices, Inc. [AMD] Raven/Raven2/Renoir Non-Sensor Fusion Hub KMDF driver
06:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 61)

Looks like we'd need some quirks to actually create a pci_device handle for the embedded AMD eMMC controller?
Thoughts?

PS: Please copy me on responses, I'm off LKML for half a decade now :-O

--
Kurt Garloff<[email protected]e>
Cologne, Germany


2019-09-26 09:40:53

by Jiri Kosina

[permalink] [raw]
Subject: Re: IOMMU vs Ryzen embedded EMMC controller

On Sat, 21 Sep 2019, Kurt Garloff wrote:

> Hi,
>
>
> enabling the IOMMU on my Ryzen v1605b (UDOO Bolt v8) does result in a
> non-working EMMC driver.
> Without enabling IOMMU, it works like a charm.
> From my POV this needs fixing, and I consider this a bug.

[ CCing Joerg ]

> I looked into sdhci to see whether the right dma_map_sg() calls are
> missing, but they are there. The sdhci driver appears to do the right
> thing. It seems that the EMMC controller is not considered and reported
> as PCI device while it still goes through the PCI IOMMU :-O
>
> Here is what happens on loading the sdhci-acpi driver on kernel 5.3:
>
> [12916.740148] mmc0: ADMA error
> [12916.740154] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
> [12916.740163] mmc0: sdhci: Sys addr:  0x00000000 | Version:  0x00001002
> [12916.740170] mmc0: sdhci: Blk size:  0x00007200 | Blk cnt:  0x00000001
> [12916.740179] mmc0: sdhci: Argument:  0x00000000 | Trn mode: 0x00000013
> [12916.740184] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:13.1
> domain=0x0000 address=0x6f2163200 flags=0x0050]
> [12916.740193] mmc0: sdhci: Present:   0xf1ff0000 | Host ctl: 0x00000019
> [12916.740202] mmc0: sdhci: Power:     0x0000000f | Blk gap:  0x00000000
> [12916.740211] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x0000f447
> [12916.740219] mmc0: sdhci: Timeout:   0x0000000a | Int stat: 0x00000000
> [12916.740226] mmc0: sdhci: Int enab:  0x03ff000b | Sig enab: 0x03ff000b
> [12916.740232] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000
> [12916.740239] mmc0: sdhci: Caps:      0x71fec8b2 | Caps_1:   0x00000577
> [12916.740246] mmc0: sdhci: Cmd:       0x0000083a | Max curr: 0x00c80064
> [12916.740253] mmc0: sdhci: Resp[0]:   0x00000700 | Resp[1]:  0xffffffff
> [12916.740259] mmc0: sdhci: Resp[2]:   0x328f5903 | Resp[3]:  0x00d00f00
> [12916.740264] mmc0: sdhci: Host ctl2: 0x00000000
> [12916.740273] mmc0: sdhci: ADMA Err:  0x00000001 | ADMA Ptr:
> 0x00000006f2163200
> [12916.740274] mmc0: sdhci: ============================================
> [12916.740337] mmc0: error -5 whilst initialising MMC card

Do you have BAR memory allocation failures in dmesg with IOMMU on?
Actually, sharing both working and non-working dmesg, as well as
/proc/iomem contents, would be helpful.

Keeping the rest of mail below untouched for reference.

>
> As you can see, from an IOMMU perspective, this is PCI device 00:13.1.
> However, from a kernel perspective, it's not on the PCI bus and does not
> require IOMMU translation, but rather just direct DMA mapping.
>
>
> [email protected](/):~/linux-53/drivers/mmc/host [0]$ sudo /sbin/lspci
> 00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Root
> Complex
> 00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 IOMMU
> 00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models
> 00h-1fh) PCIe Dummy Host Bridge
> 00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP
> Bridge [6:0]
> 00:01.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP
> Bridge [6:0]
> 00:01.6 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP
> Bridge [6:0]
> 00:01.7 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP
> Bridge [6:0]
> 00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models
> 00h-1fh) PCIe Dummy Host Bridge
> 00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Internal
> PCIe GPP Bridge 0 to Bus A
> 00:08.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Internal
> PCIe GPP Bridge 0 to Bus B
> 00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev
> 61)
> 00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
> 00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device
> 24: Function 0
> 00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device
> 24: Function 1
> 00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device
> 24: Function 2
> 00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device
> 24: Function 3
> 00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device
> 24: Function 4
> 00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device
> 24: Function 5
> 00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device
> 24: Function 6
> 00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device
> 24: Function 7
> 01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD
> Controller SM981/PM981/PM983
> 03:00.0 Network controller: Intel Corporation Dual Band Wireless-AC 3168NGW
> [Stone Peak] (rev 10)
> 04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411
> PCI Express Gigabit Ethernet Controller (rev 0c)
> 05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
> Raven Ridge [Radeon Vega Series / Radeon Vega Mobile Series] (rev 83)
> 05:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI]
> Raven/Raven2/Fenghuang HDMI/DP Audio Controller
> 05:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h
> (Models 10h-1fh) Platform Security Processor
> 05:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Raven USB 3.1
> 05:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Raven USB 3.1
> 05:00.5 Multimedia controller: Advanced Micro Devices, Inc. [AMD]
> Raven/Raven2/FireFlight/Renoir Audio Processor
> 05:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h (Models
> 10h-1fh) HD Audio Controller
> 05:00.7 Non-VGA unclassified device: Advanced Micro Devices, Inc. [AMD]
> Raven/Raven2/Renoir Non-Sensor Fusion Hub KMDF driver
> 06:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA
> Controller [AHCI mode] (rev 61)
>
> Looks like we'd need some quirks to actually create a pci_device handle for
> the embedded AMD eMMC controller?
> Thoughts?
>
> PS: Please copy me on responses, I'm off LKML for half a decade now :-O
>
> --
> Kurt Garloff<[email protected]>
> Cologne, Germany
>

--
Jiri Kosina
SUSE Labs

2019-09-26 09:42:43

by Joerg Roedel

[permalink] [raw]
Subject: Re: IOMMU vs Ryzen embedded EMMC controller

On Wed, Sep 25, 2019 at 05:27:32PM +0200, Jiri Kosina wrote:
> On Sat, 21 Sep 2019, Kurt Garloff wrote:
> > [12916.740274] mmc0: sdhci: ============================================
> > [12916.740337] mmc0: error -5 whilst initialising MMC card
>
> Do you have BAR memory allocation failures in dmesg with IOMMU on?
> Actually, sharing both working and non-working dmesg, as well as
> /proc/iomem contents, would be helpful.

Yes, can you please grab dmesg from a boot with iommu enabled and add
'amd_iommu_dump' to the kernel command line? That should give some hints
on what is going on.

Regards,

Joerg

2019-09-27 09:57:41

by Kurt Garloff

[permalink] [raw]
Subject: Re: IOMMU vs Ryzen embedded EMMC controller

Hi Jörg,

On 25/09/2019 17:42, Joerg Roedel wrote:
> On Wed, Sep 25, 2019 at 05:27:32PM +0200, Jiri Kosina wrote:
>> On Sat, 21 Sep 2019, Kurt Garloff wrote:
>>> [12916.740274] mmc0: sdhci: ============================================
>>> [12916.740337] mmc0: error -5 whilst initialising MMC card
>> Do you have BAR memory allocation failures in dmesg with IOMMU on?

No. The device is *not* treated as PCI device and I still
think that this is the source of the evil.

>> Actually, sharing both working and non-working dmesg, as well as
>> /proc/iomem contents, would be helpful.
> Yes, can you please grab dmesg from a boot with iommu enabled and add
> 'amd_iommu_dump' to the kernel command line? That should give some hints
> on what is going on.

For now I attach a dmesg and iomem from the boot with IOMMU
enabled.
Nothing much interesting without IOMMU, sdhci-acpi there
just works -- let me know if you still want me to send the
kernel msg.

Thanks for looking into this!

--

Kurt Garloff <[email protected]>
Cologne, Germany


Attachments:
iomem.iommu.txt (3.09 kB)
dmesg.iommu.txt (109.61 kB)
Download all attachments

2019-09-27 10:30:59

by Shah, Nehal-bakulchandra

[permalink] [raw]
Subject: Re: IOMMU vs Ryzen embedded EMMC controller

Hi Kurf

On 9/27/2019 3:17 PM, Kurt Garloff wrote:
> Hi Jörg,
>
> On 25/09/2019 17:42, Joerg Roedel wrote:
>> On Wed, Sep 25, 2019 at 05:27:32PM +0200, Jiri
>> Kosina wrote:
>>> On Sat, 21 Sep 2019, Kurt Garloff wrote:
>>>> [12916.740274] mmc0: sdhci:
>>>> ============================================
>>>> [12916.740337] mmc0: error -5 whilst
>>>> initialising MMC card
>>> Do you have BAR memory allocation failures in
>>> dmesg with IOMMU on?
>
> No. The device is *not* treated as PCI device and
> I still think that this is the source of the evil.
>
>>> Actually, sharing both working and non-working
>>> dmesg, as well as
>>> /proc/iomem contents, would be helpful.
>> Yes, can you please grab dmesg from a boot with
>> iommu enabled and add
>> 'amd_iommu_dump' to the kernel command line?
>> That should give some hints
>> on what is going on.
>
> For now I attach a dmesg and iomem from the boot
> with IOMMU enabled.
> Nothing much interesting without IOMMU, sdhci-acpi
> there just works -- let me know if you still want
> me to send the kernel msg.
>
> Thanks for looking into this!
>

I have added Suravee from AMD in the mail loop. He
works on IOMMU part. As per my understanding, it
needs a patch in IOMMU driver for adding support
of EMMC. Note that on Ryzen platform we have EMMC
5.0 as ACPI device.

Thanks

Nehal Shah

2019-10-09 19:26:07

by Jiri Kosina

[permalink] [raw]
Subject: Re: IOMMU vs Ryzen embedded EMMC controller

On Fri, 27 Sep 2019, Shah, Nehal-bakulchandra wrote:

> >>> Do you have BAR memory allocation failures in dmesg with IOMMU on?
> >
> > No. The device is *not* treated as PCI device and I still think that
> > this is the source of the evil.
> >
> >>> Actually, sharing both working and non-working dmesg, as well as
> >>> /proc/iomem contents, would be helpful.

> >> Yes, can you please grab dmesg from a boot with iommu enabled and add
> >> 'amd_iommu_dump' to the kernel command line? That should give some
> >> hints on what is going on.
> >
> > For now I attach a dmesg and iomem from the boot with IOMMU enabled.
> > Nothing much interesting without IOMMU, sdhci-acpi there just works --
> > let me know if you still want me to send the kernel msg.
> >
> > Thanks for looking into this!
> >
>
> I have added Suravee from AMD in the mail loop. He works on IOMMU part.
> As per my understanding, it needs a patch in IOMMU driver for adding
> support of EMMC. Note that on Ryzen platform we have EMMC 5.0 as ACPI
> device.

Friendly ping ... any news here?

Thanks,

--
Jiri Kosina
SUSE Labs

2019-10-09 19:37:45

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: Re: IOMMU vs Ryzen embedded EMMC controller

Hi,

On 10/9/19 2:25 PM, Jiri Kosina wrote:
> On Fri, 27 Sep 2019, Shah, Nehal-bakulchandra wrote:
>
>>>>> Do you have BAR memory allocation failures in dmesg with IOMMU on?
>>>
>>> No. The device is *not* treated as PCI device and I still think that
>>> this is the source of the evil.
>>>
>>>>> Actually, sharing both working and non-working dmesg, as well as
>>>>> /proc/iomem contents, would be helpful.
>
>>>> Yes, can you please grab dmesg from a boot with iommu enabled and add
>>>> 'amd_iommu_dump' to the kernel command line? That should give some
>>>> hints on what is going on.
>>>
>>> For now I attach a dmesg and iomem from the boot with IOMMU enabled.
>>> Nothing much interesting without IOMMU, sdhci-acpi there just works --
>>> let me know if you still want me to send the kernel msg.
>>>
>>> Thanks for looking into this!
>>>
>>
>> I have added Suravee from AMD in the mail loop. He works on IOMMU part.
>> As per my understanding, it needs a patch in IOMMU driver for adding
>> support of EMMC. Note that on Ryzen platform we have EMMC 5.0 as ACPI
>> device.
>
> Friendly ping ... any news here?
>
> Thanks,
>

Could you please boot the system w/ kernel option amd_iommu_dump=1,
and do "dmesg | grep AMD-Vi". Then provide the output.

I suspect that there is something missing in the IVRS table, where it needs
to provide ACPI HID for the eMMC device.

See kernel parameter:

ivrs_acpihid [HW,X86_64]
Provide an override to the ACPI-HID:UID<->DEVICE-ID
mapping provided in the IVRS ACPI table. For
example, to map UART-HID:UID AMD0020:0 to
PCI device 00:14.5 write the parameter as:
ivrs_acpihid[00:14.5]=AMD0020:0

Here we might need to do.

ivrs_acpihid[00:13.1]=<emmc ACPI HID on that system>

Thanks,
Suravee