Baolu,
On 1/5/2023 4:07 PM, Baolu Lu wrote:
> On 2023/1/5 18:27, Vasant Hegde wrote:
>> On 1/5/2023 6:39 AM, Matt Fagnani wrote:
>>> I built 6.2-rc2 with the patch applied. The same black screen problem happened
>>> with 6.2-rc2 with the patch. I tried to use early kdump with 6.2-rc2 with the
>>> patch twice by panicking the kernel with sysrq+alt+c after the black screen
>>> happened. The system rebooted after about 10-20 seconds both times, but no kdump
>>> and dmesg files were saved in /var/crash. I'm attaching the lspci -vvv output as
>>> requested.
>>>
>> Thanks for testing. As mentioned earlier I was not expecting this patch to fix
>> the black screen issue. It should fix kernel warnings and IOMMU page fault
>> related call traces. By any chance do you have the kernel boot logs?
>>
>>
>> @Baolu,
>> Looking into lspci output, it doesn't list ACS feature for Graphics card. So
>> with your fix it didn't enable PASID and hence it failed to boot.
>
> So do you mind telling why does the PASID need to be enabled for the
> graphic device? Or in another word, what does the graphic driver use the
> PASID for?
Honestly I don't know the complete details of how PASID works with graphics
card. May be Alex or Joerg can explain it better.
-Vasant
[AMD Official Use Only - General]
> -----Original Message-----
> From: Hegde, Vasant <[email protected]>
> Sent: Thursday, January 5, 2023 5:46 AM
> To: Baolu Lu <[email protected]>; Matt Fagnani
> <[email protected]>; Thorsten Leemhuis <[email protected]>;
> Deucher, Alexander <[email protected]>; Joerg Roedel
> <[email protected]>
> Cc: [email protected]; LKML <[email protected]>;
> [email protected]; Linux PCI <[email protected]>; Bjorn
> Helgaas <[email protected]>
> Subject: Re: [regression, bisected, pci/iommu] Bug 216865 - Black screen
> when amdgpu started during 6.2-rc1 boot with AMD IOMMU enabled
>
> Baolu,
>
>
> On 1/5/2023 4:07 PM, Baolu Lu wrote:
> > On 2023/1/5 18:27, Vasant Hegde wrote:
> >> On 1/5/2023 6:39 AM, Matt Fagnani wrote:
> >>> I built 6.2-rc2 with the patch applied. The same black screen
> >>> problem happened with 6.2-rc2 with the patch. I tried to use early
> >>> kdump with 6.2-rc2 with the patch twice by panicking the kernel with
> >>> sysrq+alt+c after the black screen happened. The system rebooted
> >>> after about 10-20 seconds both times, but no kdump and dmesg files
> >>> were saved in /var/crash. I'm attaching the lspci -vvv output as
> requested.
> >>>
> >> Thanks for testing. As mentioned earlier I was not expecting this
> >> patch to fix the black screen issue. It should fix kernel warnings
> >> and IOMMU page fault related call traces. By any chance do you have the
> kernel boot logs?
> >>
> >>
> >> @Baolu,
> >> Looking into lspci output, it doesn't list ACS feature for
> >> Graphics card. So with your fix it didn't enable PASID and hence it failed to
> boot.
> >
> > So do you mind telling why does the PASID need to be enabled for the
> > graphic device? Or in another word, what does the graphic driver use
> > the PASID for?
>
> Honestly I don't know the complete details of how PASID works with graphics
> card. May be Alex or Joerg can explain it better.
+ Felix
The GPU driver uses the pasid for shared virtual memory between the CPU and GPU. I.e., so that the user apps can use the same virtual address space on the GPU and the CPU. It also uses pasid to take advantage of recoverable device page faults using PRS.
Alex
Am 2023-01-05 um 09:46 schrieb Deucher, Alexander:
> [AMD Official Use Only - General]
>
>> -----Original Message-----
>> From: Hegde, Vasant <[email protected]>
>> Sent: Thursday, January 5, 2023 5:46 AM
>> To: Baolu Lu <[email protected]>; Matt Fagnani
>> <[email protected]>; Thorsten Leemhuis <[email protected]>;
>> Deucher, Alexander <[email protected]>; Joerg Roedel
>> <[email protected]>
>> Cc: [email protected]; LKML <[email protected]>;
>> [email protected]; Linux PCI <[email protected]>; Bjorn
>> Helgaas <[email protected]>
>> Subject: Re: [regression, bisected, pci/iommu] Bug 216865 - Black screen
>> when amdgpu started during 6.2-rc1 boot with AMD IOMMU enabled
>>
>> Baolu,
>>
>>
>> On 1/5/2023 4:07 PM, Baolu Lu wrote:
>>> On 2023/1/5 18:27, Vasant Hegde wrote:
>>>> On 1/5/2023 6:39 AM, Matt Fagnani wrote:
>>>>> I built 6.2-rc2 with the patch applied. The same black screen
>>>>> problem happened with 6.2-rc2 with the patch. I tried to use early
>>>>> kdump with 6.2-rc2 with the patch twice by panicking the kernel with
>>>>> sysrq+alt+c after the black screen happened. The system rebooted
>>>>> after about 10-20 seconds both times, but no kdump and dmesg files
>>>>> were saved in /var/crash. I'm attaching the lspci -vvv output as
>> requested.
>>>> Thanks for testing. As mentioned earlier I was not expecting this
>>>> patch to fix the black screen issue. It should fix kernel warnings
>>>> and IOMMU page fault related call traces. By any chance do you have the
>> kernel boot logs?
>>>> @Baolu,
>>>> Looking into lspci output, it doesn't list ACS feature for
>>>> Graphics card. So with your fix it didn't enable PASID and hence it failed to
>> boot.
>>> So do you mind telling why does the PASID need to be enabled for the
>>> graphic device? Or in another word, what does the graphic driver use
>>> the PASID for?
>> Honestly I don't know the complete details of how PASID works with graphics
>> card. May be Alex or Joerg can explain it better.
> + Felix
>
> The GPU driver uses the pasid for shared virtual memory between the CPU and GPU. I.e., so that the user apps can use the same virtual address space on the GPU and the CPU. It also uses pasid to take advantage of recoverable device page faults using PRS.
Agreed. This applies to GPU computing on some older AMD APUs that take
advantage of memory coherence and IOMMUv2 address translation to create
a shared virtual address space between the CPU and GPU. In this case it
seems to be a Carrizo APU. It is also true for Raven APUs.
Regards,
Felix
>
> Alex
+Jason
On 1/5/23 11:27 PM, Felix Kuehling wrote:
> Am 2023-01-05 um 09:46 schrieb Deucher, Alexander:
>> [AMD Official Use Only - General]
>>
>>> -----Original Message-----
>>> From: Hegde, Vasant <[email protected]>
>>> Sent: Thursday, January 5, 2023 5:46 AM
>>> To: Baolu Lu <[email protected]>; Matt Fagnani
>>> <[email protected]>; Thorsten Leemhuis <[email protected]>;
>>> Deucher, Alexander <[email protected]>; Joerg Roedel
>>> <[email protected]>
>>> Cc: [email protected]; LKML <[email protected]>;
>>> [email protected]; Linux PCI <[email protected]>;
>>> Bjorn
>>> Helgaas <[email protected]>
>>> Subject: Re: [regression, bisected, pci/iommu] Bug 216865 - Black screen
>>> when amdgpu started during 6.2-rc1 boot with AMD IOMMU enabled
>>>
>>> Baolu,
>>>
>>>
>>> On 1/5/2023 4:07 PM, Baolu Lu wrote:
>>>> On 2023/1/5 18:27, Vasant Hegde wrote:
>>>>> On 1/5/2023 6:39 AM, Matt Fagnani wrote:
>>>>>> I built 6.2-rc2 with the patch applied. The same black screen
>>>>>> problem happened with 6.2-rc2 with the patch. I tried to use early
>>>>>> kdump with 6.2-rc2 with the patch twice by panicking the kernel with
>>>>>> sysrq+alt+c after the black screen happened. The system rebooted
>>>>>> after about 10-20 seconds both times, but no kdump and dmesg files
>>>>>> were saved in /var/crash. I'm attaching the lspci -vvv output as
>>> requested.
>>>>> Thanks for testing. As mentioned earlier I was not expecting this
>>>>> patch to fix the black screen issue. It should fix kernel warnings
>>>>> and IOMMU page fault related call traces. By any chance do you have
>>>>> the
>>> kernel boot logs?
>>>>> @Baolu,
>>>>> Looking into lspci output, it doesn't list ACS feature for
>>>>> Graphics card. So with your fix it didn't enable PASID and hence it
>>>>> failed to
>>> boot.
>>>> So do you mind telling why does the PASID need to be enabled for the
>>>> graphic device? Or in another word, what does the graphic driver use
>>>> the PASID for?
>>> Honestly I don't know the complete details of how PASID works with
>>> graphics
>>> card. May be Alex or Joerg can explain it better.
>> + Felix
>>
>> The GPU driver uses the pasid for shared virtual memory between the
>> CPU and GPU. I.e., so that the user apps can use the same virtual
>> address space on the GPU and the CPU. It also uses pasid to take
>> advantage of recoverable device page faults using PRS.
>
> Agreed. This applies to GPU computing on some older AMD APUs that take
> advantage of memory coherence and IOMMUv2 address translation to create
> a shared virtual address space between the CPU and GPU. In this case it
> seems to be a Carrizo APU. It is also true for Raven APUs.
Thanks for the explanation.
This is actually the problem that commit 201007ef707a was trying to fix.
The PCIe fabric routes Memory Requests based on the TLP address,
ignoring any PASID (PCIe r6.0, sec 2.2.10.4), so a TLP with PASID that
should go upstream to the IOMMU may instead be routed as a P2P
Request if its address falls in a bridge window.
In SVA case, the IOMMU shares the address space of a user application.
The user application side has no knowledge about the PCI bridge window.
It is entirely possible that the device is programed with a P2P address
and results in a disaster.
--
Best regards,
baolu
[+cc Christian, Xinhui, amd-gfx]
On Fri, Jan 06, 2023 at 01:48:11PM +0800, Baolu Lu wrote:
> On 1/5/23 11:27 PM, Felix Kuehling wrote:
> > Am 2023-01-05 um 09:46 schrieb Deucher, Alexander:
> > > > -----Original Message-----
> > > > From: Hegde, Vasant <[email protected]>
> > > > On 1/5/2023 4:07 PM, Baolu Lu wrote:
> > > > > On 2023/1/5 18:27, Vasant Hegde wrote:
> > > > > > On 1/5/2023 6:39 AM, Matt Fagnani wrote:
> > > > > > > I built 6.2-rc2 with the patch applied. The same black
> > > > > > > screen problem happened with 6.2-rc2 with the patch. I
> > > > > > > tried to use early kdump with 6.2-rc2 with the patch
> > > > > > > twice by panicking the kernel with sysrq+alt+c after the
> > > > > > > black screen happened. The system rebooted after about
> > > > > > > 10-20 seconds both times, but no kdump and dmesg files
> > > > > > > were saved in /var/crash. I'm attaching the lspci -vvv
> > > > > > > output as requested. ...
> > > > > > Looking into lspci output, it doesn't list ACS feature
> > > > > > for Graphics card. So with your fix it didn't enable PASID
> > > > > > and hence it failed to boot. ...
> > > > > So do you mind telling why does the PASID need to be enabled
> > > > > for the graphic device? Or in another word, what does the
> > > > > graphic driver use the PASID for? ...
> > > The GPU driver uses the pasid for shared virtual memory between
> > > the CPU and GPU.? I.e., so that the user apps can use the same
> > > virtual address space on the GPU and the CPU.? It also uses
> > > pasid to take advantage of recoverable device page faults using
> > > PRS. ...
> > Agreed. This applies to GPU computing on some older AMD APUs that
> > take advantage of memory coherence and IOMMUv2 address translation
> > to create a shared virtual address space between the CPU and GPU.
> > In this case it seems to be a Carrizo APU. It is also true for
> > Raven APUs. ...
> Thanks for the explanation.
>
> This is actually the problem that commit 201007ef707a was trying to
> fix. The PCIe fabric routes Memory Requests based on the TLP
> address, ignoring any PASID (PCIe r6.0, sec 2.2.10.4), so a TLP with
> PASID that should go upstream to the IOMMU may instead be routed as
> a P2P Request if its address falls in a bridge window.
>
> In SVA case, the IOMMU shares the address space of a user
> application. The user application side has no knowledge about the
> PCI bridge window. It is entirely possible that the device is
> programed with a P2P address and results in a disaster.
Is this stalled? We explored the idea of changing the PCI core so
that for devices that use ATS/PRI, we could enable PASID without
checking for ACS [1], but IIUC we ultimately concluded that it was
based on a misunderstanding of how ATS Translation Requests are routed
and that an AMD driver change would be required [2].
So it seems like we still have this regression, and we're running out
of time before v6.2.
[1] https://lore.kernel.org/all/[email protected]/
[2] https://lore.kernel.org/all/[email protected]/
[+Shimmer, Aaron]
Am 2023-02-15 um 10:39 schrieb Bjorn Helgaas:
> [+cc Christian, Xinhui, amd-gfx]
>
> On Fri, Jan 06, 2023 at 01:48:11PM +0800, Baolu Lu wrote:
>> On 1/5/23 11:27 PM, Felix Kuehling wrote:
>>> Am 2023-01-05 um 09:46 schrieb Deucher, Alexander:
>>>>> -----Original Message-----
>>>>> From: Hegde, Vasant <[email protected]>
>>>>> On 1/5/2023 4:07 PM, Baolu Lu wrote:
>>>>>> On 2023/1/5 18:27, Vasant Hegde wrote:
>>>>>>> On 1/5/2023 6:39 AM, Matt Fagnani wrote:
>>>>>>>> I built 6.2-rc2 with the patch applied. The same black
>>>>>>>> screen problem happened with 6.2-rc2 with the patch. I
>>>>>>>> tried to use early kdump with 6.2-rc2 with the patch
>>>>>>>> twice by panicking the kernel with sysrq+alt+c after the
>>>>>>>> black screen happened. The system rebooted after about
>>>>>>>> 10-20 seconds both times, but no kdump and dmesg files
>>>>>>>> were saved in /var/crash. I'm attaching the lspci -vvv
>>>>>>>> output as requested. ...
>>>>>>> Looking into lspci output, it doesn't list ACS feature
>>>>>>> for Graphics card. So with your fix it didn't enable PASID
>>>>>>> and hence it failed to boot. ...
>>>>>> So do you mind telling why does the PASID need to be enabled
>>>>>> for the graphic device? Or in another word, what does the
>>>>>> graphic driver use the PASID for? ...
>>>> The GPU driver uses the pasid for shared virtual memory between
>>>> the CPU and GPU. I.e., so that the user apps can use the same
>>>> virtual address space on the GPU and the CPU. It also uses
>>>> pasid to take advantage of recoverable device page faults using
>>>> PRS. ...
>>> Agreed. This applies to GPU computing on some older AMD APUs that
>>> take advantage of memory coherence and IOMMUv2 address translation
>>> to create a shared virtual address space between the CPU and GPU.
>>> In this case it seems to be a Carrizo APU. It is also true for
>>> Raven APUs. ...
>> Thanks for the explanation.
>>
>> This is actually the problem that commit 201007ef707a was trying to
>> fix. The PCIe fabric routes Memory Requests based on the TLP
>> address, ignoring any PASID (PCIe r6.0, sec 2.2.10.4), so a TLP with
>> PASID that should go upstream to the IOMMU may instead be routed as
>> a P2P Request if its address falls in a bridge window.
>>
>> In SVA case, the IOMMU shares the address space of a user
>> application. The user application side has no knowledge about the
>> PCI bridge window. It is entirely possible that the device is
>> programed with a P2P address and results in a disaster.
> Is this stalled? We explored the idea of changing the PCI core so
> that for devices that use ATS/PRI, we could enable PASID without
> checking for ACS [1], but IIUC we ultimately concluded that it was
> based on a misunderstanding of how ATS Translation Requests are routed
> and that an AMD driver change would be required [2].
>
> So it seems like we still have this regression, and we're running out
> of time before v6.2.
>
> [1] https://lore.kernel.org/all/[email protected]/
> [2] https://lore.kernel.org/all/[email protected]/
If I understand this correctly, the HW or the BIOS is doing something
wrong about reporting ACS. I don't know what the GPU driver can do other
than add some quirk to stop using AMD IOMMUv2 on this HW/BIOS.
It looks like the problem is triggered when the driver calls
amd_iommu_init_device. That's when the first WARNs happen, soon followed
by a kernel oops in report_iommu_fault. The driver doesn't know anything
is wrong because amd_iommu_init_device seems to return "success". And
the oops is not in the GPU driver either.
I guess this could also be handled more gracefully in the IOMMU driver
(i.e. fail gracefully in amd_iommu_init_device and let the caller know
that something is wrong, don't oops in report_iommu_fault).
Regards,
Felix
On Wed, Feb 15, 2023 at 07:35:45PM -0500, Felix Kuehling wrote:
>
> If I understand this correctly, the HW or the BIOS is doing something wrong
> about reporting ACS. I don't know what the GPU driver can do other than add
> some quirk to stop using AMD IOMMUv2 on this HW/BIOS.
How about this:
diff --git a/drivers/iommu/amd/iommu_v2.c b/drivers/iommu/amd/iommu_v2.c
index 864e4ffb6aa94e..cc027ce9a6e86f 100644
--- a/drivers/iommu/amd/iommu_v2.c
+++ b/drivers/iommu/amd/iommu_v2.c
@@ -732,6 +732,7 @@ EXPORT_SYMBOL(amd_iommu_unbind_pasid);
int amd_iommu_init_device(struct pci_dev *pdev, int pasids)
{
+ struct iommu_dev_data *dev_data = dev_iommu_priv_get(&pdev->dev);
struct device_state *dev_state;
struct iommu_group *group;
unsigned long flags;
@@ -740,6 +741,9 @@ int amd_iommu_init_device(struct pci_dev *pdev, int pasids)
might_sleep();
+ if (!dev_data->ats.enabled)
+ return -EINVAL;
+
/*
* When memory encryption is active the device is likely not in a
* direct-mapped domain. Forbid using IOMMUv2 functionality for now.
Felix, Jason, Matt,
On 2/16/2023 6:05 AM, Felix Kuehling wrote:
> [+Shimmer, Aaron]
>
> Am 2023-02-15 um 10:39 schrieb Bjorn Helgaas:
>> [+cc Christian, Xinhui, amd-gfx]
>>
>> On Fri, Jan 06, 2023 at 01:48:11PM +0800, Baolu Lu wrote:
>>> On 1/5/23 11:27 PM, Felix Kuehling wrote:
>>>> Am 2023-01-05 um 09:46 schrieb Deucher, Alexander:
>>>>>> -----Original Message-----
>>>>>> From: Hegde, Vasant <[email protected]>
>>>>>> On 1/5/2023 4:07 PM, Baolu Lu wrote:
>>>>>>> On 2023/1/5 18:27, Vasant Hegde wrote:
>>>>>>>> On 1/5/2023 6:39 AM, Matt Fagnani wrote:
>>>>>>>>> I built 6.2-rc2 with the patch applied. The same black
>>>>>>>>> screen problem happened with 6.2-rc2 with the patch. I
>>>>>>>>> tried to use early kdump with 6.2-rc2 with the patch
>>>>>>>>> twice by panicking the kernel with sysrq+alt+c after the
>>>>>>>>> black screen happened. The system rebooted after about
>>>>>>>>> 10-20 seconds both times, but no kdump and dmesg files
>>>>>>>>> were saved in /var/crash. I'm attaching the lspci -vvv
>>>>>>>>> output as requested. ...
>>>>>>>> Looking into lspci output, it doesn't list ACS feature
>>>>>>>> for Graphics card. So with your fix it didn't enable PASID
>>>>>>>> and hence it failed to boot. ...
>>>>>>> So do you mind telling why does the PASID need to be enabled
>>>>>>> for the graphic device? Or in another word, what does the
>>>>>>> graphic driver use the PASID for? ...
>>>>> The GPU driver uses the pasid for shared virtual memory between
>>>>> the CPU and GPU. I.e., so that the user apps can use the same
>>>>> virtual address space on the GPU and the CPU. It also uses
>>>>> pasid to take advantage of recoverable device page faults using
>>>>> PRS. ...
>>>> Agreed. This applies to GPU computing on some older AMD APUs that
>>>> take advantage of memory coherence and IOMMUv2 address translation
>>>> to create a shared virtual address space between the CPU and GPU.
>>>> In this case it seems to be a Carrizo APU. It is also true for
>>>> Raven APUs. ...
>>> Thanks for the explanation.
>>>
>>> This is actually the problem that commit 201007ef707a was trying to
>>> fix. The PCIe fabric routes Memory Requests based on the TLP
>>> address, ignoring any PASID (PCIe r6.0, sec 2.2.10.4), so a TLP with
>>> PASID that should go upstream to the IOMMU may instead be routed as
>>> a P2P Request if its address falls in a bridge window.
>>>
>>> In SVA case, the IOMMU shares the address space of a user
>>> application. The user application side has no knowledge about the
>>> PCI bridge window. It is entirely possible that the device is
>>> programed with a P2P address and results in a disaster.
>> Is this stalled? We explored the idea of changing the PCI core so
>> that for devices that use ATS/PRI, we could enable PASID without
>> checking for ACS [1], but IIUC we ultimately concluded that it was
>> based on a misunderstanding of how ATS Translation Requests are routed
>> and that an AMD driver change would be required [2].
>>
>> So it seems like we still have this regression, and we're running out
>> of time before v6.2.
>>
>> [1] https://lore.kernel.org/all/[email protected]/
>> [2] https://lore.kernel.org/all/[email protected]/
>
> If I understand this correctly, the HW or the BIOS is doing something wrong
> about reporting ACS. I don't know what the GPU driver can do other than add some
> quirk to stop using AMD IOMMUv2 on this HW/BIOS.
>
> It looks like the problem is triggered when the driver calls
> amd_iommu_init_device. That's when the first WARNs happen, soon followed by a
> kernel oops in report_iommu_fault. The driver doesn't know anything is wrong
> because amd_iommu_init_device seems to return "success". And the oops is not in
> the GPU driver either.
WARN is fixed and its in Joerg's tree.
https://lore.kernel.org/all/[email protected]/
report_iommu_fault() happened because in amd_iommu_init_device() path it failed
to attach devices to new domain and returned error. But it didn't put devices
back to old domain properly. It left in incosistent state and resulted in IO
page fault. I have proposed series to handle device to domain attachment failure
and better handling of subsequent report_iommu_fault().
https://lore.kernel.org/linux-iommu/[email protected]/
@Matt,
Can you please help to verify above patches on your system where you hit the
issue originally?
(Grab above two series, apply it on top of latest kernel and test it)
-Vasant
Hi Jason,
On 2/16/2023 6:14 AM, Jason Gunthorpe wrote:
> On Wed, Feb 15, 2023 at 07:35:45PM -0500, Felix Kuehling wrote:
>>
>> If I understand this correctly, the HW or the BIOS is doing something wrong
>> about reporting ACS. I don't know what the GPU driver can do other than add
>> some quirk to stop using AMD IOMMUv2 on this HW/BIOS.
>
> How about this:
>
> diff --git a/drivers/iommu/amd/iommu_v2.c b/drivers/iommu/amd/iommu_v2.c
> index 864e4ffb6aa94e..cc027ce9a6e86f 100644
> --- a/drivers/iommu/amd/iommu_v2.c
> +++ b/drivers/iommu/amd/iommu_v2.c
> @@ -732,6 +732,7 @@ EXPORT_SYMBOL(amd_iommu_unbind_pasid);
>
> int amd_iommu_init_device(struct pci_dev *pdev, int pasids)
> {
> + struct iommu_dev_data *dev_data = dev_iommu_priv_get(&pdev->dev);
> struct device_state *dev_state;
> struct iommu_group *group;
> unsigned long flags;
> @@ -740,6 +741,9 @@ int amd_iommu_init_device(struct pci_dev *pdev, int pasids)
>
> might_sleep();
>
> + if (!dev_data->ats.enabled)
> + return -EINVAL;
> +
Thanks for the proposed fix. But aactually this will not solve the issue because
current flow is :
- in this function it tries to allocate new domain
- Calls iommu_attach_group() which will call attach_device. In that path
it will try to enable ATS/PASID and hitting error.
As I mentioned in other reply I think even current code returns error from
amd_iommu_init_device() to GPU. But the issue is, in __iommu_attach_group() path
it detached device from current domain, failed to attach to new domain and
returned error. We didn't put the device back to old domain thats causing the
issue. Below series should fix this issue.
https://lore.kernel.org/linux-iommu/[email protected]/
-Vasant
[+Suravee]
Am 2023-02-15 um 19:44 schrieb Jason Gunthorpe:
> On Wed, Feb 15, 2023 at 07:35:45PM -0500, Felix Kuehling wrote:
>> If I understand this correctly, the HW or the BIOS is doing something wrong
>> about reporting ACS. I don't know what the GPU driver can do other than add
>> some quirk to stop using AMD IOMMUv2 on this HW/BIOS.
> How about this:
>
> diff --git a/drivers/iommu/amd/iommu_v2.c b/drivers/iommu/amd/iommu_v2.c
> index 864e4ffb6aa94e..cc027ce9a6e86f 100644
> --- a/drivers/iommu/amd/iommu_v2.c
> +++ b/drivers/iommu/amd/iommu_v2.c
> @@ -732,6 +732,7 @@ EXPORT_SYMBOL(amd_iommu_unbind_pasid);
>
> int amd_iommu_init_device(struct pci_dev *pdev, int pasids)
> {
> + struct iommu_dev_data *dev_data = dev_iommu_priv_get(&pdev->dev);
> struct device_state *dev_state;
> struct iommu_group *group;
> unsigned long flags;
> @@ -740,6 +741,9 @@ int amd_iommu_init_device(struct pci_dev *pdev, int pasids)
>
> might_sleep();
>
> + if (!dev_data->ats.enabled)
> + return -EINVAL;
> +
> /*
> * When memory encryption is active the device is likely not in a
> * direct-mapped domain. Forbid using IOMMUv2 functionality for now.
Hi Suravee,
What to you think about this proposed change?
Regards,
Felix
[+Suravee]
Am 2023-02-16 um 00:37 schrieb Vasant Hegde:
> Hi Jason,
>
>
> On 2/16/2023 6:14 AM, Jason Gunthorpe wrote:
>> On Wed, Feb 15, 2023 at 07:35:45PM -0500, Felix Kuehling wrote:
>>> If I understand this correctly, the HW or the BIOS is doing something wrong
>>> about reporting ACS. I don't know what the GPU driver can do other than add
>>> some quirk to stop using AMD IOMMUv2 on this HW/BIOS.
>> How about this:
>>
>> diff --git a/drivers/iommu/amd/iommu_v2.c b/drivers/iommu/amd/iommu_v2.c
>> index 864e4ffb6aa94e..cc027ce9a6e86f 100644
>> --- a/drivers/iommu/amd/iommu_v2.c
>> +++ b/drivers/iommu/amd/iommu_v2.c
>> @@ -732,6 +732,7 @@ EXPORT_SYMBOL(amd_iommu_unbind_pasid);
>>
>> int amd_iommu_init_device(struct pci_dev *pdev, int pasids)
>> {
>> + struct iommu_dev_data *dev_data = dev_iommu_priv_get(&pdev->dev);
>> struct device_state *dev_state;
>> struct iommu_group *group;
>> unsigned long flags;
>> @@ -740,6 +741,9 @@ int amd_iommu_init_device(struct pci_dev *pdev, int pasids)
>>
>> might_sleep();
>>
>> + if (!dev_data->ats.enabled)
>> + return -EINVAL;
>> +
> Thanks for the proposed fix. But aactually this will not solve the issue because
> current flow is :
> - in this function it tries to allocate new domain
> - Calls iommu_attach_group() which will call attach_device. In that path
> it will try to enable ATS/PASID and hitting error.
>
> As I mentioned in other reply I think even current code returns error from
> amd_iommu_init_device() to GPU. But the issue is, in __iommu_attach_group() path
> it detached device from current domain, failed to attach to new domain and
> returned error. We didn't put the device back to old domain thats causing the
> issue. Below series should fix this issue.
>
> https://lore.kernel.org/linux-iommu/[email protected]/
>
> -Vasant
>
Vasant,
I applied your four patches to 6.2-rc8 and built that. The black screen,
null pointer dereference, and warnings didn't happen when booting
6.2-rc8 with your patches. There were errors that the IOMMU wasn't
restarted when amdgpu and amdkfd was starting though at kernel: kfd kfd:
amdgpu: Failed to resume IOMMU for device 1002:9874. I don't know if
those IOMMU errors were expected or not, but I did see those types of
messages when I used amd_iommu=off to work around the black screen
before. I didn't use amd_iommu=off when testing 6.2-rc8 with your
patches. There were also a different amdgpu warning at
drivers/gpu/drm/amd/amdgpu/../pm/powerplay/smumgr/smu8_smumgr.c:98
smu8_send_msg_to_smc_with_parameter+0x103/0x140 and errors about amdgpu
timeouts on 1/3 boots. Plasma took over a minute to start and shut down
on that boot which was unusually long. I've seen those sorts of amdgpu
warnings and errors infrequently before so they might be unrelated to
the IOMMU problem. The part of the journal where those errors started
was the following.
Feb 16 13:22:31 kernel: [drm] amdgpu kernel modesetting enabled.
Feb 16 13:22:31 kernel: amdgpu: Topology: Add APU node [0x0:0x0]
Feb 16 13:22:31 kernel: [drm] initializing kernel modesetting (CARRIZO
0x1002:0x9874 0x103C:0x8332 0xCA).
Feb 16 13:22:31 kernel: [drm] register mmio base: 0xF0400000
Feb 16 13:22:31 kernel: [drm] register mmio size: 262144
Feb 16 13:22:31 kernel: [drm] add ip block number 0 <vi_common>
Feb 16 13:22:31 kernel: [drm] add ip block number 1 <gmc_v8_0>
Feb 16 13:22:31 kernel: [drm] add ip block number 2 <cz_ih>
Feb 16 13:22:31 kernel: [drm] add ip block number 3 <gfx_v8_0>
Feb 16 13:22:31 kernel: [drm] add ip block number 4 <sdma_v3_0>
Feb 16 13:22:31 kernel: [drm] add ip block number 5 <powerplay>
Feb 16 13:22:31 kernel: [drm] add ip block number 6 <dm>
Feb 16 13:22:31 kernel: [drm] add ip block number 7 <uvd_v6_0>
Feb 16 13:22:31 kernel: [drm] add ip block number 8 <vce_v3_0>
Feb 16 13:22:31 kernel: [drm] add ip block number 9 <acp_ip>
Feb 16 13:22:31 kernel: amdgpu 0000:00:01.0: amdgpu: Fetched VBIOS from VFCT
Feb 16 13:22:31 kernel: amdgpu: ATOM BIOS: 113-C75100-031
Feb 16 13:22:31 kernel: [drm] UVD is enabled in physical mode
Feb 16 13:22:31 kernel: [drm] VCE enabled in physical mode
Feb 16 13:22:31 kernel: Console: switching to colour dummy device 80x25
Feb 16 13:22:31 kernel: amdgpu 0000:00:01.0: vgaarb: deactivate vga console
Feb 16 13:22:31 kernel: amdgpu 0000:00:01.0: amdgpu: Trusted Memory Zone
(TMZ) feature not supported
Feb 16 13:22:31 kernel: [drm] vm size is 64 GB, 2 levels, block size is
10-bit, fragment size is 9-bit
Feb 16 13:22:31 kernel: amdgpu 0000:00:01.0: amdgpu: VRAM: 512M
0x000000F400000000 - 0x000000F41FFFFFFF (512M used)
Feb 16 13:22:31 kernel: amdgpu 0000:00:01.0: amdgpu: GART: 1024M
0x000000FF00000000 - 0x000000FF3FFFFFFF
Feb 16 13:22:31 kernel: [drm] Detected VRAM RAM=512M, BAR=512M
Feb 16 13:22:31 kernel: [drm] RAM width 64bits UNKNOWN
Feb 16 13:22:31 kernel: [drm] amdgpu: 512M of VRAM memory ready
Feb 16 13:22:31 kernel: [drm] amdgpu: 3704M of GTT memory ready.
Feb 16 13:22:31 kernel: [drm] GART: num cpu pages 262144, num gpu pages
262144
Feb 16 13:22:31 kernel: [drm] PCIE GART of 1024M enabled (table at
0x000000F400600000).
Feb 16 13:22:31 kernel: amdgpu: hwmgr_sw_init smu backed is smu8_smu
Feb 16 13:22:31 kernel: [drm] Found UVD firmware Version: 1.91 Family ID: 11
Feb 16 13:22:31 kernel: [drm] UVD ENC is disabled
Feb 16 13:22:31 kernel: [drm] Found VCE firmware Version: 52.4 Binary ID: 3
Feb 16 13:22:31 kernel: amdgpu: smu version 27.18.00
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: values for Engine clock
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 300000
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 480000
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 533340
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 576000
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 626090
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 685720
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 720000
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 757900
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: Validation clocks:
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: engine_max_clock: 75790
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: memory_max_clock: 93300
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: level : 8
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: values for Display clock
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 300000
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 400000
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 496560
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 626090
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 685720
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 757900
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 800000
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 847060
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: Validation clocks:
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: engine_max_clock: 75790
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: memory_max_clock: 93300
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: level : 8
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: values for Memory clock
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 667000
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 933000
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: Validation clocks:
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: engine_max_clock: 75790
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: memory_max_clock: 93300
Feb 16 13:22:31 kernel: [drm] DM_PPLIB: level : 8
Feb 16 13:22:31 kernel: [drm] Display Core initialized with v3.2.215!
Feb 16 13:22:32 kernel: [drm] UVD initialized successfully.
Feb 16 13:22:32 kernel: [drm] VCE initialized successfully.
Feb 16 13:22:32 kernel: kfd kfd: amdgpu: Allocated 3969056 bytes on gart
Feb 16 13:22:32 kernel: amdgpu: sdma_bitmap: f
Feb 16 13:22:32 kernel: kfd kfd: amdgpu: Failed to resume IOMMU for
device 1002:9874
Feb 16 13:22:32 kernel: kfd kfd: amdgpu: device 1002:9874 NOT added due
to errors
Feb 16 13:22:32 kernel: amdgpu 0000:00:01.0: amdgpu: SE 1, SH per SE 1,
CU per SH 8, active_cu_number 6
Feb 16 13:22:32 kernel: [drm] Initialized amdgpu 3.49.0 20150101 for
0000:00:01.0 on minor 0
Feb 16 13:22:32 kernel: fbcon: amdgpudrmfb (fb0) is primary device
Feb 16 13:22:33 kernel: Console: switching to colour frame buffer device
170x48
Feb 16 13:22:33 kernel: amdgpu 0000:00:01.0: [drm] fb0: amdgpudrmfb
frame buffer device
Feb 16 13:22:33 kernel: audit: type=1334 audit(1676571753.397:17):
prog-id=21 op=LOAD
Feb 16 13:22:33 kernel: audit: type=1130 audit(1676571753.419:18): pid=1
uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=dbus-broker
comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=?
terminal=? res=success'
Feb 16 13:22:33 kernel: audit: type=1130 audit(1676571753.456:19): pid=1
uid=0 auid=4294967295 ses=4294967295 subj=kernel
msg='unit=dracut-initqueue comm="systemd" exe="/usr/lib/systemd/systemd"
hostname=? addr=? terminal=? res=success'
Feb 16 13:22:33 kernel: audit: type=1130 audit(1676571753.492:20): pid=1
uid=0 auid=4294967295 ses=4294967295 subj=kernel
msg='unit=systemd-fsck-root comm="systemd"
exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Feb 16 13:22:33 kernel: EXT4-fs (dm-0): mounted filesystem
00107de9-54ef-4784-a03f-61802ed0b350 with ordered data mode. Quota mode:
none.
Feb 16 13:22:36 kernel: ------------[ cut here ]------------
Feb 16 13:22:36 kernel: smu8_send_msg_to_smc_with_parameter(0x0009, 0x0)
timed out after 2814625 us
Feb 16 13:22:36 kernel: WARNING: CPU: 1 PID: 112 at
drivers/gpu/drm/amd/amdgpu/../pm/powerplay/smumgr/smu8_smumgr.c:98
smu8_send_msg_to_smc_with_parameter+0x103/0x140 [amdgpu]
Feb 16 13:22:36 kernel: Modules linked in: amdgpu i2c_algo_bit
drm_ttm_helper ttm iommu_v2 mfd_core drm_buddy gpu_sched
drm_display_helper drm_kms_helper hid_logitech_hidpp drm
crct10dif_pclmul crc32_pclmul crc32c_intel r8169 sd_mod
ghash_clmulni_intel t10_pi sha512_ssse3 crc64_rocksoft_generic
crc64_rocksoft wdat_wdt sp5100_tco hid_logitech_dj crc64 cec video wmi
fuse dm_multipath
Feb 16 13:22:36 kernel: CPU: 1 PID: 112 Comm: kworker/1:3 Not tainted
6.2.0-rc8+ #94
Feb 16 13:22:36 kernel: Hardware name: HP HP Laptop 15-bw0xx/8332, BIOS
F.52 12/03/2019
Feb 16 13:22:36 kernel: Workqueue: events amdgpu_vce_idle_work_handler
[amdgpu]
Feb 16 13:22:36 kernel: RIP:
0010:smu8_send_msg_to_smc_with_parameter+0x103/0x140 [amdgpu]
Feb 16 13:22:36 kernel: Code: 20 48 c7 c7 28 1c c1 c0 48 89 c1 48 f7 ea
48 89 c8 44 89 e9 48 c1 f8 3f 48 c1 fa 07 48 29 c2 49 89 d0 44 89 e2 e8
c5 28 48 e0 <0f> 0b eb b0 bd ea ff ff ff eb a9 48 8b 7b 40 be c0 01 00
00 48 8b
Feb 16 13:22:36 kernel: RSP: 0018:ffffb997004c7db8 EFLAGS: 00010282
Feb 16 13:22:36 kernel: RAX: 000000000000004b RBX: ffff8b4e4f596800 RCX:
0000000000000000
Feb 16 13:22:36 kernel: RDX: 0000000000000001 RSI: ffffffffa14cf075 RDI:
00000000ffffffff
Feb 16 13:22:36 kernel: RBP: 00000000ffffffc2 R08: 0000000000000000 R09:
ffffb997004c7c68
Feb 16 13:22:36 kernel: R10: 0000000000000003 R11: ffffffffa1d42e48 R12:
0000000000000009
Feb 16 13:22:36 kernel: R13: 0000000000000000 R14: 00000003ded365a4 R15:
0000000000000002
Feb 16 13:22:36 kernel: FS: 0000000000000000(0000)
GS:ffff8b4f37480000(0000) knlGS:0000000000000000
Feb 16 13:22:36 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 16 13:22:36 kernel: CR2: 00007f950a698364 CR3: 0000000033c10000 CR4:
00000000001506e0
Feb 16 13:22:36 kernel: Call Trace:
Feb 16 13:22:36 kernel: <TASK>
Feb 16 13:22:36 kernel: smum_send_msg_to_smc+0xba/0xf0 [amdgpu]
Feb 16 13:22:36 kernel: smu8_dpm_powergate_vce+0x15a/0x180 [amdgpu]
Feb 16 13:22:36 kernel: pp_set_powergating_by_smu+0xed/0x1f0 [amdgpu]
Feb 16 13:22:36 kernel: amdgpu_dpm_set_powergating_by_smu+0x84/0xf0 [amdgpu]
Feb 16 13:22:36 kernel: amdgpu_dpm_enable_vce+0x29/0xa0 [amdgpu]
Feb 16 13:22:36 kernel: process_one_work+0x1c8/0x380
Feb 16 13:22:36 kernel: worker_thread+0x4d/0x380
Feb 16 13:22:36 kernel: ? _raw_spin_lock_irqsave+0x23/0x50
Feb 16 13:22:36 kernel: ? __pfx_worker_thread+0x10/0x10
Feb 16 13:22:36 kernel: kthread+0xe9/0x110
Feb 16 13:22:36 kernel: ? __pfx_kthread+0x10/0x10
Feb 16 13:22:36 kernel: ret_from_fork+0x2c/0x50
Feb 16 13:22:36 kernel: </TASK>
Feb 16 13:22:36 kernel: ---[ end trace 0000000000000000 ]---
Feb 16 13:22:39 kernel: amdgpu:
smu8_send_msg_to_smc_with_parameter(0x0004) aborted; SMU still servicing
msg (0x0009)
Feb 16 13:22:41 kernel: amdgpu:
smu8_send_msg_to_smc_with_parameter(0x0007) aborted; SMU still servicing
msg (0x0009)
I'm attaching the kernel log for the boot of 6.2-rc8 + patches with the
IOMMU errors and amdgpu warnings and timeouts.
Thanks,
Matt
On 2/16/23 00:25, Vasant Hegde wrote:
> Felix, Jason, Matt,
>
>
> On 2/16/2023 6:05 AM, Felix Kuehling wrote:
>> [+Shimmer, Aaron]
>>
>> Am 2023-02-15 um 10:39 schrieb Bjorn Helgaas:
>>> [+cc Christian, Xinhui, amd-gfx]
>>>
>>> On Fri, Jan 06, 2023 at 01:48:11PM +0800, Baolu Lu wrote:
>>>> On 1/5/23 11:27 PM, Felix Kuehling wrote:
>>>>> Am 2023-01-05 um 09:46 schrieb Deucher, Alexander:
>>>>>>> -----Original Message-----
>>>>>>> From: Hegde, Vasant <[email protected]>
>>>>>>> On 1/5/2023 4:07 PM, Baolu Lu wrote:
>>>>>>>> On 2023/1/5 18:27, Vasant Hegde wrote:
>>>>>>>>> On 1/5/2023 6:39 AM, Matt Fagnani wrote:
>>>>>>>>>> I built 6.2-rc2 with the patch applied. The same black
>>>>>>>>>> screen problem happened with 6.2-rc2 with the patch. I
>>>>>>>>>> tried to use early kdump with 6.2-rc2 with the patch
>>>>>>>>>> twice by panicking the kernel with sysrq+alt+c after the
>>>>>>>>>> black screen happened. The system rebooted after about
>>>>>>>>>> 10-20 seconds both times, but no kdump and dmesg files
>>>>>>>>>> were saved in /var/crash. I'm attaching the lspci -vvv
>>>>>>>>>> output as requested. ...
>>>>>>>>> Looking into lspci output, it doesn't list ACS feature
>>>>>>>>> for Graphics card. So with your fix it didn't enable PASID
>>>>>>>>> and hence it failed to boot. ...
>>>>>>>> So do you mind telling why does the PASID need to be enabled
>>>>>>>> for the graphic device? Or in another word, what does the
>>>>>>>> graphic driver use the PASID for? ...
>>>>>> The GPU driver uses the pasid for shared virtual memory between
>>>>>> the CPU and GPU. I.e., so that the user apps can use the same
>>>>>> virtual address space on the GPU and the CPU. It also uses
>>>>>> pasid to take advantage of recoverable device page faults using
>>>>>> PRS. ...
>>>>> Agreed. This applies to GPU computing on some older AMD APUs that
>>>>> take advantage of memory coherence and IOMMUv2 address translation
>>>>> to create a shared virtual address space between the CPU and GPU.
>>>>> In this case it seems to be a Carrizo APU. It is also true for
>>>>> Raven APUs. ...
>>>> Thanks for the explanation.
>>>>
>>>> This is actually the problem that commit 201007ef707a was trying to
>>>> fix. The PCIe fabric routes Memory Requests based on the TLP
>>>> address, ignoring any PASID (PCIe r6.0, sec 2.2.10.4), so a TLP with
>>>> PASID that should go upstream to the IOMMU may instead be routed as
>>>> a P2P Request if its address falls in a bridge window.
>>>>
>>>> In SVA case, the IOMMU shares the address space of a user
>>>> application. The user application side has no knowledge about the
>>>> PCI bridge window. It is entirely possible that the device is
>>>> programed with a P2P address and results in a disaster.
>>> Is this stalled? We explored the idea of changing the PCI core so
>>> that for devices that use ATS/PRI, we could enable PASID without
>>> checking for ACS [1], but IIUC we ultimately concluded that it was
>>> based on a misunderstanding of how ATS Translation Requests are routed
>>> and that an AMD driver change would be required [2].
>>>
>>> So it seems like we still have this regression, and we're running out
>>> of time before v6.2.
>>>
>>> [1] https://lore.kernel.org/all/[email protected]/
>>> [2] https://lore.kernel.org/all/[email protected]/
>> If I understand this correctly, the HW or the BIOS is doing something wrong
>> about reporting ACS. I don't know what the GPU driver can do other than add some
>> quirk to stop using AMD IOMMUv2 on this HW/BIOS.
>>
>> It looks like the problem is triggered when the driver calls
>> amd_iommu_init_device. That's when the first WARNs happen, soon followed by a
>> kernel oops in report_iommu_fault. The driver doesn't know anything is wrong
>> because amd_iommu_init_device seems to return "success". And the oops is not in
>> the GPU driver either.
> WARN is fixed and its in Joerg's tree.
> https://lore.kernel.org/all/[email protected]/
>
> report_iommu_fault() happened because in amd_iommu_init_device() path it failed
> to attach devices to new domain and returned error. But it didn't put devices
> back to old domain properly. It left in incosistent state and resulted in IO
> page fault. I have proposed series to handle device to domain attachment failure
> and better handling of subsequent report_iommu_fault().
> https://lore.kernel.org/linux-iommu/[email protected]/
>
>
> @Matt,
> Can you please help to verify above patches on your system where you hit the
> issue originally?
> (Grab above two series, apply it on top of latest kernel and test it)
>
> -Vasant
>
> Feb 16 13:22:32 kernel: kfd kfd: amdgpu: Failed to resume IOMMU for
> device 1002:9874
> Feb 16 13:22:32 kernel: kfd kfd: amdgpu: device 1002:9874 NOT added
> due to errors
This look like IOMMU device initialization still fails (but more
gracefully now). Vasant, is that expected?
This would lead to KFD not being available on Carrizo with this kernel,
which is probably not a big limitation in practice. It would only affect
compute applications using the ROCm user mode stack. I don't think
anyone does that these days on these old APUs.
The SMU errors seem unrelated to this unless there is some subtle
interaction I'm missing.
Regards,
Felix
Am 2023-02-16 um 13:59 schrieb Matt Fagnani:
> Vasant,
>
> I applied your four patches to 6.2-rc8 and built that. The black
> screen, null pointer dereference, and warnings didn't happen when
> booting 6.2-rc8 with your patches. There were errors that the IOMMU
> wasn't restarted when amdgpu and amdkfd was starting though at kernel:
> kfd kfd: amdgpu: Failed to resume IOMMU for device 1002:9874. I don't
> know if those IOMMU errors were expected or not, but I did see those
> types of messages when I used amd_iommu=off to work around the black
> screen before. I didn't use amd_iommu=off when testing 6.2-rc8 with
> your patches. There were also a different amdgpu warning at
> drivers/gpu/drm/amd/amdgpu/../pm/powerplay/smumgr/smu8_smumgr.c:98
> smu8_send_msg_to_smc_with_parameter+0x103/0x140 and errors about
> amdgpu timeouts on 1/3 boots. Plasma took over a minute to start and
> shut down on that boot which was unusually long. I've seen those sorts
> of amdgpu warnings and errors infrequently before so they might be
> unrelated to the IOMMU problem. The part of the journal where those
> errors started was the following.
>
> Feb 16 13:22:31 kernel: [drm] amdgpu kernel modesetting enabled.
> Feb 16 13:22:31 kernel: amdgpu: Topology: Add APU node [0x0:0x0]
> Feb 16 13:22:31 kernel: [drm] initializing kernel modesetting (CARRIZO
> 0x1002:0x9874 0x103C:0x8332 0xCA).
> Feb 16 13:22:31 kernel: [drm] register mmio base: 0xF0400000
> Feb 16 13:22:31 kernel: [drm] register mmio size: 262144
> Feb 16 13:22:31 kernel: [drm] add ip block number 0 <vi_common>
> Feb 16 13:22:31 kernel: [drm] add ip block number 1 <gmc_v8_0>
> Feb 16 13:22:31 kernel: [drm] add ip block number 2 <cz_ih>
> Feb 16 13:22:31 kernel: [drm] add ip block number 3 <gfx_v8_0>
> Feb 16 13:22:31 kernel: [drm] add ip block number 4 <sdma_v3_0>
> Feb 16 13:22:31 kernel: [drm] add ip block number 5 <powerplay>
> Feb 16 13:22:31 kernel: [drm] add ip block number 6 <dm>
> Feb 16 13:22:31 kernel: [drm] add ip block number 7 <uvd_v6_0>
> Feb 16 13:22:31 kernel: [drm] add ip block number 8 <vce_v3_0>
> Feb 16 13:22:31 kernel: [drm] add ip block number 9 <acp_ip>
> Feb 16 13:22:31 kernel: amdgpu 0000:00:01.0: amdgpu: Fetched VBIOS
> from VFCT
> Feb 16 13:22:31 kernel: amdgpu: ATOM BIOS: 113-C75100-031
> Feb 16 13:22:31 kernel: [drm] UVD is enabled in physical mode
> Feb 16 13:22:31 kernel: [drm] VCE enabled in physical mode
> Feb 16 13:22:31 kernel: Console: switching to colour dummy device 80x25
> Feb 16 13:22:31 kernel: amdgpu 0000:00:01.0: vgaarb: deactivate vga
> console
> Feb 16 13:22:31 kernel: amdgpu 0000:00:01.0: amdgpu: Trusted Memory
> Zone (TMZ) feature not supported
> Feb 16 13:22:31 kernel: [drm] vm size is 64 GB, 2 levels, block size
> is 10-bit, fragment size is 9-bit
> Feb 16 13:22:31 kernel: amdgpu 0000:00:01.0: amdgpu: VRAM: 512M
> 0x000000F400000000 - 0x000000F41FFFFFFF (512M used)
> Feb 16 13:22:31 kernel: amdgpu 0000:00:01.0: amdgpu: GART: 1024M
> 0x000000FF00000000 - 0x000000FF3FFFFFFF
> Feb 16 13:22:31 kernel: [drm] Detected VRAM RAM=512M, BAR=512M
> Feb 16 13:22:31 kernel: [drm] RAM width 64bits UNKNOWN
> Feb 16 13:22:31 kernel: [drm] amdgpu: 512M of VRAM memory ready
> Feb 16 13:22:31 kernel: [drm] amdgpu: 3704M of GTT memory ready.
> Feb 16 13:22:31 kernel: [drm] GART: num cpu pages 262144, num gpu
> pages 262144
> Feb 16 13:22:31 kernel: [drm] PCIE GART of 1024M enabled (table at
> 0x000000F400600000).
> Feb 16 13:22:31 kernel: amdgpu: hwmgr_sw_init smu backed is smu8_smu
> Feb 16 13:22:31 kernel: [drm] Found UVD firmware Version: 1.91 Family
> ID: 11
> Feb 16 13:22:31 kernel: [drm] UVD ENC is disabled
> Feb 16 13:22:31 kernel: [drm] Found VCE firmware Version: 52.4 Binary
> ID: 3
> Feb 16 13:22:31 kernel: amdgpu: smu version 27.18.00
> Feb 16 13:22:31 kernel: [drm] DM_PPLIB: values for Engine clock
> Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 300000
> Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 480000
> Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 533340
> Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 576000
> Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 626090
> Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 685720
> Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 720000
> Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 757900
> Feb 16 13:22:31 kernel: [drm] DM_PPLIB: Validation clocks:
> Feb 16 13:22:31 kernel: [drm] DM_PPLIB: engine_max_clock: 75790
> Feb 16 13:22:31 kernel: [drm] DM_PPLIB: memory_max_clock: 93300
> Feb 16 13:22:31 kernel: [drm] DM_PPLIB: level : 8
> Feb 16 13:22:31 kernel: [drm] DM_PPLIB: values for Display clock
> Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 300000
> Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 400000
> Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 496560
> Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 626090
> Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 685720
> Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 757900
> Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 800000
> Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 847060
> Feb 16 13:22:31 kernel: [drm] DM_PPLIB: Validation clocks:
> Feb 16 13:22:31 kernel: [drm] DM_PPLIB: engine_max_clock: 75790
> Feb 16 13:22:31 kernel: [drm] DM_PPLIB: memory_max_clock: 93300
> Feb 16 13:22:31 kernel: [drm] DM_PPLIB: level : 8
> Feb 16 13:22:31 kernel: [drm] DM_PPLIB: values for Memory clock
> Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 667000
> Feb 16 13:22:31 kernel: [drm] DM_PPLIB: 933000
> Feb 16 13:22:31 kernel: [drm] DM_PPLIB: Validation clocks:
> Feb 16 13:22:31 kernel: [drm] DM_PPLIB: engine_max_clock: 75790
> Feb 16 13:22:31 kernel: [drm] DM_PPLIB: memory_max_clock: 93300
> Feb 16 13:22:31 kernel: [drm] DM_PPLIB: level : 8
> Feb 16 13:22:31 kernel: [drm] Display Core initialized with v3.2.215!
> Feb 16 13:22:32 kernel: [drm] UVD initialized successfully.
> Feb 16 13:22:32 kernel: [drm] VCE initialized successfully.
> Feb 16 13:22:32 kernel: kfd kfd: amdgpu: Allocated 3969056 bytes on gart
> Feb 16 13:22:32 kernel: amdgpu: sdma_bitmap: f
> Feb 16 13:22:32 kernel: kfd kfd: amdgpu: Failed to resume IOMMU for
> device 1002:9874
> Feb 16 13:22:32 kernel: kfd kfd: amdgpu: device 1002:9874 NOT added
> due to errors
> Feb 16 13:22:32 kernel: amdgpu 0000:00:01.0: amdgpu: SE 1, SH per SE
> 1, CU per SH 8, active_cu_number 6
> Feb 16 13:22:32 kernel: [drm] Initialized amdgpu 3.49.0 20150101 for
> 0000:00:01.0 on minor 0
> Feb 16 13:22:32 kernel: fbcon: amdgpudrmfb (fb0) is primary device
> Feb 16 13:22:33 kernel: Console: switching to colour frame buffer
> device 170x48
> Feb 16 13:22:33 kernel: amdgpu 0000:00:01.0: [drm] fb0: amdgpudrmfb
> frame buffer device
> Feb 16 13:22:33 kernel: audit: type=1334 audit(1676571753.397:17):
> prog-id=21 op=LOAD
> Feb 16 13:22:33 kernel: audit: type=1130 audit(1676571753.419:18):
> pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel
> msg='unit=dbus-broker comm="systemd" exe="/usr/lib/systemd/systemd"
> hostname=? addr=? terminal=? res=success'
> Feb 16 13:22:33 kernel: audit: type=1130 audit(1676571753.456:19):
> pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel
> msg='unit=dracut-initqueue comm="systemd"
> exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
> Feb 16 13:22:33 kernel: audit: type=1130 audit(1676571753.492:20):
> pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel
> msg='unit=systemd-fsck-root comm="systemd"
> exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
> Feb 16 13:22:33 kernel: EXT4-fs (dm-0): mounted filesystem
> 00107de9-54ef-4784-a03f-61802ed0b350 with ordered data mode. Quota
> mode: none.
> Feb 16 13:22:36 kernel: ------------[ cut here ]------------
> Feb 16 13:22:36 kernel: smu8_send_msg_to_smc_with_parameter(0x0009,
> 0x0) timed out after 2814625 us
> Feb 16 13:22:36 kernel: WARNING: CPU: 1 PID: 112 at
> drivers/gpu/drm/amd/amdgpu/../pm/powerplay/smumgr/smu8_smumgr.c:98
> smu8_send_msg_to_smc_with_parameter+0x103/0x140 [amdgpu]
> Feb 16 13:22:36 kernel: Modules linked in: amdgpu i2c_algo_bit
> drm_ttm_helper ttm iommu_v2 mfd_core drm_buddy gpu_sched
> drm_display_helper drm_kms_helper hid_logitech_hidpp drm
> crct10dif_pclmul crc32_pclmul crc32c_intel r8169 sd_mod
> ghash_clmulni_intel t10_pi sha512_ssse3 crc64_rocksoft_generic
> crc64_rocksoft wdat_wdt sp5100_tco hid_logitech_dj crc64 cec video wmi
> fuse dm_multipath
> Feb 16 13:22:36 kernel: CPU: 1 PID: 112 Comm: kworker/1:3 Not tainted
> 6.2.0-rc8+ #94
> Feb 16 13:22:36 kernel: Hardware name: HP HP Laptop 15-bw0xx/8332,
> BIOS F.52 12/03/2019
> Feb 16 13:22:36 kernel: Workqueue: events amdgpu_vce_idle_work_handler
> [amdgpu]
> Feb 16 13:22:36 kernel: RIP:
> 0010:smu8_send_msg_to_smc_with_parameter+0x103/0x140 [amdgpu]
> Feb 16 13:22:36 kernel: Code: 20 48 c7 c7 28 1c c1 c0 48 89 c1 48 f7
> ea 48 89 c8 44 89 e9 48 c1 f8 3f 48 c1 fa 07 48 29 c2 49 89 d0 44 89
> e2 e8 c5 28 48 e0 <0f> 0b eb b0 bd ea ff ff ff eb a9 48 8b 7b 40 be c0
> 01 00 00 48 8b
> Feb 16 13:22:36 kernel: RSP: 0018:ffffb997004c7db8 EFLAGS: 00010282
> Feb 16 13:22:36 kernel: RAX: 000000000000004b RBX: ffff8b4e4f596800
> RCX: 0000000000000000
> Feb 16 13:22:36 kernel: RDX: 0000000000000001 RSI: ffffffffa14cf075
> RDI: 00000000ffffffff
> Feb 16 13:22:36 kernel: RBP: 00000000ffffffc2 R08: 0000000000000000
> R09: ffffb997004c7c68
> Feb 16 13:22:36 kernel: R10: 0000000000000003 R11: ffffffffa1d42e48
> R12: 0000000000000009
> Feb 16 13:22:36 kernel: R13: 0000000000000000 R14: 00000003ded365a4
> R15: 0000000000000002
> Feb 16 13:22:36 kernel: FS: 0000000000000000(0000)
> GS:ffff8b4f37480000(0000) knlGS:0000000000000000
> Feb 16 13:22:36 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Feb 16 13:22:36 kernel: CR2: 00007f950a698364 CR3: 0000000033c10000
> CR4: 00000000001506e0
> Feb 16 13:22:36 kernel: Call Trace:
> Feb 16 13:22:36 kernel: <TASK>
> Feb 16 13:22:36 kernel: smum_send_msg_to_smc+0xba/0xf0 [amdgpu]
> Feb 16 13:22:36 kernel: smu8_dpm_powergate_vce+0x15a/0x180 [amdgpu]
> Feb 16 13:22:36 kernel: pp_set_powergating_by_smu+0xed/0x1f0 [amdgpu]
> Feb 16 13:22:36 kernel: amdgpu_dpm_set_powergating_by_smu+0x84/0xf0
> [amdgpu]
> Feb 16 13:22:36 kernel: amdgpu_dpm_enable_vce+0x29/0xa0 [amdgpu]
> Feb 16 13:22:36 kernel: process_one_work+0x1c8/0x380
> Feb 16 13:22:36 kernel: worker_thread+0x4d/0x380
> Feb 16 13:22:36 kernel: ? _raw_spin_lock_irqsave+0x23/0x50
> Feb 16 13:22:36 kernel: ? __pfx_worker_thread+0x10/0x10
> Feb 16 13:22:36 kernel: kthread+0xe9/0x110
> Feb 16 13:22:36 kernel: ? __pfx_kthread+0x10/0x10
> Feb 16 13:22:36 kernel: ret_from_fork+0x2c/0x50
> Feb 16 13:22:36 kernel: </TASK>
> Feb 16 13:22:36 kernel: ---[ end trace 0000000000000000 ]---
> Feb 16 13:22:39 kernel: amdgpu:
> smu8_send_msg_to_smc_with_parameter(0x0004) aborted; SMU still
> servicing msg (0x0009)
> Feb 16 13:22:41 kernel: amdgpu:
> smu8_send_msg_to_smc_with_parameter(0x0007) aborted; SMU still
> servicing msg (0x0009)
>
> I'm attaching the kernel log for the boot of 6.2-rc8 + patches with
> the IOMMU errors and amdgpu warnings and timeouts.
>
> Thanks,
>
> Matt
>
> On 2/16/23 00:25, Vasant Hegde wrote:
>> Felix, Jason, Matt,
>>
>>
>> On 2/16/2023 6:05 AM, Felix Kuehling wrote:
>>> [+Shimmer, Aaron]
>>>
>>> Am 2023-02-15 um 10:39 schrieb Bjorn Helgaas:
>>>> [+cc Christian, Xinhui, amd-gfx]
>>>>
>>>> On Fri, Jan 06, 2023 at 01:48:11PM +0800, Baolu Lu wrote:
>>>>> On 1/5/23 11:27 PM, Felix Kuehling wrote:
>>>>>> Am 2023-01-05 um 09:46 schrieb Deucher, Alexander:
>>>>>>>> -----Original Message-----
>>>>>>>> From: Hegde, Vasant <[email protected]>
>>>>>>>> On 1/5/2023 4:07 PM, Baolu Lu wrote:
>>>>>>>>> On 2023/1/5 18:27, Vasant Hegde wrote:
>>>>>>>>>> On 1/5/2023 6:39 AM, Matt Fagnani wrote:
>>>>>>>>>>> I built 6.2-rc2 with the patch applied. The same black
>>>>>>>>>>> screen problem happened with 6.2-rc2 with the patch. I
>>>>>>>>>>> tried to use early kdump with 6.2-rc2 with the patch
>>>>>>>>>>> twice by panicking the kernel with sysrq+alt+c after the
>>>>>>>>>>> black screen happened. The system rebooted after about
>>>>>>>>>>> 10-20 seconds both times, but no kdump and dmesg files
>>>>>>>>>>> were saved in /var/crash. I'm attaching the lspci -vvv
>>>>>>>>>>> output as requested. ...
>>>>>>>>>> Looking into lspci output, it doesn't list ACS feature
>>>>>>>>>> for Graphics card. So with your fix it didn't enable PASID
>>>>>>>>>> and hence it failed to boot. ...
>>>>>>>>> So do you mind telling why does the PASID need to be enabled
>>>>>>>>> for the graphic device? Or in another word, what does the
>>>>>>>>> graphic driver use the PASID for? ...
>>>>>>> The GPU driver uses the pasid for shared virtual memory between
>>>>>>> the CPU and GPU. I.e., so that the user apps can use the same
>>>>>>> virtual address space on the GPU and the CPU. It also uses
>>>>>>> pasid to take advantage of recoverable device page faults using
>>>>>>> PRS. ...
>>>>>> Agreed. This applies to GPU computing on some older AMD APUs that
>>>>>> take advantage of memory coherence and IOMMUv2 address translation
>>>>>> to create a shared virtual address space between the CPU and GPU.
>>>>>> In this case it seems to be a Carrizo APU. It is also true for
>>>>>> Raven APUs. ...
>>>>> Thanks for the explanation.
>>>>>
>>>>> This is actually the problem that commit 201007ef707a was trying to
>>>>> fix. The PCIe fabric routes Memory Requests based on the TLP
>>>>> address, ignoring any PASID (PCIe r6.0, sec 2.2.10.4), so a TLP with
>>>>> PASID that should go upstream to the IOMMU may instead be routed as
>>>>> a P2P Request if its address falls in a bridge window.
>>>>>
>>>>> In SVA case, the IOMMU shares the address space of a user
>>>>> application. The user application side has no knowledge about the
>>>>> PCI bridge window. It is entirely possible that the device is
>>>>> programed with a P2P address and results in a disaster.
>>>> Is this stalled? We explored the idea of changing the PCI core so
>>>> that for devices that use ATS/PRI, we could enable PASID without
>>>> checking for ACS [1], but IIUC we ultimately concluded that it was
>>>> based on a misunderstanding of how ATS Translation Requests are routed
>>>> and that an AMD driver change would be required [2].
>>>>
>>>> So it seems like we still have this regression, and we're running out
>>>> of time before v6.2.
>>>>
>>>> [1]
>>>> https://lore.kernel.org/all/[email protected]/
>>>> [2] https://lore.kernel.org/all/[email protected]/
>>> If I understand this correctly, the HW or the BIOS is doing
>>> something wrong
>>> about reporting ACS. I don't know what the GPU driver can do other
>>> than add some
>>> quirk to stop using AMD IOMMUv2 on this HW/BIOS.
>>>
>>> It looks like the problem is triggered when the driver calls
>>> amd_iommu_init_device. That's when the first WARNs happen, soon
>>> followed by a
>>> kernel oops in report_iommu_fault. The driver doesn't know anything
>>> is wrong
>>> because amd_iommu_init_device seems to return "success". And the
>>> oops is not in
>>> the GPU driver either.
>> WARN is fixed and its in Joerg's tree.
>> https://lore.kernel.org/all/[email protected]/
>>
>> report_iommu_fault() happened because in amd_iommu_init_device() path
>> it failed
>> to attach devices to new domain and returned error. But it didn't put
>> devices
>> back to old domain properly. It left in incosistent state and
>> resulted in IO
>> page fault. I have proposed series to handle device to domain
>> attachment failure
>> and better handling of subsequent report_iommu_fault().
>> https://lore.kernel.org/linux-iommu/[email protected]/
>>
>>
>>
>> @Matt,
>> Can you please help to verify above patches on your system where
>> you hit the
>> issue originally?
>> (Grab above two series, apply it on top of latest kernel and test it)
>>
>> -Vasant
>>
Matt,
Thanks a lot for testing and the dmesg log.
On 2/17/2023 12:29 AM, Matt Fagnani wrote:
> Vasant,
>
> I applied your four patches to 6.2-rc8 and built that. The black screen, null
> pointer dereference, and warnings didn't happen when booting 6.2-rc8 with your
> patches. There were errors that the IOMMU wasn't restarted when amdgpu and
> amdkfd was starting though at kernel: kfd kfd: amdgpu: Failed to resume IOMMU
> for device 1002:9874. I don't know if those IOMMU errors were expected or not,
This patch is not for fixing PASID enablement issue. Its more of gracefully
handling the error path.
This means patch worked in expected way. i. e. It failed to enable PASID because
of original patch (commit 201007ef70), it didn't attach devices to new domain
and attach devices back to default domain.
It returned error to GPU saying we couldn't enable PASID/PRI. Hence we saw above
error message.
-Vasant
Hi Felix,
On 2/17/2023 1:29 AM, Felix Kuehling wrote:
>> Feb 16 13:22:32 kernel: kfd kfd: amdgpu: Failed to resume IOMMU for device
>> 1002:9874
>> Feb 16 13:22:32 kernel: kfd kfd: amdgpu: device 1002:9874 NOT added due to errors
> This look like IOMMU device initialization still fails (but more gracefully
> now). Vasant, is that expected?
My fix is to gracefully handle failure paths in IOMMU. So above logs are
expected. Basically it means IOMMU couldn't attach devices to new domain
(because it couldn't enable PASID on AMD GPU as ACS RR/UF flags are missing, see
commit 201007ef707 ) and we did fall back to old domain properly.
It also means that GPU will not be able to use PASID/PRI. If you need these
feauteres then you have to look into commit 201007ef707 and see how we can
enable PASID for GPU (without ACS UF/RR flag?).
>
> This would lead to KFD not being available on Carrizo with this kernel, which is
> probably not a big limitation in practice. It would only affect compute
> applications using the ROCm user mode stack. I don't think anyone does that
> these days on these old APUs.
>
> The SMU errors seem unrelated to this unless there is some subtle interaction
> I'm missing.
I have no idea about GPU warning. All I can say is IOMMU side looks good but
PASID/PRI is not enabled for GPU.
-Vasant