2023-09-22 13:49:27

by Hector Martin

[permalink] [raw]
Subject: [PATCH REGRESSION] iommu: Only allocate FQ domains for IOMMUs that support them

Commit a4fdd9762272 ("iommu: Use flush queue capability") hid the
IOMMU_DOMAIN_DMA_FQ domain type from domain allocation. A check was
introduced in iommu_dma_init_domain() to fall back if not supported, but
this check runs too late: by that point, devices have been attached to
the IOMMU, and the IOMMU driver might not expect FQ domains at
ops->attach_dev() time.

Ensure that we immediately clamp FQ domains to plain DMA if not
supported by the driver at device attach time, not later.

This regressed apple-dart in v6.5.

Cc: [email protected]
Cc: [email protected]
Fixes: a4fdd9762272 ("iommu: Use flush queue capability")
Signed-off-by: Hector Martin <[email protected]>
---
drivers/iommu/iommu.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3bfc56df4f78..12464eaa8d91 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2039,6 +2039,15 @@ static int __iommu_attach_device(struct iommu_domain *domain,
if (unlikely(domain->ops->attach_dev == NULL))
return -ENODEV;

+ /*
+ * Ensure we do not try to attach devices to FQ domains if the
+ * IOMMU does not support them. We can safely fall back to
+ * non-FQ.
+ */
+ if (domain->type == IOMMU_DOMAIN_DMA_FQ &&
+ !device_iommu_capable(dev, IOMMU_CAP_DEFERRED_FLUSH))
+ domain->type = IOMMU_DOMAIN_DMA;
+
ret = domain->ops->attach_dev(domain, dev);
if (ret)
return ret;

---
base-commit: ce9ecca0238b140b88f43859b211c9fdfd8e5b70
change-id: 20230922-iommu-type-regression-25b4f43df770

Best regards,
--
Hector Martin <[email protected]>


2023-09-22 14:38:53

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH REGRESSION] iommu: Only allocate FQ domains for IOMMUs that support them

On 22/09/2023 2:40 pm, Hector Martin wrote:
> Commit a4fdd9762272 ("iommu: Use flush queue capability") hid the
> IOMMU_DOMAIN_DMA_FQ domain type from domain allocation. A check was
> introduced in iommu_dma_init_domain() to fall back if not supported, but
> this check runs too late: by that point, devices have been attached to
> the IOMMU, and the IOMMU driver might not expect FQ domains at
> ops->attach_dev() time.
>
> Ensure that we immediately clamp FQ domains to plain DMA if not
> supported by the driver at device attach time, not later.
>
> This regressed apple-dart in v6.5.

Apologies, I missed that apple-dart was doing something unusual here.
However, could we just fix that directly instead?

diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c
index 2082081402d3..0b8927508427 100644
--- a/drivers/iommu/apple-dart.c
+++ b/drivers/iommu/apple-dart.c
@@ -671,8 +671,7 @@ static int apple_dart_attach_dev(struct iommu_domain
*domain,
return ret;

switch (domain->type) {
- case IOMMU_DOMAIN_DMA:
- case IOMMU_DOMAIN_UNMANAGED:
+ default:
ret = apple_dart_domain_add_streams(dart_domain, cfg);
if (ret)
return ret;


That's pretty much where we're headed with the domain_alloc_paging
redesign anyway - at the driver level, operations on a paging domain
should not need to know about the higher-level usage intent of that
domain. Ideally, blocking and identity domains should have their own
distinct ops now as well, but that might be a bit too big a change for
an immediate fix here.

Thanks,
Robin.

>
> Cc: [email protected]
> Cc: [email protected]
> Fixes: a4fdd9762272 ("iommu: Use flush queue capability")
> Signed-off-by: Hector Martin <[email protected]>
> ---
> drivers/iommu/iommu.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 3bfc56df4f78..12464eaa8d91 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2039,6 +2039,15 @@ static int __iommu_attach_device(struct iommu_domain *domain,
> if (unlikely(domain->ops->attach_dev == NULL))
> return -ENODEV;
>
> + /*
> + * Ensure we do not try to attach devices to FQ domains if the
> + * IOMMU does not support them. We can safely fall back to
> + * non-FQ.
> + */
> + if (domain->type == IOMMU_DOMAIN_DMA_FQ &&
> + !device_iommu_capable(dev, IOMMU_CAP_DEFERRED_FLUSH))
> + domain->type = IOMMU_DOMAIN_DMA;
> +
> ret = domain->ops->attach_dev(domain, dev);
> if (ret)
> return ret;
>
> ---
> base-commit: ce9ecca0238b140b88f43859b211c9fdfd8e5b70
> change-id: 20230922-iommu-type-regression-25b4f43df770
>
> Best regards,

2023-09-22 15:59:17

by Hector Martin

[permalink] [raw]
Subject: Re: [PATCH REGRESSION] iommu: Only allocate FQ domains for IOMMUs that support them

On 22/09/2023 23.21, Robin Murphy wrote:
> On 22/09/2023 2:40 pm, Hector Martin wrote:
>> Commit a4fdd9762272 ("iommu: Use flush queue capability") hid the
>> IOMMU_DOMAIN_DMA_FQ domain type from domain allocation. A check was
>> introduced in iommu_dma_init_domain() to fall back if not supported, but
>> this check runs too late: by that point, devices have been attached to
>> the IOMMU, and the IOMMU driver might not expect FQ domains at
>> ops->attach_dev() time.
>>
>> Ensure that we immediately clamp FQ domains to plain DMA if not
>> supported by the driver at device attach time, not later.
>>
>> This regressed apple-dart in v6.5.
>
> Apologies, I missed that apple-dart was doing something unusual here.
> However, could we just fix that directly instead?
>
> diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c
> index 2082081402d3..0b8927508427 100644
> --- a/drivers/iommu/apple-dart.c
> +++ b/drivers/iommu/apple-dart.c
> @@ -671,8 +671,7 @@ static int apple_dart_attach_dev(struct iommu_domain
> *domain,
> return ret;
>
> switch (domain->type) {
> - case IOMMU_DOMAIN_DMA:
> - case IOMMU_DOMAIN_UNMANAGED:
> + default:
> ret = apple_dart_domain_add_streams(dart_domain, cfg);
> if (ret)
> return ret;
>
>
> That's pretty much where we're headed with the domain_alloc_paging
> redesign anyway - at the driver level, operations on a paging domain
> should not need to know about the higher-level usage intent of that
> domain. Ideally, blocking and identity domains should have their own
> distinct ops now as well, but that might be a bit too big a change for
> an immediate fix here.

Sure, but it sounded like if there's a capability for this the core
should probably use it and not expose the type at all to drivers that
can't support it :)

If you think defaulting to that branch in DART is correctly future-proof
I can make that change. It's not the only driver checking the domain
type in attach_dev(), but it might be the only one enumerating all the
options instead of checking for specific cases only (e.g. intel checks
for IOMMU_DOMAIN_IDENTITY).

- Hector

2023-09-23 04:34:02

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH REGRESSION] iommu: Only allocate FQ domains for IOMMUs that support them

On Fri, Sep 22, 2023 at 03:21:17PM +0100, Robin Murphy wrote:
> On 22/09/2023 2:40 pm, Hector Martin wrote:
> > Commit a4fdd9762272 ("iommu: Use flush queue capability") hid the
> > IOMMU_DOMAIN_DMA_FQ domain type from domain allocation. A check was
> > introduced in iommu_dma_init_domain() to fall back if not supported, but
> > this check runs too late: by that point, devices have been attached to
> > the IOMMU, and the IOMMU driver might not expect FQ domains at
> > ops->attach_dev() time.
> >
> > Ensure that we immediately clamp FQ domains to plain DMA if not
> > supported by the driver at device attach time, not later.
> >
> > This regressed apple-dart in v6.5.
>
> Apologies, I missed that apple-dart was doing something unusual here.
> However, could we just fix that directly instead?
>
> diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c
> index 2082081402d3..0b8927508427 100644
> --- a/drivers/iommu/apple-dart.c
> +++ b/drivers/iommu/apple-dart.c
> @@ -671,8 +671,7 @@ static int apple_dart_attach_dev(struct iommu_domain
> *domain,
> return ret;
>
> switch (domain->type) {
> - case IOMMU_DOMAIN_DMA:
> - case IOMMU_DOMAIN_UNMANAGED:
> + default:
> ret = apple_dart_domain_add_streams(dart_domain, cfg);
> if (ret)
> return ret;

Yes, I much prefer this to the original patch please. Drivers should
not be testing DMA_FQ at all.

I already wrote a series to convert DART to domain_alloc_paging() that
fixes this inadvertantly.

Robin's suggestion is good for a temporary -rc fix.

Removing the switch is slightly more robust:

if (domain->type & domain->type & __IOMMU_DOMAIN_PAGING) {
[..]
return 0
}

if (domain->type == IOMMU_DOMAIN_BLOCKED) {
..
}

return -EOPNOTSUPP;

But not so worthwhile since I deleted all this anyhow...

I'll send out the dart series, it can't go to -rc, so a patch is still needed.

Thanks,
Jason

Subject: Re: [PATCH REGRESSION] iommu: Only allocate FQ domains for IOMMUs that support them

[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

On 22.09.23 15:40, Hector Martin wrote:
> Commit a4fdd9762272 ("iommu: Use flush queue capability") hid the
> IOMMU_DOMAIN_DMA_FQ domain type from domain allocation. A check was
> introduced in iommu_dma_init_domain() to fall back if not supported, but
> this check runs too late: by that point, devices have been attached to
> the IOMMU, and the IOMMU driver might not expect FQ domains at
> ops->attach_dev() time.
>
> Ensure that we immediately clamp FQ domains to plain DMA if not
> supported by the driver at device attach time, not later.
>
> This regressed apple-dart in v6.5.
> [...]


Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced a4fdd9762272
#regzbot title iommu: apple-dart regressed
#regzbot monitor:
https://lore.kernel.org/all/[email protected]/
#regzbot fix: iommu/apple-dart: Handle DMA_FQ domains in attach_dev()
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.