2024-04-16 13:01:03

by Robin Murphy

[permalink] [raw]
Subject: [PATCH] iommu: Fix def_domain_type interaction with untrusted devices

Previously, an untrusted device forcing IOMMU_DOMAIN_DMA always took
precedence over whatever a driver's def_domain_type may have wanted to
say. This was intentionally handled in core code since 3 years prior,
to avoid drivers poking at the details of what is essentially a policy
between the PCI core and the IOMMU core. Now, though, we go to the
length of evaluating both constraints to check for any conflict, and if
so throw our toys out of the pram and refuse to handle the device at
all. Regardless of any intent, in practice this leaves the device, and
potentially the rest of its group or even the whole IOMMU, in a largely
undetermined state, which at worst may render the whole system unusable.
Unfortunately it turns out that this is a realistic situation to run
into by connecting a PASID-capable device (e.g. a GPU) to an AMD-based
laptop via a Thunderbolt expansion box, since the AMD IOMMU driver needs
an identity default domain for PASIDs to be usable, and thus sets a
def_domain_type override based on PASID capability.

In general, restoring the old behaviour of forcing translation will not
make that device's operation any more broken than leaving it potentially
blocked or subject to the rest of a group's translations would, nor will
it be any less safe than leaving it potentially bypassed or subject to
the rest of a group's translations would, so do that, and let eGPUs work
again.

Reported-by: Eric Wagner <[email protected]>
Link: https://lore.kernel.org/linux-iommu/CAHudX3zLH6CsRmLE-yb+gRjhh-v4bU5_1jW_xCcxOo_oUUZKYg@mail.gmail.com
Fixes: 59ddce4418da ("iommu: Reorganize iommu_get_default_domain_type() to respect def_domain_type()")
Signed-off-by: Robin Murphy <[email protected]>
---
drivers/iommu/iommu.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 996e79dc582d..90dbea14d4d6 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1772,9 +1772,8 @@ static int iommu_get_default_domain_type(struct iommu_group *group,
if (driver_type && driver_type != IOMMU_DOMAIN_DMA) {
dev_err_ratelimited(
untrusted,
- "Device is not trusted, but driver is overriding group %u to %s, refusing to probe.\n",
+ "IOMMU_DOMAIN_DMA for untrusted device overrides driver request of %s for group %u, expect issues...\n",
group->id, iommu_domain_type_str(driver_type));
- return -1;
}
driver_type = IOMMU_DOMAIN_DMA;
}
--
2.39.2.101.g768bb238c484.dirty



2024-04-16 14:34:27

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH] iommu: Fix def_domain_type interaction with untrusted devices

On Tue, Apr 16, 2024 at 02:00:43PM +0100, Robin Murphy wrote:
> Previously, an untrusted device forcing IOMMU_DOMAIN_DMA always took
> precedence over whatever a driver's def_domain_type may have wanted to
> say. This was intentionally handled in core code since 3 years prior,
> to avoid drivers poking at the details of what is essentially a policy
> between the PCI core and the IOMMU core. Now, though, we go to the
> length of evaluating both constraints to check for any conflict, and if
> so throw our toys out of the pram and refuse to handle the device at
> all. Regardless of any intent, in practice this leaves the device, and
> potentially the rest of its group or even the whole IOMMU, in a largely
> undetermined state, which at worst may render the whole system unusable.
> Unfortunately it turns out that this is a realistic situation to run
> into by connecting a PASID-capable device (e.g. a GPU) to an AMD-based
> laptop via a Thunderbolt expansion box, since the AMD IOMMU driver needs
> an identity default domain for PASIDs to be usable, and thus sets a
> def_domain_type override based on PASID capability.
>
> In general, restoring the old behaviour of forcing translation will not
> make that device's operation any more broken than leaving it potentially
> blocked or subject to the rest of a group's translations would, nor will
> it be any less safe than leaving it potentially bypassed or subject to
> the rest of a group's translations would, so do that, and let eGPUs work
> again.
>
> Reported-by: Eric Wagner <[email protected]>
> Link: https://lore.kernel.org/linux-iommu/CAHudX3zLH6CsRmLE-yb+gRjhh-v4bU5_1jW_xCcxOo_oUUZKYg@mail.gmail.com
> Fixes: 59ddce4418da ("iommu: Reorganize iommu_get_default_domain_type() to respect def_domain_type()")
> Signed-off-by: Robin Murphy <[email protected]>
> ---
> drivers/iommu/iommu.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 996e79dc582d..90dbea14d4d6 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1772,9 +1772,8 @@ static int iommu_get_default_domain_type(struct iommu_group *group,
> if (driver_type && driver_type != IOMMU_DOMAIN_DMA) {
> dev_err_ratelimited(
> untrusted,
> - "Device is not trusted, but driver is overriding group %u to %s, refusing to probe.\n",
> + "IOMMU_DOMAIN_DMA for untrusted device overrides driver request of %s for group %u, expect issues...\n",
> group->id, iommu_domain_type_str(driver_type));
> - return -1;
> }
> driver_type = IOMMU_DOMAIN_DMA;
> }
> --
> 2.39.2.101.g768bb238c484.dirty
>
>

Hi,

This is the friendly patch-bot of Greg Kroah-Hartman. You have sent him
a patch that has triggered this response. He used to manually respond
to these common problems, but in order to save his sanity (he kept
writing the same thing over and over, yet to different people), I was
created. Hopefully you will not take offence and will fix the problem
in your patch and resubmit it so that it can be accepted into the Linux
kernel tree.

You are receiving this message because of the following common error(s)
as indicated below:

- You have marked a patch with a "Fixes:" tag for a commit that is in an
older released kernel, yet you do not have a cc: stable line in the
signed-off-by area at all, which means that the patch will not be
applied to any older kernel releases. To properly fix this, please
follow the documented rules in the
Documentation/process/stable-kernel-rules.rst file for how to resolve
this.

If you wish to discuss this problem further, or you have questions about
how to resolve this issue, please feel free to respond to this email and
Greg will reply once he has dug out from the pending patches received
from other developers.

thanks,

greg k-h's patch email bot

2024-04-16 15:29:53

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] iommu: Fix def_domain_type interaction with untrusted devices

On Tue, Apr 16, 2024 at 02:00:43PM +0100, Robin Murphy wrote:
> Previously, an untrusted device forcing IOMMU_DOMAIN_DMA always took
> precedence over whatever a driver's def_domain_type may have wanted to
> say. This was intentionally handled in core code since 3 years prior,
> to avoid drivers poking at the details of what is essentially a policy
> between the PCI core and the IOMMU core. Now, though, we go to the
> length of evaluating both constraints to check for any conflict, and if
> so throw our toys out of the pram and refuse to handle the device at
> all. Regardless of any intent, in practice this leaves the device, and
> potentially the rest of its group or even the whole IOMMU, in a largely
> undetermined state, which at worst may render the whole system
> unusable.

For systems supporting untrusted device security the translation must
be BLOCKED at this point.

> Unfortunately it turns out that this is a realistic situation to run
> into by connecting a PASID-capable device (e.g. a GPU) to an AMD-based
> laptop via a Thunderbolt expansion box, since the AMD IOMMU driver needs
> an identity default domain for PASIDs to be usable, and thus sets a
> def_domain_type override based on PASID capability.

The majority of places implementing def_domain_type are using it as a
statement of HW capability that should not be ignored by the core code:

- DART
* system page size is too small, can't map IOPTEs, force identity
* iommu does not support IDENTITY at all, force paging
- tegra: Device quirks mean paging and DMA API doesn't work
- amd: The driver can't support PAGING when in SNP mode
- SMMU: The driver can't support paging when in legacy binding mode and
paging domain allocation fails as well
- qcom: Looks like these devices have some iommu bypass bus in their
HW and paging doesn't work
- SMMUv3: The comment says HISI devices cannot support paging due to a HW quirk

For these force overriding the driver knowledge will either result in
domain allocate/attach failure or a broken DMA environment anyhow.

The AMD PASID thing is actually unique because the driver can still
fully support PAGING, despite it wrongly telling the core code that it
can't.

This is happening because it is all just a hack to work around the
incomplete SW implementation in the AMD driver. When the AMD driver is
completed its def_domain_type should be removed entirely.

Since actual PASID AMD attach isn't implemented yet we could just
remove that check from def_domain_type as an RC fix. Vasant can sort
it out properly by disabling pasid support on untrusted devices until
the DTE logic is fully completed.

> In general, restoring the old behaviour of forcing translation will not
> make that device's operation any more broken than leaving it potentially
> blocked or subject to the rest of a group's translations would, nor will
> it be any less safe than leaving it potentially bypassed or subject to
> the rest of a group's translations would, so do that, and let eGPUs work
> again.

Well, this is true, since we can't handle the probe error it doesn't
matter what we do.

Jason

2024-04-17 05:23:30

by Vasant Hegde

[permalink] [raw]
Subject: Re: [PATCH] iommu: Fix def_domain_type interaction with untrusted devices

Hi Jason, Robin,


On 4/16/2024 8:59 PM, Jason Gunthorpe wrote:
> On Tue, Apr 16, 2024 at 02:00:43PM +0100, Robin Murphy wrote:
>> Previously, an untrusted device forcing IOMMU_DOMAIN_DMA always took
>> precedence over whatever a driver's def_domain_type may have wanted to
>> say. This was intentionally handled in core code since 3 years prior,
>> to avoid drivers poking at the details of what is essentially a policy
>> between the PCI core and the IOMMU core. Now, though, we go to the
>> length of evaluating both constraints to check for any conflict, and if
>> so throw our toys out of the pram and refuse to handle the device at
>> all. Regardless of any intent, in practice this leaves the device, and
>> potentially the rest of its group or even the whole IOMMU, in a largely
>> undetermined state, which at worst may render the whole system
>> unusable.
>
> For systems supporting untrusted device security the translation must
> be BLOCKED at this point.
>
>> Unfortunately it turns out that this is a realistic situation to run
>> into by connecting a PASID-capable device (e.g. a GPU) to an AMD-based
>> laptop via a Thunderbolt expansion box, since the AMD IOMMU driver needs
>> an identity default domain for PASIDs to be usable, and thus sets a
>> def_domain_type override based on PASID capability.
>
> The majority of places implementing def_domain_type are using it as a
> statement of HW capability that should not be ignored by the core code:
>
> - DART
> * system page size is too small, can't map IOPTEs, force identity
> * iommu does not support IDENTITY at all, force paging
> - tegra: Device quirks mean paging and DMA API doesn't work
> - amd: The driver can't support PAGING when in SNP mode

Actually When SNP (Secure Nested Paging) is enabled in host, AMD driver forces
DMA translation mode with AMD V1 page table.


> - SMMU: The driver can't support paging when in legacy binding mode and
> paging domain allocation fails as well
> - qcom: Looks like these devices have some iommu bypass bus in their
> HW and paging doesn't work
> - SMMUv3: The comment says HISI devices cannot support paging due to a HW quirk
>
> For these force overriding the driver knowledge will either result in
> domain allocate/attach failure or a broken DMA environment anyhow.
>
> The AMD PASID thing is actually unique because the driver can still
> fully support PAGING, despite it wrongly telling the core code that it
> can't.

As I mentioned in other thread, AMD driver will be fixed to support paging mode
with V2 page table for PASID. I will look into it soon.


>
> This is happening because it is all just a hack to work around the
> incomplete SW implementation in the AMD driver. When the AMD driver is
> completed its def_domain_type should be removed entirely.

Not related to this topic, but for completeness.. AMD driver has many condition
to deal. like :
- Memory Encryption support - by default enforce paging mode
- SNP - Enforce paging mode with AMD V1 page table
- GPUs - Identity mapping

>
> Since actual PASID AMD attach isn't implemented yet we could just
> remove that check from def_domain_type as an RC fix. Vasant can sort
> it out properly by disabling pasid support on untrusted devices until
> the DTE logic is fully completed.

Keeping PASID support aside, largely the question is who should handle/decide
domain type for untrusted device? Is it core IOMMU layer -OR- HW driver?
- If its core layer, then this patch looks good to me.
- If its individual driver, then I can try to add fix in AMD driver. But then
what is the expectation? Driver is expected to return IOMMU_DOMAIN_DMA -OR- core
IOMMU layer is expected to adhere to whatever driver returned?


-Vasant


>
>> In general, restoring the old behaviour of forcing translation will not
>> make that device's operation any more broken than leaving it potentially
>> blocked or subject to the rest of a group's translations would, nor will
>> it be any less safe than leaving it potentially bypassed or subject to
>> the rest of a group's translations would, so do that, and let eGPUs work
>> again.
>
> Well, this is true, since we can't handle the probe error it doesn't
> matter what we do.
>
> Jason

2024-04-17 16:07:41

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] iommu: Fix def_domain_type interaction with untrusted devices

On Wed, Apr 17, 2024 at 10:53:00AM +0530, Vasant Hegde wrote:
> > - DART
> > * system page size is too small, can't map IOPTEs, force identity
> > * iommu does not support IDENTITY at all, force paging
> > - tegra: Device quirks mean paging and DMA API doesn't work
> > - amd: The driver can't support PAGING when in SNP mode
>
> Actually When SNP (Secure Nested Paging) is enabled in host, AMD driver forces
> DMA translation mode with AMD V1 page table.

Why does it return IDENTITY?

if (!cc_platform_has(CC_ATTR_MEM_ENCRYPT) && !amd_iommu_snp_en)
return IOMMU_DOMAIN_IDENTITY;

If the HW can't support IDENTITY then it needs to return
IOMMU_DOMAIN_DMA..

> > Since actual PASID AMD attach isn't implemented yet we could just
> > remove that check from def_domain_type as an RC fix. Vasant can sort
> > it out properly by disabling pasid support on untrusted devices until
> > the DTE logic is fully completed.
>
> Keeping PASID support aside, largely the question is who should handle/decide
> domain type for untrusted device? Is it core IOMMU layer -OR- HW
> driver?

def_domain_type() should return a HW limitation. It has got muddled up
with the default domain type stuff, but in essence it's usage right
now is to tell the core code that a specific device cannot support
IDENTITY or PAGING. It should be combined with a matching test in
attach_dev to block the unsupported domain combination.

The core code should use that information to decice on policy. If the
device reports it cannot support PAGING and the device is untrusted
then, IMHO the core code should leave the device blocked and
inoperable for security.

The key thing is this API should not be used for the driver to inject
some kind policy/optimization choice.

> - If its individual driver, then I can try to add fix in AMD driver. But then
> what is the expectation? Driver is expected to return IOMMU_DOMAIN_DMA -OR- core
> IOMMU layer is expected to adhere to whatever driver returned?

If driver and HW supports PAGING and IDENTITY attaches to the devices
RID then it should return 0.

If only PAGING is supported it should return
IOMMU_DOMAIN_DMA. alloc_domain_paging(dev) should select the correct
page table format - including a format to enable features like PASID.

If only IDENTITY is supported it should return IOMMU_DOMAIN_IDENTITY.

If AMD SW cannot support PASID & RID PAGING right now then it should
return 0 and fail that attachment combination during set_dev_pasid().

In the interm you should instruct users to use the command line option
to force IDENTITY until the driver work is completed.

If there are embedded GPUs or something that have an implementation
that bypasses the IOMMU for DMA then you need an ID list to check
against like some of the ARM drivers do.

Jason

2024-04-18 11:45:22

by Vasant Hegde

[permalink] [raw]
Subject: Re: [PATCH] iommu: Fix def_domain_type interaction with untrusted devices

Jason,


On 4/17/2024 9:36 PM, Jason Gunthorpe wrote:
> On Wed, Apr 17, 2024 at 10:53:00AM +0530, Vasant Hegde wrote:
>>> - DART
>>> * system page size is too small, can't map IOPTEs, force identity
>>> * iommu does not support IDENTITY at all, force paging
>>> - tegra: Device quirks mean paging and DMA API doesn't work
>>> - amd: The driver can't support PAGING when in SNP mode
>>
>> Actually When SNP (Secure Nested Paging) is enabled in host, AMD driver forces
>> DMA translation mode with AMD V1 page table.
>
> Why does it return IDENTITY?

We return IDENTITY only if SNP and memory encryption is not enabled and device
is SVA capable. Upstream has below code (v6.9-rc4)

if (pdev_pasid_supported(dev_data) &&
!cc_platform_has(CC_ATTR_MEM_ENCRYPT) &&
!amd_iommu_snp_en) {
return IOMMU_DOMAIN_IDENTITY;
}


and during boot, we will enforce Paging domain when Encryption/SNP is enabled.


>
> if (!cc_platform_has(CC_ATTR_MEM_ENCRYPT) && !amd_iommu_snp_en)
> return IOMMU_DOMAIN_IDENTITY;
>
> If the HW can't support IDENTITY then it needs to return
> IOMMU_DOMAIN_DMA..
>
>>> Since actual PASID AMD attach isn't implemented yet we could just
>>> remove that check from def_domain_type as an RC fix. Vasant can sort
>>> it out properly by disabling pasid support on untrusted devices until
>>> the DTE logic is fully completed.
>>
>> Keeping PASID support aside, largely the question is who should handle/decide
>> domain type for untrusted device? Is it core IOMMU layer -OR- HW
>> driver?
>
> def_domain_type() should return a HW limitation. It has got muddled up
> with the default domain type stuff, but in essence it's usage right
> now is to tell the core code that a specific device cannot support
> IDENTITY or PAGING. It should be combined with a matching test in
> attach_dev to block the unsupported domain combination.
>
> The core code should use that information to decice on policy. If the
> device reports it cannot support PAGING and the device is untrusted
> then, IMHO the core code should leave the device blocked and
> inoperable for security.
>
> The key thing is this API should not be used for the driver to inject
> some kind policy/optimization choice.

So far the policy was core layer to handle untrusted device which got changed
and hence we hit this bug.

I don't see any issue with enforcing these checks inside AMD driver.. If we go
with this approach then IMO core should just adhere to what driver returned
rather than failing. Having policy decision at two level is inviting trouble
like this one.



>
>> - If its individual driver, then I can try to add fix in AMD driver. But then
>> what is the expectation? Driver is expected to return IOMMU_DOMAIN_DMA -OR- core
>> IOMMU layer is expected to adhere to whatever driver returned?
>
> If driver and HW supports PAGING and IDENTITY attaches to the devices
> RID then it should return 0.
>
> If only PAGING is supported it should return
> IOMMU_DOMAIN_DMA. alloc_domain_paging(dev) should select the correct
> page table format - including a format to enable features like PASID.
>
> If only IDENTITY is supported it should return IOMMU_DOMAIN_IDENTITY.
>
> If AMD SW cannot support PASID & RID PAGING right now then it should
> return 0 and fail that attachment combination during set_dev_pasid().
>
> In the interm you should instruct users to use the command line option
> to force IDENTITY until the driver work is completed.

This is a regression hence it should be fixed. All other enhancement/fixes can
be done later.

-Vasant

>
> If there are embedded GPUs or something that have an implementation
> that bypasses the IOMMU for DMA then you need an ID list to check
> against like some of the ARM drivers do.
>
> Jason

2024-04-18 12:03:14

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] iommu: Fix def_domain_type interaction with untrusted devices

On Thu, Apr 18, 2024 at 05:14:59PM +0530, Vasant Hegde wrote:
> We return IDENTITY only if SNP and memory encryption is not enabled and device
> is SVA capable. Upstream has below code (v6.9-rc4)
>
> if (pdev_pasid_supported(dev_data) &&
> !cc_platform_has(CC_ATTR_MEM_ENCRYPT) &&
> !amd_iommu_snp_en) {
> return IOMMU_DOMAIN_IDENTITY;
> }
>
>
> and during boot, we will enforce Paging domain when Encryption/SNP is enabled.

It is very confusing considering the comment says identity isn't supported:

* Do not identity map IOMMUv2 capable devices when:
* - memory encryption is active, because some of those devices
* (AMD GPUs) don't have the encryption bit in their DMA-mask
* and require remapping.
* - SNP is enabled, because it prohibits DTE[Mode]=0.
*/

Then it goes on to return IDENTITY anyhow.

If the HW cannot identity map it needs to return DMA for those cases:

/* SNP is enabled, because it prohibits DTE[Mode]=0, IDENTITY is not
* supported */
if (amd_snp_en)
return IOMMU_DOMAIN_DMA;

/* memory encryption is active, because some of those devices
* (AMD GPUs) don't have the encryption bit in their DMA-mask
* and require remapping. IDENTITY is not supported.
*/
if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
return IOMMU_DOMAIN_DMA;


> So far the policy was core layer to handle untrusted device which got changed
> and hence we hit this bug.

Well I would say the AMD driver changed to mis-use def_domain for
policy and that causes problems.

> I don't see any issue with enforcing these checks inside AMD driver.. If we go
> with this approach then IMO core should just adhere to what driver returned
> rather than failing. Having policy decision at two level is inviting trouble
> like this one.

Again, no policy in drivers.

def_domain should return only the HW capability. If it returns
IDENTITY then the driver must fail attaches of PAGING.

You should send a patch fixing this function and remove the PASID test
entirely for now.

Jason

2024-04-23 11:28:01

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH] iommu: Fix def_domain_type interaction with untrusted devices

On 16/04/2024 4:29 pm, Jason Gunthorpe wrote:
> On Tue, Apr 16, 2024 at 02:00:43PM +0100, Robin Murphy wrote:
>> Previously, an untrusted device forcing IOMMU_DOMAIN_DMA always took
>> precedence over whatever a driver's def_domain_type may have wanted to
>> say. This was intentionally handled in core code since 3 years prior,
>> to avoid drivers poking at the details of what is essentially a policy
>> between the PCI core and the IOMMU core. Now, though, we go to the
>> length of evaluating both constraints to check for any conflict, and if
>> so throw our toys out of the pram and refuse to handle the device at
>> all. Regardless of any intent, in practice this leaves the device, and
>> potentially the rest of its group or even the whole IOMMU, in a largely
>> undetermined state, which at worst may render the whole system
>> unusable.
>
> For systems supporting untrusted device security the translation must
> be BLOCKED at this point.

So why is that not enforced then? Simply hoping that calling
ops->release_device() or returning an error from iommu_device_register()
might do the right thing is no guarantee. Furthermore I'm pretty sure
we're still letting an untrusted device be hotplugged into an existing
group without any checks at all.

Meanwhile if we go back to letting untrusted take priority and attach
the device to an empty DMA domain, oh hey look it's blocked, and what's
more we also retain control of the IOMMU to keep it that way. Problem
solved. You intentionally changed the code to be less effective at what
you say it should be doing, so once again I'm left a little puzzled as
to what point you're trying to argue here.

>> Unfortunately it turns out that this is a realistic situation to run
>> into by connecting a PASID-capable device (e.g. a GPU) to an AMD-based
>> laptop via a Thunderbolt expansion box, since the AMD IOMMU driver needs
>> an identity default domain for PASIDs to be usable, and thus sets a
>> def_domain_type override based on PASID capability.
>
> The majority of places implementing def_domain_type are using it as a
> statement of HW capability that should not be ignored by the core code:
>
> - DART
> * system page size is too small, can't map IOPTEs, force identity

Not a hardware limitation at all, it supports paging just fine, and
iommu-dma *could* in principle work with larger pages with a bit of
effort (some proof-of-concept patches were posted some time ago). This
is entirely the driver hacking around the legacy of iommu-dma and core
code not expecting the situation and thus not detecting it or handling
it gracefully. In fact I should be able to sort that out relatively
soon, once my other stuff lands and I'm able to start pulling
iommu_dma_init_domain() apart.

> * iommu does not support IDENTITY at all, force paging
> - tegra: Device quirks mean paging and DMA API doesn't work

Hmm, I don't recall any specific device concerns. What I do know is that
historically this driver has always wanted to prevent the ARM DMA domain
or older versions of iommu-dma (prior to iommu_get_dma_domain()) from
getting in the way of what the host1x drivers expect, so if there is any
actual hardware policy here, it's very much implicitly hanging on the
coat-tails of software policy.

> - amd: The driver can't support PAGING when in SNP mode
> - SMMU: The driver can't support paging when in legacy binding mode and
> paging domain allocation fails as well

Quite obviously nothing to do with hardware. That's there solely because
using the old binding prevents us from reasonably being able to order
binding the IOMMU driver against the client drivers, so we can't risk
having iommu-dma come up and change the DMA ops underneath an
already-active device.

> - qcom: Looks like these devices have some iommu bypass bus in their
> HW and paging doesn't work

I believe that is true for some of the modem stuff, however again
there's also plenty of software policy in there too; some of it the same
thing about default domains getting in the way of how the GPU/display
drivers want to use dma-direct for cache maintenance underneath their
own unmanaged domains, one is completely abusing the domain type in a
devious hack which just happens to make some behaviour fall out of the
rest of the driver to satisfy an unrelated hardware/firmware constraint
(which is still *supposed* to be only a temporary fix pending a proper
solution, ha...)

> - SMMUv3: The comment says HISI devices cannot support paging due to a HW quirk
>
> For these force overriding the driver knowledge will either result in
> domain allocate/attach failure or a broken DMA environment anyhow.
>
> The AMD PASID thing is actually unique because the driver can still
> fully support PAGING, despite it wrongly telling the core code that it
> can't.
>
> This is happening because it is all just a hack to work around the
> incomplete SW implementation in the AMD driver.

Not really, it's shown up because the core code took an illogical step
backwards. Without significant surgery to IOMMU and/or driver core code,
the best and most reasonable thing to do at the moment is still the one
which happens to work out OK for the current AMD behaviour. Yes, if we
*had* done all the work to make the core code super-robust then the
issue could have been the GPU driver gracefully failing rather than
wedging the whole system, and the remaining blame could then be fairly
laid on amd_iommu_def_domain_type(), but that is not the case and cannot
be claimed to be.

Per our API expectations, a driver's def_domain_type can go one of three
ways; do nothing, request IOMMU_DOMAIN_DMA, or request
IOMMU_DOMAIN_IDENTITY. For the first two it's clear that untrusted
forcing IOMMU_DOMAIN_DMA without even asking is perfectly fine. In the
third case, we still definitely want untrusted to take precedence, since
our policy is that the device not having access to things it shouldn't
is more important that it having to access things it should. Sure, in
the almost-impossible case that we did have an external device which
genuinely couldn't handle translation, then the user may allow a driver
to bind, which ends up not working and causing IOMMU faults, however
that would be equally true of any kind of blocking notion we could
implement right now, so there's no appreciable loss of functionality.
However, the overwhelming fact is that the few devices genuinely
requiring IDENTITY overrides are on-chip integrated things, any of which
suddenly turning up on external ports would be highly suspect, so that's
actually a big argument in *favour* of not believing def_domain_type
either in such cases.

In fact the more I think about it, I could swear I've had this
discussion and worked through this reasoning before, possibly back when
we first introduced it?

> When the AMD driver is
> completed its def_domain_type should be removed entirely.
>
> Since actual PASID AMD attach isn't implemented yet we could just
> remove that check from def_domain_type as an RC fix. Vasant can sort
> it out properly by disabling pasid support on untrusted devices until
> the DTE logic is fully completed.
>
>> In general, restoring the old behaviour of forcing translation will not
>> make that device's operation any more broken than leaving it potentially
>> blocked or subject to the rest of a group's translations would, nor will
>> it be any less safe than leaving it potentially bypassed or subject to
>> the rest of a group's translations would, so do that, and let eGPUs work
>> again.
>
> Well, this is true, since we can't handle the probe error it doesn't
> matter what we do.

OK, so after all that you do in fact agree? In that case, why are we
still mucking about proposing hacks on top of hacks in the AMD driver
rather than just fixing the regression sensibly?

Thanks,
Robin.

2024-04-24 13:05:13

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] iommu: Fix def_domain_type interaction with untrusted devices

On Tue, Apr 23, 2024 at 12:26:41PM +0100, Robin Murphy wrote:
> On 16/04/2024 4:29 pm, Jason Gunthorpe wrote:
> > On Tue, Apr 16, 2024 at 02:00:43PM +0100, Robin Murphy wrote:
> > > Previously, an untrusted device forcing IOMMU_DOMAIN_DMA always took
> > > precedence over whatever a driver's def_domain_type may have wanted to
> > > say. This was intentionally handled in core code since 3 years prior,
> > > to avoid drivers poking at the details of what is essentially a policy
> > > between the PCI core and the IOMMU core. Now, though, we go to the
> > > length of evaluating both constraints to check for any conflict, and if
> > > so throw our toys out of the pram and refuse to handle the device at
> > > all. Regardless of any intent, in practice this leaves the device, and
> > > potentially the rest of its group or even the whole IOMMU, in a largely
> > > undetermined state, which at worst may render the whole system
> > > unusable.
> >
> > For systems supporting untrusted device security the translation must
> > be BLOCKED at this point.
>
> So why is that not enforced then?

I'm not sure what you mean? At this point the core code has no idea
what translation the iommu driver setup when it booted the HW..

I mean if a iommu driver is going to support untrusted then at no
point can it be doing IDENTITY while the untrusted link is up.

> Simply hoping that calling ops->release_device() or returning an
> error from iommu_device_register() might do the right thing is no
> guarantee.

The systems that support untrusted must put things back to BLOCKED in
their release_device() to be ready for the next hotplug of an
untrusted device.

> Furthermore I'm pretty sure we're still letting an
> untrusted device be hotplugged into an existing group without any
> checks at all.

Yes, combining trusted and untrusted in the same group, regardless of
domain type, is fundamentally wrong and we don't check it.

> > > Unfortunately it turns out that this is a realistic situation to run
> > > into by connecting a PASID-capable device (e.g. a GPU) to an AMD-based
> > > laptop via a Thunderbolt expansion box, since the AMD IOMMU driver needs
> > > an identity default domain for PASIDs to be usable, and thus sets a
> > > def_domain_type override based on PASID capability.
> >
> > The majority of places implementing def_domain_type are using it as a
> > statement of HW capability that should not be ignored by the core code:
> >
> > - DART
> > * system page size is too small, can't map IOPTEs, force identity
>
> Not a hardware limitation at all, it supports paging just fine, and
> iommu-dma *could* in principle work with larger pages with a bit of effort
> (some proof-of-concept patches were posted some time ago).

I was not clear, I mean a HW/SW/Driver/system issue. The net result is
that PAGING or IDENTITY does not work in the Linux kernel for a whole
bunch of good and bad reasons.

> > * iommu does not support IDENTITY at all, force paging

Like this one is a good reason. No HW support. It is checked during
attach domain and fails it.

> > - tegra: Device quirks mean paging and DMA API doesn't work
>
> Hmm, I don't recall any specific device concerns.

Thierry told me once, it is another ugly GPU driver hack IIRC. It is a
bad reason, some hacky GPU driver issue and is overkill since it
should of_match like qcom.

> > - amd: The driver can't support PAGING when in SNP mode
> > - SMMU: The driver can't support paging when in legacy binding mode and
> > paging domain allocation fails as well
>
> Quite obviously nothing to do with hardware.

But attach_domain still fails. From a core perspective PAGING doesn't
work.

> OK, so after all that you do in fact agree? In that case, why are we still
> mucking about proposing hacks on top of hacks in the AMD driver rather than
> just fixing the regression sensibly?

It is because your proposal is regressing the meaning of
def_domain_type back to a policy knob when I've spent a bunch of work
emptying out def_domain type implementations to get it into a
capability report.

def_domain_type is now about *capability*. Does the
HW/SW/Driver/system support PAGING/IDENTITY or not.

Meaning if def_domain_type says it is not supported then the core code
should not use it. This is how everything was working until AMD
changed their driver to lie about what their attach_domain would
accept.

I do not want to see def_domain_type regress back to being confused
where some drivers are policy advice and some drivers are capability!

AMD should hack their driver for the rc fix and then go and fix it
properly to remove the PASID logic entirely from def_domain_type. I
will also point again out that in v6.9-rc AMD doesn't even support
PASID yet so this abuse of def_domain_type isn't even needed. :(

The core code should contiue to treat def_domain_type as capability.

This has obviously become confusing. I think we should rename
def_domain_type to have a clearer name and clearer purpose. Call it
supported_domain_types() returning an enum of:

SUPPORTS_ALL
SUPPORTS_IDENTITY
SUPPORTS_PAGING
SUPPORTS_IDENTITY_PASID

Which is much more clearly about some full system support, not a way
for drivers to inject arbitrary policy.

Jason

2024-04-24 14:48:55

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] iommu: Fix def_domain_type interaction with untrusted devices

On Wed, Apr 24, 2024 at 10:18:00PM +0800, Baolu Lu wrote:

> For example, the intel iommu driver allows users to opt-in graphic in
> passthrough mode, in that case def_domain_type will return
> IOMMU_DOMAIN_IDENTITY no matter the device is trusted or not.
>
> if ((iommu_identity_mapping & IDENTMAP_GFX) && IS_GFX_DEVICE(pdev))
> return IOMMU_DOMAIN_IDENTITY;
>
> this potentially creates same conflict as the amd driver.

These performance policy choices should be done in the core code and
they should interact correctly with other policy knobs like untrusted.

If Intel Graphics has some performance reason to prefer IDENTITY then
it should work the same no matter the IOMMU it is connected to. I
think just because the GPU is co-packaged with the IOMMU isn't a good
reason to organize the software like this.

If having a policy of a performance boost to some devices is
legitimate then I guess we'd need more levels on the command line:
fast all IDENTITY
fast-secure all DMA expect IDENTIY for special devices
mostly-secure all DMA but unmapping is not strict
secure all DMA and strict unmapping

How exactly you decide when the performance reason justfies IDENTITY,
I don't know.. Would mlx5 800G NICs that can overwhelm most IOMMUs
also go in that bucket too?

But yes, I'm quite adament that drivers should not be using
def_domain_type as some kind of performance policy thing.

Jason

2024-04-24 16:32:06

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH] iommu: Fix def_domain_type interaction with untrusted devices

On 2024/4/24 21:04, Jason Gunthorpe wrote:
>> Furthermore I'm pretty sure we're still letting an
>> untrusted device be hotplugged into an existing group without any
>> checks at all.
> Yes, combining trusted and untrusted in the same group, regardless of
> domain type, is fundamentally wrong and we don't check it.

I suppose this should be fixed in the PCI layer. The current algorithm
is setting external_facing to the pci port which has been marked as an
external facing port.

static void pci_acpi_set_external_facing(struct pci_dev *dev)
{
u8 val;

if (pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT)
return;
if (device_property_read_u8(&dev->dev, "ExternalFacingPort", &val))
return;

/*
* These root ports expose PCIe (including DMA) outside of the
* system. Everything downstream from them is external.
*/
if (val)
dev->external_facing = 1;
}

Then, all devices connected to this port are marked as untrusted.

static void set_pcie_untrusted(struct pci_dev *dev)
{
struct pci_dev *parent;

/*
* If the upstream bridge is untrusted we treat this device
* untrusted as well.
*/
parent = pci_upstream_bridge(dev);
if (parent && (parent->untrusted || parent->external_facing))
dev->untrusted = true;
}

The above algorithms don't consider the ACS. Hence, an untrusted device
could possibly gain P2P access to other devices that are treated as
trusted. This appears to be already broken.

Thus, the impact has already beyond iommu group if I didn't overlook
anything.

Best regards,
baolu

2024-04-24 18:22:21

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH] iommu: Fix def_domain_type interaction with untrusted devices

On 2024/4/24 21:04, Jason Gunthorpe wrote:
>> OK, so after all that you do in fact agree? In that case, why are we still
>> mucking about proposing hacks on top of hacks in the AMD driver rather than
>> just fixing the regression sensibly?
> It is because your proposal is regressing the meaning of
> def_domain_type back to a policy knob when I've spent a bunch of work
> emptying out def_domain type implementations to get it into a
> capability report.
>
> def_domain_type is now about*capability*. Does the
> HW/SW/Driver/system support PAGING/IDENTITY or not.
>
> Meaning if def_domain_type says it is not supported then the core code
> should not use it. This is how everything was working until AMD
> changed their driver to lie about what their attach_domain would
> accept.
>
> I do not want to see def_domain_type regress back to being confused
> where some drivers are policy advice and some drivers are capability!
>
> AMD should hack their driver for the rc fix and then go and fix it
> properly to remove the PASID logic entirely from def_domain_type. I
> will also point again out that in v6.9-rc AMD doesn't even support
> PASID yet so this abuse of def_domain_type isn't even needed. ????
>
> The core code should contiue to treat def_domain_type as capability.

I agree with this. Ideally there should be some mechanism to disallow
any device driver to bind to the device if there's a conflict between
the core policy and iommu hw capability because iommu-dma functionality
is already compromised.

If we all agree with the statement that "the core should treat
def_domain_type as capability", the intel iommu driver needs some
enhancement as well.

For example, the intel iommu driver allows users to opt-in graphic in
passthrough mode, in that case def_domain_type will return
IOMMU_DOMAIN_IDENTITY no matter the device is trusted or not.

if ((iommu_identity_mapping & IDENTMAP_GFX) && IS_GFX_DEVICE(pdev))
return IOMMU_DOMAIN_IDENTITY;

this potentially creates same conflict as the amd driver.

Best regards,
baolu

2024-04-25 01:43:38

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH] iommu: Fix def_domain_type interaction with untrusted devices

On 4/24/24 10:37 PM, Jason Gunthorpe wrote:
> On Wed, Apr 24, 2024 at 10:18:00PM +0800, Baolu Lu wrote:
>
>> For example, the intel iommu driver allows users to opt-in graphic in
>> passthrough mode, in that case def_domain_type will return
>> IOMMU_DOMAIN_IDENTITY no matter the device is trusted or not.
>>
>> if ((iommu_identity_mapping & IDENTMAP_GFX) && IS_GFX_DEVICE(pdev))
>> return IOMMU_DOMAIN_IDENTITY;
>>
>> this potentially creates same conflict as the amd driver.
> These performance policy choices should be done in the core code and
> they should interact correctly with other policy knobs like untrusted.
>
> If Intel Graphics has some performance reason to prefer IDENTITY then
> it should work the same no matter the IOMMU it is connected to. I
> think just because the GPU is co-packaged with the IOMMU isn't a good
> reason to organize the software like this.
>
> If having a policy of a performance boost to some devices is
> legitimate then I guess we'd need more levels on the command line:
> fast all IDENTITY
> fast-secure all DMA expect IDENTIY for special devices
> mostly-secure all DMA but unmapping is not strict
> secure all DMA and strict unmapping
>
> How exactly you decide when the performance reason justfies IDENTITY,
> I don't know.. Would mlx5 800G NICs that can overwhelm most IOMMUs
> also go in that bucket too?
>
> But yes, I'm quite adament that drivers should not be using
> def_domain_type as some kind of performance policy thing.

Yes. I will follow up to address this issue in the intel driver.

Best regards,
baolu