2024-01-18 10:13:20

by Nikhil V

[permalink] [raw]
Subject: [PATCH 1/1] iommu: Avoid races around default domain allocations

From: Charan Teja Kalla <[email protected]>

This fix is applicable for 6.1 kernel. In latest kernels, this race
issue is fixed by the patch series [1] and [2]. This fix can be taken
as alternative instead of backporting the series of patches as these
seems too intrusive to be picked for stable branches.
[1] https://lore.kernel.org/all/[email protected]/
[2] https://lore.kernel.org/all/[email protected]/

A race condition is observed when arm_smmu_device_probe and
modprobe of client devices happens in parallel. This results
in the allocation of a new default domain for the iommu group
even though it was previously allocated and the respective iova
domain(iovad) was initialized. However, for this newly allocated
default domain, iovad will not be initialized. As a result, for
devices requesting dma allocations, this uninitialized iovad will
be used, thereby causing NULL pointer dereference issue.

Flow:
- During arm_smmu_device_probe, bus_iommu_probe() will be called
as part of iommu_device_register(). This results in the device probe,
__iommu_probe_device().

- When the modprobe of the client device happens in parallel, it
sets up the DMA configuration for the device using of_dma_configure_id(),
which inturn calls iommu_probe_device(). Later, default domain is
allocated and attached using iommu_alloc_default_domain() and
__iommu_attach_device() respectively. It then ends up initializing a
mapping domain(IOVA domain) and rcaches for the device via
arch_setup_dma_ops()->iommu_setup_dma_ops().

- Now, in the bus_iommu_probe() path, it again tries to allocate
a default domain via probe_alloc_default_domain(). This results in
allocating a new default domain(along with IOVA domain) via
__iommu_domain_alloc(). However, this newly allocated IOVA domain
will not be initialized.

- Now, when the same client device tries dma allocations via
iommu_dma_alloc(), it ends up accessing the rcaches of the newly
allocated IOVA domain, which is not initialized. This results
into NULL pointer dereferencing.

Fix this issue by adding a check in iommu_group_alloc_default_domain()
to see if the iommu_group already has a default domain allocated
and initialized.

Signed-off-by: Charan Teja Kalla <[email protected]>
Co-developed-by: Nikhil V <[email protected]>
Signed-off-by: Nikhil V <[email protected]>
---
drivers/iommu/iommu.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 8b3897239477..99f8cd5af497 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1594,6 +1594,9 @@ static int iommu_group_alloc_default_domain(struct bus_type *bus,
{
struct iommu_domain *dom;

+ if (group->default_domain)
+ return 0;
+
dom = __iommu_domain_alloc(bus, type);
if (!dom && type != IOMMU_DOMAIN_DMA) {
dom = __iommu_domain_alloc(bus, IOMMU_DOMAIN_DMA);
--
2.17.1



2024-01-29 08:00:02

by Nikhil V

[permalink] [raw]
Subject: Re: [PATCH 1/1] iommu: Avoid races around default domain allocations



On 1/18/2024 3:41 PM, Nikhil V wrote:
> From: Charan Teja Kalla <[email protected]>
>
> This fix is applicable for 6.1 kernel. In latest kernels, this race
> issue is fixed by the patch series [1] and [2]. This fix can be taken
> as alternative instead of backporting the series of patches as these
> seems too intrusive to be picked for stable branches.
> [1] https://lore.kernel.org/all/[email protected]/
> [2] https://lore.kernel.org/all/[email protected]/
>
> A race condition is observed when arm_smmu_device_probe and
> modprobe of client devices happens in parallel. This results
> in the allocation of a new default domain for the iommu group
> even though it was previously allocated and the respective iova
> domain(iovad) was initialized. However, for this newly allocated
> default domain, iovad will not be initialized. As a result, for
> devices requesting dma allocations, this uninitialized iovad will
> be used, thereby causing NULL pointer dereference issue.
>
> Flow:
> - During arm_smmu_device_probe, bus_iommu_probe() will be called
> as part of iommu_device_register(). This results in the device probe,
> __iommu_probe_device().
>
> - When the modprobe of the client device happens in parallel, it
> sets up the DMA configuration for the device using of_dma_configure_id(),
> which inturn calls iommu_probe_device(). Later, default domain is
> allocated and attached using iommu_alloc_default_domain() and
> __iommu_attach_device() respectively. It then ends up initializing a
> mapping domain(IOVA domain) and rcaches for the device via
> arch_setup_dma_ops()->iommu_setup_dma_ops().
>
> - Now, in the bus_iommu_probe() path, it again tries to allocate
> a default domain via probe_alloc_default_domain(). This results in
> allocating a new default domain(along with IOVA domain) via
> __iommu_domain_alloc(). However, this newly allocated IOVA domain
> will not be initialized.
>
> - Now, when the same client device tries dma allocations via
> iommu_dma_alloc(), it ends up accessing the rcaches of the newly
> allocated IOVA domain, which is not initialized. This results
> into NULL pointer dereferencing.
>
> Fix this issue by adding a check in iommu_group_alloc_default_domain()
> to see if the iommu_group already has a default domain allocated
> and initialized.
>
> Signed-off-by: Charan Teja Kalla <[email protected]>
> Co-developed-by: Nikhil V <[email protected]>
> Signed-off-by: Nikhil V <[email protected]>
> ---
> drivers/iommu/iommu.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 8b3897239477..99f8cd5af497 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1594,6 +1594,9 @@ static int iommu_group_alloc_default_domain(struct bus_type *bus,
> {
> struct iommu_domain *dom;
>
> + if (group->default_domain)
> + return 0;
> +
> dom = __iommu_domain_alloc(bus, type);
> if (!dom && type != IOMMU_DOMAIN_DMA) {
> dom = __iommu_domain_alloc(bus, IOMMU_DOMAIN_DMA);

Hi,

Gentle ping to have your valuable feedback. This fix is helping us
downstream without which we see a bunch of kernel crashes.

Thanks
Nikhil V

2024-02-01 16:25:52

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 1/1] iommu: Avoid races around default domain allocations

On Mon, Jan 29, 2024 at 01:29:12PM +0530, Nikhil V wrote:

> Gentle ping to have your valuable feedback. This fix is helping us
> downstream without which we see a bunch of kernel crashes.

What are you expecting here? This was fixed in Linus's tree some time
ago now

Are you asking for the stable team to put something weird in 6.1? I
don't think they generally do that?

Jason

2024-02-07 14:28:21

by Nikhil V

[permalink] [raw]
Subject: Re: [PATCH 1/1] iommu: Avoid races around default domain allocations



On 2/1/2024 9:53 PM, Jason Gunthorpe wrote:
> On Mon, Jan 29, 2024 at 01:29:12PM +0530, Nikhil V wrote:
>
>> Gentle ping to have your valuable feedback. This fix is helping us
>> downstream without which we see a bunch of kernel crashes.
>
> What are you expecting here? This was fixed in Linus's tree some time
> ago now
>
> Are you asking for the stable team to put something weird in 6.1? I
> don't think they generally do that?
>
> Jason


Hi @Jason,

Considering that the issue is reported on 6.1, which is an __LTS
kernel__, any suggestion to fix this issue cleanly would help us a lot.
Right thing here would have been propagating the changes from 6.6 (like
for any stability issue), but considering the intrusiveness of them, is
it even possible?

Just to be open about reproducibility of the issue, a bunch of them are
reported, both internally and by customers.

Thanks
Nikhil V

2024-02-07 15:03:24

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 1/1] iommu: Avoid races around default domain allocations

On Wed, Feb 07, 2024 at 07:56:25PM +0530, Nikhil V wrote:
>
>
> On 2/1/2024 9:53 PM, Jason Gunthorpe wrote:
> > On Mon, Jan 29, 2024 at 01:29:12PM +0530, Nikhil V wrote:
> >
> > > Gentle ping to have your valuable feedback. This fix is helping us
> > > downstream without which we see a bunch of kernel crashes.
> >
> > What are you expecting here? This was fixed in Linus's tree some time
> > ago now
> >
> > Are you asking for the stable team to put something weird in 6.1? I
> > don't think they generally do that?
> >
> > Jason
>
>
> Hi @Jason,
>
> Considering that the issue is reported on 6.1, which is an __LTS kernel__,
> any suggestion to fix this issue cleanly would help us a lot. Right thing
> here would have been propagating the changes from 6.6 (like for any
> stability issue), but considering the intrusiveness of them, is it even
> possible?
>
> Just to be open about reproducibility of the issue, a bunch of them are
> reported, both internally and by customers.

I think you need to talk to the stable maintainers not the iommu
upstream folks. I don't well know their policy.

Frankly, I'd suggest just proposing the necessary (and tested)
upstream patches to 6.1, however large they are, and see what Greg and
Sasha say. This is the usual working model they have, as I understand
it.

Jason

2024-02-08 00:05:33

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH 1/1] iommu: Avoid races around default domain allocations

On 2024-02-07 2:56 pm, Jason Gunthorpe wrote:
> On Wed, Feb 07, 2024 at 07:56:25PM +0530, Nikhil V wrote:
>>
>>
>> On 2/1/2024 9:53 PM, Jason Gunthorpe wrote:
>>> On Mon, Jan 29, 2024 at 01:29:12PM +0530, Nikhil V wrote:
>>>
>>>> Gentle ping to have your valuable feedback. This fix is helping us
>>>> downstream without which we see a bunch of kernel crashes.
>>>
>>> What are you expecting here? This was fixed in Linus's tree some time
>>> ago now
>>>
>>> Are you asking for the stable team to put something weird in 6.1? I
>>> don't think they generally do that?
>>>
>>> Jason
>>
>>
>> Hi @Jason,
>>
>> Considering that the issue is reported on 6.1, which is an __LTS kernel__,
>> any suggestion to fix this issue cleanly would help us a lot. Right thing
>> here would have been propagating the changes from 6.6 (like for any
>> stability issue), but considering the intrusiveness of them, is it even
>> possible?
>>
>> Just to be open about reproducibility of the issue, a bunch of them are
>> reported, both internally and by customers.
>
> I think you need to talk to the stable maintainers not the iommu
> upstream folks. I don't well know their policy.
>
> Frankly, I'd suggest just proposing the necessary (and tested)
> upstream patches to 6.1, however large they are, and see what Greg and
> Sasha say. This is the usual working model they have, as I understand
> it.

To be blunt, hell no. Stable is far enough from its namesake already;
the ongoing bordering-on-ridiculous brokenness of your mainline changes
where each "fix" keeps affecting something else is a massive NAK to
backporting any of it, let alone 43+ patches to achieve a 2-line fix.

Nikhil, if this is truly sufficient to resolve the issues you see
(AFAICS things end up serialised by the group mutex so probably should
be robust enough), then I'm OK with you proposing it as a dedicated
stable-only fix, as an "equivalent" patch per Option 3 of
stable-kernel-rules.rst - I reckon your commit message is already pretty
good with regards to the final point there, but I'll be happy to help
argue the case if necessary. Just one point - is it genuinely not
relevant to 5.15 and earlier or is it just the case that 6.1 is the
oldest thing you're actively testing? (Apologies, I've already forgotten
where things were that far back).

That said, I also don't think there would be any harm in applying this
to mainline as a belt-and-braces thing either, if it helps makes a
backport easier and Joerg doesn't mind. There's already a bunch of stuff
I'll be cleaning up once the underlying issue behind all of this is
properly fixed, so adding a couple more lines of code to that list is no
big deal as far as I'm concerned.

Thanks,
Robin.

2024-02-08 01:13:40

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 1/1] iommu: Avoid races around default domain allocations

On Thu, Feb 08, 2024 at 12:04:44AM +0000, Robin Murphy wrote:
> > Frankly, I'd suggest just proposing the necessary (and tested)
> > upstream patches to 6.1, however large they are, and see what Greg and
> > Sasha say. This is the usual working model they have, as I understand
> > it.
>
> To be blunt, hell no. Stable is far enough from its namesake already; the
> ongoing bordering-on-ridiculous brokenness of your mainline changes where

What on earth are you even talking about? POWER?

> That said, I also don't think there would be any harm in applying this to
> mainline as a belt-and-braces thing either,

Really?

Now that you've made me look, this patch breaks the
iommu_group_store_type() flow both on latest and on v6.1 from what I
can see.

On v6.1:

iommu_change_dev_def_domain():

prev_dom = group->default_domain;
if (!prev_dom) {
ret = -EINVAL;
goto out;
}
[..]
/* Sets group->default_domain to the newly allocated domain */
ret = iommu_group_alloc_default_domain(dev->bus, group, type);
if (ret)
goto out;

But this patch changes iommu_group_alloc_default_domain() to succeed
always without doing anythiing.

So this patch needs some fixing.

There is a very good reason the stable people don't like boutique patches..

Jason

2024-02-08 01:38:05

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH 1/1] iommu: Avoid races around default domain allocations

On 2024-02-08 1:13 am, Jason Gunthorpe wrote:
> On Thu, Feb 08, 2024 at 12:04:44AM +0000, Robin Murphy wrote:
>>> Frankly, I'd suggest just proposing the necessary (and tested)
>>> upstream patches to 6.1, however large they are, and see what Greg and
>>> Sasha say. This is the usual working model they have, as I understand
>>> it.
>>
>> To be blunt, hell no. Stable is far enough from its namesake already; the
>> ongoing bordering-on-ridiculous brokenness of your mainline changes where
>
> What on earth are you even talking about? POWER?

I mean you're literally getting bug reports for your fix for your fix
for your grand idea, so what should we figure, that reality not aligning
with your expectations is all reality's fault?

>> That said, I also don't think there would be any harm in applying this to
>> mainline as a belt-and-braces thing either,
>
> Really?

"at 12:04:44AM +0000, Robin Murphy wrote:"

It's late, I should have gone to bed hours ago, so I apologise for any
lack of clarity; that was very much meant to be an implication of
agreement with the overall approach, not the exact patch as is, which if
you read the rest of my response you will see I still had questions
about and did not formally ack or review.

> Now that you've made me look, this patch breaks the
> iommu_group_store_type() flow both on latest and on v6.1 from what I
> can see.
>
> On v6.1:
>
> iommu_change_dev_def_domain():
>
> prev_dom = group->default_domain;
> if (!prev_dom) {
> ret = -EINVAL;
> goto out;
> }
> [..]
> /* Sets group->default_domain to the newly allocated domain */
> ret = iommu_group_alloc_default_domain(dev->bus, group, type);
> if (ret)
> goto out;
>
> But this patch changes iommu_group_alloc_default_domain() to succeed
> always without doing anythiing.
>
> So this patch needs some fixing.

Hurrah! Please apply that kind of rigour to your own patches also.

Thanks,
Robin.

[ you get two responses this week since I admit I ran out of patience
and motivation to finish last week's on time ]

2024-02-08 15:18:48

by Nikhil V

[permalink] [raw]
Subject: Re: [PATCH 1/1] iommu: Avoid races around default domain allocations



On 2/8/2024 7:07 AM, Robin Murphy wrote:
> On 2024-02-08 1:13 am, Jason Gunthorpe wrote:
>> On Thu, Feb 08, 2024 at 12:04:44AM +0000, Robin Murphy wrote:
>>>> Frankly, I'd suggest just proposing the necessary (and tested)
>>>> upstream patches to 6.1, however large they are, and see what Greg and
>>>> Sasha say. This is the usual working model they have, as I understand
>>>> it.
>>>
>>> To be blunt, hell no. Stable is far enough from its namesake already;
>>> the
>>> ongoing bordering-on-ridiculous brokenness of your mainline changes
>>> where
>>
>> What on earth are you even talking about? POWER?
>
> I mean you're literally getting bug reports for your fix for your fix
> for your grand idea, so what should we figure, that reality not aligning
> with your expectations is all reality's fault?
>
>>> That said, I also don't think there would be any harm in applying
>>> this to
>>> mainline as a belt-and-braces thing either,
>>
>> Really?
>
> "at 12:04:44AM +0000, Robin Murphy wrote:"
>
> It's late, I should have gone to bed hours ago, so I apologise for any
> lack of clarity; that was very much meant to be an implication of
> agreement with the overall approach, not the exact patch as is, which if
> you read the rest of my response you will see I still had questions
> about and did not formally ack or review.
>
>> Now that you've made me look, this patch breaks the
>> iommu_group_store_type() flow both on latest and on v6.1 from what I
>> can see.
>>
>> On v6.1:
>>
>> iommu_change_dev_def_domain():
>>          prev_dom = group->default_domain;
>>          if (!prev_dom) {
>>                  ret = -EINVAL;
>>                  goto out;
>>          }
>> [..]
>>          /* Sets group->default_domain to the newly allocated domain */
>>          ret = iommu_group_alloc_default_domain(dev->bus, group, type);
>>          if (ret)
>>                  goto out;
>>
>> But this patch changes iommu_group_alloc_default_domain() to succeed
>> always without doing anythiing.
>>
>> So this patch needs some fixing.
>
> Hurrah! Please apply that kind of rigour to your own patches also.
>
> Thanks,
> Robin.
>
> [ you get two responses this week since I admit I ran out of patience
> and motivation to finish last week's on time ]


Hi Robin,

Bunch of issues were reported on 6.1 because that is where we do master
testing. The change proposed was fixing the reported issues.

On your query regarding 5.15, from code perspective, looks like this fix
is applicable on 5.15 as well. However, as mentioned, we have done
master testings on 6.1 codebase and hence could see the issues there only.

Also, as Jason pointed out, this fix would not allow changing the
default domain for an iommu group. We didn't have any usecase for
changing default domain, hence couldn't see any issues with the fix
proposed. So, we would have to rework the patch for it to be upstream
ready. Any suggestions for this would be really helpful and
appreciated. Thanks!!!

Thanks
Nikhil V


2024-02-08 16:09:44

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 1/1] iommu: Avoid races around default domain allocations

On Thu, Feb 08, 2024 at 08:47:42PM +0530, Nikhil V wrote:

> Also, as Jason pointed out, this fix would not allow changing the
> default domain for an iommu group. We didn't have any usecase for
> changing default domain, hence couldn't see any issues with the fix
> proposed. So, we would have to rework the patch for it to be
> upstream ready. Any suggestions for this would be really helpful and
> appreciated. Thanks!!!

I think you can lift and duplicate the check into the callers and that
way restrict it only to the probe paths.

Jason