2024-03-14 07:59:15

by Tyler Hicks

[permalink] [raw]
Subject: Why is the ARM SMMU v1/v2 put into bypass mode on kexec?

Given that drivers are only optionally asked to implement the .shutdown
hook, which is required to properly quiesce devices before a kexec, why
is it that we put the ARM SMMU v1/v2 into bypass mode in the arm-smmu
driver's own .shutdown hook?

arm_smmu_device_shutdown() -> set SMMU_sCR0.CLIENTPD bit to 1

Driver authors often forget to even implement a .shutdown hook, which
results in some hard-to-debug memory corruption issues in the kexec'ed
target kernel due to pending DMA operations happening on untranslated
addresses. Why not leave the SMMU in translate mode but clear the stream
mapping table (or maybe even call arm_smmu_device_reset()) in the SMMU's
shutdown hook to prevent the memory corruption from happening in the
first place?

Fully acknowledging that the proper fix is to quiesce the devices, I
feel like resetting the SMMU and leaving it in translate mode across
kexec would be more consistent with the intent behind v5.2 commit
954a03be033c ("iommu/arm-smmu: Break insecure users by disabling bypass
by default"). The incoming transactions of devices, that weren't
properly quiesced during a kexec, would be blocked until their drivers
have a chance to reinitialize the devices in the new kernel.

I appreciate any help understanding why bypass mode is utilized here as
I'm sure there are nuances that I haven't considered. Thank you!

Tyler


2024-03-14 19:07:45

by Tyler Hicks

[permalink] [raw]
Subject: Re: Why is the ARM SMMU v1/v2 put into bypass mode on kexec?

On 2024-03-14 09:55:46, Tyler Hicks wrote:
> Given that drivers are only optionally asked to implement the .shutdown
> hook, which is required to properly quiesce devices before a kexec, why
> is it that we put the ARM SMMU v1/v2 into bypass mode in the arm-smmu
> driver's own .shutdown hook?
>
> arm_smmu_device_shutdown() -> set SMMU_sCR0.CLIENTPD bit to 1
>
> Driver authors often forget to even implement a .shutdown hook, which
> results in some hard-to-debug memory corruption issues in the kexec'ed
> target kernel due to pending DMA operations happening on untranslated
> addresses. Why not leave the SMMU in translate mode but clear the stream
> mapping table (or maybe even call arm_smmu_device_reset()) in the SMMU's
> .shutdown hook to prevent the memory corruption from happening in the
> first place?
>
> Fully acknowledging that the proper fix is to quiesce the devices, I
> feel like resetting the SMMU and leaving it in translate mode across
> kexec would be more consistent with the intent behind v5.2 commit
> 954a03be033c ("iommu/arm-smmu: Break insecure users by disabling bypass
> by default"). The incoming transactions of devices, that weren't
> properly quiesced during a kexec, would be blocked until their drivers
> have a chance to reinitialize the devices in the new kernel.
>
> I appreciate any help understanding why bypass mode is utilized here as
> I'm sure there are nuances that I haven't considered. Thank you!

I now see that Will has previously mentioned that he'd be open to such a
change:

One thing I would be in favour of is changing the ->shutdown() code to
honour disable_bypass=1 so that we put the SMMU in an aborting state
instead of passthrough. Would that help at all? It would at least
avoid the memory corruption on missing shutdown callback.

- https://lore.kernel.org/linux-arm-kernel/20200608113852.GA3108@willie-the-truck/

Robin mentions the need to support kexec into a non-SMMU-aware OS. I
hadn't considered that bit of complexity:

... consider if the first kernel *did* leave it enabled with whatever
left-over translations in place, and kexec'ed into another OS that
wasn't SMMU-aware...

- https://lore.kernel.org/linux-arm-kernel/[email protected]/

Now that we're 3-4 years removed from that conversation, has anything
changed? Will, is there anything we'd need to watch out for if we were
to prototype this sort of change? For example, would it be wise to
disable fault interrupts when putting the SMMU in an aborting state
before kexec'ing?

Thanks again!

Tyler

2024-03-19 12:58:07

by Robin Murphy

[permalink] [raw]
Subject: Re: Why is the ARM SMMU v1/v2 put into bypass mode on kexec?

On 2024-03-14 7:06 pm, Tyler Hicks wrote:
> On 2024-03-14 09:55:46, Tyler Hicks wrote:
>> Given that drivers are only optionally asked to implement the .shutdown
>> hook, which is required to properly quiesce devices before a kexec, why
>> is it that we put the ARM SMMU v1/v2 into bypass mode in the arm-smmu
>> driver's own .shutdown hook?
>>
>> arm_smmu_device_shutdown() -> set SMMU_sCR0.CLIENTPD bit to 1
>>
>> Driver authors often forget to even implement a .shutdown hook, which
>> results in some hard-to-debug memory corruption issues in the kexec'ed
>> target kernel due to pending DMA operations happening on untranslated
>> addresses. Why not leave the SMMU in translate mode but clear the stream
>> mapping table (or maybe even call arm_smmu_device_reset()) in the SMMU's
>> .shutdown hook to prevent the memory corruption from happening in the
>> first place?
>>
>> Fully acknowledging that the proper fix is to quiesce the devices, I
>> feel like resetting the SMMU and leaving it in translate mode across
>> kexec would be more consistent with the intent behind v5.2 commit
>> 954a03be033c ("iommu/arm-smmu: Break insecure users by disabling bypass
>> by default"). The incoming transactions of devices, that weren't
>> properly quiesced during a kexec, would be blocked until their drivers
>> have a chance to reinitialize the devices in the new kernel.
>>
>> I appreciate any help understanding why bypass mode is utilized here as
>> I'm sure there are nuances that I haven't considered. Thank you!
>
> I now see that Will has previously mentioned that he'd be open to such a
> change:
>
> One thing I would be in favour of is changing the ->shutdown() code to
> honour disable_bypass=1 so that we put the SMMU in an aborting state
> instead of passthrough. Would that help at all? It would at least
> avoid the memory corruption on missing shutdown callback.
>
> - https://lore.kernel.org/linux-arm-kernel/20200608113852.GA3108@willie-the-truck/
>
> Robin mentions the need to support kexec into a non-SMMU-aware OS. I
> hadn't considered that bit of complexity:
>
> ... consider if the first kernel *did* leave it enabled with whatever
> left-over translations in place, and kexec'ed into another OS that
> wasn't SMMU-aware...
>
> - https://lore.kernel.org/linux-arm-kernel/[email protected]/
>
> Now that we're 3-4 years removed from that conversation, has anything
> changed? Will, is there anything we'd need to watch out for if we were
> to prototype this sort of change? For example, would it be wise to
> disable fault interrupts when putting the SMMU in an aborting state
> before kexec'ing?

Fundamentally, we expect the SMMU to be disabled at initial boot, so per
the intent of kexec we put it back in that state. That also seems the
most likely expectation of anything we could kexec into, given that it
is the natural state of an untouched SMMU after a hard reset, and thus
comes out as the least-worst option.

Beyond properly quiescing and resetting the system back to a boot-time
state, the outgoing kernel in a kexec can only really do things which
affect itself. Sure, we *could* configure the SMMU to block all traffic
and disable the interrupt to avoid getting stuck in a storm of faults on
the way out, but what does that mean for the incoming kexec payload?
That it can have the pleasure of discovering the SMMU, innocently
enabling the interrupt and getting stuck in an unexpected storm of
faults. Or perhaps just resetting the SMMU into a disabled state and
thus still unwittingly allowing its memory to be corrupted by the
previous kernel not supporting kexec properly.

So no, I would not say that anything has changed here, at least not in
favour of this idea. If anything, it's become even more impractical now
that we have RMRs to properly support cases like an EFI framebuffer
where neither the outgoing nor incoming kernels necessarily have the
ability to quiesce the underlying DMA or recover it from faults, thus we
have to be even more careful.

Thanks,
Robin.

2024-03-19 15:48:10

by Will Deacon

[permalink] [raw]
Subject: Re: Why is the ARM SMMU v1/v2 put into bypass mode on kexec?

On Tue, Mar 19, 2024 at 12:57:52PM +0000, Robin Murphy wrote:
> On 2024-03-14 7:06 pm, Tyler Hicks wrote:
> > On 2024-03-14 09:55:46, Tyler Hicks wrote:
> > > Given that drivers are only optionally asked to implement the .shutdown
> > > hook, which is required to properly quiesce devices before a kexec, why
> > > is it that we put the ARM SMMU v1/v2 into bypass mode in the arm-smmu
> > > driver's own .shutdown hook?
> > >
> > > arm_smmu_device_shutdown() -> set SMMU_sCR0.CLIENTPD bit to 1
> > >
> > > Driver authors often forget to even implement a .shutdown hook, which
> > > results in some hard-to-debug memory corruption issues in the kexec'ed
> > > target kernel due to pending DMA operations happening on untranslated
> > > addresses. Why not leave the SMMU in translate mode but clear the stream
> > > mapping table (or maybe even call arm_smmu_device_reset()) in the SMMU's
> > > .shutdown hook to prevent the memory corruption from happening in the
> > > first place?
> > >
> > > Fully acknowledging that the proper fix is to quiesce the devices, I
> > > feel like resetting the SMMU and leaving it in translate mode across
> > > kexec would be more consistent with the intent behind v5.2 commit
> > > 954a03be033c ("iommu/arm-smmu: Break insecure users by disabling bypass
> > > by default"). The incoming transactions of devices, that weren't
> > > properly quiesced during a kexec, would be blocked until their drivers
> > > have a chance to reinitialize the devices in the new kernel.
> > >
> > > I appreciate any help understanding why bypass mode is utilized here as
> > > I'm sure there are nuances that I haven't considered. Thank you!
> >
> > I now see that Will has previously mentioned that he'd be open to such a
> > change:
> >
> > One thing I would be in favour of is changing the ->shutdown() code to
> > honour disable_bypass=1 so that we put the SMMU in an aborting state
> > instead of passthrough. Would that help at all? It would at least
> > avoid the memory corruption on missing shutdown callback.
> >
> > - https://lore.kernel.org/linux-arm-kernel/20200608113852.GA3108@willie-the-truck/
> >
> > Robin mentions the need to support kexec into a non-SMMU-aware OS. I
> > hadn't considered that bit of complexity:
> >
> > ... consider if the first kernel *did* leave it enabled with whatever
> > left-over translations in place, and kexec'ed into another OS that
> > wasn't SMMU-aware...
> >
> > - https://lore.kernel.org/linux-arm-kernel/[email protected]/
> >
> > Now that we're 3-4 years removed from that conversation, has anything
> > changed? Will, is there anything we'd need to watch out for if we were
> > to prototype this sort of change? For example, would it be wise to
> > disable fault interrupts when putting the SMMU in an aborting state
> > before kexec'ing?

I've grown older and wiser in those four years and no longer think that's
a good idea :) Well, older maybe, but the reality is that the code around
the driver has evolved and 'disable_bypass' is even more of a hack now
than it used to be.

> Fundamentally, we expect the SMMU to be disabled at initial boot, so per the
> intent of kexec we put it back in that state. That also seems the most
> likely expectation of anything we could kexec into, given that it is the
> natural state of an untouched SMMU after a hard reset, and thus comes out as
> the least-worst option.

Heh, that sounded too good to be true when I read it so I went and looked at
the code:

SMMUv3: arm_smmu_device_shutdown() -> clears CR0 but doesn't touch GBPA
SMMUv2: arm_smmu_device_shutdown() -> writes CLIENTPD to CR0

So it's a bit of a muddle afaict; SMMUv2 explicitly goes into bypass
whereas SMMUv3 probably does honour disable_bypass=false! Did I miss
something?

As discussed elsewhere, if we remove disable_bypass from SMMUv3, then
we should be able to be consistent here. The question is, what's the
right behaviour?

> Beyond properly quiescing and resetting the system back to a boot-time
> state, the outgoing kernel in a kexec can only really do things which affect
> itself. Sure, we *could* configure the SMMU to block all traffic and disable
> the interrupt to avoid getting stuck in a storm of faults on the way out,
> but what does that mean for the incoming kexec payload? That it can have the
> pleasure of discovering the SMMU, innocently enabling the interrupt and
> getting stuck in an unexpected storm of faults. Or perhaps just resetting
> the SMMU into a disabled state and thus still unwittingly allowing its
> memory to be corrupted by the previous kernel not supporting kexec properly.

Right, it's hard to win if DMA-active devices weren't quiesced properly
by the outgoing kernel. Either the SMMU was left in abort (leading to the
problems you list above) or the SMMU is left in bypass (leading to possible
data corruption). Which is better?

The best solution is obviously to implement those missing ->shutdown()
callbacks.

Will

2024-03-19 17:54:33

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: Why is the ARM SMMU v1/v2 put into bypass mode on kexec?

On Tue, Mar 19, 2024 at 03:47:56PM +0000, Will Deacon wrote:

> Right, it's hard to win if DMA-active devices weren't quiesced properly
> by the outgoing kernel. Either the SMMU was left in abort (leading to the
> problems you list above) or the SMMU is left in bypass (leading to possible
> data corruption). Which is better?

For whatever reason (and I really don't like this design) alot of work
was done on x86 so that device continues to work as-was right up until
the crash kernel does the first DMA operation. Including having the
crash kernel non disruptively inherit and retain the IOMMU
configuration. (eg see translation_pre_enabled() stuff in intel
driver)

I think the idea was that the crash kernel driver will recover control
of the device prior to trying to do DMA. Devices without a driver or
devices that are not operated by the crash kernel just keep going as
they were.

In general practice this is unworkable as some devices can't be
recovered without doing DMA in the first place creating a catch-22.

So now lots of devices use their shutdown handler to quiet the device
before handing over to the crash kernel.

I think this emerged as some 'small work' to try and make crash
kernels functional at all. Implementing every shutdown handler would
be pretty hard, but many (?) devices seem to work OK if the crash
kernel drivers runs for a bit before destroying their DMA setup. We
don't trigger weird platform crashes or anything due to failing DMA
operations either.

Now we have all kinds of infrastructure and deployed crash kernels
that have this assumption baked in. :( It sure would be nice to not
spread this full complexity to ARM.

If the original kernel could signal to the crash kernel that specific
devices are quieted and then the crash kernel could simply ignore
unquieted devices and set the IOMMU to abort them and don't allow any
crash drivers to attach. (or maybe FLR them?) If someone wants a
device to be usuable in the crash kernel then the original kernel
needs to implement the shutdown handler.

Regardless, I think if your goal is to support crash kernels then you
have to do at least a bit of the x86 'keep the iommu unchanged'. The
iommu shutdown should do less like x86 does and the iommu startup
should detect the special case and try to atomic switch to the new STE
table that aborts unquieted devices.

Booting a non-crash OS is a different matter and in that case you
really want every bit of HW put back to a clean "just booted" state,
and arguably it can't work unless the original kernel implements all
the shutdown handlers... I don't know if x86 kexec actually support
this, it looks like it only works on Linux OS and things like the
Linux iommu driver have code to support the crash focused hand over
even in non crash cases.

Jason

2024-03-19 18:17:57

by Robin Murphy

[permalink] [raw]
Subject: Re: Why is the ARM SMMU v1/v2 put into bypass mode on kexec?

On 2024-03-19 3:47 pm, Will Deacon wrote:
> On Tue, Mar 19, 2024 at 12:57:52PM +0000, Robin Murphy wrote:
>> On 2024-03-14 7:06 pm, Tyler Hicks wrote:
>>> On 2024-03-14 09:55:46, Tyler Hicks wrote:
>>>> Given that drivers are only optionally asked to implement the .shutdown
>>>> hook, which is required to properly quiesce devices before a kexec, why
>>>> is it that we put the ARM SMMU v1/v2 into bypass mode in the arm-smmu
>>>> driver's own .shutdown hook?
>>>>
>>>> arm_smmu_device_shutdown() -> set SMMU_sCR0.CLIENTPD bit to 1
>>>>
>>>> Driver authors often forget to even implement a .shutdown hook, which
>>>> results in some hard-to-debug memory corruption issues in the kexec'ed
>>>> target kernel due to pending DMA operations happening on untranslated
>>>> addresses. Why not leave the SMMU in translate mode but clear the stream
>>>> mapping table (or maybe even call arm_smmu_device_reset()) in the SMMU's
>>>> .shutdown hook to prevent the memory corruption from happening in the
>>>> first place?
>>>>
>>>> Fully acknowledging that the proper fix is to quiesce the devices, I
>>>> feel like resetting the SMMU and leaving it in translate mode across
>>>> kexec would be more consistent with the intent behind v5.2 commit
>>>> 954a03be033c ("iommu/arm-smmu: Break insecure users by disabling bypass
>>>> by default"). The incoming transactions of devices, that weren't
>>>> properly quiesced during a kexec, would be blocked until their drivers
>>>> have a chance to reinitialize the devices in the new kernel.
>>>>
>>>> I appreciate any help understanding why bypass mode is utilized here as
>>>> I'm sure there are nuances that I haven't considered. Thank you!
>>>
>>> I now see that Will has previously mentioned that he'd be open to such a
>>> change:
>>>
>>> One thing I would be in favour of is changing the ->shutdown() code to
>>> honour disable_bypass=1 so that we put the SMMU in an aborting state
>>> instead of passthrough. Would that help at all? It would at least
>>> avoid the memory corruption on missing shutdown callback.
>>>
>>> - https://lore.kernel.org/linux-arm-kernel/20200608113852.GA3108@willie-the-truck/
>>>
>>> Robin mentions the need to support kexec into a non-SMMU-aware OS. I
>>> hadn't considered that bit of complexity:
>>>
>>> ... consider if the first kernel *did* leave it enabled with whatever
>>> left-over translations in place, and kexec'ed into another OS that
>>> wasn't SMMU-aware...
>>>
>>> - https://lore.kernel.org/linux-arm-kernel/[email protected]/
>>>
>>> Now that we're 3-4 years removed from that conversation, has anything
>>> changed? Will, is there anything we'd need to watch out for if we were
>>> to prototype this sort of change? For example, would it be wise to
>>> disable fault interrupts when putting the SMMU in an aborting state
>>> before kexec'ing?
>
> I've grown older and wiser in those four years and no longer think that's
> a good idea :) Well, older maybe, but the reality is that the code around
> the driver has evolved and 'disable_bypass' is even more of a hack now
> than it used to be.
>
>> Fundamentally, we expect the SMMU to be disabled at initial boot, so per the
>> intent of kexec we put it back in that state. That also seems the most
>> likely expectation of anything we could kexec into, given that it is the
>> natural state of an untouched SMMU after a hard reset, and thus comes out as
>> the least-worst option.
>
> Heh, that sounded too good to be true when I read it so I went and looked at
> the code:
>
> SMMUv3: arm_smmu_device_shutdown() -> clears CR0 but doesn't touch GBPA
> SMMUv2: arm_smmu_device_shutdown() -> writes CLIENTPD to CR0
>
> So it's a bit of a muddle afaict; SMMUv2 explicitly goes into bypass
> whereas SMMUv3 probably does honour disable_bypass=false! Did I miss
> something?

I think so, namely the utter madness around how and when we do actually
touch GBPA - if we found SMMUEN set at the start of probe, then we set
GBPA to abort before initially clearing SMMUEN; if the DT is broken then
we set GBPA to bypass instead of enabling SMMUEN at the end of probe,
*unless* disable_bypass was set. Thus by the time we get to shutdown,
SMMUEN may or may not already be 0 and GPBA may or may not have been
changed from its initial value to either one of bypass or abort.

> As discussed elsewhere, if we remove disable_bypass from SMMUv3, then
> we should be able to be consistent here. The question is, what's the
> right behaviour?

"Not that", at the very least ;)

In terms of the shutdown behaviour, I think it actually works out as-is.
For the normal case we haven't touched GBPA, so we are truly returning
to the boot-time condition; in the unexpected case where SMMUEN was
already enabled then we'll go into an explicit GPBA abort state, but
that seems a not-unreasonable compromise for not preserving the entire
boot-time Stream Table etc., whose presence kind of implies it wouldn't
have been bypassing everything anyway.

The more I look at the remaining aspect of disable_bypass for
controlling broken-DT behaviour the more I suspect it can't actually be
useful either way, especially not since default domains. I have no
memory of what my original reasoning might have been, so I'm inclined to
just rip that all out and let probe fail. I see no reason these days not
to expect a broken DT to leads to a broken system, especially not now
with DTSchema validation.

Then there's just the kdump warning it suppresses, of which I also have
no idea why it's there either, but apparently that one's on you :P

Cheers,
Robin.

2024-03-19 19:15:05

by Tyler Hicks

[permalink] [raw]
Subject: Re: Why is the ARM SMMU v1/v2 put into bypass mode on kexec?

On 2024-03-19 15:47:56, Will Deacon wrote:
> On Tue, Mar 19, 2024 at 12:57:52PM +0000, Robin Murphy wrote:
> > Beyond properly quiescing and resetting the system back to a boot-time
> > state, the outgoing kernel in a kexec can only really do things which affect
> > itself. Sure, we *could* configure the SMMU to block all traffic and disable
> > the interrupt to avoid getting stuck in a storm of faults on the way out,
> > but what does that mean for the incoming kexec payload? That it can have the
> > pleasure of discovering the SMMU, innocently enabling the interrupt and
> > getting stuck in an unexpected storm of faults. Or perhaps just resetting
> > the SMMU into a disabled state and thus still unwittingly allowing its
> > memory to be corrupted by the previous kernel not supporting kexec properly.
>
> Right, it's hard to win if DMA-active devices weren't quiesced properly
> by the outgoing kernel. Either the SMMU was left in abort (leading to the
> problems you list above) or the SMMU is left in bypass (leading to possible
> data corruption). Which is better?

My thoughts are that a loud and obvious failure (via unidentified stream
fault messages and/or a possible interrupt storm preventing the new
kernel from booting) is favorable to silent and subtle data corruption
of the target kernel.

> The best solution is obviously to implement those missing ->shutdown()
> callbacks.

Completely agree here but it can be difficult to even identify that a
missing ->shutdown hook is the root cause without code changes to put
the SMMU into abort mode and sleep for a bit in the SMMU's ->shutdown
hook.

Tyler

2024-03-22 15:52:10

by Will Deacon

[permalink] [raw]
Subject: Re: Why is the ARM SMMU v1/v2 put into bypass mode on kexec?

On Tue, Mar 19, 2024 at 06:17:39PM +0000, Robin Murphy wrote:
> In terms of the shutdown behaviour, I think it actually works out as-is. For
> the normal case we haven't touched GBPA, so we are truly returning to the
> boot-time condition; in the unexpected case where SMMUEN was already enabled
> then we'll go into an explicit GPBA abort state, but that seems a
> not-unreasonable compromise for not preserving the entire boot-time Stream
> Table etc., whose presence kind of implies it wouldn't have been bypassing
> everything anyway.
>
> The more I look at the remaining aspect of disable_bypass for controlling
> broken-DT behaviour the more I suspect it can't actually be useful either
> way, especially not since default domains. I have no memory of what my
> original reasoning might have been, so I'm inclined to just rip that all out
> and let probe fail. I see no reason these days not to expect a broken DT to
> leads to a broken system, especially not now with DTSchema validation.

That sounds reasonable to me, although we may end up having to back it
out if we regress systems with borked firmware :(

> Then there's just the kdump warning it suppresses, of which I also have no
> idea why it's there either, but apparently that one's on you :P

I think _that_ one is because the previous (crashed) kernel won't have
torn anything down, so there could be active DMA using translations in
the SMMU. In that case, the crashkernel (which is running from some
carveout) may find the SMMU enabled, but it really can't stick it into
bypass mode because that's likely to corrupt random memory. So in that
case, we do stick it into abort before we reinitialise it and then we
disabling fault reporting altogether to avoid the log spam:

if (is_kdump_kernel())
enables &= ~(CR0_EVTQEN | CR0_PRIQEN)

Will

2024-03-22 15:55:41

by Will Deacon

[permalink] [raw]
Subject: Re: Why is the ARM SMMU v1/v2 put into bypass mode on kexec?

Hey Jason,

On Tue, Mar 19, 2024 at 02:50:07PM -0300, Jason Gunthorpe wrote:
> On Tue, Mar 19, 2024 at 03:47:56PM +0000, Will Deacon wrote:
>
> > Right, it's hard to win if DMA-active devices weren't quiesced properly
> > by the outgoing kernel. Either the SMMU was left in abort (leading to the
> > problems you list above) or the SMMU is left in bypass (leading to possible
> > data corruption). Which is better?
>
> For whatever reason (and I really don't like this design) alot of work
> was done on x86 so that device continues to work as-was right up until
> the crash kernel does the first DMA operation. Including having the
> crash kernel non disruptively inherit and retain the IOMMU
> configuration. (eg see translation_pre_enabled() stuff in intel
> driver)

Right, I'm also not thrilled about trying to implement that :)
What we have at the moment seems to be good enough to avoid folks
complaining about it.

For the case Tyler is reporting, though, I _think_ it's just a standard
kexec() rather than a crashkernel.

Will

2024-03-22 16:07:16

by Will Deacon

[permalink] [raw]
Subject: Re: Why is the ARM SMMU v1/v2 put into bypass mode on kexec?

On Tue, Mar 19, 2024 at 02:14:26PM -0500, Tyler Hicks wrote:
> On 2024-03-19 15:47:56, Will Deacon wrote:
> > On Tue, Mar 19, 2024 at 12:57:52PM +0000, Robin Murphy wrote:
> > > Beyond properly quiescing and resetting the system back to a boot-time
> > > state, the outgoing kernel in a kexec can only really do things which affect
> > > itself. Sure, we *could* configure the SMMU to block all traffic and disable
> > > the interrupt to avoid getting stuck in a storm of faults on the way out,
> > > but what does that mean for the incoming kexec payload? That it can have the
> > > pleasure of discovering the SMMU, innocently enabling the interrupt and
> > > getting stuck in an unexpected storm of faults. Or perhaps just resetting
> > > the SMMU into a disabled state and thus still unwittingly allowing its
> > > memory to be corrupted by the previous kernel not supporting kexec properly.
> >
> > Right, it's hard to win if DMA-active devices weren't quiesced properly
> > by the outgoing kernel. Either the SMMU was left in abort (leading to the
> > problems you list above) or the SMMU is left in bypass (leading to possible
> > data corruption). Which is better?
>
> My thoughts are that a loud and obvious failure (via unidentified stream
> fault messages and/or a possible interrupt storm preventing the new
> kernel from booting) is favorable to silent and subtle data corruption
> of the target kernel.

Looking at the SMMUv3 spec, the architecture does actually allow hardware
to reset into an aborting state:

[GBPA.ABORT]
| Note: An implementation can reset this field to 1, in order to
| implement a default deny policy on reset.

so perhaps it's not that unreasonable. I just dread the flood of emails
I'll get because the SMMU driver is noisy due to missing ->shutdown()
callbacks elsewhere :/

> > The best solution is obviously to implement those missing ->shutdown()
> > callbacks.
>
> Completely agree here but it can be difficult to even identify that a
> missing ->shutdown hook is the root cause without code changes to put
> the SMMU into abort mode and sleep for a bit in the SMMU's ->shutdown
> hook.

Perhaps that's the thing to tackle first, then? If we make it easier for
folks to diagnose and fix the missing ->shutdown() callbacks, then going
into abort is much more reasonable,

Will

2024-03-22 19:53:10

by Tyler Hicks

[permalink] [raw]
Subject: Re: Why is the ARM SMMU v1/v2 put into bypass mode on kexec?

On 2024-03-22 15:55:29, Will Deacon wrote:
> Hey Jason,
>
> On Tue, Mar 19, 2024 at 02:50:07PM -0300, Jason Gunthorpe wrote:
> > On Tue, Mar 19, 2024 at 03:47:56PM +0000, Will Deacon wrote:
> >
> > > Right, it's hard to win if DMA-active devices weren't quiesced properly
> > > by the outgoing kernel. Either the SMMU was left in abort (leading to the
> > > problems you list above) or the SMMU is left in bypass (leading to possible
> > > data corruption). Which is better?
> >
> > For whatever reason (and I really don't like this design) alot of work
> > was done on x86 so that device continues to work as-was right up until
> > the crash kernel does the first DMA operation. Including having the
> > crash kernel non disruptively inherit and retain the IOMMU
> > configuration. (eg see translation_pre_enabled() stuff in intel
> > driver)
>
> Right, I'm also not thrilled about trying to implement that :)
> What we have at the moment seems to be good enough to avoid folks
> complaining about it.
>
> For the case Tyler is reporting, though, I _think_ it's just a standard
> kexec() rather than a crashkernel.

That's correct.

Tyler

2024-04-02 16:33:17

by Robin Murphy

[permalink] [raw]
Subject: Re: Why is the ARM SMMU v1/v2 put into bypass mode on kexec?

On 2024-03-22 3:51 pm, Will Deacon wrote:
> On Tue, Mar 19, 2024 at 06:17:39PM +0000, Robin Murphy wrote:
>> In terms of the shutdown behaviour, I think it actually works out as-is. For
>> the normal case we haven't touched GBPA, so we are truly returning to the
>> boot-time condition; in the unexpected case where SMMUEN was already enabled
>> then we'll go into an explicit GPBA abort state, but that seems a
>> not-unreasonable compromise for not preserving the entire boot-time Stream
>> Table etc., whose presence kind of implies it wouldn't have been bypassing
>> everything anyway.
>>
>> The more I look at the remaining aspect of disable_bypass for controlling
>> broken-DT behaviour the more I suspect it can't actually be useful either
>> way, especially not since default domains. I have no memory of what my
>> original reasoning might have been, so I'm inclined to just rip that all out
>> and let probe fail. I see no reason these days not to expect a broken DT to
>> leads to a broken system, especially not now with DTSchema validation.
>
> That sounds reasonable to me, although we may end up having to back it
> out if we regress systems with borked firmware :(
>
>> Then there's just the kdump warning it suppresses, of which I also have no
>> idea why it's there either, but apparently that one's on you :P
>
> I think _that_ one is because the previous (crashed) kernel won't have
> torn anything down, so there could be active DMA using translations in
> the SMMU. In that case, the crashkernel (which is running from some
> carveout) may find the SMMU enabled, but it really can't stick it into
> bypass mode because that's likely to corrupt random memory. So in that
> case, we do stick it into abort before we reinitialise it and then we
> disabling fault reporting altogether to avoid the log spam:
>
> if (is_kdump_kernel())
> enables &= ~(CR0_EVTQEN | CR0_PRIQEN)

Oh, I know why we do what we do for the kdump situation in general - it
was merely the matter of why we chose to demand that the user explicitly
tells us to do what we know is the right thing (and scream at them if
they don't), rather than to just go ahead and do the right thing anyway.

(the significance of disable_bypass for kdump is after we turn the SMMU
back on from GBPA Abort state - we don't want any ongoing traffic being
able to inadvertently bypass via an STE config either)

Cheers,
Robin.