2022-10-10 19:06:06

by Jerry Snitselaar

[permalink] [raw]
Subject: Issue seen since commit f5ff79fddf0e ("dma-mapping: remove CONFIG_DMA_REMAP")

I've been looking at an odd issue that shows up with commit
f5ff79fddf0e ("dma-mapping: remove CONFIG_DMA_REMAP"). What is being
seen is the bnx2fc driver calling dma_free_coherent(), and eventually
hits the BUG_ON() in vunmap(). bnx2fc_free_session_resc() does a
spin_lock_bh() around the dma_free_coherent() calls, and looking at
preempt.h that will trigger in_interrupt() to return positive, so that
makes sense. The really odd part is this only happens with the
shutdown of the kernel after a system install. Reboots after that do not
hit the BUG_ON() in vunmap().

I still need to grab a system and try to see what it is doing on the
subsequent shutdowns, because it seems to me that any time
bnx2fc_free_session_resc() is called it will end up there, unless the
allocs are not coming from vmalloc() in the later boots. Between the
comments in dma_free_attrs(), and preempt.h, dma_free_coherent()
shouldn't be called under a spin_lock_bh(), yes? I think the comments
in dma_free_attrs() might be out of date with commit f5ff79fddf0e
("dma-mapping: remove CONFIG_DMA_REMAP") in place since now it is more
general that you can land in vunmap(). Also, should that WARN_ON() in
dma_free_attrs() trigger as well for the BH disabled case?

It was also reproduced with a 6.0-rc5 kernel build[1].

Regards,
Jerry


[1] https://koji.fedoraproject.org/koji/buildinfo?buildID=2061881


2022-10-11 07:47:21

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Issue seen since commit f5ff79fddf0e ("dma-mapping: remove CONFIG_DMA_REMAP")

On Mon, Oct 10, 2022 at 11:57:39AM -0700, Jerry Snitselaar wrote:
> I still need to grab a system and try to see what it is doing on the
> subsequent shutdowns, because it seems to me that any time
> bnx2fc_free_session_resc() is called it will end up there, unless the
> allocs are not coming from vmalloc() in the later boots. Between the
> comments in dma_free_attrs(), and preempt.h, dma_free_coherent()
> shouldn't be called under a spin_lock_bh(), yes?

It shouldn't. And you would have seen the same BUG_ON on many non-x86
architectures even beforethe commit..

> I think the comments
> in dma_free_attrs() might be out of date with commit f5ff79fddf0e
> ("dma-mapping: remove CONFIG_DMA_REMAP") in place since now it is more
> general that you can land in vunmap().

Yes, the comment has always been a bit too specific, and is even more
so now.

> Also, should that WARN_ON() in
> dma_free_attrs() trigger as well for the BH disabled case?

Probably. And we should probably early return in that case to "just"
leak the memory instead of crashing the kernel when called from a buggy
driver.

2022-10-11 12:46:25

by Robin Murphy

[permalink] [raw]
Subject: Re: Issue seen since commit f5ff79fddf0e ("dma-mapping: remove CONFIG_DMA_REMAP")

On 2022-10-10 19:57, Jerry Snitselaar wrote:
> I've been looking at an odd issue that shows up with commit
> f5ff79fddf0e ("dma-mapping: remove CONFIG_DMA_REMAP"). What is being
> seen is the bnx2fc driver calling dma_free_coherent(), and eventually
> hits the BUG_ON() in vunmap(). bnx2fc_free_session_resc() does a
> spin_lock_bh() around the dma_free_coherent() calls, and looking at
> preempt.h that will trigger in_interrupt() to return positive, so that
> makes sense. The really odd part is this only happens with the
> shutdown of the kernel after a system install. Reboots after that do not
> hit the BUG_ON() in vunmap().

Most likely a difference in IOMMU config/parameters between the
installer and the installed kernel - if the latter is defaulting to
passthrough then it won't be remapping (assuming the device is coherent).

> I still need to grab a system and try to see what it is doing on the
> subsequent shutdowns, because it seems to me that any time
> bnx2fc_free_session_resc() is called it will end up there, unless the
> allocs are not coming from vmalloc() in the later boots. Between the
> comments in dma_free_attrs(), and preempt.h, dma_free_coherent()
> shouldn't be called under a spin_lock_bh(), yes? I think the comments
> in dma_free_attrs() might be out of date with commit f5ff79fddf0e
> ("dma-mapping: remove CONFIG_DMA_REMAP") in place since now it is more
> general that you can land in vunmap(). Also, should that WARN_ON() in
> dma_free_attrs() trigger as well for the BH disabled case?
>
> It was also reproduced with a 6.0-rc5 kernel build[1].

Looking at the history of that comment I guess I was just trying to
capture the most common case to explain the original motivation for
having the WARN_ON(). It was never meant to imply that that's the *only*
reason, especially since iommu-dma was already well established by that
point. That warning has been present on x86 in one form or another for
15 years, so I guess the real issue at hand is the difference between
irqs_disabled() and in_interrupt()?

As far as that particular driver goes it looks rather questionable
anyway - it seems like a terrible design flaw if a race between
consuming things and freeing them can exist at all, but then it looks
like bnx2fc_arm_cq() might actually program the hardware to *reuse* a CQ
which may already be waiting to be freed as soon as the lock is
dropped... that can't be good :/

Thanks,
Robin.

2022-10-12 03:13:26

by Jerry Snitselaar

[permalink] [raw]
Subject: Re: Issue seen since commit f5ff79fddf0e ("dma-mapping: remove CONFIG_DMA_REMAP")

On Tue, Oct 11, 2022 at 01:15:57PM +0100, Robin Murphy wrote:
> On 2022-10-10 19:57, Jerry Snitselaar wrote:
> > I've been looking at an odd issue that shows up with commit
> > f5ff79fddf0e ("dma-mapping: remove CONFIG_DMA_REMAP"). What is being
> > seen is the bnx2fc driver calling dma_free_coherent(), and eventually
> > hits the BUG_ON() in vunmap(). bnx2fc_free_session_resc() does a
> > spin_lock_bh() around the dma_free_coherent() calls, and looking at
> > preempt.h that will trigger in_interrupt() to return positive, so that
> > makes sense. The really odd part is this only happens with the
> > shutdown of the kernel after a system install. Reboots after that do not
> > hit the BUG_ON() in vunmap().
>
> Most likely a difference in IOMMU config/parameters between the installer
> and the installed kernel - if the latter is defaulting to passthrough then
> it won't be remapping (assuming the device is coherent).
>

I'm pretty sure that is the difference now. I'm still trying to get access to
a system to verify. I think what is happening is the install occurs
with intel_iommu=on, but they aren't setting up the system to use intel_iommu=on
afterwards. They are saying they aren't installing with intel_iommu=on,
but it looks like the netboot configuration has it, and they aren't
going to get to __iommu_dma_free() if it isn't. :) So, I think during
install the iommu is enabled and uses dma-iommu, and then afterwards
it isn't enabled so they are going through dma-direct, which still
has a possibility of vunmap() in the code. I should have
verification of that tomorrow. Thank you for the responses.

Thanks,
Jerry

> > I still need to grab a system and try to see what it is doing on the
> > subsequent shutdowns, because it seems to me that any time
> > bnx2fc_free_session_resc() is called it will end up there, unless the
> > allocs are not coming from vmalloc() in the later boots. Between the
> > comments in dma_free_attrs(), and preempt.h, dma_free_coherent()
> > shouldn't be called under a spin_lock_bh(), yes? I think the comments
> > in dma_free_attrs() might be out of date with commit f5ff79fddf0e
> > ("dma-mapping: remove CONFIG_DMA_REMAP") in place since now it is more
> > general that you can land in vunmap(). Also, should that WARN_ON() in
> > dma_free_attrs() trigger as well for the BH disabled case?
> >
> > It was also reproduced with a 6.0-rc5 kernel build[1].
>
> Looking at the history of that comment I guess I was just trying to capture
> the most common case to explain the original motivation for having the
> WARN_ON(). It was never meant to imply that that's the *only* reason,
> especially since iommu-dma was already well established by that point. That
> warning has been present on x86 in one form or another for 15 years, so I
> guess the real issue at hand is the difference between irqs_disabled() and
> in_interrupt()?
>
> As far as that particular driver goes it looks rather questionable anyway -
> it seems like a terrible design flaw if a race between consuming things and
> freeing them can exist at all, but then it looks like bnx2fc_arm_cq() might
> actually program the hardware to *reuse* a CQ which may already be waiting
> to be freed as soon as the lock is dropped... that can't be good :/
>
> Thanks,
> Robin.