2024-04-26 15:47:53

by Timur Tabi

[permalink] [raw]
Subject: Re: [PATCH 1/2] drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

On Fri, 2024-04-26 at 11:41 -0400, Lyude Paul wrote:
> We hit this because when initializing firmware of type
> NVKM_FIRMWARE_IMG_DMA we allocate coherent memory and then attempt to
> include that coherent memory in a scatterlist.

I'm sure this patch is a good one, and I will try to test it soon, but I am
very curious to know why including coherent memory in a scatterlist is bad.


2024-04-28 15:53:05

by Lyude Paul

[permalink] [raw]
Subject: Re: [PATCH 1/2] drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

On Fri, 2024-04-26 at 15:47 +0000, Timur Tabi wrote:
> On Fri, 2024-04-26 at 11:41 -0400, Lyude Paul wrote:
> > We hit this because when initializing firmware of type
> > NVKM_FIRMWARE_IMG_DMA we allocate coherent memory and then attempt
> > to
> > include that coherent memory in a scatterlist.
>
> I'm sure this patch is a good one, and I will try to test it soon,
> but I am
> very curious to know why including coherent memory in a scatterlist
> is bad.

Thanks for asking this as I think you unintentionally pointed out this
explanation I gave doesn't make sense - so I looked a bit more into it.
The issue isn't coherent memory in the scatterlist, the issue is that
we're allocating with dma_alloc_coherent(). And according to the source
in dma_alloc_attrs() (which dma_alloc_coherent() is just a wrapper)
for):

/*
* DMA allocations can never be turned back into a page pointer, so
* requesting compound pages doesn't make sense (and can't even be
* supported at all by various backends).
*/
if (WARN_ON_ONCE(flag & __GFP_COMP))
return NULL;

Which explains the check in sg_set_buf() that this patch stops us from
hitting:

BUG_ON(!virt_addr_valid(buf));

Scatterlists need page pointers (we use one later down here:)

sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf));

But we can't get a page pointer from an allocation made by
dma_alloc_coherent() - but we can from vmalloc(). I'll fix the patch
explanation in the next version, I have to send out another version
anyhow since I realized that patch #2 still needs one more check to
work properly
--
Cheers,
Lyude Paul (she/her)
Software Engineer at Red Hat


2024-04-28 18:20:11

by Timur Tabi

[permalink] [raw]
Subject: Re: [PATCH 1/2] drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

On Sun, 2024-04-28 at 11:52 -0400, Lyude Paul wrote:
> But we can't get a page pointer from an allocation made by
> dma_alloc_coherent() - but we can from vmalloc(). I'll fix the patch
> explanation in the next version, I have to send out another version
> anyhow since I realized that patch #2 still needs one more check to
> work properly

Glad my confusion was helpful! You wouldn't happent to know why this
statement is true?

DMA allocations can never be turned back into a page pointer

I'm guessing DMA allocations are taken from some memory pool that isn't
tracked using page pointers, but I don't understand why. It seems weird that
you can have a virtual address to the buffer, therefore it is surely mapped on
a page-by-page basis, so why are there no page pointers?