DMA buffers allocated from the CMA dma-buf heap get counted under
RssFile for processes that map them and trigger page faults. In
addition to the incorrect accounting reported to userspace, reclaim
behavior was influenced by the MM_FILEPAGES counter until linux 6.8, but
this memory is not reclaimable. [1] Change the CMA dma-buf heap to set
VM_PFNMAP on the VMA so MM does not poke at the memory managed by this
dma-buf heap, and use vmf_insert_pfn to correct the RSS accounting.
The system dma-buf heap does not suffer from this issue since
remap_pfn_range is used during the mmap of the buffer, which also sets
VM_PFNMAP on the VMA.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/mm/vmscan.c?id=fb46e22a9e3863e08aef8815df9f17d0f4b9aede
Fixes: b61614ec318a ("dma-buf: heaps: Add CMA heap to dmabuf heaps")
Signed-off-by: T.J. Mercier <[email protected]>
---
drivers/dma-buf/heaps/cma_heap.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/drivers/dma-buf/heaps/cma_heap.c b/drivers/dma-buf/heaps/cma_heap.c
index ee899f8e6721..4a63567e93ba 100644
--- a/drivers/dma-buf/heaps/cma_heap.c
+++ b/drivers/dma-buf/heaps/cma_heap.c
@@ -168,10 +168,7 @@ static vm_fault_t cma_heap_vm_fault(struct vm_fault *vmf)
if (vmf->pgoff > buffer->pagecount)
return VM_FAULT_SIGBUS;
- vmf->page = buffer->pages[vmf->pgoff];
- get_page(vmf->page);
-
- return 0;
+ return vmf_insert_pfn(vma, vmf->address, page_to_pfn(buffer->pages[vmf->pgoff]));
}
static const struct vm_operations_struct dma_heap_vm_ops = {
@@ -185,6 +182,8 @@ static int cma_heap_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma)
if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) == 0)
return -EINVAL;
+ vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP);
+
vma->vm_ops = &dma_heap_vm_ops;
vma->vm_private_data = buffer;
--
2.43.0.381.gb435a96ce8-goog
On Thu, Jan 18, 2024 at 6:49 AM Daniel Vetter <[email protected]> wrote:
>
> On Thu, Jan 18, 2024 at 11:02:22AM +0100, Christian König wrote:
> > Am 17.01.24 um 19:11 schrieb T.J. Mercier:
> > > DMA buffers allocated from the CMA dma-buf heap get counted under
> > > RssFile for processes that map them and trigger page faults. In
> > > addition to the incorrect accounting reported to userspace, reclaim
> > > behavior was influenced by the MM_FILEPAGES counter until linux 6.8, but
> > > this memory is not reclaimable. [1] Change the CMA dma-buf heap to set
> > > VM_PFNMAP on the VMA so MM does not poke at the memory managed by this
> > > dma-buf heap, and use vmf_insert_pfn to correct the RSS accounting.
> > >
> > > The system dma-buf heap does not suffer from this issue since
> > > remap_pfn_range is used during the mmap of the buffer, which also sets
> > > VM_PFNMAP on the VMA.
> >
> > Mhm, not an issue with this patch but Daniel wanted to add a check for
> > VM_PFNMAP to dma_buf_mmap() which would have noted this earlier.
> >
> > I don't fully remember the discussion but for some reason that was never
> > committed. We should probably try that again.
>
> Iirc the issue is that dma_mmap is not guaranteed to give you a VM_SPECIAL
> mapping, at least on absolutely all architectures. That's why I defacto
> dropped that idea, but it would indeed be really great if we could
> resurrect it.
I actually had it in my head that it was a BUG_ON check for VM_PFNMAP
in dma_buf_mmap and it was merged, so I was surprised to discover that
it wasn't set for these CMA buffers.
> Maybe for x86 only? Or x86+armv8, I'm honestly not sure anymore which
> exact cases ended up with a VM_NORMAL mapping ... Would need a pile of
> digging.
Looking back at the patch, the CI email at the end of the thread lists
a bunch of now-broken links to DMESG-WARN test failures I assume
pointed at a large chunk of them.
https://lore.kernel.org/all/[email protected]/
> >
> > > [1]https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/mm/vmscan.c?id=fb46e22a9e3863e08aef8815df9f17d0f4b9aede
> > >
> > > Fixes: b61614ec318a ("dma-buf: heaps: Add CMA heap to dmabuf heaps")
> > > Signed-off-by: T.J. Mercier<[email protected]>
> >
> > Acked-by: Christian König <[email protected]>
Thanks Christian.
On Thu, Jan 18, 2024 at 11:02:22AM +0100, Christian K?nig wrote:
> Am 17.01.24 um 19:11 schrieb T.J. Mercier:
> > DMA buffers allocated from the CMA dma-buf heap get counted under
> > RssFile for processes that map them and trigger page faults. In
> > addition to the incorrect accounting reported to userspace, reclaim
> > behavior was influenced by the MM_FILEPAGES counter until linux 6.8, but
> > this memory is not reclaimable. [1] Change the CMA dma-buf heap to set
> > VM_PFNMAP on the VMA so MM does not poke at the memory managed by this
> > dma-buf heap, and use vmf_insert_pfn to correct the RSS accounting.
> >
> > The system dma-buf heap does not suffer from this issue since
> > remap_pfn_range is used during the mmap of the buffer, which also sets
> > VM_PFNMAP on the VMA.
>
> Mhm, not an issue with this patch but Daniel wanted to add a check for
> VM_PFNMAP to dma_buf_mmap() which would have noted this earlier.
>
> I don't fully remember the discussion but for some reason that was never
> committed. We should probably try that again.
Iirc the issue is that dma_mmap is not guaranteed to give you a VM_SPECIAL
mapping, at least on absolutely all architectures. That's why I defacto
dropped that idea, but it would indeed be really great if we could
resurrect it.
Maybe for x86 only? Or x86+armv8, I'm honestly not sure anymore which
exact cases ended up with a VM_NORMAL mapping ... Would need a pile of
digging.
I think all the other patches I've landed.
-Sima
>
> > [1]https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/mm/vmscan.c?id=fb46e22a9e3863e08aef8815df9f17d0f4b9aede
> >
> > Fixes: b61614ec318a ("dma-buf: heaps: Add CMA heap to dmabuf heaps")
> > Signed-off-by: T.J. Mercier<[email protected]>
>
> Acked-by: Christian K?nig <[email protected]>
>
> > ---
> > drivers/dma-buf/heaps/cma_heap.c | 7 +++----
> > 1 file changed, 3 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/dma-buf/heaps/cma_heap.c b/drivers/dma-buf/heaps/cma_heap.c
> > index ee899f8e6721..4a63567e93ba 100644
> > --- a/drivers/dma-buf/heaps/cma_heap.c
> > +++ b/drivers/dma-buf/heaps/cma_heap.c
> > @@ -168,10 +168,7 @@ static vm_fault_t cma_heap_vm_fault(struct vm_fault *vmf)
> > if (vmf->pgoff > buffer->pagecount)
> > return VM_FAULT_SIGBUS;
> > - vmf->page = buffer->pages[vmf->pgoff];
> > - get_page(vmf->page);
> > -
> > - return 0;
> > + return vmf_insert_pfn(vma, vmf->address, page_to_pfn(buffer->pages[vmf->pgoff]));
> > }
> > static const struct vm_operations_struct dma_heap_vm_ops = {
> > @@ -185,6 +182,8 @@ static int cma_heap_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma)
> > if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) == 0)
> > return -EINVAL;
> > + vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP);
> > +
> > vma->vm_ops = &dma_heap_vm_ops;
> > vma->vm_private_data = buffer;
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
On Thu, Jan 18, 2024 at 08:57:16AM -0800, T.J. Mercier wrote:
> On Thu, Jan 18, 2024 at 6:49 AM Daniel Vetter <[email protected]> wrote:
> >
> > On Thu, Jan 18, 2024 at 11:02:22AM +0100, Christian König wrote:
> > > Am 17.01.24 um 19:11 schrieb T.J. Mercier:
> > > > DMA buffers allocated from the CMA dma-buf heap get counted under
> > > > RssFile for processes that map them and trigger page faults. In
> > > > addition to the incorrect accounting reported to userspace, reclaim
> > > > behavior was influenced by the MM_FILEPAGES counter until linux 6.8, but
> > > > this memory is not reclaimable. [1] Change the CMA dma-buf heap to set
> > > > VM_PFNMAP on the VMA so MM does not poke at the memory managed by this
> > > > dma-buf heap, and use vmf_insert_pfn to correct the RSS accounting.
> > > >
> > > > The system dma-buf heap does not suffer from this issue since
> > > > remap_pfn_range is used during the mmap of the buffer, which also sets
> > > > VM_PFNMAP on the VMA.
> > >
> > > Mhm, not an issue with this patch but Daniel wanted to add a check for
> > > VM_PFNMAP to dma_buf_mmap() which would have noted this earlier.
> > >
> > > I don't fully remember the discussion but for some reason that was never
> > > committed. We should probably try that again.
> >
> > Iirc the issue is that dma_mmap is not guaranteed to give you a VM_SPECIAL
> > mapping, at least on absolutely all architectures. That's why I defacto
> > dropped that idea, but it would indeed be really great if we could
> > resurrect it.
>
> I actually had it in my head that it was a BUG_ON check for VM_PFNMAP
> in dma_buf_mmap and it was merged, so I was surprised to discover that
> it wasn't set for these CMA buffers.
>
> > Maybe for x86 only? Or x86+armv8, I'm honestly not sure anymore which
> > exact cases ended up with a VM_NORMAL mapping ... Would need a pile of
> > digging.
>
> Looking back at the patch, the CI email at the end of the thread lists
> a bunch of now-broken links to DMESG-WARN test failures I assume
> pointed at a large chunk of them.
>
> https://lore.kernel.org/all/[email protected]/
I thought there was a more recent submission, where I at least fixed the
various fallout in gem code. But maybe I only dreamed ...
Also I did the code grepping again, and at least iommu_dma_mmap() in
drivers/iommu/dma-iommu.c and arm_iommu_mmap_attrs() for arm use
vm_map_pages in certain cases, which is _not_ VM_PFNMAP.
Means really no cases where I think we can assume we'll always get
VM_PFNMAP, and unfortunately we need VM_PFNMAP or VM_IO to prevent
get_user_pages and similar bad things from happening to dma-buf mmaps.
So still no luck :-/
-Sima
>
> > >
> > > > [1]https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/mm/vmscan.c?id=fb46e22a9e3863e08aef8815df9f17d0f4b9aede
> > > >
> > > > Fixes: b61614ec318a ("dma-buf: heaps: Add CMA heap to dmabuf heaps")
> > > > Signed-off-by: T.J. Mercier<[email protected]>
> > >
> > > Acked-by: Christian König <[email protected]>
>
> Thanks Christian.
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch