When a buffer is exported to userspace, use memcg to attribute the
buffer to the allocating cgroup until all buffer references are
released.
Unlike the dmabuf sysfs stats implementation, this memcg accounting
avoids contention over the kernfs_rwsem incurred when creating or
removing nodes.
Signed-off-by: T.J. Mercier <[email protected]>
---
Documentation/admin-guide/cgroup-v2.rst | 4 ++++
drivers/dma-buf/dma-buf.c | 5 +++++
include/linux/dma-buf.h | 3 +++
include/linux/memcontrol.h | 1 +
mm/memcontrol.c | 4 ++++
5 files changed, 17 insertions(+)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index c8ae7c897f14..538ae22bc514 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1455,6 +1455,10 @@ PAGE_SIZE multiple when read back.
Amount of memory used for storing in-kernel data
structures.
+ dmabuf (npn)
+ Amount of memory used for exported DMA buffers allocated by the cgroup.
+ Stays with the allocating cgroup regardless of how the buffer is shared.
+
workingset_refault_anon
Number of refaults of previously evicted anonymous pages.
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index e6528767efc7..ac45dd101c4d 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -75,6 +75,8 @@ static void dma_buf_release(struct dentry *dentry)
*/
BUG_ON(dmabuf->cb_in.active || dmabuf->cb_out.active);
+ mod_memcg_state(dmabuf->memcg, MEMCG_DMABUF, -dmabuf->size);
+ mem_cgroup_put(dmabuf->memcg);
dma_buf_stats_teardown(dmabuf);
dmabuf->ops->release(dmabuf);
@@ -673,6 +675,9 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
if (ret)
goto err_dmabuf;
+ dmabuf->memcg = get_mem_cgroup_from_mm(current->mm);
+ mod_memcg_state(dmabuf->memcg, MEMCG_DMABUF, dmabuf->size);
+
file->private_data = dmabuf;
file->f_path.dentry->d_fsdata = dmabuf;
dmabuf->file = file;
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 6fa8d4e29719..1f0ffb8e4bf5 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -22,6 +22,7 @@
#include <linux/fs.h>
#include <linux/dma-fence.h>
#include <linux/wait.h>
+#include <linux/memcontrol.h>
struct device;
struct dma_buf;
@@ -446,6 +447,8 @@ struct dma_buf {
struct dma_buf *dmabuf;
} *sysfs_entry;
#endif
+ /* The cgroup to which this buffer is currently attributed */
+ struct mem_cgroup *memcg;
};
/**
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index d3c8203cab6c..1c1da2da20a6 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -37,6 +37,7 @@ enum memcg_stat_item {
MEMCG_KMEM,
MEMCG_ZSWAP_B,
MEMCG_ZSWAPPED,
+ MEMCG_DMABUF,
MEMCG_NR_STAT,
};
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ab457f0394ab..680189bec7e0 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1502,6 +1502,7 @@ static const struct memory_stat memory_stats[] = {
{ "unevictable", NR_UNEVICTABLE },
{ "slab_reclaimable", NR_SLAB_RECLAIMABLE_B },
{ "slab_unreclaimable", NR_SLAB_UNRECLAIMABLE_B },
+ { "dmabuf", MEMCG_DMABUF },
/* The memory events */
{ "workingset_refault_anon", WORKINGSET_REFAULT_ANON },
@@ -1519,6 +1520,7 @@ static int memcg_page_state_unit(int item)
switch (item) {
case MEMCG_PERCPU_B:
case MEMCG_ZSWAP_B:
+ case MEMCG_DMABUF:
case NR_SLAB_RECLAIMABLE_B:
case NR_SLAB_UNRECLAIMABLE_B:
case WORKINGSET_REFAULT_ANON:
@@ -4042,6 +4044,7 @@ static const unsigned int memcg1_stats[] = {
WORKINGSET_REFAULT_ANON,
WORKINGSET_REFAULT_FILE,
MEMCG_SWAP,
+ MEMCG_DMABUF,
};
static const char *const memcg1_stat_names[] = {
@@ -4057,6 +4060,7 @@ static const char *const memcg1_stat_names[] = {
"workingset_refault_anon",
"workingset_refault_file",
"swap",
+ "dmabuf",
};
/* Universal VM events cgroup1 shows, original sort order */
--
2.39.0.314.g84b9a713c41-goog
On Mon 09-01-23 21:38:04, T.J. Mercier wrote:
> When a buffer is exported to userspace, use memcg to attribute the
> buffer to the allocating cgroup until all buffer references are
> released.
>
> Unlike the dmabuf sysfs stats implementation, this memcg accounting
> avoids contention over the kernfs_rwsem incurred when creating or
> removing nodes.
I am not familiar with dmabuf infrastructure so please bear with me.
AFAIU this patch adds a dmabuf specific counter to find out the amount
of dmabuf memory used. But I do not see any actual charging implemented
for that memory.
I have looked at two random users of dma_buf_export cma_heap_allocate
and it allocates pages to back the dmabuf (AFAIU) by cma_alloc
which doesn't account to memcg, system_heap_allocate uses
alloc_largest_available which relies on order_flags which doesn't seem
to ever use __GFP_ACCOUNT.
This would mean that the counter doesn't represent any actual memory
reflected in the overall memory consumption of a memcg. I believe this
is rather unexpected and confusing behavior. While some counters
overlap and their sum would exceed the charged memory we do not have any
that doesn't correspond to any memory (at least not for non-root memcgs).
--
Michal Hocko
SUSE Labs
On Tue, Jan 10, 2023 at 12:58 AM Michal Hocko <[email protected]> wrote:
>
> On Mon 09-01-23 21:38:04, T.J. Mercier wrote:
> > When a buffer is exported to userspace, use memcg to attribute the
> > buffer to the allocating cgroup until all buffer references are
> > released.
> >
> > Unlike the dmabuf sysfs stats implementation, this memcg accounting
> > avoids contention over the kernfs_rwsem incurred when creating or
> > removing nodes.
>
> I am not familiar with dmabuf infrastructure so please bear with me.
> AFAIU this patch adds a dmabuf specific counter to find out the amount
> of dmabuf memory used. But I do not see any actual charging implemented
> for that memory.
>
> I have looked at two random users of dma_buf_export cma_heap_allocate
> and it allocates pages to back the dmabuf (AFAIU) by cma_alloc
> which doesn't account to memcg, system_heap_allocate uses
> alloc_largest_available which relies on order_flags which doesn't seem
> to ever use __GFP_ACCOUNT.
>
> This would mean that the counter doesn't represent any actual memory
> reflected in the overall memory consumption of a memcg. I believe this
> is rather unexpected and confusing behavior. While some counters
> overlap and their sum would exceed the charged memory we do not have any
> that doesn't correspond to any memory (at least not for non-root memcgs).
>
> --
> Michal Hocko
> SUSE Labs
Thank you, that behavior is not intentional. I'm not looking at the
overall memcg charge yet otherwise I would have noticed this. I think
I understand what's needed for the charging part, but Shakeel
mentioned some additional work for "reclaim, OOM and charge context
and failure cases" on the cover letter which I need to look into.