2023-06-15 03:53:15

by Zhongkun He

[permalink] [raw]
Subject: [RFC PATCH 1/3] zram: charge the compressed RAM to the page's memcgroup

The compressed RAM is currently charged to kernel, not to
any memory cgroup, which is not satisfy our usage scenario.
if the memory of a task is limited by memcgroup, it will
swap out the memory to zram swap device when the memory
is insufficient. In that case, the memory limit will have
no effect.

So, it should makes sense to charge the compressed RAM to
the page's memory cgroup.

Signed-off-by: Zhongkun He <[email protected]>
---
drivers/block/zram/zram_drv.c | 22 ++++++++++++++++------
1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index f6d90f1ba5cf..03b508447473 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -33,6 +33,7 @@
#include <linux/debugfs.h>
#include <linux/cpuhotplug.h>
#include <linux/part_stat.h>
+#include <linux/memcontrol.h>

#include "zram_drv.h"

@@ -1419,6 +1420,10 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)
struct zcomp_strm *zstrm;
unsigned long element = 0;
enum zram_pageflags flags = 0;
+ struct mem_cgroup *memcg, *old_memcg;
+
+ memcg = page_memcg(page);
+ old_memcg = set_active_memcg(memcg);

mem = kmap_atomic(page);
if (page_same_filled(mem, &element)) {
@@ -1426,7 +1431,7 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)
/* Free memory associated with this sector now. */
flags = ZRAM_SAME;
atomic64_inc(&zram->stats.same_pages);
- goto out;
+ goto out_free;
}
kunmap_atomic(mem);

@@ -1440,7 +1445,7 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)
zcomp_stream_put(zram->comps[ZRAM_PRIMARY_COMP]);
pr_err("Compression failed! err=%d\n", ret);
zs_free(zram->mem_pool, handle);
- return ret;
+ goto out;
}

if (comp_len >= huge_class_size)
@@ -1470,8 +1475,10 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)
handle = zs_malloc(zram->mem_pool, comp_len,
GFP_NOIO | __GFP_HIGHMEM |
__GFP_MOVABLE);
- if (IS_ERR_VALUE(handle))
- return PTR_ERR((void *)handle);
+ if (IS_ERR_VALUE(handle)) {
+ ret = PTR_ERR((void *)handle);
+ goto out;
+ }

if (comp_len != PAGE_SIZE)
goto compress_again;
@@ -1491,7 +1498,8 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)
if (zram->limit_pages && alloced_pages > zram->limit_pages) {
zcomp_stream_put(zram->comps[ZRAM_PRIMARY_COMP]);
zs_free(zram->mem_pool, handle);
- return -ENOMEM;
+ ret = -ENOMEM;
+ goto out;
}

dst = zs_map_object(zram->mem_pool, handle, ZS_MM_WO);
@@ -1506,7 +1514,7 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)
zcomp_stream_put(zram->comps[ZRAM_PRIMARY_COMP]);
zs_unmap_object(zram->mem_pool, handle);
atomic64_add(comp_len, &zram->stats.compr_data_size);
-out:
+out_free:
/*
* Free memory associated with this sector
* before overwriting unused sectors.
@@ -1531,6 +1539,8 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)

/* Update stats */
atomic64_inc(&zram->stats.pages_stored);
+out:
+ set_active_memcg(old_memcg);
return ret;
}

--
2.25.1



2023-06-15 05:20:05

by Yu Zhao

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] zram: charge the compressed RAM to the page's memcgroup

On Wed, Jun 14, 2023 at 9:48 PM Zhongkun He
<[email protected]> wrote:
>
> The compressed RAM is currently charged to kernel, not to
> any memory cgroup, which is not satisfy our usage scenario.
> if the memory of a task is limited by memcgroup, it will
> swap out the memory to zram swap device when the memory
> is insufficient. In that case, the memory limit will have
> no effect.
>
> So, it should makes sense to charge the compressed RAM to
> the page's memory cgroup.

We used to do this a long time ago, but we had per-memcg swapfiles [1[
to prevent compressed pages from different memcgs from sharing the
same zspage.

Does this patchset alone suffer from the same problem, i.e., memcgs
sharing zspages?

[1] https://lwn.net/Articles/592923/

2023-06-15 09:49:42

by Fabian Deutsch

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] zram: charge the compressed RAM to the page's memcgroup

On Thu, Jun 15, 2023 at 6:59 AM Yu Zhao <[email protected]> wrote:
>
> On Wed, Jun 14, 2023 at 9:48 PM Zhongkun He
> <[email protected]> wrote:
> >
> > The compressed RAM is currently charged to kernel, not to
> > any memory cgroup, which is not satisfy our usage scenario.
> > if the memory of a task is limited by memcgroup, it will
> > swap out the memory to zram swap device when the memory
> > is insufficient. In that case, the memory limit will have
> > no effect.
> >
> > So, it should makes sense to charge the compressed RAM to
> > the page's memory cgroup.

While looking at this in the past weeks, I believe that there are two
distinct problems:
1. Direct zram usage by process within a cg ie. a process writing to a
zram device
2. Indirect zram usage by a process within a cg via swap (described above)

Both of them probably require different solutions.
In order to fix #1, accounting a zram device should be accounted
towards a cgroup. IMHO this is something that should be fixed.

Yu Zhao and Yosry are probably much more familiar with the solution to #2.
WRT per-cgrou-swapfile, to me this is addressing #2, but I agree with
Yu Zhao, that there are probably better solutions to this.

Lastly, this patchset, while it will possibly not address the swap
issue (#2) completely, is it satisfying the needs of #1?

- fabian

> We used to do this a long time ago, but we had per-memcg swapfiles [1[
> to prevent compressed pages from different memcgs from sharing the
> same zspage.
>
> Does this patchset alone suffer from the same problem, i.e., memcgs
> sharing zspages?
>
> [1] https://lwn.net/Articles/592923/
>


2023-06-15 09:53:24

by David Hildenbrand

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] zram: charge the compressed RAM to the page's memcgroup

On 15.06.23 05:48, Zhongkun He wrote:
> The compressed RAM is currently charged to kernel, not to
> any memory cgroup, which is not satisfy our usage scenario.
> if the memory of a task is limited by memcgroup, it will
> swap out the memory to zram swap device when the memory
> is insufficient. In that case, the memory limit will have
> no effect.
>
> So, it should makes sense to charge the compressed RAM to
> the page's memory cgroup.

Interesting. When looking at possible ways to achieve that in a clean
way, my understanding was that the page might not always be accounted to
a memcg (e.g., simply some buffer?). Besides the swap->zram case I was
thinking about other fs->zram case, or just a straight read/write to the
zram device.

The easiest way to see where that goes wrong I think is
zram_bvec_write_partial(), where we simply allocate a temporary page.

Either I am missing something important, or this only works in some
special cases.

I came to the conclusion that the only reasonable way is to assign a
complete zram device to a single cgroup and have all memory accounted to
that cgroup. Does also not solve all use cases (especially the
swap->zram case that might be better offjust using zswap?) but at least
the memory gets accounted in a more predictable way.

Happy to learn more.

--
Cheers,

David / dhildenb


2023-06-15 09:53:59

by Michal Hocko

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] zram: charge the compressed RAM to the page's memcgroup

On Thu 15-06-23 11:48:30, Zhongkun He wrote:
[...]
> @@ -1419,6 +1420,10 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)
> struct zcomp_strm *zstrm;
> unsigned long element = 0;
> enum zram_pageflags flags = 0;
> + struct mem_cgroup *memcg, *old_memcg;
> +
> + memcg = page_memcg(page);
> + old_memcg = set_active_memcg(memcg);
>
> mem = kmap_atomic(page);
> if (page_same_filled(mem, &element)) {
[...]
> @@ -1470,8 +1475,10 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)
> handle = zs_malloc(zram->mem_pool, comp_len,
> GFP_NOIO | __GFP_HIGHMEM |
> __GFP_MOVABLE);
> - if (IS_ERR_VALUE(handle))
> - return PTR_ERR((void *)handle);
> + if (IS_ERR_VALUE(handle)) {
> + ret = PTR_ERR((void *)handle);
> + goto out;
> + }
>
> if (comp_len != PAGE_SIZE)
> goto compress_again;

I am not really deeply familiar with zram implementation nor usage but
how is the above allocation going to be charged without __GFP_ACCOUNT in
the gfp mask?

Also what exactly is going to happen for the swap backed by the zram
device? Your memcg might be hitting the hard limit and therefore
swapping out. Wouldn't zs_malloc fail very likely under that condition
making the swap effectively unusable?
--
Michal Hocko
SUSE Labs

2023-06-15 10:14:29

by Zhongkun He

[permalink] [raw]
Subject: Re: [External] Re: [RFC PATCH 1/3] zram: charge the compressed RAM to the page's memcgroup

On Thu, Jun 15, 2023 at 1:00 PM Yu Zhao <[email protected]> wrote:
>
> On Wed, Jun 14, 2023 at 9:48 PM Zhongkun He
> <[email protected]> wrote:
> >
> > The compressed RAM is currently charged to kernel, not to
> > any memory cgroup, which is not satisfy our usage scenario.
> > if the memory of a task is limited by memcgroup, it will
> > swap out the memory to zram swap device when the memory
> > is insufficient. In that case, the memory limit will have
> > no effect.
> >
> > So, it should makes sense to charge the compressed RAM to
> > the page's memory cgroup.
>
> We used to do this a long time ago, but we had per-memcg swapfiles [1[
> to prevent compressed pages from different memcgs from sharing the
> same zspage.
>
> Does this patchset alone suffer from the same problem, i.e., memcgs
> sharing zspages?
>
> [1] https://lwn.net/Articles/592923/

Thanks for your reply. Yes, different memcgs may share the same zspages.

2023-06-15 12:17:08

by Zhongkun He

[permalink] [raw]
Subject: Re: [External] Re: [RFC PATCH 1/3] zram: charge the compressed RAM to the page's memcgroup

On Thu, Jun 15, 2023 at 5:27 PM David Hildenbrand <[email protected]> wrote:
>
> Interesting. When looking at possible ways to achieve that in a clean
> way, my understanding was that the page might not always be accounted to
> a memcg (e.g., simply some buffer?). Besides the swap->zram case I was
> thinking about other fs->zram case, or just a straight read/write to the
> zram device.
>
> The easiest way to see where that goes wrong I think is
> zram_bvec_write_partial(), where we simply allocate a temporary page.
>
> Either I am missing something important, or this only works in some
> special cases.
>
> I came to the conclusion that the only reasonable way is to assign a
> complete zram device to a single cgroup and have all memory accounted to
> that cgroup. Does also not solve all use cases (especially the
> swap->zram case that might be better offjust using zswap?) but at least
> the memory gets accounted in a more predictable way.
>
> Happy to learn more.
>
> --
> Cheers,
>
> David / dhildenb
>

Hi david, thanks for your reply. This should be my mistake, I didn't
describe the
problem clearly.

The reason for the problem is not the temporary page allocated at the beginning
of zram_bvec_write_partial() and released at the end,but the compressed memory
allocated by zs_malloc() in zram_write_page().

So, it may not meet the need to assign a complete zram device to a
single cgroup.
we need to charge the compressed memory to the original page's memory cgroup,
which is not charged so far.

2023-06-15 12:21:12

by Zhongkun He

[permalink] [raw]
Subject: Re: [External] Re: [RFC PATCH 1/3] zram: charge the compressed RAM to the page's memcgroup

Hi michal, glad to hear from you.

> I am not really deeply familiar with zram implementation nor usage but
> how is the above allocation going to be charged without __GFP_ACCOUNT in
> the gfp mask?

Yes,zs_malloc() did not charge compressed memory, even if we add this gfp.
so we need to implement this function in this patchset. But this flag should be
used to enable this feature.

> Also what exactly is going to happen for the swap backed by the zram
> device? Your memcg might be hitting the hard limit and therefore
> swapping out. Wouldn't zs_malloc fail very likely under that condition
> making the swap effectively unusable?

This is the key point, as i said above, zs_malloc() did not charge
compressed memory,
so zs_malloc will not fail under that condition. if the zram swap is
large enough, zs_malloc
never fails unless system OOM. so memory.max will be invalidated.


> --
> Michal Hocko
> SUSE Labs

2023-06-15 12:50:51

by Michal Hocko

[permalink] [raw]
Subject: Re: [External] Re: [RFC PATCH 1/3] zram: charge the compressed RAM to the page's memcgroup

On Thu 15-06-23 19:58:37, 贺中坤 wrote:
> Hi michal, glad to hear from you.
>
> > I am not really deeply familiar with zram implementation nor usage but
> > how is the above allocation going to be charged without __GFP_ACCOUNT in
> > the gfp mask?
>
> Yes,zs_malloc() did not charge compressed memory, even if we add this gfp.
> so we need to implement this function in this patchset. But this flag should be
> used to enable this feature.

Let me check I understand. This patch on its own doesn't really do
anything. You need the zs_malloc support implemented in patch 3 for this
to have any effect. Even with that in place the zs_malloc doesn't follow
the __GFP_ACCOUNT scheme we use for allocation tracking. Correct?

> > Also what exactly is going to happen for the swap backed by the zram
> > device? Your memcg might be hitting the hard limit and therefore
> > swapping out. Wouldn't zs_malloc fail very likely under that condition
> > making the swap effectively unusable?
>
> This is the key point, as i said above, zs_malloc() did not charge
> compressed memory,
> so zs_malloc will not fail under that condition. if the zram swap is
> large enough, zs_malloc
> never fails unless system OOM. so memory.max will be invalidated.

I do not think this is answering my question. Or maybe I just
misunderstand. Let me try again. Say you have a memcg under hard limit
pressure so any further charge is going to fail. How can you reasonably
implement zram back swapout if the memory is charged?

--
Michal Hocko
SUSE Labs

2023-06-15 13:16:23

by Zhongkun He

[permalink] [raw]
Subject: Re: [External] Re: [RFC PATCH 1/3] zram: charge the compressed RAM to the page's memcgroup

> Let me check I understand. This patch on its own doesn't really do
> anything. You need the zs_malloc support implemented in patch 3 for this
> to have any effect. Even with that in place the zs_malloc doesn't follow
> the __GFP_ACCOUNT scheme we use for allocation tracking. Correct?
>

Yes, I will use it on next version.

> I do not think this is answering my question. Or maybe I just
> misunderstand. Let me try again. Say you have a memcg under hard limit
> pressure so any further charge is going to fail. How can you reasonably
> implement zram back swapout if the memory is charged?
>

Sorry, let me try to explain again. I have a memcg under hard limit pressure.
Any further charge will try to free memory and swapout to zram back which
is compressed and stored data in memory.so any further charge is not going
to fail. The charged memory is swapout to compressed memory step by
step, but the compressed memory is not charged to the original memcgroup.
So, Actual memory usage is already greater than the hard limit in some cases.
This pachset will charge the compressed memory to the original memcg,
limited by memory.max

> --
> Michal Hocko
> SUSE Labs

2023-06-15 13:40:59

by Michal Hocko

[permalink] [raw]
Subject: Re: [External] Re: [RFC PATCH 1/3] zram: charge the compressed RAM to the page's memcgroup

On Thu 15-06-23 21:09:16, 贺中坤 wrote:
> > Let me check I understand. This patch on its own doesn't really do
> > anything. You need the zs_malloc support implemented in patch 3 for this
> > to have any effect. Even with that in place the zs_malloc doesn't follow
> > the __GFP_ACCOUNT scheme we use for allocation tracking. Correct?
> >
>
> Yes, I will use it on next version.

OK, also make sure that the zsmalloc support is implemented before zram
depends on it.

> > I do not think this is answering my question. Or maybe I just
> > misunderstand. Let me try again. Say you have a memcg under hard limit
> > pressure so any further charge is going to fail. How can you reasonably
> > implement zram back swapout if the memory is charged?
> >
>
> Sorry, let me try to explain again. I have a memcg under hard limit pressure.
> Any further charge will try to free memory and swapout to zram back which
> is compressed and stored data in memory.so any further charge is not going
> to fail. The charged memory is swapout to compressed memory step by
> step, but the compressed memory is not charged to the original memcgroup.
> So, Actual memory usage is already greater than the hard limit in some cases.
> This pachset will charge the compressed memory to the original memcg,
> limited by memory.max

This is not really answering my question though. memcg under hard limit
is not really my concern. This is a simpler case. I am not saying it
doesn't need to get addresses but it is the memcg hard limited case that
is much more interested. Because your charges are going to fail very
likely and that would mean that swapout would fail AFAIU. If my
understanding is wrong then it would really help to describe that case
much more in the changelog.

--
Michal Hocko
SUSE Labs

2023-06-15 14:42:24

by Zhongkun He

[permalink] [raw]
Subject: Re: [External] Re: [RFC PATCH 1/3] zram: charge the compressed RAM to the page's memcgroup

>
> OK, also make sure that the zsmalloc support is implemented before zram
> depends on it.
>

OK. I got it.

>
> This is not really answering my question though. memcg under hard limit
> is not really my concern. This is a simpler case. I am not saying it
> doesn't need to get addresses but it is the memcg hard limited case that
> is much more interested. Because your charges are going to fail very
> likely and that would mean that swapout would fail AFAIU. If my
> understanding is wrong then it would really help to describe that case
> much more in the changelog.
>

OK, Got it. In many cases I have tested, it will not fail because we did
not charge the page directly,but the objects(like slab,compressed page),
for the zspage may be shared by any memcg.

> --
> Michal Hocko
> SUSE Labs

2023-06-15 14:52:24

by Michal Hocko

[permalink] [raw]
Subject: Re: [External] Re: [RFC PATCH 1/3] zram: charge the compressed RAM to the page's memcgroup

On Thu 15-06-23 22:13:09, 贺中坤 wrote:
> > This is not really answering my question though. memcg under hard limit
> > is not really my concern. This is a simpler case. I am not saying it
> > doesn't need to get addresses but it is the memcg hard limited case that
> > is much more interested. Because your charges are going to fail very
> > likely and that would mean that swapout would fail AFAIU. If my
> > understanding is wrong then it would really help to describe that case
> > much more in the changelog.
> >
>
> OK, Got it. In many cases I have tested, it will not fail because we did
> not charge the page directly,but the objects(like slab,compressed page),
> for the zspage may be shared by any memcg.

That sounds like a broken design to me.
--
Michal Hocko
SUSE Labs

2023-06-16 01:56:32

by Yosry Ahmed

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] zram: charge the compressed RAM to the page's memcgroup

On Thu, Jun 15, 2023 at 1:57 AM Fabian Deutsch <[email protected]> wrote:
>
>
> On Thu, Jun 15, 2023 at 6:59 AM Yu Zhao <[email protected]> wrote:
>>
>> On Wed, Jun 14, 2023 at 9:48 PM Zhongkun He
>> <[email protected]> wrote:
>> >
>> > The compressed RAM is currently charged to kernel, not to
>> > any memory cgroup, which is not satisfy our usage scenario.
>> > if the memory of a task is limited by memcgroup, it will
>> > swap out the memory to zram swap device when the memory
>> > is insufficient. In that case, the memory limit will have
>> > no effect.
>> >
>> > So, it should makes sense to charge the compressed RAM to
>> > the page's memory cgroup.
>
>
> While looking at this in the past weeks, I believe that there are two distinct problems:
> 1. Direct zram usage by process within a cg ie. a process writing to a zram device
> 2. Indirect zram usage by a process within a cg via swap (described above)
>
> Both of them probably require different solutions.
> In order to fix #1, accounting a zram device should be accounted towards a cgroup. IMHO this is something that should be fixed.
>
> Yu Zhao and Yosry are probably much more familiar with the solution to #2.
> WRT per-cgrou-swapfile, to me this is addressing #2, but I agree with Yu Zhao, that there are probably better solutions to this.

Thanks Fabian for tagging me.

I am not familiar with #1, so I will speak to #2. Zhongkun, There are
a few parts that I do not understand -- hopefully you can help me out
here:

(1) If I understand correctly in this patch we set the active memcg
trying to charge any pages allocated in a zspage to the current memcg,
yet that zspage will contain multiple compressed object slots, not
just the one used by this memcg. Aren't we overcharging the memcg?
Basically the first memcg that happens to allocate the zspage will pay
for all the objects in this zspage, even after it stops using the
zspage completely?

(2) Patch 3 seems to be charging the compressed slots to the memcgs,
yet this patch is trying to charge the entire zspage. Aren't we double
charging the zspage? I am guessing this isn't happening because (as
Michal pointed out) we are not using __GFP_ACCOUNT here anyway, so
this patch may be NOP, and the actual charging is coming from patch 3
only.

(3) Zswap recently implemented per-memcg charging of compressed
objects in a much simpler way. If your main interest is #2 (which is
what I understand from the commit log), it seems like zswap might be
providing this already? Why can't you use zswap? Is it the fact that
zswap requires a backing swapfile?

Thanks!

>
> Lastly, this patchset, while it will possibly not address the swap issue (#2) completely, is it satisfying the needs of #1?
>
> - fabian
>
>>
>> We used to do this a long time ago, but we had per-memcg swapfiles [1[
>> to prevent compressed pages from different memcgs from sharing the
>> same zspage.
>>
>> Does this patchset alone suffer from the same problem, i.e., memcgs
>> sharing zspages?
>>
>> [1] https://lwn.net/Articles/592923/
>>

2023-06-16 03:49:12

by Zhongkun He

[permalink] [raw]
Subject: Re: [External] Re: [RFC PATCH 1/3] zram: charge the compressed RAM to the page's memcgroup

On Thu, Jun 15, 2023 at 10:21 PM Michal Hocko <[email protected]> wrote:
>
> On Thu 15-06-23 22:13:09, 贺中坤 wrote:
> > > This is not really answering my question though. memcg under hard limit
> > > is not really my concern. This is a simpler case. I am not saying it
> > > doesn't need to get addresses but it is the memcg hard limited case that
> > > is much more interested. Because your charges are going to fail very
> > > likely and that would mean that swapout would fail AFAIU. If my
> > > understanding is wrong then it would really help to describe that case
> > > much more in the changelog.
> > >
> >
> > OK, Got it. In many cases I have tested, it will not fail because we did
> > not charge the page directly,but the objects(like slab,compressed page),
> > for the zspage may be shared by any memcg.
>
> That sounds like a broken design to me.
> --
> Michal Hocko
> SUSE Labs

I will try more cases in different compression ratios and make sure
that swapout will not fail.

2023-06-16 04:51:52

by Zhongkun He

[permalink] [raw]
Subject: Re: [External] Re: [RFC PATCH 1/3] zram: charge the compressed RAM to the page's memcgroup

> Thanks Fabian for tagging me.
>
> I am not familiar with #1, so I will speak to #2. Zhongkun, There are
> a few parts that I do not understand -- hopefully you can help me out
> here:
>
> (1) If I understand correctly in this patch we set the active memcg
> trying to charge any pages allocated in a zspage to the current memcg,
> yet that zspage will contain multiple compressed object slots, not
> just the one used by this memcg. Aren't we overcharging the memcg?
> Basically the first memcg that happens to allocate the zspage will pay
> for all the objects in this zspage, even after it stops using the
> zspage completely?

It will not overcharge. As you said below, we are not using
__GFP_ACCOUNT and charging the compressed slots to the memcgs.

>
> (2) Patch 3 seems to be charging the compressed slots to the memcgs,
> yet this patch is trying to charge the entire zspage. Aren't we double
> charging the zspage? I am guessing this isn't happening because (as
> Michal pointed out) we are not using __GFP_ACCOUNT here anyway, so
> this patch may be NOP, and the actual charging is coming from patch 3
> only.

YES, the actual charging is coming from patch 3. This patch just
delivers the BIO page's memcg to the current task which is not the
consumer.

>
> (3) Zswap recently implemented per-memcg charging of compressed
> objects in a much simpler way. If your main interest is #2 (which is
> what I understand from the commit log), it seems like zswap might be
> providing this already? Why can't you use zswap? Is it the fact that
> zswap requires a backing swapfile?

Thanks for your reply and review. Yes, the zswap requires a backing
swapfile. The I/O path is very complex, sometimes it will throttle the
whole system if some resources are short , so we hope to use zram.

>
> Thanks!
>

2023-06-16 06:53:38

by Michal Hocko

[permalink] [raw]
Subject: Re: [External] Re: [RFC PATCH 1/3] zram: charge the compressed RAM to the page's memcgroup

On Fri 16-06-23 11:31:18, 贺中坤 wrote:
> On Thu, Jun 15, 2023 at 10:21 PM Michal Hocko <[email protected]> wrote:
> >
> > On Thu 15-06-23 22:13:09, 贺中坤 wrote:
> > > > This is not really answering my question though. memcg under hard limit
> > > > is not really my concern. This is a simpler case. I am not saying it
> > > > doesn't need to get addresses but it is the memcg hard limited case that
> > > > is much more interested. Because your charges are going to fail very
> > > > likely and that would mean that swapout would fail AFAIU. If my
> > > > understanding is wrong then it would really help to describe that case
> > > > much more in the changelog.
> > > >
> > >
> > > OK, Got it. In many cases I have tested, it will not fail because we did
> > > not charge the page directly,but the objects(like slab,compressed page),
> > > for the zspage may be shared by any memcg.
> >
> > That sounds like a broken design to me.
> > --
> > Michal Hocko
> > SUSE Labs
>
> I will try more cases in different compression ratios and make sure
> that swapout will not fail.

I do not think different compression methods will cut it. You
essentially need some form of memory reserves - in memcg world
pre-charged pool of memory readily available for the swapout. Another
way would be to allow the charge to bypass the limit with a guarantee
that this will be a temporal breach.

--
Michal Hocko
SUSE Labs

2023-06-16 07:47:54

by Yosry Ahmed

[permalink] [raw]
Subject: Re: [External] Re: [RFC PATCH 1/3] zram: charge the compressed RAM to the page's memcgroup

On Thu, Jun 15, 2023 at 9:41 PM 贺中坤 <[email protected]> wrote:
>
> > Thanks Fabian for tagging me.
> >
> > I am not familiar with #1, so I will speak to #2. Zhongkun, There are
> > a few parts that I do not understand -- hopefully you can help me out
> > here:
> >
> > (1) If I understand correctly in this patch we set the active memcg
> > trying to charge any pages allocated in a zspage to the current memcg,
> > yet that zspage will contain multiple compressed object slots, not
> > just the one used by this memcg. Aren't we overcharging the memcg?
> > Basically the first memcg that happens to allocate the zspage will pay
> > for all the objects in this zspage, even after it stops using the
> > zspage completely?
>
> It will not overcharge. As you said below, we are not using
> __GFP_ACCOUNT and charging the compressed slots to the memcgs.
>
> >
> > (2) Patch 3 seems to be charging the compressed slots to the memcgs,
> > yet this patch is trying to charge the entire zspage. Aren't we double
> > charging the zspage? I am guessing this isn't happening because (as
> > Michal pointed out) we are not using __GFP_ACCOUNT here anyway, so
> > this patch may be NOP, and the actual charging is coming from patch 3
> > only.
>
> YES, the actual charging is coming from patch 3. This patch just
> delivers the BIO page's memcg to the current task which is not the
> consumer.
>
> >
> > (3) Zswap recently implemented per-memcg charging of compressed
> > objects in a much simpler way. If your main interest is #2 (which is
> > what I understand from the commit log), it seems like zswap might be
> > providing this already? Why can't you use zswap? Is it the fact that
> > zswap requires a backing swapfile?
>
> Thanks for your reply and review. Yes, the zswap requires a backing
> swapfile. The I/O path is very complex, sometimes it will throttle the
> whole system if some resources are short , so we hope to use zram.

Is the only problem with zswap for you the requirement of a backing swapfile?

If yes, I am in the early stages of developing a solution to make
zswap work without a backing swapfile. This was discussed in LSF/MM
[1]. Would this make zswap usable in for your use case?

[1]https://lore.kernel.org/linux-mm/CAJD7tkYb_sGN8mfGVjr2JxdB8Pz8Td=yj9_sBCMrmsKQo56vTg@mail.gmail.com/

>
> >
> > Thanks!
> >

2023-06-16 08:08:58

by David Hildenbrand

[permalink] [raw]
Subject: Re: [External] Re: [RFC PATCH 1/3] zram: charge the compressed RAM to the page's memcgroup

On 16.06.23 09:37, Yosry Ahmed wrote:
> On Thu, Jun 15, 2023 at 9:41 PM 贺中坤 <[email protected]> wrote:
>>
>>> Thanks Fabian for tagging me.
>>>
>>> I am not familiar with #1, so I will speak to #2. Zhongkun, There are
>>> a few parts that I do not understand -- hopefully you can help me out
>>> here:
>>>
>>> (1) If I understand correctly in this patch we set the active memcg
>>> trying to charge any pages allocated in a zspage to the current memcg,
>>> yet that zspage will contain multiple compressed object slots, not
>>> just the one used by this memcg. Aren't we overcharging the memcg?
>>> Basically the first memcg that happens to allocate the zspage will pay
>>> for all the objects in this zspage, even after it stops using the
>>> zspage completely?
>>
>> It will not overcharge. As you said below, we are not using
>> __GFP_ACCOUNT and charging the compressed slots to the memcgs.
>>
>>>
>>> (2) Patch 3 seems to be charging the compressed slots to the memcgs,
>>> yet this patch is trying to charge the entire zspage. Aren't we double
>>> charging the zspage? I am guessing this isn't happening because (as
>>> Michal pointed out) we are not using __GFP_ACCOUNT here anyway, so
>>> this patch may be NOP, and the actual charging is coming from patch 3
>>> only.
>>
>> YES, the actual charging is coming from patch 3. This patch just
>> delivers the BIO page's memcg to the current task which is not the
>> consumer.
>>
>>>
>>> (3) Zswap recently implemented per-memcg charging of compressed
>>> objects in a much simpler way. If your main interest is #2 (which is
>>> what I understand from the commit log), it seems like zswap might be
>>> providing this already? Why can't you use zswap? Is it the fact that
>>> zswap requires a backing swapfile?
>>
>> Thanks for your reply and review. Yes, the zswap requires a backing
>> swapfile. The I/O path is very complex, sometimes it will throttle the
>> whole system if some resources are short , so we hope to use zram.
>
> Is the only problem with zswap for you the requirement of a backing swapfile?
>
> If yes, I am in the early stages of developing a solution to make
> zswap work without a backing swapfile. This was discussed in LSF/MM
> [1]. Would this make zswap usable in for your use case?

Out of curiosity, are there any other known pros/cons when using
zswap-without-swap instead of zram?

I know that zram requires sizing (size of the virtual block device) and
consumes metadata, zswap doesn't.

--
Cheers,

David / dhildenb


2023-06-16 08:34:02

by Yosry Ahmed

[permalink] [raw]
Subject: Re: [External] Re: [RFC PATCH 1/3] zram: charge the compressed RAM to the page's memcgroup

On Fri, Jun 16, 2023 at 12:57 AM David Hildenbrand <[email protected]> wrote:
>
> On 16.06.23 09:37, Yosry Ahmed wrote:
> > On Thu, Jun 15, 2023 at 9:41 PM 贺中坤 <[email protected]> wrote:
> >>
> >>> Thanks Fabian for tagging me.
> >>>
> >>> I am not familiar with #1, so I will speak to #2. Zhongkun, There are
> >>> a few parts that I do not understand -- hopefully you can help me out
> >>> here:
> >>>
> >>> (1) If I understand correctly in this patch we set the active memcg
> >>> trying to charge any pages allocated in a zspage to the current memcg,
> >>> yet that zspage will contain multiple compressed object slots, not
> >>> just the one used by this memcg. Aren't we overcharging the memcg?
> >>> Basically the first memcg that happens to allocate the zspage will pay
> >>> for all the objects in this zspage, even after it stops using the
> >>> zspage completely?
> >>
> >> It will not overcharge. As you said below, we are not using
> >> __GFP_ACCOUNT and charging the compressed slots to the memcgs.
> >>
> >>>
> >>> (2) Patch 3 seems to be charging the compressed slots to the memcgs,
> >>> yet this patch is trying to charge the entire zspage. Aren't we double
> >>> charging the zspage? I am guessing this isn't happening because (as
> >>> Michal pointed out) we are not using __GFP_ACCOUNT here anyway, so
> >>> this patch may be NOP, and the actual charging is coming from patch 3
> >>> only.
> >>
> >> YES, the actual charging is coming from patch 3. This patch just
> >> delivers the BIO page's memcg to the current task which is not the
> >> consumer.
> >>
> >>>
> >>> (3) Zswap recently implemented per-memcg charging of compressed
> >>> objects in a much simpler way. If your main interest is #2 (which is
> >>> what I understand from the commit log), it seems like zswap might be
> >>> providing this already? Why can't you use zswap? Is it the fact that
> >>> zswap requires a backing swapfile?
> >>
> >> Thanks for your reply and review. Yes, the zswap requires a backing
> >> swapfile. The I/O path is very complex, sometimes it will throttle the
> >> whole system if some resources are short , so we hope to use zram.
> >
> > Is the only problem with zswap for you the requirement of a backing swapfile?
> >
> > If yes, I am in the early stages of developing a solution to make
> > zswap work without a backing swapfile. This was discussed in LSF/MM
> > [1]. Would this make zswap usable in for your use case?
>
> Out of curiosity, are there any other known pros/cons when using
> zswap-without-swap instead of zram?
>
> I know that zram requires sizing (size of the virtual block device) and
> consumes metadata, zswap doesn't.

We don't use zram in our data centers so I am not an expert about
zram, but off the top of my head there are a few more advantages to
zswap:
(1) Better memcg support (which this series is attempting to address
in zram, although in a much more complicated way).

(2) We internally have incompressible memory handling on top of zswap,
which is something that we would like to upstream when
zswap-without-swap is supported. Basically if a page does not compress
well enough to save memory we reject it from zswap and make it
unevictable (if there is no backing swapfile). The existence of zswap
in the MM layer helps with this. Since zram is a block device from the
MM perspective, it's more difficult to do something like this.
Incompressible pages just sit in zram AFAICT.

(3) Writeback support. If you're running out of memory to store
compressed pages you can add a swapfile in runtime and zswap will
start writing to it freeing up space to compress more pages. This
wouldn't be possible in the same way in zram. Zram supports writing to
a backing device but in a more manual way (userspace has to write to
an interface to tell zram to write some pages).

The disadvantage is that zswap doesn't currently support being used
without a swapfile, but once this support exists, I am not sure what
other disadvantages zswap would have compared to zram.

>
> --
> Cheers,
>
> David / dhildenb
>

2023-06-16 08:53:39

by David Hildenbrand

[permalink] [raw]
Subject: Re: [External] Re: [RFC PATCH 1/3] zram: charge the compressed RAM to the page's memcgroup

On 16.06.23 10:04, Yosry Ahmed wrote:
> On Fri, Jun 16, 2023 at 12:57 AM David Hildenbrand <[email protected]> wrote:
>>
>> On 16.06.23 09:37, Yosry Ahmed wrote:
>>> On Thu, Jun 15, 2023 at 9:41 PM 贺中坤 <[email protected]> wrote:
>>>>
>>>>> Thanks Fabian for tagging me.
>>>>>
>>>>> I am not familiar with #1, so I will speak to #2. Zhongkun, There are
>>>>> a few parts that I do not understand -- hopefully you can help me out
>>>>> here:
>>>>>
>>>>> (1) If I understand correctly in this patch we set the active memcg
>>>>> trying to charge any pages allocated in a zspage to the current memcg,
>>>>> yet that zspage will contain multiple compressed object slots, not
>>>>> just the one used by this memcg. Aren't we overcharging the memcg?
>>>>> Basically the first memcg that happens to allocate the zspage will pay
>>>>> for all the objects in this zspage, even after it stops using the
>>>>> zspage completely?
>>>>
>>>> It will not overcharge. As you said below, we are not using
>>>> __GFP_ACCOUNT and charging the compressed slots to the memcgs.
>>>>
>>>>>
>>>>> (2) Patch 3 seems to be charging the compressed slots to the memcgs,
>>>>> yet this patch is trying to charge the entire zspage. Aren't we double
>>>>> charging the zspage? I am guessing this isn't happening because (as
>>>>> Michal pointed out) we are not using __GFP_ACCOUNT here anyway, so
>>>>> this patch may be NOP, and the actual charging is coming from patch 3
>>>>> only.
>>>>
>>>> YES, the actual charging is coming from patch 3. This patch just
>>>> delivers the BIO page's memcg to the current task which is not the
>>>> consumer.
>>>>
>>>>>
>>>>> (3) Zswap recently implemented per-memcg charging of compressed
>>>>> objects in a much simpler way. If your main interest is #2 (which is
>>>>> what I understand from the commit log), it seems like zswap might be
>>>>> providing this already? Why can't you use zswap? Is it the fact that
>>>>> zswap requires a backing swapfile?
>>>>
>>>> Thanks for your reply and review. Yes, the zswap requires a backing
>>>> swapfile. The I/O path is very complex, sometimes it will throttle the
>>>> whole system if some resources are short , so we hope to use zram.
>>>
>>> Is the only problem with zswap for you the requirement of a backing swapfile?
>>>
>>> If yes, I am in the early stages of developing a solution to make
>>> zswap work without a backing swapfile. This was discussed in LSF/MM
>>> [1]. Would this make zswap usable in for your use case?
>>
>> Out of curiosity, are there any other known pros/cons when using
>> zswap-without-swap instead of zram?
>>
>> I know that zram requires sizing (size of the virtual block device) and
>> consumes metadata, zswap doesn't.
>
> We don't use zram in our data centers so I am not an expert about
> zram, but off the top of my head there are a few more advantages to
> zswap:

Thanks!

> (1) Better memcg support (which this series is attempting to address
> in zram, although in a much more complicated way).

Right. I think this patch also misses to update apply the charging in the recompress
case. (only triggered by user space IIUC)

>
> (2) We internally have incompressible memory handling on top of zswap,
> which is something that we would like to upstream when
> zswap-without-swap is supported. Basically if a page does not compress
> well enough to save memory we reject it from zswap and make it
> unevictable (if there is no backing swapfile). The existence of zswap
> in the MM layer helps with this. Since zram is a block device from the
> MM perspective, it's more difficult to do something like this.
> Incompressible pages just sit in zram AFAICT.

I see. With ZRAM_HUGE we still have to store the uncompressed page
(because, it's a block device and has to hold that data).

>
> (3) Writeback support. If you're running out of memory to store
> compressed pages you can add a swapfile in runtime and zswap will
> start writing to it freeing up space to compress more pages. This
> wouldn't be possible in the same way in zram. Zram supports writing to
> a backing device but in a more manual way (userspace has to write to
> an interface to tell zram to write some pages).

Right, that zram backing device stuff is really sub-optimal and only useful
in corner cases (most probably not datacenters).

What one can do with zram is to add a second swap device with lower priority.
Looking at my Fedora machine:

$ cat /proc/swaps
Filename Type Size Used Priority
/dev/dm-2 partition 16588796 0 -2
/dev/zram0 partition 8388604 0 100


Guess the difference here is that you won't be writing out the compressed
data to the disk, but anything the gets swapped out afterwards will
end up on the disk. I can see how the zswap behavior might be better in that case
(instead of swapping out some additional pages you relocate the
already-swapped-out-to-zswap pages to the disk).

--
Cheers,

David / dhildenb