2021-12-01 10:59:22

by Zhaoyang Huang

[permalink] [raw]
Subject: [RFC PATCH] mm: count zram read/write into PSI_IO_WAIT

From: Zhaoyang Huang <[email protected]>

Have zram reading/writing be counted in PSI_IO_WAIT.

Signed-off-by: Zhaoyang Huang <[email protected]>
---
drivers/block/zram/zram_drv.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index fcaf275..b0e4766 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -34,6 +34,7 @@
#include <linux/debugfs.h>
#include <linux/cpuhotplug.h>
#include <linux/part_stat.h>
+#include <linux/psi.h>

#include "zram_drv.h"

@@ -1246,7 +1247,9 @@ static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index,
zram_get_element(zram, index),
bio, partial_io);
}
-
+#ifdef CONFIG_PSI
+ psi_task_change(current, 0, TSK_IOWAIT);
+#endif
handle = zram_get_handle(zram, index);
if (!handle || zram_test_flag(zram, index, ZRAM_SAME)) {
unsigned long value;
@@ -1257,6 +1260,9 @@ static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index,
zram_fill_page(mem, PAGE_SIZE, value);
kunmap_atomic(mem);
zram_slot_unlock(zram, index);
+#ifdef CONFIG_PSI
+ psi_task_change(current, TSK_IOWAIT, 0);
+#endif
return 0;
}

@@ -1284,6 +1290,9 @@ static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index,
if (WARN_ON(ret))
pr_err("Decompression failed! err=%d, page=%u\n", ret, index);

+#ifdef CONFIG_PSI
+ psi_task_change(current, TSK_IOWAIT, 0);
+#endif
return ret;
}

@@ -1471,7 +1480,13 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
vec.bv_offset = 0;
}

+#ifdef CONFIG_PSI
+ psi_task_change(current, 0, TSK_IOWAIT);
+#endif
ret = __zram_bvec_write(zram, &vec, index, bio);
+#ifdef CONFIG_PSI
+ psi_task_change(current, TSK_IOWAIT, 0);
+#endif
out:
if (is_partial_io(bvec))
__free_page(page);
@@ -1607,7 +1622,6 @@ static blk_qc_t zram_submit_bio(struct bio *bio)
atomic64_inc(&zram->stats.invalid_io);
goto error;
}
-
__zram_make_request(zram, bio);
return BLK_QC_T_NONE;

--
1.9.1



2021-12-01 11:12:55

by Zhaoyang Huang

[permalink] [raw]
Subject: Re: [RFC PATCH] mm: count zram read/write into PSI_IO_WAIT

There is no chance for zram reading/writing to be counted in
PSI_IO_WAIT so far as zram will deal with the request just in current
context without invoking submit_bio and io_schedule.

On Wed, Dec 1, 2021 at 6:59 PM Huangzhaoyang <[email protected]> wrote:
>
> From: Zhaoyang Huang <[email protected]>
>
> Have zram reading/writing be counted in PSI_IO_WAIT.
>
> Signed-off-by: Zhaoyang Huang <[email protected]>
> ---
> drivers/block/zram/zram_drv.c | 18 ++++++++++++++++--
> 1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index fcaf275..b0e4766 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -34,6 +34,7 @@
> #include <linux/debugfs.h>
> #include <linux/cpuhotplug.h>
> #include <linux/part_stat.h>
> +#include <linux/psi.h>
>
> #include "zram_drv.h"
>
> @@ -1246,7 +1247,9 @@ static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index,
> zram_get_element(zram, index),
> bio, partial_io);
> }
> -
> +#ifdef CONFIG_PSI
> + psi_task_change(current, 0, TSK_IOWAIT);
> +#endif
> handle = zram_get_handle(zram, index);
> if (!handle || zram_test_flag(zram, index, ZRAM_SAME)) {
> unsigned long value;
> @@ -1257,6 +1260,9 @@ static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index,
> zram_fill_page(mem, PAGE_SIZE, value);
> kunmap_atomic(mem);
> zram_slot_unlock(zram, index);
> +#ifdef CONFIG_PSI
> + psi_task_change(current, TSK_IOWAIT, 0);
> +#endif
> return 0;
> }
>
> @@ -1284,6 +1290,9 @@ static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index,
> if (WARN_ON(ret))
> pr_err("Decompression failed! err=%d, page=%u\n", ret, index);
>
> +#ifdef CONFIG_PSI
> + psi_task_change(current, TSK_IOWAIT, 0);
> +#endif
> return ret;
> }
>
> @@ -1471,7 +1480,13 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
> vec.bv_offset = 0;
> }
>
> +#ifdef CONFIG_PSI
> + psi_task_change(current, 0, TSK_IOWAIT);
> +#endif
> ret = __zram_bvec_write(zram, &vec, index, bio);
> +#ifdef CONFIG_PSI
> + psi_task_change(current, TSK_IOWAIT, 0);
> +#endif
> out:
> if (is_partial_io(bvec))
> __free_page(page);
> @@ -1607,7 +1622,6 @@ static blk_qc_t zram_submit_bio(struct bio *bio)
> atomic64_inc(&zram->stats.invalid_io);
> goto error;
> }
> -
> __zram_make_request(zram, bio);
> return BLK_QC_T_NONE;
>
> --
> 1.9.1
>

2021-12-02 16:28:43

by Johannes Weiner

[permalink] [raw]
Subject: Re: [RFC PATCH] mm: count zram read/write into PSI_IO_WAIT

On Wed, Dec 01, 2021 at 07:12:30PM +0800, Zhaoyang Huang wrote:
> There is no chance for zram reading/writing to be counted in
> PSI_IO_WAIT so far as zram will deal with the request just in current
> context without invoking submit_bio and io_schedule.

Hm, but you're also not waiting for a real io device - during which
the CPU could be doing something else e.g. You're waiting for
decompression. The thread also isn't in D-state during that time. What
scenario would benefit from this accounting? How is IO pressure from
comp/decomp paths actionable to you?

What about when you use zram with disk writeback enabled, and you see
a mix of decompression and actual disk IO. Wouldn't you want to be
able to tell the two apart, to see if you're short on CPU or short on
IO bandwidth in this setup? Your patch would make that impossible.

This needs a much more comprehensive changelog.

> > @@ -1246,7 +1247,9 @@ static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index,
> > zram_get_element(zram, index),
> > bio, partial_io);
> > }
> > -
> > +#ifdef CONFIG_PSI
> > + psi_task_change(current, 0, TSK_IOWAIT);
> > +#endif

Add psi_iostall_enter() and leave helpers that encapsulate the ifdefs.

2021-12-03 09:17:11

by Zhaoyang Huang

[permalink] [raw]
Subject: Re: [RFC PATCH] mm: count zram read/write into PSI_IO_WAIT

On Fri, Dec 3, 2021 at 12:28 AM Johannes Weiner <[email protected]> wrote:
>
> On Wed, Dec 01, 2021 at 07:12:30PM +0800, Zhaoyang Huang wrote:
> > There is no chance for zram reading/writing to be counted in
> > PSI_IO_WAIT so far as zram will deal with the request just in current
> > context without invoking submit_bio and io_schedule.
>
> Hm, but you're also not waiting for a real io device - during which
> the CPU could be doing something else e.g. You're waiting for
> decompression. The thread also isn't in D-state during that time. What
> scenario would benefit from this accounting? How is IO pressure from
> comp/decomp paths actionable to you?
No. Block device related D-state will be counted in via
psi_dequeue(io_wait). What I am proposing here is do NOT ignore the
influence on non-productive time by huge numbers of in-context swap
in/out (zram like). This can help to make IO pressure more accurate
and coordinate with the number of PSWPIN/OUT. It is like counting the
IO time within filemap_fault->wait_on_page_bit_common into
psi_mem_stall, which introduces memory pressure high by IO.
>
> What about when you use zram with disk writeback enabled, and you see
> a mix of decompression and actual disk IO. Wouldn't you want to be
> able to tell the two apart, to see if you're short on CPU or short on
> IO bandwidth in this setup? Your patch would make that impossible.
OK. Is it better to start the IO counting from pageout? Both of the
bdev and ram backed swap would benefit from it.
>
> This needs a much more comprehensive changelog.
>
> > > @@ -1246,7 +1247,9 @@ static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index,
> > > zram_get_element(zram, index),
> > > bio, partial_io);
> > > }
> > > -
> > > +#ifdef CONFIG_PSI
> > > + psi_task_change(current, 0, TSK_IOWAIT);
> > > +#endif
>
> Add psi_iostall_enter() and leave helpers that encapsulate the ifdefs.
OK.

2021-12-08 16:38:41

by Johannes Weiner

[permalink] [raw]
Subject: Re: [RFC PATCH] mm: count zram read/write into PSI_IO_WAIT

On Fri, Dec 03, 2021 at 05:16:47PM +0800, Zhaoyang Huang wrote:
> On Fri, Dec 3, 2021 at 12:28 AM Johannes Weiner <[email protected]> wrote:
> >
> > On Wed, Dec 01, 2021 at 07:12:30PM +0800, Zhaoyang Huang wrote:
> > > There is no chance for zram reading/writing to be counted in
> > > PSI_IO_WAIT so far as zram will deal with the request just in current
> > > context without invoking submit_bio and io_schedule.
> >
> > Hm, but you're also not waiting for a real io device - during which
> > the CPU could be doing something else e.g. You're waiting for
> > decompression. The thread also isn't in D-state during that time. What
> > scenario would benefit from this accounting? How is IO pressure from
> > comp/decomp paths actionable to you?
> No. Block device related D-state will be counted in via
> psi_dequeue(io_wait). What I am proposing here is do NOT ignore the
> influence on non-productive time by huge numbers of in-context swap
> in/out (zram like). This can help to make IO pressure more accurate
> and coordinate with the number of PSWPIN/OUT. It is like counting the
> IO time within filemap_fault->wait_on_page_bit_common into
> psi_mem_stall, which introduces memory pressure high by IO.

It's not ignored, it shows up as memory pressure. Because those delays
occur due to a lack of memory.

On the other hand, having a faster IO device would make no difference
to the time spent on compression and decompression. Counting this time
as IO pressure makes no sense to me.

I'm against merging this patch.

2021-12-08 17:45:27

by Chris Down

[permalink] [raw]
Subject: Re: [RFC PATCH] mm: count zram read/write into PSI_IO_WAIT

Zhaoyang Huang writes:
>No. Block device related D-state will be counted in via
>psi_dequeue(io_wait). What I am proposing here is do NOT ignore the
>influence on non-productive time by huge numbers of in-context swap
>in/out (zram like). This can help to make IO pressure more accurate
>and coordinate with the number of PSWPIN/OUT. It is like counting the
>IO time within filemap_fault->wait_on_page_bit_common into
>psi_mem_stall, which introduces memory pressure high by IO.

I think part of the confusion here is that the name "io" doesn't really just
mean "io", it means "disk I/O". As in, we are targeting real, physical or
network disk I/O. Of course, we can only do what's reasonable if the device
we're accounting for is layers upon layers eventually leading to a
memory-backed device, but _intentionally_ polluting that with more memory-bound
accesses doesn't make any sense when we already have separate accounting for
memory. Why would anyone want that?

I'm with Johannes here, I think this would actively make memory pressure
monitoring less useful. This is a NAK from my perspective as someone who
actually uses these things in production.

2021-12-08 17:47:05

by Chris Down

[permalink] [raw]
Subject: Re: [RFC PATCH] mm: count zram read/write into PSI_IO_WAIT

Chris Down writes:
>I'm with Johannes here, I think this would actively make memory
>pressure monitoring less useful. This is a NAK from my perspective as
>someone who actually uses these things in production.

s/memory/io/