From: Zhaoyang Huang <[email protected]>
It doesn't make sense to count IO time into psi memstall. Bail out after
bio submitted.
Signed-off-by: Zhaoyang Huang <[email protected]>
---
mm/page_io.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/page_io.c b/mm/page_io.c
index c493ce9..1d131fc 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -423,6 +423,7 @@ int swap_readpage(struct page *page, bool synchronous)
count_vm_event(PSWPIN);
bio_get(bio);
qc = submit_bio(bio);
+ psi_memstall_leave(&pflags);
while (synchronous) {
set_current_state(TASK_UNINTERRUPTIBLE);
if (!READ_ONCE(bio->bi_private))
@@ -433,7 +434,7 @@ int swap_readpage(struct page *page, bool synchronous)
}
__set_current_state(TASK_RUNNING);
bio_put(bio);
-
+ return ret;
out:
psi_memstall_leave(&pflags);
return ret;
--
1.9.1
On 9/7/21 13:59, Huangzhaoyang wrote:
> From: Zhaoyang Huang <[email protected]>
>
> It doesn't make sense to count IO time into psi memstall. Bail out after
> bio submitted.
Isn't that the point if psi, to observe real stalls, which include IO?
Anyway, CCing Johannes.
> Signed-off-by: Zhaoyang Huang <[email protected]>
> ---
> mm/page_io.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/mm/page_io.c b/mm/page_io.c
> index c493ce9..1d131fc 100644
> --- a/mm/page_io.c
> +++ b/mm/page_io.c
> @@ -423,6 +423,7 @@ int swap_readpage(struct page *page, bool synchronous)
> count_vm_event(PSWPIN);
> bio_get(bio);
> qc = submit_bio(bio);
> + psi_memstall_leave(&pflags);
> while (synchronous) {
> set_current_state(TASK_UNINTERRUPTIBLE);
> if (!READ_ONCE(bio->bi_private))
> @@ -433,7 +434,7 @@ int swap_readpage(struct page *page, bool synchronous)
> }
> __set_current_state(TASK_RUNNING);
> bio_put(bio);
> -
> + return ret;
> out:
> psi_memstall_leave(&pflags);
> return ret;
>
On Tue, Sep 7, 2021 at 8:03 PM Vlastimil Babka <[email protected]> wrote:
>
> On 9/7/21 13:59, Huangzhaoyang wrote:
> > From: Zhaoyang Huang <[email protected]>
> >
> > It doesn't make sense to count IO time into psi memstall. Bail out after
> > bio submitted.
>
> Isn't that the point if psi, to observe real stalls, which include IO?
> Anyway, CCing Johannes.
IO stalls could be observed within blk_io_schedule. The time cost of
the data from block device to RAM is counted here. The original
purpose is to deal with the ZRAM alike devices which deal with the bio
locally instead of submitting it to request queue.
>
> > Signed-off-by: Zhaoyang Huang <[email protected]>
> > ---
> > mm/page_io.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/page_io.c b/mm/page_io.c
> > index c493ce9..1d131fc 100644
> > --- a/mm/page_io.c
> > +++ b/mm/page_io.c
> > @@ -423,6 +423,7 @@ int swap_readpage(struct page *page, bool synchronous)
> > count_vm_event(PSWPIN);
> > bio_get(bio);
> > qc = submit_bio(bio);
> > + psi_memstall_leave(&pflags);
> > while (synchronous) {
> > set_current_state(TASK_UNINTERRUPTIBLE);
> > if (!READ_ONCE(bio->bi_private))
> > @@ -433,7 +434,7 @@ int swap_readpage(struct page *page, bool synchronous)
> > }
> > __set_current_state(TASK_RUNNING);
> > bio_put(bio);
> > -
> > + return ret;
> > out:
> > psi_memstall_leave(&pflags);
> > return ret;
> >
>
On Tue, Sep 07, 2021 at 08:15:30PM +0800, Zhaoyang Huang wrote:
> On Tue, Sep 7, 2021 at 8:03 PM Vlastimil Babka <[email protected]> wrote:
> >
> > On 9/7/21 13:59, Huangzhaoyang wrote:
> > > From: Zhaoyang Huang <[email protected]>
> > >
> > > It doesn't make sense to count IO time into psi memstall. Bail out after
> > > bio submitted.
> >
> > Isn't that the point if psi, to observe real stalls, which include IO?
Yes, correct.
> IO stalls could be observed within blk_io_schedule. The time cost of
> the data from block device to RAM is counted here.
Yes, that is on purpose. The time a thread waits for swap read IO is
time in which the thread is not productive due to a lack of memory.
For async-submitted IO, this happens in lock_page() called from
do_swap_page(). If the submitting thread directly waits after the
submit_bio(), then that should be accounted too.
This patch doesn't make sense to me.
On Tue, Sep 7, 2021 at 9:24 PM Johannes Weiner <[email protected]> wrote:
>
> On Tue, Sep 07, 2021 at 08:15:30PM +0800, Zhaoyang Huang wrote:
> > On Tue, Sep 7, 2021 at 8:03 PM Vlastimil Babka <[email protected]> wrote:
> > >
> > > On 9/7/21 13:59, Huangzhaoyang wrote:
> > > > From: Zhaoyang Huang <[email protected]>
> > > >
> > > > It doesn't make sense to count IO time into psi memstall. Bail out after
> > > > bio submitted.
> > >
> > > Isn't that the point if psi, to observe real stalls, which include IO?
>
> Yes, correct.
>
> > IO stalls could be observed within blk_io_schedule. The time cost of
> > the data from block device to RAM is counted here.
>
> Yes, that is on purpose. The time a thread waits for swap read IO is
> time in which the thread is not productive due to a lack of memory.
>
> For async-submitted IO, this happens in lock_page() called from
> do_swap_page(). If the submitting thread directly waits after the
> submit_bio(), then that should be accounted too.
IMO, memstall counting should be terminated by bio submitted. blk
driver fetching request and the operation on the real device shouldn't
be counted in. It especially doesn't make sense in a virtualization
system like XEN etc, where the blk driver is implemented via
backend-frontend way that introduce memory irrelevant latency
>
> This patch doesn't make sense to me.
On Wed, Sep 08, 2021 at 11:35:40AM +0800, Zhaoyang Huang wrote:
> On Tue, Sep 7, 2021 at 9:24 PM Johannes Weiner <[email protected]> wrote:
> >
> > On Tue, Sep 07, 2021 at 08:15:30PM +0800, Zhaoyang Huang wrote:
> > > On Tue, Sep 7, 2021 at 8:03 PM Vlastimil Babka <[email protected]> wrote:
> > > >
> > > > On 9/7/21 13:59, Huangzhaoyang wrote:
> > > > > From: Zhaoyang Huang <[email protected]>
> > > > >
> > > > > It doesn't make sense to count IO time into psi memstall. Bail out after
> > > > > bio submitted.
> > > >
> > > > Isn't that the point if psi, to observe real stalls, which include IO?
> >
> > Yes, correct.
> >
> > > IO stalls could be observed within blk_io_schedule. The time cost of
> > > the data from block device to RAM is counted here.
> >
> > Yes, that is on purpose. The time a thread waits for swap read IO is
> > time in which the thread is not productive due to a lack of memory.
> >
> > For async-submitted IO, this happens in lock_page() called from
> > do_swap_page(). If the submitting thread directly waits after the
> > submit_bio(), then that should be accounted too.
> IMO, memstall counting should be terminated by bio submitted. blk
> driver fetching request and the operation on the real device shouldn't
> be counted in. It especially doesn't make sense in a virtualization
> system like XEN etc, where the blk driver is implemented via
> backend-frontend way that introduce memory irrelevant latency
Yes but the entire IO operation and all the associated latency only
happens due to a shortage of memory in the first place. The thread is
incurring these delays due to a lack of memory.
What is a memstall if not the latencies and wait times incurred in the
process of reloading pages that were evicted prematurely?