2022-01-02 03:31:51

by DaeRo Lee

[permalink] [raw]
Subject: [PATCH] mm/vmscan.c: no need to double-check if free pages are under high-watermark

From: Daero Lee <[email protected]>

In kswapd_try_to_sleep function, to check whether kswapd can sleep,
the prepare_kswapd_sleep function is called twice.

If free pages are below high-watermark in the first call,
the @remaining variable is not updated at 0 and the
prepare_kswapd_sleep function is called for the second time.

I think it is necessary to set the initial value of the
@remaining to a non-zero value to prevent consecutive calls
to the same function.

Signed-off-by: Daero Lee <[email protected]>
---
mm/vmscan.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 700434db5735..1217ecec5bbb 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4331,7 +4331,7 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx)
/*
* Return the order kswapd stopped reclaiming at as
* prepare_kswapd_sleep() takes it into account. If another caller
- * entered the allocator slow path while kswapd was awake, order will
+ * entered the allqocator slow path while kswapd was awake, order will
* remain at the higher level.
*/
return sc.order;
@@ -4355,7 +4355,7 @@ static enum zone_type kswapd_highest_zoneidx(pg_data_t *pgdat,
static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_order,
unsigned int highest_zoneidx)
{
- long remaining = 0;
+ long remaining = ~0;
DEFINE_WAIT(wait);

if (freezing(current) || kthread_should_stop())
--
2.25.1



2022-01-06 00:17:13

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] mm/vmscan.c: no need to double-check if free pages are under high-watermark

On Sun, 2 Jan 2022 12:31:29 +0900 [email protected] wrote:

> From: Daero Lee <[email protected]>
>
> In kswapd_try_to_sleep function, to check whether kswapd can sleep,
> the prepare_kswapd_sleep function is called twice.
>
> If free pages are below high-watermark in the first call,
> the @remaining variable is not updated at 0 and the
> prepare_kswapd_sleep function is called for the second time.
>
> I think it is necessary to set the initial value of the
> @remaining to a non-zero value to prevent consecutive calls
> to the same function.

Seems right. Mel & Joonsoo, could you please check?

> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -4331,7 +4331,7 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx)
> /*
> * Return the order kswapd stopped reclaiming at as
> * prepare_kswapd_sleep() takes it into account. If another caller
> - * entered the allocator slow path while kswapd was awake, order will
> + * entered the allqocator slow path while kswapd was awake, order will

whoops.

> * remain at the higher level.
> */
> return sc.order;
> @@ -4355,7 +4355,7 @@ static enum zone_type kswapd_highest_zoneidx(pg_data_t *pgdat,
> static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_order,
> unsigned int highest_zoneidx)
> {
> - long remaining = 0;
> + long remaining = ~0;
> DEFINE_WAIT(wait);
>
> if (freezing(current) || kthread_should_stop())


2022-01-06 09:46:57

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH] mm/vmscan.c: no need to double-check if free pages are under high-watermark

On Sun, Jan 02, 2022 at 12:31:29PM +0900, [email protected] wrote:
> From: Daero Lee <[email protected]>
>
> In kswapd_try_to_sleep function, to check whether kswapd can sleep,
> the prepare_kswapd_sleep function is called twice.
>
> If free pages are below high-watermark in the first call,
> the @remaining variable is not updated at 0 and the
> prepare_kswapd_sleep function is called for the second time.
>
> I think it is necessary to set the initial value of the
> @remaining to a non-zero value to prevent consecutive calls
> to the same function.
>
> Signed-off-by: Daero Lee <[email protected]>
> ---
> mm/vmscan.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 700434db5735..1217ecec5bbb 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -4331,7 +4331,7 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx)
> /*
> * Return the order kswapd stopped reclaiming at as
> * prepare_kswapd_sleep() takes it into account. If another caller
> - * entered the allocator slow path while kswapd was awake, order will
> + * entered the allqocator slow path while kswapd was awake, order will
> * remain at the higher level.
> */
> return sc.order;

This hunk just adds a typo, drop it.

> @@ -4355,7 +4355,7 @@ static enum zone_type kswapd_highest_zoneidx(pg_data_t *pgdat,
> static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_order,
> unsigned int highest_zoneidx)
> {
> - long remaining = 0;
> + long remaining = ~0;
> DEFINE_WAIT(wait);
>
> if (freezing(current) || kthread_should_stop())

While this does avoid calling prepare_kswapd_sleep() twice if the pgdat
is balanced on the first try, it then does not restore the vmstat
thresholds and doesn't call schedul() for kswapd to go to sleep.

I think you did spot a problem but I suspect you want something like
the following untested patch

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 700434db5735..40784693c840 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4355,7 +4355,8 @@ static enum zone_type kswapd_highest_zoneidx(pg_data_t *pgdat,
static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_order,
unsigned int highest_zoneidx)
{
- long remaining = 0;
+ long remaining;
+ bool balanced;
DEFINE_WAIT(wait);

if (freezing(current) || kthread_should_stop())
@@ -4370,7 +4371,8 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_o
* eligible zone balanced that it's also unlikely that compaction will
* succeed.
*/
- if (prepare_kswapd_sleep(pgdat, reclaim_order, highest_zoneidx)) {
+ balanced = prepare_kswapd_sleep(pgdat, reclaim_order, highest_zoneidx);
+ if (balanced) {
/*
* Compaction records what page blocks it recently failed to
* isolate pages from and skips them in the future scanning.
@@ -4387,6 +4389,10 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_o

remaining = schedule_timeout(HZ/10);

+ /* Is pgdat balanced after a short sleep? */
+ balanced = prepare_kswapd_sleep(pgdat, reclaim_order,
+ highest_zoneidx);
+
/*
* If woken prematurely then reset kswapd_highest_zoneidx and
* order. The values will either be from a wakeup request or
@@ -4406,11 +4412,11 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_o
}

/*
- * After a short sleep, check if it was a premature sleep. If not, then
- * go fully to sleep until explicitly woken up.
+ * If balanced to the high watermark, restore vmstat thresholds and
+ * kswapd goes to sleep. If kswapd remains awake, account whether
+ * the low or high watermark was hit quickly.
*/
- if (!remaining &&
- prepare_kswapd_sleep(pgdat, reclaim_order, highest_zoneidx)) {
+ if (balanced) {
trace_mm_vmscan_kswapd_sleep(pgdat->node_id);

/*

2022-01-06 12:03:51

by DaeRo Lee

[permalink] [raw]
Subject: Re: [PATCH] mm/vmscan.c: no need to double-check if free pages are under high-watermark

> > @@ -4355,7 +4355,7 @@ static enum zone_type kswapd_highest_zoneidx(pg_data_t *pgdat,
> > static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_order,
> > unsigned int highest_zoneidx)
> > {
> > - long remaining = 0;
> > + long remaining = ~0;
> > DEFINE_WAIT(wait);
> >
> > if (freezing(current) || kthread_should_stop())
>
> While this does avoid calling prepare_kswapd_sleep() twice if the pgdat
> is balanced on the first try, it then does not restore the vmstat
> thresholds and doesn't call schedul() for kswapd to go to sleep.

I intended not to call prepare_kswapd_sleep() twice when the pgdat is NOT
balanced on the first try:)

> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 700434db5735..40784693c840 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -4355,7 +4355,8 @@ static enum zone_type kswapd_highest_zoneidx(pg_data_t *pgdat,
> static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_order,
> unsigned int highest_zoneidx)
> {
> - long remaining = 0;
> + long remaining;
> + bool balanced;
> DEFINE_WAIT(wait);
>
> if (freezing(current) || kthread_should_stop())
> @@ -4370,7 +4371,8 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_o
> * eligible zone balanced that it's also unlikely that compaction will
> * succeed.
> */
> - if (prepare_kswapd_sleep(pgdat, reclaim_order, highest_zoneidx)) {
> + balanced = prepare_kswapd_sleep(pgdat, reclaim_order, highest_zoneidx);
> + if (balanced) {
> /*
> * Compaction records what page blocks it recently failed to
> * isolate pages from and skips them in the future scanning.
> @@ -4387,6 +4389,10 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_o
>
> remaining = schedule_timeout(HZ/10);
>
> + /* Is pgdat balanced after a short sleep? */
> + balanced = prepare_kswapd_sleep(pgdat, reclaim_order,
> + highest_zoneidx);
> +
> /*
> * If woken prematurely then reset kswapd_highest_zoneidx and
> * order. The values will either be from a wakeup request or
> @@ -4406,11 +4412,11 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_o
> }
>
> /*
> - * After a short sleep, check if it was a premature sleep. If not, then
> - * go fully to sleep until explicitly woken up.
> + * If balanced to the high watermark, restore vmstat thresholds and
> + * kswapd goes to sleep. If kswapd remains awake, account whether
> + * the low or high watermark was hit quickly.
> */
> - if (!remaining &&
> - prepare_kswapd_sleep(pgdat, reclaim_order, highest_zoneidx)) {
> + if (balanced) {
> trace_mm_vmscan_kswapd_sleep(pgdat->node_id);
>
> /*

But, I think what you did is more readable and nice.
Thanks!

> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 700434db5735..1217ecec5bbb 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -4331,7 +4331,7 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx)
> > /*
> > * Return the order kswapd stopped reclaiming at as
> > * prepare_kswapd_sleep() takes it into account. If another caller
> > - * entered the allocator slow path while kswapd was awake, order will
> > + * entered the allqocator slow path while kswapd was awake, order will
> > * remain at the higher level.
> > */
> > return sc.order;
>
> This hunk just adds a typo, drop it.

Sorry about that;;

2022-01-06 12:58:03

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH] mm/vmscan.c: no need to double-check if free pages are under high-watermark

On Thu, Jan 06, 2022 at 09:03:34PM +0900, DaeRo Lee wrote:
> > > @@ -4355,7 +4355,7 @@ static enum zone_type kswapd_highest_zoneidx(pg_data_t *pgdat,
> > > static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_order,
> > > unsigned int highest_zoneidx)
> > > {
> > > - long remaining = 0;
> > > + long remaining = ~0;
> > > DEFINE_WAIT(wait);
> > >
> > > if (freezing(current) || kthread_should_stop())
> >
> > While this does avoid calling prepare_kswapd_sleep() twice if the pgdat
> > is balanced on the first try, it then does not restore the vmstat
> > thresholds and doesn't call schedul() for kswapd to go to sleep.
>
> I intended not to call prepare_kswapd_sleep() twice when the pgdat is NOT
> balanced on the first try:)
>

Stupid typo on my part.

> > @@ -4406,11 +4412,11 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_o
> > }
> >
> > /*
> > - * After a short sleep, check if it was a premature sleep. If not, then
> > - * go fully to sleep until explicitly woken up.
> > + * If balanced to the high watermark, restore vmstat thresholds and
> > + * kswapd goes to sleep. If kswapd remains awake, account whether
> > + * the low or high watermark was hit quickly.
> > */
> > - if (!remaining &&
> > - prepare_kswapd_sleep(pgdat, reclaim_order, highest_zoneidx)) {
> > + if (balanced) {
> > trace_mm_vmscan_kswapd_sleep(pgdat->node_id);
> >
> > /*
>
> But, I think what you did is more readable and nice.
> Thanks!
>

Feel free to pick it up, rerun your tests to ensure it's behaving as
expected and resend! Include something in the changelog about user-visible
effects if any (or a note saying that it reduces unnecssary overhead)
and resend with me added to the cc.

> > > @@ -4331,7 +4331,7 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx)
> > > /*
> > > * Return the order kswapd stopped reclaiming at as
> > > * prepare_kswapd_sleep() takes it into account. If another caller
> > > - * entered the allocator slow path while kswapd was awake, order will
> > > + * entered the allqocator slow path while kswapd was awake, order will
> > > * remain at the higher level.
> > > */
> > > return sc.order;
> >
> > This hunk just adds a typo, drop it.
>
> Sorry about that;;

No need to be sorry, it happens :)

Thanks!

--
Mel Gorman
SUSE Labs

2022-03-26 17:39:20

by Wei Yang

[permalink] [raw]
Subject: Re: [PATCH] mm/vmscan.c: no need to double-check if free pages are under high-watermark

On Thu, Jan 06, 2022 at 12:57:58PM +0000, Mel Gorman wrote:
>On Thu, Jan 06, 2022 at 09:03:34PM +0900, DaeRo Lee wrote:
>> > > @@ -4355,7 +4355,7 @@ static enum zone_type kswapd_highest_zoneidx(pg_data_t *pgdat,
>> > > static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_order,
>> > > unsigned int highest_zoneidx)
>> > > {
>> > > - long remaining = 0;
>> > > + long remaining = ~0;
>> > > DEFINE_WAIT(wait);
>> > >
>> > > if (freezing(current) || kthread_should_stop())
>> >
>> > While this does avoid calling prepare_kswapd_sleep() twice if the pgdat
>> > is balanced on the first try, it then does not restore the vmstat
>> > thresholds and doesn't call schedul() for kswapd to go to sleep.
>>
>> I intended not to call prepare_kswapd_sleep() twice when the pgdat is NOT
>> balanced on the first try:)
>>
>
>Stupid typo on my part.
>
>> > @@ -4406,11 +4412,11 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_o
>> > }
>> >
>> > /*
>> > - * After a short sleep, check if it was a premature sleep. If not, then
>> > - * go fully to sleep until explicitly woken up.
>> > + * If balanced to the high watermark, restore vmstat thresholds and
>> > + * kswapd goes to sleep. If kswapd remains awake, account whether
>> > + * the low or high watermark was hit quickly.
>> > */
>> > - if (!remaining &&
>> > - prepare_kswapd_sleep(pgdat, reclaim_order, highest_zoneidx)) {
>> > + if (balanced) {
>> > trace_mm_vmscan_kswapd_sleep(pgdat->node_id);
>> >
>> > /*
>>
>> But, I think what you did is more readable and nice.
>> Thanks!
>>
>
>Feel free to pick it up, rerun your tests to ensure it's behaving as
>expected and resend! Include something in the changelog about user-visible
>effects if any (or a note saying that it reduces unnecssary overhead)
>and resend with me added to the cc.
>

Hi, All

Seems this thread stops here. I don't see following patch and current upstream
doesn't include this change.

May I continue this? Of course, with author-ship from DaeRo Lee <[email protected]>.

Mel,

Would you mind suggesting some cases that I could do to see the effects from
this change? Such as the overhead or throughput? Or what cases you expect?


--
Wei Yang
Help you, Help me

2022-04-07 20:42:34

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH] mm/vmscan.c: no need to double-check if free pages are under high-watermark

On Sat, Mar 26, 2022 at 03:50:22PM +0000, Wei Yang wrote:
> On Thu, Jan 06, 2022 at 12:57:58PM +0000, Mel Gorman wrote:
> >On Thu, Jan 06, 2022 at 09:03:34PM +0900, DaeRo Lee wrote:
> >> > > @@ -4355,7 +4355,7 @@ static enum zone_type kswapd_highest_zoneidx(pg_data_t *pgdat,
> >> > > static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_order,
> >> > > unsigned int highest_zoneidx)
> >> > > {
> >> > > - long remaining = 0;
> >> > > + long remaining = ~0;
> >> > > DEFINE_WAIT(wait);
> >> > >
> >> > > if (freezing(current) || kthread_should_stop())
> >> >
> >> > While this does avoid calling prepare_kswapd_sleep() twice if the pgdat
> >> > is balanced on the first try, it then does not restore the vmstat
> >> > thresholds and doesn't call schedul() for kswapd to go to sleep.
> >>
> >> I intended not to call prepare_kswapd_sleep() twice when the pgdat is NOT
> >> balanced on the first try:)
> >>
> >
> >Stupid typo on my part.
> >
> >> > @@ -4406,11 +4412,11 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_o
> >> > }
> >> >
> >> > /*
> >> > - * After a short sleep, check if it was a premature sleep. If not, then
> >> > - * go fully to sleep until explicitly woken up.
> >> > + * If balanced to the high watermark, restore vmstat thresholds and
> >> > + * kswapd goes to sleep. If kswapd remains awake, account whether
> >> > + * the low or high watermark was hit quickly.
> >> > */
> >> > - if (!remaining &&
> >> > - prepare_kswapd_sleep(pgdat, reclaim_order, highest_zoneidx)) {
> >> > + if (balanced) {
> >> > trace_mm_vmscan_kswapd_sleep(pgdat->node_id);
> >> >
> >> > /*
> >>
> >> But, I think what you did is more readable and nice.
> >> Thanks!
> >>
> >
> >Feel free to pick it up, rerun your tests to ensure it's behaving as
> >expected and resend! Include something in the changelog about user-visible
> >effects if any (or a note saying that it reduces unnecssary overhead)
> >and resend with me added to the cc.
> >
>
> Hi, All
>
> Seems this thread stops here. I don't see following patch and current upstream
> doesn't include this change.
>
> May I continue this? Of course, with author-ship from DaeRo Lee <[email protected]>.
>

I've no objections. When I said "Feel free to pick it up", I meant that
I was ok with you taking the patch and putting your team on it.

> Mel,
>
> Would you mind suggesting some cases that I could do to see the effects from
> this change? Such as the overhead or throughput? Or what cases you expect?
>

I don't have any suggestions on artificially triggering it. I had assumed
you had encountered the bug in practice and had a test case but it would
be ok to note that the patch is a theoretical fix based on code review.

--
Mel Gorman
SUSE Labs