2023-06-28 08:45:13

by Yin, Fengwei

[permalink] [raw]
Subject: [PATCH v2] readahead: Correct the start and size in ondemand_readahead()

The commit
9425c591e06a ("page cache: fix page_cache_next/prev_miss off by one")
updated the page_cache_next_miss() to return the index beyond
range.

But it breaks the start/size of ra in ondemand_readahead() because
the offset by one is accumulated to readahead_index. As a consequence,
not best readahead order is picked.

Tracing of the order parameter of filemap_alloc_folio() showed:
page order : count distribution
0 : 892073 | |
1 : 0 | |
2 : 65120457 |****************************************|
3 : 32914005 |******************** |
4 : 33020991 |******************** |
with 9425c591e06a9.

With parent commit:
page order : count distribution
0 : 3417288 |**** |
1 : 0 | |
2 : 877012 |* |
3 : 288 | |
4 : 5607522 |******* |
5 : 29974228 |****************************************|

Fix the issue by removing the offset by one when page_cache_next_miss()
returns no gaps in the range.

After the fix:
page order : count distribution
0 : 2598561 |*** |
1 : 0 | |
2 : 687739 | |
3 : 288 | |
4 : 207210 | |
5 : 32628260 |****************************************|

Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-lkp/[email protected]
Fixes: 9425c591e06a ("page cache: fix page_cache_next/prev_miss off by one")
Signed-off-by: Yin Fengwei <[email protected]>
---
Changes from v1:
- only removing offset by one when there is no gaps found by
page_cache_next_miss()
- Update commit message to include the histogram of page order
after fix

mm/readahead.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/mm/readahead.c b/mm/readahead.c
index 47afbca1d122..a93af773686f 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -614,9 +614,17 @@ static void ondemand_readahead(struct readahead_control *ractl,
max_pages);
rcu_read_unlock();

- if (!start || start - index > max_pages)
+ if (!start || start - index - 1 > max_pages)
return;

+ /*
+ * If no gaps in the range, page_cache_next_miss() returns
+ * index beyond range. Adjust it back to make sure
+ * ractl->_index is updated correctly later.
+ */
+ if ((start - index - 1) == max_pages)
+ start--;
+
ra->start = start;
ra->size = start - index; /* old async_size */
ra->size += req_size;
--
2.39.2



2023-07-03 18:56:57

by Mike Kravetz

[permalink] [raw]
Subject: Re: [PATCH v2] readahead: Correct the start and size in ondemand_readahead()

On 06/28/23 12:43, Yin Fengwei wrote:
> The commit
> 9425c591e06a ("page cache: fix page_cache_next/prev_miss off by one")
> updated the page_cache_next_miss() to return the index beyond
> range.
>
> But it breaks the start/size of ra in ondemand_readahead() because
> the offset by one is accumulated to readahead_index. As a consequence,
> not best readahead order is picked.
>
> Tracing of the order parameter of filemap_alloc_folio() showed:
> page order : count distribution
> 0 : 892073 | |
> 1 : 0 | |
> 2 : 65120457 |****************************************|
> 3 : 32914005 |******************** |
> 4 : 33020991 |******************** |
> with 9425c591e06a9.
>
> With parent commit:
> page order : count distribution
> 0 : 3417288 |**** |
> 1 : 0 | |
> 2 : 877012 |* |
> 3 : 288 | |
> 4 : 5607522 |******* |
> 5 : 29974228 |****************************************|
>
> Fix the issue by removing the offset by one when page_cache_next_miss()
> returns no gaps in the range.
>
> After the fix:
> page order : count distribution
> 0 : 2598561 |*** |
> 1 : 0 | |
> 2 : 687739 | |
> 3 : 288 | |
> 4 : 207210 | |
> 5 : 32628260 |****************************************|
>

Thank you for your detailed analysis!

When the regression was initially discovered, I sent a patch to revert
commit 9425c591e06a. Andrew has picked up this change. And, Andrew has
also picked up this patch.

I have not verified yet, but I suspect that this patch is going to cause
a regression because it depends on the behavior of page_cache_next_miss
in 9425c591e06a which has been reverted.

Sorry for the delay in responding as I was traveling.
--
Mike Kravetz



> Reported-by: kernel test robot <[email protected]>
> Closes: https://lore.kernel.org/oe-lkp/[email protected]
> Fixes: 9425c591e06a ("page cache: fix page_cache_next/prev_miss off by one")
> Signed-off-by: Yin Fengwei <[email protected]>
> ---
> Changes from v1:
> - only removing offset by one when there is no gaps found by
> page_cache_next_miss()
> - Update commit message to include the histogram of page order
> after fix
>
> mm/readahead.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/mm/readahead.c b/mm/readahead.c
> index 47afbca1d122..a93af773686f 100644
> --- a/mm/readahead.c
> +++ b/mm/readahead.c
> @@ -614,9 +614,17 @@ static void ondemand_readahead(struct readahead_control *ractl,
> max_pages);
> rcu_read_unlock();
>
> - if (!start || start - index > max_pages)
> + if (!start || start - index - 1 > max_pages)
> return;
>
> + /*
> + * If no gaps in the range, page_cache_next_miss() returns
> + * index beyond range. Adjust it back to make sure
> + * ractl->_index is updated correctly later.
> + */
> + if ((start - index - 1) == max_pages)
> + start--;
> +
> ra->start = start;
> ra->size = start - index; /* old async_size */
> ra->size += req_size;
> --
> 2.39.2
>

2023-07-04 02:03:10

by Yin, Fengwei

[permalink] [raw]
Subject: Re: [PATCH v2] readahead: Correct the start and size in ondemand_readahead()



On 7/4/2023 2:49 AM, Mike Kravetz wrote:
> On 06/28/23 12:43, Yin Fengwei wrote:
>> The commit
>> 9425c591e06a ("page cache: fix page_cache_next/prev_miss off by one")
>> updated the page_cache_next_miss() to return the index beyond
>> range.
>>
>> But it breaks the start/size of ra in ondemand_readahead() because
>> the offset by one is accumulated to readahead_index. As a consequence,
>> not best readahead order is picked.
>>
>> Tracing of the order parameter of filemap_alloc_folio() showed:
>> page order : count distribution
>> 0 : 892073 | |
>> 1 : 0 | |
>> 2 : 65120457 |****************************************|
>> 3 : 32914005 |******************** |
>> 4 : 33020991 |******************** |
>> with 9425c591e06a9.
>>
>> With parent commit:
>> page order : count distribution
>> 0 : 3417288 |**** |
>> 1 : 0 | |
>> 2 : 877012 |* |
>> 3 : 288 | |
>> 4 : 5607522 |******* |
>> 5 : 29974228 |****************************************|
>>
>> Fix the issue by removing the offset by one when page_cache_next_miss()
>> returns no gaps in the range.
>>
>> After the fix:
>> page order : count distribution
>> 0 : 2598561 |*** |
>> 1 : 0 | |
>> 2 : 687739 | |
>> 3 : 288 | |
>> 4 : 207210 | |
>> 5 : 32628260 |****************************************|
>>
>
> Thank you for your detailed analysis!
>
> When the regression was initially discovered, I sent a patch to revert
> commit 9425c591e06a. Andrew has picked up this change. And, Andrew has
> also picked up this patch.
Oh. I didn't notice that you sent revert patch. My understanding is that
commit 9425c591e06a is a good change.

>
> I have not verified yet, but I suspect that this patch is going to cause
> a regression because it depends on the behavior of page_cache_next_miss
> in 9425c591e06a which has been reverted.
Yes. If the 9425c591e06a was reverted, this patch could introduce regression.
Which fixing do you prefer? reverting 9425c591e06a or this patch? Then we
can suggest to Andrew to take it.


Regards
Yin, Fengwei

2023-07-05 17:09:27

by Mike Kravetz

[permalink] [raw]
Subject: Re: [PATCH v2] readahead: Correct the start and size in ondemand_readahead()

On 07/04/23 09:41, Yin, Fengwei wrote:
> On 7/4/2023 2:49 AM, Mike Kravetz wrote:
> > On 06/28/23 12:43, Yin Fengwei wrote:
> >
> > Thank you for your detailed analysis!
> >
> > When the regression was initially discovered, I sent a patch to revert
> > commit 9425c591e06a. Andrew has picked up this change. And, Andrew has
> > also picked up this patch.
> Oh. I didn't notice that you sent revert patch. My understanding is that
> commit 9425c591e06a is a good change.
>
> >
> > I have not verified yet, but I suspect that this patch is going to cause
> > a regression because it depends on the behavior of page_cache_next_miss
> > in 9425c591e06a which has been reverted.
> Yes. If the 9425c591e06a was reverted, this patch could introduce regression.
> Which fixing do you prefer? reverting 9425c591e06a or this patch? Then we
> can suggest to Andrew to take it.

For now, I suggest we go with the revert. Why?
- The revert is already going into stable trees.
- I may not be remembering correctly, but I seem to recall Matthew
mentioning plans to redo/redesign the page cache and possibly
readahead code. If this is the case, then better to keep the legacy
behavior for now. But, I am not sure if this is actually part of any
plan or work in progress.

--
Mike Kravetz

2023-07-06 02:00:08

by Yin, Fengwei

[permalink] [raw]
Subject: Re: [PATCH v2] readahead: Correct the start and size in ondemand_readahead()



On 7/6/23 00:52, Mike Kravetz wrote:
> On 07/04/23 09:41, Yin, Fengwei wrote:
>> On 7/4/2023 2:49 AM, Mike Kravetz wrote:
>>> On 06/28/23 12:43, Yin Fengwei wrote:
>>>
>>> Thank you for your detailed analysis!
>>>
>>> When the regression was initially discovered, I sent a patch to revert
>>> commit 9425c591e06a. Andrew has picked up this change. And, Andrew has
>>> also picked up this patch.
>> Oh. I didn't notice that you sent revert patch. My understanding is that
>> commit 9425c591e06a is a good change.
>>
>>>
>>> I have not verified yet, but I suspect that this patch is going to cause
>>> a regression because it depends on the behavior of page_cache_next_miss
>>> in 9425c591e06a which has been reverted.
>> Yes. If the 9425c591e06a was reverted, this patch could introduce regression.
>> Which fixing do you prefer? reverting 9425c591e06a or this patch? Then we
>> can suggest to Andrew to take it.
>
> For now, I suggest we go with the revert. Why?
> - The revert is already going into stable trees.
> - I may not be remembering correctly, but I seem to recall Matthew
> mentioning plans to redo/redesign the page cache and possibly
> readahead code. If this is the case, then better to keep the legacy
> behavior for now. But, I am not sure if this is actually part of any
> plan or work in progress.
>
It's fine to me and thanks a lot for detail explanations.


Hi Andrew,
Could you please help to drop this patch? Thanks.


Regards
Yin, Fengwei