From: Zhaoyang Huang <[email protected]>
As the eg bellowing, using GFP_KERNEL could confuse the registered .releasepage
or .shrinker functions when called in kswapd and have them acting wrongly.Mask
__GFP_DIRECT_RECLAIM in kswapd.
eg,
kswapd
shrink_page_list
try_to_release_page
__fscache_maybe_release_page
...
if (!(gfp & __GFP_DIRECT_RECLAIM) || !(gfp & __GFP_FS)) {
fscache_stat(&fscache_n_store_vmscan_busy);
return false;
}
Signed-off-by: Zhaoyang Huang <[email protected]>
---
mm/vmscan.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index ef4a6dc..3b5c5e6 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4083,7 +4083,7 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx)
bool boosted;
struct zone *zone;
struct scan_control sc = {
- .gfp_mask = GFP_KERNEL,
+ .gfp_mask = GFP_KERNEL & ~__GFP_DIRECT_RECLAIM,
.order = order,
.may_unmap = 1,
};
--
1.9.1
On Mon, 6 Dec 2021 11:19:22 +0800 Huangzhaoyang <[email protected]> wrote:
> From: Zhaoyang Huang <[email protected]>
>
> As the eg bellowing, using GFP_KERNEL could confuse the registered .releasepage
> or .shrinker functions when called in kswapd and have them acting wrongly.Mask
> __GFP_DIRECT_RECLAIM in kswapd.
>
> eg,
> kswapd
> shrink_page_list
> try_to_release_page
> __fscache_maybe_release_page
> ...
> if (!(gfp & __GFP_DIRECT_RECLAIM) || !(gfp & __GFP_FS)) {
> fscache_stat(&fscache_n_store_vmscan_busy);
> return false;
> }
Well, we have thus far been permitting kswapd's memory allocations to
enter direct reclaim. Forbidding that kernel-wide might be the right
thing to do, or might not be. But disabling it kernel-wide because of
a peculiar hack in fscache is not good justification.
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -4083,7 +4083,7 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx)
> bool boosted;
> struct zone *zone;
> struct scan_control sc = {
> - .gfp_mask = GFP_KERNEL,
> + .gfp_mask = GFP_KERNEL & ~__GFP_DIRECT_RECLAIM,
> .order = order,
> .may_unmap = 1,
> };
Maybe hack the hack like this?
--- a/fs/fscache/page.c~a
+++ a/fs/fscache/page.c
@@ -126,8 +126,10 @@ page_busy:
* sleeping on memory allocation, so we may need to impose a timeout
* too. */
if (!(gfp & __GFP_DIRECT_RECLAIM) || !(gfp & __GFP_FS)) {
- fscache_stat(&fscache_n_store_vmscan_busy);
- return false;
+ if (!current_is_kswapd()) {
+ fscache_stat(&fscache_n_store_vmscan_busy);
+ return false;
+ }
}
fscache_stat(&fscache_n_store_vmscan_wait);
_
But please, do cc the fscache mailing list and maintainer when mucking
with these things.
On Tue, Dec 7, 2021 at 9:23 AM Andrew Morton <[email protected]> wrote:
>
> On Mon, 6 Dec 2021 11:19:22 +0800 Huangzhaoyang <[email protected]> wrote:
>
> > From: Zhaoyang Huang <[email protected]>
> >
> > As the eg bellowing, using GFP_KERNEL could confuse the registered .releasepage
> > or .shrinker functions when called in kswapd and have them acting wrongly.Mask
> > __GFP_DIRECT_RECLAIM in kswapd.
> >
> > eg,
> > kswapd
> > shrink_page_list
> > try_to_release_page
> > __fscache_maybe_release_page
> > ...
> > if (!(gfp & __GFP_DIRECT_RECLAIM) || !(gfp & __GFP_FS)) {
> > fscache_stat(&fscache_n_store_vmscan_busy);
> > return false;
> > }
>
> Well, we have thus far been permitting kswapd's memory allocations to
> enter direct reclaim. Forbidding that kernel-wide might be the right
> thing to do, or might not be. But disabling it kernel-wide because of
> a peculiar hack in fscache is not good justification.
By checking the whole path of kswapd reclaiming, I don't find any
steps need __GFP_DIRECT_RECLAIM but the hooked slab shrinker and fs's
releasepage functions. It doesn't make sense for kswapd be aware of
there is a concurrent direct reclaim.
>
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -4083,7 +4083,7 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx)
> > bool boosted;
> > struct zone *zone;
> > struct scan_control sc = {
> > - .gfp_mask = GFP_KERNEL,
> > + .gfp_mask = GFP_KERNEL & ~__GFP_DIRECT_RECLAIM,
> > .order = order,
> > .may_unmap = 1,
> > };
>
> Maybe hack the hack like this?
>
> --- a/fs/fscache/page.c~a
> +++ a/fs/fscache/page.c
> @@ -126,8 +126,10 @@ page_busy:
> * sleeping on memory allocation, so we may need to impose a timeout
> * too. */
> if (!(gfp & __GFP_DIRECT_RECLAIM) || !(gfp & __GFP_FS)) {
> - fscache_stat(&fscache_n_store_vmscan_busy);
> - return false;
> + if (!current_is_kswapd()) {
> + fscache_stat(&fscache_n_store_vmscan_busy);
> + return false;
> + }
> }
>
> fscache_stat(&fscache_n_store_vmscan_wait);
This method works. However, there are several other hook functions as
below using this flag for judging the context. IMHO,
__GFP_DIRECT_RECLAIM just only takes affection for two points. Have
page_alloc_slow_path judging if enter direct_reclaim and the reclaimer
tell the context.
eg.
xfs_qm_shrink_scan(
...
if ((sc->gfp_mask & (__GFP_FS|__GFP_DIRECT_RECLAIM)) !=
(__GFP_FS|__GFP_DIRECT_RECLAIM))
return 0;
static int ceph_releasepage(struct page *page, gfp_t gfp)
...
if (PageFsCache(page)) {
if (!(gfp & __GFP_DIRECT_RECLAIM) || !(gfp & __GFP_FS))
return 0;
static int afs_releasepage(struct page *page, gfp_t gfp_flags)
...
if (PageFsCache(page)) {
if (!(gfp_flags & __GFP_DIRECT_RECLAIM) ||
!(gfp_flags & __GFP_FS))
return false;
> _
>
> But please, do cc the fscache mailing list and maintainer when mucking
> with these things.
Andrew Morton <[email protected]> wrote:
> > __fscache_maybe_release_page
> > ...
> > if (!(gfp & __GFP_DIRECT_RECLAIM) || !(gfp & __GFP_FS)) {
> > fscache_stat(&fscache_n_store_vmscan_busy);
> > return false;
> > }
>
> Well, we have thus far been permitting kswapd's memory allocations to
> enter direct reclaim. Forbidding that kernel-wide might be the right
> thing to do, or might not be. But disabling it kernel-wide because of
> a peculiar hack in fscache is not good justification.
It's avoiding sleeping in ->releasepage() if fscache is doing something with
the page. With the old I/O still used by nfs and cifs, PG_fscache means that
the page is known to fscache and it might be doing something with it in the
background. You have to ask fscache to release the page, which may require
I/O to take place, to get rid of the mark.
With the new I/O, as used by 9p, afs and ceph, where we're doing async DIO
between the page and the cache, PG_fscache just means that there's a DIO write
in progress from the page. It will be cleared when the DIO completes.
I'm fine with changing the condition in the if-statement. Note that in my
fscache-rewrite branch:
https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-rewrite
I've been changing this to:
if (!gfpflags_allow_blocking(gfp) || !(gfp & __GFP_FS))
and the old I/O is gone. This is aimed at the next merge window. If you want
me to change it there, let me know.
David